DNA Melting: Model function and parameter estimation by nonlinear regression

From Course Wiki
Revision as of 17:52, 18 November 2012 by Steven Wasserman (Talk | contribs)

Jump to: navigation, search
20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


DNA melting curve raw data plotted versus temperature along with DNA apparatus model function evaluated with best-fit parameters from nonlinear regression.

Overview

In addition to the dsDNA concentration, $ \Delta H^{\circ} $, and $ \Delta S^{\circ} $, the measured fluorescence voltage, $ V_{f,measured} $, depends of several factors, including:

  • dynamics of the temperature cycling system
  • thermal quenching of the fluorophore
  • photobleaching
  • responsivity and offset of the instrument
  • binding kinetics of the dye

The goal is to write a model for $ V_f $ that takes these effects into account and use nonlinear regression to estimate the parameters.

Sample temperature

Lumped element model of thermal cycler.

The DNA melter does not directly measure sample temperature. Sample temperature significantly lags block temperature.

Assumptions

  • The RTD measures temperature of the metal heating block
  • The sample has a constant heat capacity with uniform temperature throughout.
  • The heating block has a much larger heat capacity than the sample, so the block can be modeled as a temperature source.
  • The cuvette is modeled as a thermal resistance; its heat capacity is neglected.
  • Heat escaping from the sample to the environment is modeled by a thermal resistance between the sample and room temperature

A circuit model for the heating system is shown in the figure. The circuit has a first-order, low-pass filter transfer function with two parameters:

$ \frac{\hat{\theta}_{sample}(\omega)} {\hat{\theta}_{block}(\omega)}=\frac{K_{thermal}}{j \omega \tau_{thermal} + 1} $
  • $ K_{thermal} $ is the low frequency gain of the system. In other words, if the heating block was set to 100° and the system was allowed to equilibrate for an infinite amount of time, the sample would reach a temperature of $ K_{thermal} \times 100^{\circ} $
  • In response to a step increase in block temperature, the sample temperature would rise exponentially with time constant $ \tau_{thermal} $
Block temperature, measured sample temperature, and model sample temperature. Sample temperature was measured by immersing an RTD in the DNA solution.
$ V_f $ versus model sample temperature, $ \theta_{sample} $.

Photobleaching

Excited dye molecules can react chemically with compounds in their environment and permanently lose their ability to fluoresce — a phenomenon called photobleaching. LED and ambient light illuminate the SYBR Green dye for 15 or more minutes over the course of a single melting and annealing cycle, which results in measurable photobleaching. Bleaching can be diminished by lowering the illumination; however, the signal is also proportionally reduced. Photobleaching can also be corrected by a mathematical model.

Assumptions

Photobleaching is a complicated phenomenon. The model proposed here adheres to Dr. George E. P. Box's excellent advice on modeling. Some of the assumptions are more suspect than others.

Which assumptions are probably significant sources of systematic error?

  • The dye is divided into two populations, bleached and unbleached, with concentrations S and S
  • The initial dye concentration is S + S
  • Only dye molecules bound to dsDNA may be excited.
  • Only excited fluorophore molecules bleach
  • An excited fluorophore will either bleach with probability $ p $ or return to the ground state with probability $ 1-p $.
    • The constant $ 1-p $ encompasses all of the mechanisms by which the fluorophore returns to the ground state unbleached, including fluorescence, phosphorescence, and non-radiative relaxation.
  • Bleaching is irreversible.
  • The number of molecules in the excited state is proportional to fluorescence.
  • Bleached and unbleached molecules bind to dsDNA with the same affinity.

Model

Under these assumptions, the bleaching rate is proportional to fluorescence.

$ \frac{\partial \bar{S}}{\partial t} = K_{bleaching} f(t) $

$ K_{bleaching} $ encompasses several constants, including $ p $, illumination intensity, optical gain of the instrument, and lock-in amplifier gain.

Setting the initial dye concentration to 1, the fraction of unbleached SYBR Green can be calculated by integrating:

$ S(t)=1-\bar{S}(t)=1-K_{bleaching}\int_0^t f(t) \mathrm{d}t $

Background fluorescence (optional)

SYBR Green has a finite contrast ratio. In reality, unbound dye fluoresces at a low level. Because the model includes an arbitrary constant to account for instrument offset, the background fluorescence is not measured. Adding a constant rate term to the bleaching equation can help describe this behavior. The revised model is:

$ \frac{\partial \bar{S}}{\partial t} = K_{bleaching} f(t) + K_{background} $
$ S(t)=1-\bar{S}(t)=1-K_{bleaching}\int_0^t f(t) \mathrm{d}t + K_{background} t $

Many melting curves are well described without the constant term. If this term is not needed in your model, leave it out. A model that explains the dataset well with fewer parameters is always better.

Thermal quenching

SYBR Green I fluoresces less efficiently at higher temperatures. The decrease in fluorescence is approximately linear. Defining the dye efficiency at the initial temperature to be 1, quenching can be modeled by:

$ Q(t)=1-K_{quench}[\theta_{sample}(t)-\theta(0)] $

Instrument gain and offset

There are several optical and electronic gain factors built into the DNA melter that determine the instrument responsively, $ \partial V_F / \partial [{dsDNA}] $. In addition, $ V_f $ may be offset from zero when the dsDNA concentration is zero.

  • $ K_{gain} $accounts for the arbitrary scale of$ V_f $.
  • $ K_{offset} $ is equal to the value of $ V_f $ when the dsDNA concentration is zero.

Expression for model $ V_f $

$ V_{f,model} = K_{gain} S(t) Q(t) \text{DnaFraction}(\theta_{sample}, \Delta H^{\circ}, \Delta S^{\circ}) - K_{offset} $

DnaFraction is the melting curve model from part 1 of the lab.

Finding double stranded DNA fraction from raw data

300 px Raw data transformed by applying the inverse of the model function to raw data

It is possible to use the inverse of the melting model function to transform raw experimental data into dsDNA fraction.

$ \text{DnaFraction}_{inverse-model}(t) = \frac{V_f(t) + K_{offset}} {K_{gain} S(t) Q(t)} $

An example plot of $ \text{DnaFraction}(t)_{inverse-model} $ versus $ \theta_{sample} $ ishown at right. $ \text{DnaFraction}_{model} $ versus $ \theta_{sample} $is plotted on the same set of axes to reveals discrepancies between the model and experimental data caused by noise and systematic error in the model. These discrepancies are better visualized on a residual plot, as described below.

Residual plot

300 px Residuals plotted versus time, temperature, fluorescence, and the cumulative sum of fluorescence

Observed values differ from predicted values because of noise and systematic errors in the model. Residuals are the difference between the experimental observations and the model predictions, $ V_{f,model}-V_{f,measured} $. Ideally, a plot of the residuals should demonstrate that the residuals are random and identically distributed — something like white noise.

The plots at right show $ V_{f,model}-V_{f,measured} $, plotted versus temperature, time, fluorescence, and the cumulative sum of fluorescence. The residuals are clearly not random and identically distributed. This suggests that the model does not perfectly explain the data. The scale of the plot is much smaller than the data plot — about one percent of the data scale.

The magnitude of the residuals is small, about one percent. A perfect model might require dozens of added parameters and additional physical measurements.

Plotting the residuals versus different variables can help suggest what factors are not modeled well.