Difference between revisions of "DNA Melting: Model function and parameter estimation by nonlinear regression"

From Course Wiki
Jump to: navigation, search
(Finding double stranded DNA fraction from raw data)
Line 28: Line 28:
 
===Assumptions===
 
===Assumptions===
 
* The RTD measures temperature of the metal heating block, which is uniform throughout.
 
* The RTD measures temperature of the metal heating block, which is uniform throughout.
* The heating block has a much larger heat capacity than the sample. (This allows the block to be modeled as a temperature source).
+
* The heating block has a much larger heat conductivity than the sample. (This allows the block to be modeled as a temperature source).
 
* The thermal capacity of the glass cuvette is small relative to its thermal resistance. (The cuvette's heat capacity is neglected in the model.)
 
* The thermal capacity of the glass cuvette is small relative to its thermal resistance. (The cuvette's heat capacity is neglected in the model.)
 
* The sample has constant heat capacity and uniform temperature.
 
* The sample has constant heat capacity and uniform temperature.
Line 88: Line 88:
 
SYBR Green I fluoresces less efficiently at higher temperatures. The decrease in fluorescence is approximately linear. Defining the dye efficiency at the initial temperature to be 1, quenching can be modeled by:
 
SYBR Green I fluoresces less efficiently at higher temperatures. The decrease in fluorescence is approximately linear. Defining the dye efficiency at the initial temperature to be 1, quenching can be modeled by:
  
::<math>\left . Q(t)=1-K_{quench}[T_{sample}(t)-\theta(0)]\right .</math>
+
::<math>\left . Q(t)=1-K_{quench}[T_{sample}(t)-T_{sample}(0)]\right .</math>
  
 
==Instrument gain and offset==
 
==Instrument gain and offset==

Revision as of 17:16, 13 December 2013

20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


Measured photodiode voltage, Vf,measured plotted versus block temperature, θblock along with model photodiode voltage, Vf,model.

Overview

In addition to the double stranded DNA concentration Cds(t), ΔH°, and ΔS°, the measured fluorescence voltage, Vf,measured(t), depends of several factors, including:

  • dynamics of the temperature cycling system
  • thermal quenching of the fluorophore
  • photobleaching
  • responsivity and offset of the instrument
  • binding kinetics of the dye

The goal is to write a model for Vf,measured that takes these effects into account and use nonlinear regression to estimate the parameters. The model proposed here adheres to Dr. George E. P. Box's excellent advice on modeling, in that it is both wrong and useful. Some of the assumptions are more dubious than others.

Sample temperature

Lumped element model of thermal cycler.

The DNA melter measures the temperature of the metal heating block instead of the sample. Because the glass cuvette has high thermal resistance, the sample temperature significantly lags the heating block temperature. A simple, lumped-element model can predict the sample temperature based on the block temperature history.

Direct measurements of the sample temperature made by placing an RTD within the sample volume validated the model's predictions to within 0.5° C.

Assumptions

  • The RTD measures temperature of the metal heating block, which is uniform throughout.
  • The heating block has a much larger heat conductivity than the sample. (This allows the block to be modeled as a temperature source).
  • The thermal capacity of the glass cuvette is small relative to its thermal resistance. (The cuvette's heat capacity is neglected in the model.)
  • The sample has constant heat capacity and uniform temperature.
  • Heat escapes from the sample to the environment at a rate proportional to the difference between the sample and room temperatures.

Under these assumptions, the thermal system can be modeled by lumped elements. An equivalent circuit for the heating system is shown in the figure. The circuit is a first-order, low-pass filter. In this circuit, a step increase in block temperature causes the sample temperature to rise exponentially to a peak value of Kthermal with time constant τthermal.

The transfer function of the low-pass filter has two parameters:

$ \frac{\hat{T}_{sample}(\omega)} {\hat{T}_{block}(\omega)}=\frac{K_{thermal}}{j \omega \tau_{thermal} + 1} $.
  • Kthermal is the low frequency gain of the system.
  • τthermal is the time constant.
Block temperature, measured sample temperature, and model sample temperature. Sample temperature was measured by immersing an RTD in the DNA solution.
$ V_{f,measured} $ versus model sample temperature, $ \theta_{sample} $.

Implementation

Matlab functions tf and lsim can be used to simulate this continuous-time system. Be certain to handle the initial conditions properly. (One way to avoid setting initial conditions is to subtract the initial value from the temperature signal and add it back in after evaluating lsim.)

Photobleaching

Excited dye molecules can react chemically with compounds in their environment and permanently lose their ability to fluoresce — a phenomenon called photobleaching. LED and ambient light illuminate the SYBR Green dye for 15 or more minutes over the course of a single melting and annealing cycle, which results in measurable photobleaching. The effect of photobleaching can be approximated by a mathematical model.

Assumptions

  • The dye is divided into two populations, bleached and unbleached, with concentrations S and S
  • The initial dye concentration is S + S.
  • Only dye molecules bound to dsDNA may be excited.
  • Only excited fluorophore molecules bleach.
  • An excited fluorophore will either bleach with probability p or return to the ground state with probability 1-p.
    • The constant 1-p encompasses all of the mechanisms by which the fluorophore returns to the ground state unbleached, including fluorescence, phosphorescence, and non-radiative relaxation.
  • Bleaching is irreversible.
  • The number of molecules in the excited state is proportional to fluorescence.
  • Bleached and unbleached molecules bind to dsDNA with the same affinity.

Model

Under these assumptions, the bleaching rate is proportional to fluorescence.

$ \frac{\partial \bar{S}}{\partial t} = K_{bleaching} f(t) $

Kbleaching encompasses several constants, including $ p $, illumination intensity, optical gain of the instrument, and lock-in amplifier gain.

Setting the initial dye concentration to 1, the fraction of unbleached SYBR Green can be calculated by integrating:

$ S(t)=1-\bar{S}(t)=1-K_{bleaching}\int_0^t f(t) \mathrm{d}t $

Background fluorescence (optional)

SYBR Green has a finite contrast ratio. In reality, unbound dye fluoresces at a low level. Because the model includes an arbitrary constant to account for instrument offset, the background fluorescence is not measured. Adding a constant rate term to the bleaching equation can help describe this behavior. The revised model is:

$ \frac{\partial \bar{S}}{\partial t} = K_{bleaching} f(t) + K_{background} $
$ S(t)=1-\bar{S}(t)=1 - K_{bleaching}\int_0^t f(t) \mathrm{d}t - K_{background} t $

Many melting curves are well described without the constant term. If this term is not needed in your model, leave it out. A model that explains the dataset well with fewer parameters is preferable.

Thermal quenching

SYBR Green I fluoresces less efficiently at higher temperatures. The decrease in fluorescence is approximately linear. Defining the dye efficiency at the initial temperature to be 1, quenching can be modeled by:

$ \left . Q(t)=1-K_{quench}[T_{sample}(t)-T_{sample}(0)]\right . $

Instrument gain and offset

There are several optical and electronic gain factors built into the DNA melter that determine the instrument responsively, ∂Vf,measured / ∂ Cds. In addition, Vf,measured may be offset from zero when Cds concentration is zero.

  • Kgain is equal to the change in Vf,measured divided by the change in fraction of dsDNA. This is assumed to be constant.
  • Koffset is equal to the value of Vf,measured when the dsDNA concentration is zero.

Expression for model Vf,model

$ \left . V_{f,model}(t) = K_{gain} S(t) Q(t) C_{ds}(T_{sample}(t), \Delta H^\circ, \Delta S^\circ) + K_{offset}\right . $

Cds is the model melting curve produced by the DnaFraction function from part 1 of the lab. Differences between the model and the observations are precisely visualized on a residual plot, described below.

Finding double stranded DNA fraction from raw data

Corrected DNA data.png

It is possible to use the inverse function of the melting model with respect to $ V_{f,measured}(t) $. The result is a model that estimates the concentration of double stranded DNA based on observations and the models for bleaching and quenching. This estimated melting curve may be directly compared with simulations, measurements or other predictions of the true melting curve.

$ C_{ds,inverse-model}(t) = \frac{V_{f,measured}(t) - K_{offset}} {K_{gain} S(t) Q(t)} $

The plot at right shows an example of Cds,inverse-model(t) versus Tsample(t). Cds(ΔH°, ΔS°, T) versus T is plotted on the same set of axes. One can see discrepancies between the model and experimental data caused by random noise in Vf,measured(t) and systematic error in the model Vf,model.

A simulated melting curve from DinaMelt is plotted on the same axes. The estimated melting curve is shifted to the right compared to the inverse model, possibly due to systematic error in the sample temperature model. The estimated melting curve also serves as a comparison to the thermodynamic model developed in DNA Melting Thermodynamics or any independent measurement of the thermodynamic melting curve.

Residual plot

Residuals plotted versus time, temperature, fluorescence, and the cumulative sum of fluorescence

Observed values differ from predicted values because of noise and systematic errors in the model. Residuals are the difference between experimental observations and model predictions, Vf,modelVf,measured. Ideally, the residuals should be random and identically distributed.

The plots at right show Vf,modelVf,measured, versus temperature, time, fluorescence, and the cumulative sum of fluorescence. The residuals are clearly not random and identically distributed. This suggests that the model does not perfectly explain the observations. The scale of the plot is much smaller than the data plot — about one percent of the data scale.

A perfect model might require dozens of added parameters and additional physical measurements.

Plotting the residuals versus different variables can help suggest what factors are not modeled well.

Testing the fitting algorithm

The input to the model function, Vf,model, is the block temperature (from which sample temperature is derived) and the model parameters (to be determined). The model function is suitable for use with the Matlab nlinfit function (or other fitting routines). The input to nlinfit will then be the block temperature, fluorescence signal, the model function, and initial values of the model parameters. The output of nlinfit are the fitted parameter values.

In Part II, the block temperature will be controlled to produce a ramp up from room temperature to 95 degrees then back down to room temperature.

Model block temperature.
% generate temperature ramp
Tmin = 25;
Tmax = 95;
Temp = [ (0:0.1:900)*((Tmax-Tmin)/900) + Tmin, Tmax*ones(1,1200), ...
    (0.1:0.1:900)*((Tmin-Tmax)/900) + Tmax, Tmin*ones(1,1200) ];
time = (1:length(Temp))*0.1;
plot(time,Temp)
xlabel('Time, seconds')
ylabel('Temperature, deg C')
title('Model block temperature')

For testing the fitting algorithm, create a model data set using the model function with block temperature profile given above, and use that data set in the fitting routine. Of course, using model function data in the fit of that same model function should result in the exact parameters used to generate the data. Adding some noise to the simulated fluorescence signal will give a little bit of a challenge to the fitting algorithm. (Note: in the code below, simDNA is a stand-in for the model function with default parameters. Substitute your own model function with appropriate parameters.)

Noisy simulated DNA melting curve.
noise = 0.05;  % Roughly 5% noise
F = simDNA(Temp, struct);  % Fill in with default parameters
F = F + noise*randn(size(F));
plot(time,F)
xlabel('Temperature, deg C');
ylabel('Fluorescence Signal');
title('Simulated DNA Melting Curve');

Even at a relatively large amplitude, random noise by itself should not appreciably affect the fitted parameter values; however, noise will generally contribute to uncertainty in the fit values. This uncertainty is seen in the parameter confidence intervals. Use the Matlab function nlparci to obtain the 95% confidence intervals from the results returned by nlinfit, e.g.,

    [fitValues, residual, ~, COVB, ~] = nlinfit(Temp, F, fitFunc, beta0, fitOptions);
    CI = nlparci(fitValues, residual, 'covar',COVB);

where CI will be an N×2 array of values where N is the length of fitValues. It may informative to compute the normalized confidence intervals as a fraction of each fit value, e.g.,

    NormalizedCI = (CI(:,2) - CI(:,1))' ./ fitValues;


HWassignment.jpg Simulate a DNA melting curve and vary the noise magnitude from 0 to 0.05. At each noise level, fit the curve, and compute the confidence intervals. Finally, plot the normalized confidence intervals as a function of the noise magnitude.


Vf / ∂θ versus θ plot

Vf / ∂ θ versus θ

DNA melting curves are frequently used in high-throughput screens to rapidly and inexpensively search for SNPs and other variations. In these studies, the temperature at which the maximum value of the derivative ∂Vf / ∂ θ occurs is frequently used as an estimate of melting temperature. This provides a computationally inexpensive way to compare samples without finding ΔH° and ΔS°. Large datasets from are also frequently visualized using differential plots — with one curve subtracted from another.

Computing derivatives can be problematic. In frequency space, differentiation is equivalent to multiplying the signal's transform by j2πf. Differentiation amplifies the high-frequency content of the signal, which has the effect of emphasizing noise. Smoothing the data, an operation that reduces high frequencies, can help.

Implementation

Assuming the variables temperature and fluorescence are defined, the following Matlab code fragment computes ΔF/Δθ for the heating portion of the curve:

dTemperature = diff(smooth(temperature, 30));
dFluorescence = diff(smooth(fluorescence, 30));
temperatureAxis = (temperature(1:(end-1)) + temperature(2:end)) ./ 2;

goodOnes = abs(dTemperature) > 0.005;
dFdT = dFluorescence(goodOnes) ./ dTemperature(goodOnes);

plot( temperatureAxis(goodOnes), dFdT);

The Matlab smooth command implements a boxcar filter for data smoothing with selectable length.

Adjust the threshold value and to get a good curve. Discontinuities in fluorescence or temperature signals will produce large spikes in the derivative. Eliminate these before differentiating.

This method uses the block temperature, $ \theta_{block} $, which is significantly different than the sample temperature, θsample. In addition, this method may fail if melting occurred during the constant-temperature portion of the thermal cycle. (In this case, the cooling curve may give better results.) It is possible to rectify the temperature discrepancy by using θsample instead of θblock; however, the thermal transfer function parameters must be found by another means.

Comparing the known and unknown samples

One possible way to compare the unknown sample to the three knowns is to use Matlab's anova and multcompare functions. Anova takes care of the difficulties that arise when comparing multiple sample means using Student's t-test.

The following code creates a simulated set of melting temperatures for three known samples and one unknown. In the simulation, each sample was run three times. The samples have melting temperatures of 270, 272, and 274 degrees. The unknown sample has a melting temperature of 272 degrees. Random noise is added to the true mean values to generate simulated results. Try running the code with different values of noiseStandardDeviation.

% create simulated dataset
noiseStandardDeviation = 0.5;
meltingTemperature = [270 270 270 272 272 272 274 274 274 272 272 272] + noiseStandardDeviation * randn(1,12);
sampleName = {'20bp' '20bp' '20bp' '30bp' '30bp' '30bp' '40bp' '40bp' '40bp' 'unknown' 'unknown' 'unknown'};

% compute anova statistics
[p, table, anovaStatistics] = anova1(meltingTemperature, sampleName);

% do the multiple comparison
[comparison means plotHandle groupNames] = multcompare(anovaStatistics);
Output of multcompare command.

The multcompare function generates a table of confidence intervals for each possible pair-wise comparison. You can use the table to determine whether the means of two samples are significantly different. Output of multcompare is shown at right. If your data has a lot of variation, you might have to use the options to reduce the confidence level. (Or there might not be a significant difference at all.)

Consider devising a more sophisticated method that uses both the ΔH° and ΔS° values, instead.








</div>