Difference between revisions of "DNA Melting: Model function and parameter estimation by nonlinear regression"

From Course Wiki
Jump to: navigation, search
(∂Vf / ∂θ versus θ plot)
(Comparing the known and unknown samples)
 
(22 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
[[Category:20.309]]
 
[[Category:20.309]]
 
[[Category:DNA Melting Lab]]
 
[[Category:DNA Melting Lab]]
 +
[[Category:Matlab]]
 
{{Template:20.309}}
 
{{Template:20.309}}
  
Line 46: Line 47:
  
 
===Implementation===
 
===Implementation===
Matlab functions [http://www.mathworks.com/help/control/ref/tf.html tf] and [http://www.mathworks.com/help/control/ref/lsim.html lsim] can be used to simulate this continuous-time system. Be certain to handle the initial conditions properly. (One way to avoid setting initial conditions is to subtract the initial value from the temperature signal and add it back in after evaluating <code>lsim</code>.)
+
MATLAB includes several functions for simulating continuous-time, linear, shift-invariant (CTLSI) systems. You can write the transfer function of a CTLSI system as the ratio of two polynomials in <math>s</math>, where <math>s=j\omega</math>. For example, the transfer function of a low-pass filter can be expressed in the form <math>H(\omega)=\frac{1}{\tau s+1}=\frac{1}{\tau j \omega+1}</math>. The MATLAB function [http://www.mathworks.com/help/control/ref/tf.html tf] constructs a software object that represents a CTLSI transfer function. <tt>tf</tt> takes two arguments that specify the coefficients of the numerator and denominator polynomials. The arguments must be formatted as vectors containing the coefficients of <math>s</math> in order of decreasing power. For example, the transfer function <math>\frac{1}{60 s + 1}</math> can be created by:
 +
 
 +
<pre>
 +
>> exampleTransferFunction = tf( 1, [ 60 1 ] )
 +
exampleTransferFunction =
 +
 +
    1
 +
  --------
 +
  60 s + 1
 +
 +
Continuous-time transfer function.
 +
</pre>
 +
 
 +
Several functions operate on continuous-time transfer function objects. For example <tt>bode</tt> and <tt>impulse</tt> generate bode and impulse response plots as expected. (Perhaps the Vice President of Inconsistent Naming at MathWorks was out sick the day those functions were named.) There is also a CTLSI simulator called [http://www.mathworks.com/help/control/ref/lsim.html lsim] that will be useful for deriving the sample temperature from the block temperature using a lumped-element model of the thermal low-pass system realized by the heating block, glass vial, and sample. (Seems like the Director of Inconsistencies was back in the office when <tt>lsim</tt> got named.)
 +
 
 +
The SimulateLowPass MATLAB function below uses <tt>lsim</tt> to produce the simulated output of a low-pass filter given its time constant, a vector of input values, and an optional vector of time values. If supplied, the time vector must be the same length as the input vector (''i.e.'' each time value corresponds to one input value). If no time vector is passed, the function generates a time vector assuming that the input values are sampled uniformly at ten times per second, starting at <math>t=0</math>. The DNA melter software produces ten samples per second.
 +
 
 +
<pre>
 +
function SimulatedOutput = SimulateLowPass( TimeConstant, InputData, TimeVector )
 +
    if( nargin < 3 )
 +
        TimeVector = (0:(length(InputData)-1))/10;
 +
    end
 +
 
 +
    transferFunction = tf( 1, [TimeConstant, 1] );
 +
 
 +
    initalTemperature = InputData(1);
 +
    InputData = InputData - initalTemperature;
 +
 
 +
    SimulatedOutput = lsim( transferFunction, InputData, TimeVector' ) + initalTemperature;
 +
end
 +
</pre>
 +
 
 +
The function subtracts off the initial value before running the simulation and than adds the initial value back in after the simulation. This is a cheesy way to handle initial conditions &mdash; setting the temperature zero point to be equal to the initial temperature. This is equivalent to assuming that the block has been at its initial temperature during the time interval <math>t = [ -\infty, 0 )</math>.
  
 
==Photobleaching==
 
==Photobleaching==
Excited dye molecules can react chemically with compounds in their environment and permanently lose their ability to fluoresce &mdash; a phenomenon called photobleaching. LED and ambient light illuminate the SYBR Green dye for 15 or more minutes over the course of a single melting and annealing cycle, which results in measurable photobleaching. The effect of photobleaching can be approximated by a mathematical model.
+
Excited dye molecules can react chemically with compounds in their environment and permanently lose their ability to fluoresce &mdash; a phenomenon called photobleaching. LED and ambient light illuminate the LC Green dye for 15 or more minutes over the course of a single melting and annealing cycle, which results in measurable photobleaching. The effect of photobleaching can be approximated by a mathematical model.
  
 
===Assumptions===
 
===Assumptions===
* The dye is divided into two populations, bleached and unbleached, with concentrations S and <span style="text-decoration: overline">S</span>
+
* The dye is divided into two populations, bleached and unbleached, with concentrations <span style="text-decoration: overline">S</span> and S
 
* The initial dye concentration is S + <span style="text-decoration: overline">S</span>.
 
* The initial dye concentration is S + <span style="text-decoration: overline">S</span>.
 
* Only dye molecules bound to dsDNA may be excited.
 
* Only dye molecules bound to dsDNA may be excited.
Line 69: Line 102:
 
''K<sub>bleaching</sub>'' encompasses several constants, including <math>p</math>, illumination intensity, optical gain of the instrument, and lock-in amplifier gain.
 
''K<sub>bleaching</sub>'' encompasses several constants, including <math>p</math>, illumination intensity, optical gain of the instrument, and lock-in amplifier gain.
  
Setting the initial dye concentration to 1, the fraction of unbleached SYBR Green can be calculated by integrating:
+
Setting the initial dye concentration to 1, the fraction of unbleached LC Green can be calculated by integrating:
  
 
::<math>S(t)=1-\bar{S}(t)=1-K_{bleaching}\int_0^t f(t) \mathrm{d}t</math>
 
::<math>S(t)=1-\bar{S}(t)=1-K_{bleaching}\int_0^t f(t) \mathrm{d}t</math>
  
===Background fluorescence (optional)===
+
===Implementation===
SYBR Green has a finite contrast ratio. In reality, unbound dye fluoresces at a low level. Because the model includes an arbitrary constant to account for instrument offset, the background fluorescence is not measured. Adding a constant rate term to the bleaching equation can help describe this behavior. The revised model is:
+
A simple way to approximate an integral in discrete time is a cumulative sum: <math>\int_0^t f(t) \approx \sum_{n=1}^{f F_s}f(n T_s)</math>, where <math>F_s</math> is the sample rate. This works particularly well when the signal changes slowly relative to the sample period. It's a good bet that the bleaching time constant is much longer than the 0.1 s sample period. MATLAB has a built-in function for computing cumulative sums called <tt>cumsum</tt>. Here is an example:
  
::<math>\frac{\partial \bar{S}}{\partial t} = K_{bleaching} f(t) + K_{background}</math>
+
<pre>
 
+
>> cumsum( 1:10 )
::<math>S(t)=1-\bar{S}(t)=1 - K_{bleaching}\int_0^t f(t) \mathrm{d}t - K_{background} t</math>
+
ans =
 +
    1     3    6    10    15    21    28    36    45    55
 +
</pre>
  
Many melting curves are well described without the constant term. If this term is not needed in your model, leave it out. A model that explains the dataset well with fewer parameters is preferable.
+
<tt>cumsum</tt> will generate a very good numerical approximation to <math>\int_0^t f(t) \mathrm{d}t</math>.
  
 
==Thermal quenching==
 
==Thermal quenching==
SYBR Green I fluoresces less efficiently at higher temperatures. The decrease in fluorescence is approximately linear. Defining the dye efficiency at the initial temperature to be 1, quenching can be modeled by:
+
LC Green I fluoresces less efficiently at higher temperatures. The decrease in fluorescence is approximately linear. Defining the dye efficiency at the initial temperature to be 1, quenching can be modeled by:
  
 
::<math>\left . Q(t)=1-K_{quench}[T_{sample}(t)-T_{sample}(0)]\right .</math>
 
::<math>\left . Q(t)=1-K_{quench}[T_{sample}(t)-T_{sample}(0)]\right .</math>
Line 102: Line 137:
 
[[Image:Corrected DNA data.png|thumb|right]]
 
[[Image:Corrected DNA data.png|thumb|right]]
  
The inverse function of the melting model with respect to ''V<sub>f,measured</sub>''(''t'') is helpful to visulaize discrepancies between the model and experimental data caused by random noise in ''V<sub>f,measured</sub>'' and systematic error in the model ''V<sub>f,model''.  The function,  
+
The inverse function of the melting model with respect to ''V<sub>f,measured</sub>''(''t'') is helpful to visualize discrepancies between the model and experimental data caused by random noise in ''V<sub>f,measured</sub>'' and systematic error in the model ''V<sub>f,model''.  The function,  
  
 
::<math>C_{ds,inverse-model}(V_{f,measured}(t)) = \frac{V_{f,measured}(t) - K_{offset}} {K_{gain} S(t) Q(t)}</math>,
 
::<math>C_{ds,inverse-model}(V_{f,measured}(t)) = \frac{V_{f,measured}(t) - K_{offset}} {K_{gain} S(t) Q(t)}</math>,
Line 139: Line 174:
 
</pre>
 
</pre>
  
For testing the fitting algorithm, create a model data set using the model function with block temperature profile given above, and use that data set in the fitting routine. Of course, using model function data in the fit of that same model function should result in the exact parameters used to generate the data.
+
To test your curve fitting code, create a synthetic data set with some added random noise:
Adding some noise to the simulated fluorescence signal will give a little bit of a challenge to the fitting algorithm. (Note: in the code below, simDNA is a stand-in for the model function with default parameters. Substitute your own model function with appropriate parameters.)
+
  
 
[[Image:Noisy Simulated DNA Melting Curve.png|thumb|right| Noisy simulated DNA melting curve.]]
 
[[Image:Noisy Simulated DNA Melting Curve.png|thumb|right| Noisy simulated DNA melting curve.]]
 
<pre>
 
<pre>
 +
dnaConcentration = 30E-6;
 +
deltaS = -500;
 +
deltaH = -180E3;
 
noise = 0.05;  % Roughly 5% noise
 
noise = 0.05;  % Roughly 5% noise
F = simDNA(Temp, struct); % Fill in with default parameters
+
F = DnaFraction( dnaConcentration, Temp + 273, deltaS, deltaH );
 
F = F + noise*randn(size(F));
 
F = F + noise*randn(size(F));
 
plot(time,F)
 
plot(time,F)
Line 158: Line 195:
 
     CI = nlparci(fitValues, residual, 'covar',COVB);
 
     CI = nlparci(fitValues, residual, 'covar',COVB);
 
</pre>
 
</pre>
where CI will be an N&times;2 array of values where N is the length of fitValues. It may informative to compute the normalized confidence intervals as a fraction of each fit value, e.g.,
+
where CI will be an N&times;2 array of values where N is the length of fitValues.
<pre>
+
    NormalizedCI = (CI(:,2) - CI(:,1))' ./ fitValues;
+
</pre>
+
  
 
{{Template:Assignment|message=Simulate a DNA melting curve and vary the noise magnitude from 0 to 0.05. At each noise level, fit the curve, and compute the confidence intervals.  Finally, plot the normalized confidence intervals as a function of the noise magnitude.}}
 
{{Template:Assignment|message=Simulate a DNA melting curve and vary the noise magnitude from 0 to 0.05. At each noise level, fit the curve, and compute the confidence intervals.  Finally, plot the normalized confidence intervals as a function of the noise magnitude.}}
 
==&part;''V<sub>f</sub>'' / &part;''T'' versus ''T'' plot==
 
[[Image:Delta Vf Delta theta versus theta.png|thumb|right| &part;''V<sub>f</sub>'' / &part; ''T'' versus ''T'']]
 
DNA melting curves are frequently used in high-throughput screens to rapidly and inexpensively search for SNPs and other variations. In these studies, the temperature at which the maximum value of the derivative &part;''V<sub>f</sub>'' / &part; ''T'' occurs is frequently used as an estimate of melting temperature. This provides a computationally inexpensive way to compare samples without finding &Delta;H&deg; and &Delta;S&deg;. Large datasets are also frequently visualized using differential plots &mdash; with one curve subtracted from another.
 
 
Computing derivatives can be problematic. In frequency space, differentiation is equivalent to multiplying the signal's transform by j2&pi;''f''. Differentiation amplifies the high-frequency content of the signal, which has the effect of emphasizing noise. Smoothing the data, an operation that reduces high frequencies, can help.
 
 
===Implementation===
 
Assuming the variables <code>temperature</code> and <code>fluorescence</code> are defined, the following Matlab code fragment computes &Delta;F/&Delta;T for the heating portion of the curve:
 
 
<pre>
 
dTemperature = diff(smooth(temperature, 30));
 
dFluorescence = diff(smooth(fluorescence, 30));
 
temperatureAxis = (temperature(1:(end-1)) + temperature(2:end)) ./ 2;
 
 
goodOnes = abs(dTemperature) > 0.005;
 
dFdT = dFluorescence(goodOnes) ./ dTemperature(goodOnes);
 
 
plot( temperatureAxis(goodOnes), dFdT);
 
</pre>
 
 
The Matlab [http://www.mathworks.com/help/curvefit/smooth.html smooth] command implements a boxcar filter for data smoothing with selectable length.
 
 
Adjust the threshold value and  to get a good curve. Discontinuities in fluorescence or temperature signals will produce large spikes in the derivative. Eliminate these before differentiating.
 
 
This method uses the block temperature, <math>T_{block}</math>, which is significantly different than the sample temperature, ''T<sub>sample</sub>''. In addition, this method may fail if melting occurred during the constant-temperature portion of the thermal cycle. (In this case, the cooling curve may give better results.) It is possible to rectify the temperature discrepancy by using ''T<sub>sample</sub>'' instead of ''T<sub>block</sub>''; however, the thermal transfer function parameters must be found by another means.
 
  
 
==Comparing the known and unknown samples==
 
==Comparing the known and unknown samples==
Line 199: Line 207:
 
% create simulated dataset
 
% create simulated dataset
 
noiseStandardDeviation = 0.5;
 
noiseStandardDeviation = 0.5;
meltingTemperature = [270 270 270 272 272 272 274 274 274 272 272 272] + noiseStandardDeviation * randn(1,12);
+
meltingTemperature = [270 270 270 272 272 272 274 274 274 272 272 272]
sampleName = {'20bp' '20bp' '20bp' '30bp' '30bp' '30bp' '40bp' '40bp' '40bp' 'unknown' 'unknown' 'unknown'};
+
      + noiseStandardDeviation * randn(1,12);
 +
sampleName = {'20bp' '20bp' '20bp' '30bp' '30bp' '30bp' '40bp' '40bp' '40bp'
 +
      'unknown' 'unknown' 'unknown'};
  
 
% compute anova statistics
 
% compute anova statistics
Line 210: Line 220:
  
 
[[Image:Multcompare output.png|right|thumb|Output of <code>multcompare</code> command.]]
 
[[Image:Multcompare output.png|right|thumb|Output of <code>multcompare</code> command.]]
The <code>multcompare</code> function generates a table of confidence intervals for each possible pair-wise comparison. You can use the table to determine whether the means of two samples are significantly different. Output of <code>multcompare</code> is shown at right. If your data has a lot of variation, you might have to use the options to reduce the confidence level. (Or there might not be a significant difference at all.)
+
The <code>multcompare</code> function generates a table of confidence intervals for each possible pair-wise comparison. You can use the table to determine whether the means of two samples are significantly different. Output of <code>multcompare</code> is shown at right. If your data has a lot of variation, you might have to use the options to reduce the confidence level. (Or there might not be a significant difference at all.) Note that <code>multcompare</code> has a default confidence level of 95% (alpha = 0.05). One way to assess how confident you are in your sample identification is by finding the lowest "alpha" value needed to identify your sample.
  
 
Consider devising a more sophisticated method that uses both the &Delta;H&deg; and &Delta;S&deg; values, instead.
 
Consider devising a more sophisticated method that uses both the &Delta;H&deg; and &Delta;S&deg; values, instead.

Latest revision as of 21:55, 17 August 2017

20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


Measured photodiode voltage, Vf,measured plotted versus block temperature, θblock along with model photodiode voltage, Vf,model.

Overview

In addition to the double stranded DNA concentration Cds(t), ΔH°, and ΔS°, the measured fluorescence voltage, Vf,measured(t), depends of several factors, including:

  • dynamics of the temperature cycling system
  • thermal quenching of the fluorophore
  • photobleaching
  • responsivity and offset of the instrument
  • binding kinetics of the dye

The goal is to write a model for Vf,measured that takes these effects into account and use nonlinear regression to estimate the parameters. The model proposed here adheres to Dr. George E. P. Box's excellent advice on modeling, in that it is both wrong and useful. Some of the assumptions are more dubious than others.

Sample temperature

The DNA melter measures the temperature of the metal heating block, not of the sample. Because the glass vial has high thermal resistance, the sample temperature significantly lags the heating block temperature. A simple, lumped-element circuit model can predict the sample temperature to within about 0.5°C based on the block temperature history.

The state variables in a thermal circuit model are temperature T and heat flow q. A fundamental assumption of lumped element models is that each element has a single value of both state variables. Because of this assumption, lumped element models only work well if the bodies you are modeling have a nearly uniform temperature. Components that are small in size, have high thermal conductivity, and have low heat transfer coefficients are most suitable for lumped element modeling. These attributes are summarized by the Biot number, Bi=hl/k, where h is the heat transfer coefficient, l is the characteristic length of the body (volume divided by surface area), and k is the thermal conductivity of the material. Bodies with Biot numbers less than 1 are good candidates for lumped element models. The Biot number for the heating block is on the order of 6x10-3, assuming h=50 W/m2 K, l=3 cm, and k=235 W/m K. This is much less than one, so the heating block is a good candidate for a lumped element model. The Biot number for the glass vial is about 0.03 (h=10.5 W/m2 K, l=3 mm, and k=1.05 W/m K).

The DNA melter measures the heating block's temperature directly. The heating block has a much higher thermal capacity than the sample, so a temperature source is a good model for the block.

The model neglects the heat capacity of the vial.

Lumped element model of thermal cycler.

The thermal circuit model for the heating system is shown in the figure on the right. The circuit is a first-order, low-pass filter. In this circuit, a step increase in block temperature causes the sample temperature to rise exponentially to a peak value of Kthermal with time constant τthermal.

The transfer function of the low-pass filter has two parameters:

$ \frac{\hat{T}_{sample}(\omega)} {\hat{T}_{block}(\omega)}=\frac{K_{thermal}}{j \omega \tau_{thermal} + 1} $.
  • Kthermal is the low frequency gain of the system.
  • τthermal is the time constant.
Block temperature, measured sample temperature, and model sample temperature. Sample temperature was measured by immersing an RTD in the DNA solution.
$ V_{f,measured} $ versus model sample temperature, $ \theta_{sample} $.

Implementation

MATLAB includes several functions for simulating continuous-time, linear, shift-invariant (CTLSI) systems. You can write the transfer function of a CTLSI system as the ratio of two polynomials in $ s $, where $ s=j\omega $. For example, the transfer function of a low-pass filter can be expressed in the form $ H(\omega)=\frac{1}{\tau s+1}=\frac{1}{\tau j \omega+1} $. The MATLAB function tf constructs a software object that represents a CTLSI transfer function. tf takes two arguments that specify the coefficients of the numerator and denominator polynomials. The arguments must be formatted as vectors containing the coefficients of $ s $ in order of decreasing power. For example, the transfer function $ \frac{1}{60 s + 1} $ can be created by:

>> exampleTransferFunction = tf( 1, [ 60 1 ] )
exampleTransferFunction =
 
     1
  --------
  60 s + 1
 
Continuous-time transfer function.

Several functions operate on continuous-time transfer function objects. For example bode and impulse generate bode and impulse response plots as expected. (Perhaps the Vice President of Inconsistent Naming at MathWorks was out sick the day those functions were named.) There is also a CTLSI simulator called lsim that will be useful for deriving the sample temperature from the block temperature using a lumped-element model of the thermal low-pass system realized by the heating block, glass vial, and sample. (Seems like the Director of Inconsistencies was back in the office when lsim got named.)

The SimulateLowPass MATLAB function below uses lsim to produce the simulated output of a low-pass filter given its time constant, a vector of input values, and an optional vector of time values. If supplied, the time vector must be the same length as the input vector (i.e. each time value corresponds to one input value). If no time vector is passed, the function generates a time vector assuming that the input values are sampled uniformly at ten times per second, starting at $ t=0 $. The DNA melter software produces ten samples per second.

function SimulatedOutput = SimulateLowPass( TimeConstant, InputData, TimeVector )
    if( nargin < 3 )
        TimeVector = (0:(length(InputData)-1))/10;
    end

    transferFunction = tf( 1, [TimeConstant, 1] );

    initalTemperature = InputData(1);
    InputData = InputData - initalTemperature;

    SimulatedOutput = lsim( transferFunction, InputData, TimeVector' ) + initalTemperature;
end

The function subtracts off the initial value before running the simulation and than adds the initial value back in after the simulation. This is a cheesy way to handle initial conditions — setting the temperature zero point to be equal to the initial temperature. This is equivalent to assuming that the block has been at its initial temperature during the time interval $ t = [ -\infty, 0 ) $.

Photobleaching

Excited dye molecules can react chemically with compounds in their environment and permanently lose their ability to fluoresce — a phenomenon called photobleaching. LED and ambient light illuminate the LC Green dye for 15 or more minutes over the course of a single melting and annealing cycle, which results in measurable photobleaching. The effect of photobleaching can be approximated by a mathematical model.

Assumptions

  • The dye is divided into two populations, bleached and unbleached, with concentrations S and S
  • The initial dye concentration is S + S.
  • Only dye molecules bound to dsDNA may be excited.
  • Only excited fluorophore molecules bleach.
  • An excited fluorophore will either bleach with probability p or return to the ground state with probability 1-p.
    • The constant 1-p encompasses all of the mechanisms by which the fluorophore returns to the ground state unbleached, including fluorescence, phosphorescence, and non-radiative relaxation.
  • Bleaching is irreversible.
  • The number of molecules in the excited state is proportional to fluorescence.
  • Bleached and unbleached molecules bind to dsDNA with the same affinity.

Model

Under these assumptions, the bleaching rate is proportional to fluorescence.

$ \frac{\partial \bar{S}}{\partial t} = K_{bleaching} f(t) $

Kbleaching encompasses several constants, including $ p $, illumination intensity, optical gain of the instrument, and lock-in amplifier gain.

Setting the initial dye concentration to 1, the fraction of unbleached LC Green can be calculated by integrating:

$ S(t)=1-\bar{S}(t)=1-K_{bleaching}\int_0^t f(t) \mathrm{d}t $

Implementation

A simple way to approximate an integral in discrete time is a cumulative sum: $ \int_0^t f(t) \approx \sum_{n=1}^{f F_s}f(n T_s) $, where $ F_s $ is the sample rate. This works particularly well when the signal changes slowly relative to the sample period. It's a good bet that the bleaching time constant is much longer than the 0.1 s sample period. MATLAB has a built-in function for computing cumulative sums called cumsum. Here is an example:

>> cumsum( 1:10 )
ans =
     1     3     6    10    15    21    28    36    45    55

cumsum will generate a very good numerical approximation to $ \int_0^t f(t) \mathrm{d}t $.

Thermal quenching

LC Green I fluoresces less efficiently at higher temperatures. The decrease in fluorescence is approximately linear. Defining the dye efficiency at the initial temperature to be 1, quenching can be modeled by:

$ \left . Q(t)=1-K_{quench}[T_{sample}(t)-T_{sample}(0)]\right . $

Instrument gain and offset

There are several optical and electronic gain factors built into the DNA melter that determine the instrument responsively, ∂Vf,measured / ∂ Cds. In addition, Vf,measured may be offset from zero when Cds concentration is zero.

  • Kgain is equal to the change in Vf,measured divided by the change in fraction of dsDNA. This is assumed to be constant.
  • Koffset is equal to the value of Vf,measured when the dsDNA concentration is zero.

Expression for model Vf,model

$ \left . V_{f,model}(t) = K_{gain} S(t) Q(t) C_{ds}(T_{sample}(t), \Delta H^\circ, \Delta S^\circ) + K_{offset}\right . $

Cds is the model melting curve produced by the DnaFraction function from part 1 of the lab. Differences between the model and the observations are precisely visualized on a residual plot, described below.

Finding double stranded DNA fraction from raw data

Corrected DNA data.png

The inverse function of the melting model with respect to Vf,measured(t) is helpful to visualize discrepancies between the model and experimental data caused by random noise in Vf,measured and systematic error in the model Vf,model. The function,

$ C_{ds,inverse-model}(V_{f,measured}(t)) = \frac{V_{f,measured}(t) - K_{offset}} {K_{gain} S(t) Q(t)} $,

is itself a model. This model estimates the concentration of double stranded DNA based on the observations $ V_{f,measured}(t) $ and the models for bleaching and quenching.

The estimated melting curve may be directly compared with simulations, measurements or other predictions of the true melting curve. The plot at right shows an example of Cds,inverse-model(t) versus Tsample(t). The estimated melting curve is shifted to the right compared to the simulated melting curve, possibly due to systematic error in the sample temperature model. The estimated melting curve also serves as a comparison to the thermodynamic model developed in DNA Melting Thermodynamics, or to any other independent measurement or model of the melting curve, i.e., the concentration of dsDNA vs sample temperature.

Residual plot

Residuals plotted versus time, temperature, fluorescence, and the cumulative sum of fluorescence

Observed values differ from predicted values because of noise and systematic errors in the model. Residuals are the difference between experimental observations and model predictions, Vf,modelVf,measured. Ideally, the residuals should be random and identically distributed.

The plots at right show Vf,modelVf,measured, versus temperature, time, fluorescence, and the cumulative sum of fluorescence. The residuals are clearly not random and identically distributed. This suggests that the model does not perfectly explain the observations. The scale of the plot is much smaller than the data plot — about one percent of the data scale.

A perfect model might require dozens of added parameters and additional physical measurements.

Plotting the residuals versus different variables can help suggest what factors are not modeled well.

Testing the fitting algorithm

The input to the model function, Vf,model, is the block temperature (from which sample temperature is derived) and the model parameters (to be determined). The model function is suitable for use with the Matlab nlinfit function (or other fitting routines). The input to nlinfit will then be the block temperature, fluorescence signal, the model function, and initial values of the model parameters. The output of nlinfit are the fitted parameter values.

In Part II, the block temperature will be controlled to produce a ramp up from room temperature to 95 degrees then back down to room temperature.

Model block temperature.
% generate temperature ramp
Tmin = 25;
Tmax = 95;
Temp = [ (0:0.1:900)*((Tmax-Tmin)/900) + Tmin, Tmax*ones(1,1200), ...
    (0.1:0.1:900)*((Tmin-Tmax)/900) + Tmax, Tmin*ones(1,1200) ];
time = (1:length(Temp))*0.1;
plot(time,Temp)
xlabel('Time, seconds')
ylabel('Temperature, deg C')
title('Model block temperature')

To test your curve fitting code, create a synthetic data set with some added random noise:

Noisy simulated DNA melting curve.
dnaConcentration = 30E-6;
deltaS = -500;
deltaH = -180E3;
noise = 0.05;  % Roughly 5% noise
F = DnaFraction( dnaConcentration, Temp + 273, deltaS, deltaH );
F = F + noise*randn(size(F));
plot(time,F)
xlabel('Temperature, deg C');
ylabel('Fluorescence Signal');
title('Simulated DNA Melting Curve');

Even at a relatively large amplitude, random noise by itself should not appreciably affect the fitted parameter values; however, noise will generally contribute to uncertainty in the fit values. This uncertainty is seen in the parameter confidence intervals. Use the Matlab function nlparci to obtain the 95% confidence intervals from the results returned by nlinfit, e.g.,

    [fitValues, residual, ~, COVB, ~] = nlinfit(Temp, F, fitFunc, beta0, fitOptions);
    CI = nlparci(fitValues, residual, 'covar',COVB);

where CI will be an N×2 array of values where N is the length of fitValues.


HWassignment.jpg Simulate a DNA melting curve and vary the noise magnitude from 0 to 0.05. At each noise level, fit the curve, and compute the confidence intervals. Finally, plot the normalized confidence intervals as a function of the noise magnitude.


Comparing the known and unknown samples

One possible way to compare the unknown sample to the three knowns is to use Matlab's anova and multcompare functions. Anova takes care of the difficulties that arise when comparing multiple sample means using Student's t-test.

The following code creates a simulated set of melting temperatures for three known samples and one unknown. In the simulation, each sample was run three times. The samples have melting temperatures of 270, 272, and 274 degrees. The unknown sample has a melting temperature of 272 degrees. Random noise is added to the true mean values to generate simulated results. Try running the code with different values of noiseStandardDeviation.

% create simulated dataset
noiseStandardDeviation = 0.5;
meltingTemperature = [270 270 270 272 272 272 274 274 274 272 272 272]
      + noiseStandardDeviation * randn(1,12);
sampleName = {'20bp' '20bp' '20bp' '30bp' '30bp' '30bp' '40bp' '40bp' '40bp'
      'unknown' 'unknown' 'unknown'};

% compute anova statistics
[p, table, anovaStatistics] = anova1(meltingTemperature, sampleName);

% do the multiple comparison
[comparison means plotHandle groupNames] = multcompare(anovaStatistics);
Output of multcompare command.

The multcompare function generates a table of confidence intervals for each possible pair-wise comparison. You can use the table to determine whether the means of two samples are significantly different. Output of multcompare is shown at right. If your data has a lot of variation, you might have to use the options to reduce the confidence level. (Or there might not be a significant difference at all.) Note that multcompare has a default confidence level of 95% (alpha = 0.05). One way to assess how confident you are in your sample identification is by finding the lowest "alpha" value needed to identify your sample.

Consider devising a more sophisticated method that uses both the ΔH° and ΔS° values, instead.








</div>