Assignment 9, Part 2: Simulating DNA melting data and testing the model function
This is Part 2 of Assignment 9.
Testing your model function
Now it's time to test your code. Just like in the particle tracking assignment, we will create some simulated data with a known functional form, run the code, then see whether the outputs match what you put in.
The input to the model function, V_{f,model}, is the block temperature (from which sample temperature is derived) and the model parameters (to be determined). In assignment 8, you upgraded your DNA melting instrument to control the block temperature (rather than just having it turn on and off). You produced a temperature ramp from room temperature up to 95 degrees, held it there for a bit, then cooled it back down to room temperature. Let's simulate this in MATLAB:
% generate temperature ramp Tmin = 25; Tmax = 95; blockTemperature = [ linspace( Tmin, Tmax, 900 * sampleFrequency ), ... Tmax * ones( 1, 1200 ), linspace( Tmax, Tmin, 900 * sampleFrequency ) ]; time = (1:length(blockTemperature))*0.1; plot(time, blockTemperature) xlabel('Time, seconds') ylabel('Temperature, deg C') title('Model block temperature')
Next, create a synthetic data set to mimic the DNA fluorescence. You'll need to choose some values for your initial parameters. Test your code to ensure that it outputs a reasonable fluorescence curve. The bleaching and quenching parameters are particularly sensitive, since large values can lead to unphysical results. It may be helpful to start with idealized parameters (for example, Kbleach = 0, Kgain = 1, etc.), and then change the parameters individually until you obtain a realistic curve.
dnaConcentration = 60E-6; deltaS = -500; deltaH = -180E3; kThermal = 0.95; kGain = ... % choose some reasonable values for your input parameters: F = Vfmodel( kGain, kOffset, ... , blockTemperature ); plot(time, F) xlabel('Time (s)'); ylabel('Fluorescence Signal'); title('Simulated DNA Melting Curve');
Finally, and some random noise:
noise = 0.05; % Roughly 5% noise F = F + noise * kGain * randn(size(F)); plot(time, F) xlabel('Time (s)'); ylabel('Fluorescence Signal'); title('Simulated DNA Melting Curve');
Now that we have some synthetic data, we can test our model function and fitting algorithm. First, you must adapt the model function you wrote to be suitable for use with the Matlab nlinfit function (or other fitting routines). In order for the function to be compatible, the first input argument must be a vector containing all the parameters to be fitted, and the second argument must contain a vector/matrix of the independent variable(s). The following code can be used inline to define the fitting function:
DnaMeltingFitFunction = @(parameters,InputData) Vfmodel( parameters(1), parameters(2), ..., dnaConcentration, InputData );
Don't forget to include parameters for the instrument gain and offset, and $ \Delta H^\circ, \Delta S^\circ $, as well as the bleaching, quenching, and thermal time constants. There are two variables that should not be treated as fit parameters. These are the DNA concentration, since this value is known, and the thermal circuit gain Kthermal. Unfortunately, Kthermal and the overall instrument gain Kgain are coupled in our model function. Therefore, if we try to fit both parameters, only their product will be well defined, not each individual value. We can get around this issue by choosing a fixed value for Kthermal, and only fitting Kgain. Luckily, we have a decent estimate of Kthermal from the direct measurement of the sample vs. block temperature (see image at right). In this case, the best fit parameter was found to be Kthermal = 0.95.
The input to nlinfit will then be the independent variable (block temperature), the dependent variable that you are trying to fit (fluorescence), the model function, and initial values of the model parameters. Additionally, there are many options you can set for the nlinfit algorithm [described here]. The output of nlinfit are the fitted parameter values.
fitValues = nlinfit(blockTemperature, F, DnaMeltingFitFunction, beta0, fitOptions);
The initial parameters you choose (beta0) are essential for getting a reasonable output from nlinfit. In this particular case, we know exactly what the parameters should be. Try altering the parameters slightly from their known values to get a feel for what it means for your initial guess to be 'reasonable'. Think about how you would guess the parameters for your real data. Use these slightly altered parameters in the following plot.
The effects of noise on fit parameters
Even at a relatively large amplitude, random noise by itself should not appreciably affect the fitted parameter values; however, noise will generally contribute to uncertainty in the fit values. This uncertainty is seen in the parameter confidence intervals. Use the MATLAB function nlparci to obtain the 95% confidence intervals from the results returned by nlinfit, e.g.,
[fitValues, residual, ~, COVB, ~] = nlinfit(blockTemperature, F, DnaMeltingFitFunction, beta0, fitOptions); CI = nlparci(fitValues, residual, 'covar',COVB);
where CI will be an N×2 array of values where N is the length of fitValues, and the two columns represent the upper and lower bound of the confidence interval. For example, if one of your parameters from fitValues is $ \alpha = 30.0903 $, and the corresponding row in CI is: [29.8243, 30.3563] what this really means is that if you were to repeat your experiment over and over, 95% of the time you will find a value for $ \alpha $ within the range of 29.8 to 30.4. A compact way to report this result is as follows: $ \alpha = 30.1 \pm 0.3 $ (in whatever units $ \alpha $ has). (Note the confidence intervals are not necessarily symmetric for all types of statistics. But if the statistics are gaussian, (which we're already assuming) then it should be equally likely to measure a parameter above or below the mean.) Uncertainty is typically reported with one or two significant figures. Round the uncertain quantity to the same decimal place as the uncertainty.
Calculate the uncertainty for each parameter from the fit you performed above. Fill in the following table, making sure to include appropriate units and significant figures. |
Parameter name | Input parameter value | Best fit parameter value | Best fit parameter uncertainty |
... |
To see the effects of noise on uncertainty, simulate a DNA melting curve and vary the noise magnitude from 0 to 0.05. At each noise level, fit the curve, and compute the uncertainty for each parameter.
Plot the normalized uncertainty for each fit parameter as a function of the noise magnitude. Which parameter appears to be the most sensitive to noise? the least? |
The ultimate goal of the DNA melting lab assignments is to identify an unknown sample of DNA by comparing it to three other samples. One thing you might notice while fitting your data is the values of $ \Delta H $ and $ \Delta S $ are rather unconstrained. The melting temperature, T_{m}, however, is much better defined, and will be a more robust metric to use to compare your data to one-another.
Using the expressions we derived in class for dsDNA concentration at a given temperature:
Derive an expression for the melting temperature, $ T_m $in terms of $ \Delta H $ and $ \Delta S $. The melting temperature is defined as the temperature for which the fraction of double stranded DNA equals 0.5. Calculate the melting temperature for each noise level from 0.01 to 0.05. |
- Assignment 9 Overview
- Part 1: model function
- Part 2: test your code with simulated data
- Part 3: fitting your data