Difference between revisions of "DNA melting: Identifying the unknown sample"

From Course Wiki
Jump to: navigation, search
Line 3: Line 3:
 
In the DNA lab, you had four samples. Each sample has a true melting temperature <math>M_j</math> (where <math>j</math> is an integer from 1 to 4). The instructors told you that the fourth sample is identical to one of the other three samples. Therefore, the unknown should have exactly the same melting temperature as sample 1, 2, or 3. Your job was to figure out which one matched the unknown.
 
In the DNA lab, you had four samples. Each sample has a true melting temperature <math>M_j</math> (where <math>j</math> is an integer from 1 to 4). The instructors told you that the fourth sample is identical to one of the other three samples. Therefore, the unknown should have exactly the same melting temperature as sample 1, 2, or 3. Your job was to figure out which one matched the unknown.
  
Most groups measured each sample group in triplicate: <math>N=3</math>. (Some people did something slightly different.) This resulted in 12 observations, <math>O_{i,j}</math>, where <math>j</math> is the sample group and <math>i</math> is the experimental trial number &mdash; an integer from 1 to 3. The majority of lab groups calculated the average melting temperature of each sample group, <math>\bar{O_i}</math> and guessed that <math>M_4</math> was the same as whichever of the known samples had the closest melting temperature.
+
Most groups measured each sample group in triplicate: <math>N=3</math>. (Some special students did something a little bit different.) This resulted in 12 observations, <math>O_{i,j}</math>, where <math>j</math> is the sample group and <math>i</math> is the experimental trial number &mdash; an integer from 1 to 3. The majority of lab groups calculated the average melting temperature of each sample group, <math>\bar{O_i}</math> and guessed that <math>M_4</math> was the same as whichever of the known samples had the closest melting temperature.
  
 
Seems reasonable.  
 
Seems reasonable.  

Revision as of 19:27, 5 May 2015

20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


In the DNA lab, you had four samples. Each sample has a true melting temperature $ M_j $ (where $ j $ is an integer from 1 to 4). The instructors told you that the fourth sample is identical to one of the other three samples. Therefore, the unknown should have exactly the same melting temperature as sample 1, 2, or 3. Your job was to figure out which one matched the unknown.

Most groups measured each sample group in triplicate: $ N=3 $. (Some special students did something a little bit different.) This resulted in 12 observations, $ O_{i,j} $, where $ j $ is the sample group and $ i $ is the experimental trial number — an integer from 1 to 3. The majority of lab groups calculated the average melting temperature of each sample group, $ \bar{O_i} $ and guessed that $ M_4 $ was the same as whichever of the known samples had the closest melting temperature.

Seems reasonable.

Your observations included measurement error, $ O_{i,j}=M_j+E_{i,j} $. The presence of measurement error leads to the possibility that an unfortunate confluence of error terms might cause you to misidentify the unknown sample. It’s not hard to imagine what factors tend to increase the likelihood of such an unfortunate fluke: the true means are close together, or the error terms are large. To get a handle on the possibility that your results were crap due to bad luck (not incompetence), it is necessary to know something about the distribution of the error terms. How about this? The error terms are normally distributed with mean $ \mu=0 $ and standard deviation $ \sigma $. (Note that the error distribution among all of the sample types is the same.) Within the confines of this model, it is possible to estimate the chance that your result was a fluke.

There are 6 possible pairwise hypotheses to test: $ M_4\stackrel{?}{=}M_1 $, $ M_4\stackrel{?}{=}M_2 $, $ M_4\stackrel{?}{=}M_3 $, $ M_1\stackrel{?}{=}M_2 $, $ M_1\stackrel{?}{=}M_3 $, $ M_2\stackrel{?}{=}M_3 $. If all goes well, one of the first 3 hypotheses is accepted and the other 5 are rejected. Student’s t-test offers a method for assigning a numerical degree of confidence in to each hypothesis. Essentially, the test considers all the possible outcomes of the experimental study. Imagine if you repeated the study an infinite number of times, you would obtain all possible outcomes of $ E_{j,i} $. These outcomes are divided into two realms: those that are more favorable to the null hypothesis (i.e., $ O $ is closer to $ M $ than the result you got); and those that are less favorable ($ O $ is farther from $ M $ than the result you got). The t-test can be summarized by a p-value, which is the percentage of all possible outcomes that are less favorable than the result you got. A low p-value means that there are few possible results less favorable to the null hypotheses, so it's probably a good idea to reject it. In most circumstances, the experimenter chooses a significance level such as 10% or 5% or 1% in advance of examining the data.

hypothesis that two populations have the same mean, μab. The test produces a number called a p-value.  To interpret the p-value, consider all the possible outcomes of your experiment — if you repeated the procedure an infinite number of times 

You can probably think of some ways this simple model might be deficient. For example, there is no attempt to include any kind of systematic error. If there were significant systematic error sources in your experiment, your estimate of the chance that an unlucky accident happened may be very far from the truth. Given the true mean values

A simple and reasonable assumption about the distribution of the error terms is that they Okay. It’s a model. Now what?

There are 6 possible pairwise comparisons: M<

S

There are six possible pairwise hypotheses that can be tested.

In order to quantify the uncertainty of your conclusion, The analysis most of you did assumes that 
known sample had the closest melting temperature. Under the assumed error model, this is reasonable. All the sample means have the same uncertainty (if you repeated each the samples the same number of times). The error model assumes that all the error terms are drawn from the same normal distribution.

There are 6 possible comparisons