Difference between revisions of "DNA melting: Identifying the unknown sample"

From Course Wiki
Jump to: navigation, search
(Evaluating the hypotheses)
(Evaluating the hypotheses)
Line 22: Line 22:
  
 
==Evaluating the hypotheses==
 
==Evaluating the hypotheses==
Student’s t-test offers a method for assigning a numerical degree of confidence to each null hypothesis. Essentially, the test considers the entire universe of possible outcomes of your experimental study. Imagine that you repeated the study an infinite number of times. (This may not be hard for you to imaging.) Repeating the study ''ad infinitum'' will elicit all possible outcomes of <math>E_{j,i}</math>. The t-test categorizes each outcome into one of two realms: those that are more favorable to the null hypothesis (i.e., <math>O</math> is closer to <math>M</math> than the result you got); and those that are less favorable (<math>O</math> is farther from <math>M</math> than the result you got).  
+
Student’s t-test offers a method for assigning a numerical degree of confidence to each null hypothesis. Essentially, the test considers the entire universe of possible outcomes of your experimental study. Imagine that you repeated the study an infinite number of times. (This may not be hard for you to imagine.) Repeating the study ''ad infinitum'' will elicit all possible outcomes of <math>E_{j,i}</math>. The t-test categorizes each outcome into one of two realms: those that are more favorable to the null hypothesis (i.e., <math>O</math> is closer to <math>M</math> than the result you got); and those that are less favorable (<math>O</math> is farther from <math>M</math> than the result you got).  
  
 
The t-test can be summarized by a p-value, which is equal to the percentage of possible outcomes that is less favorable to the null hypothesis than the result you got. A low p-value means that there are not many possible results less favorable to the null hypotheses than the one you got and many results that are more favorable. So it's probably reasonable to reject the null hypothesis.  Rejecting the null hypothesis means that the means are likely not the same.  
 
The t-test can be summarized by a p-value, which is equal to the percentage of possible outcomes that is less favorable to the null hypothesis than the result you got. A low p-value means that there are not many possible results less favorable to the null hypotheses than the one you got and many results that are more favorable. So it's probably reasonable to reject the null hypothesis.  Rejecting the null hypothesis means that the means are likely not the same.  

Revision as of 21:55, 5 May 2015

20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


Overview

In the DNA lab, you had four samples. Each sample had a true melting temperature $ M_j $ (where $ j $ is an integer from 1 to 4). The instructors told you that the fourth sample was identical to one of the other three samples. Therefore, the unknown should have exactly the same melting temperature as sample 1, 2, or 3. Your job was to figure out which one matched the unknown.

Measurement procedure

Most groups measured each sample group in triplicate: $ N=3 $. (Some special students did something a little bit different.) This resulted in 12 observations, $ O_{i,j} $, where $ j $ is the sample group and $ i $ is the experimental trial number — an integer from 1 to 3. The majority of lab groups calculated the average melting temperature of each sample group, $ \bar{O_i} $ and guessed that sample 4 matched whichever of the known samples had the closest melting temperature.

Seems reasonable.

Measurement uncertainty

The observations included measurement error, $ O_{i,j}=M_j+E_{i,j} $. The presence of measurement error leads to the possibility that an unfortunate confluence of error terms might cause you to misidentify the unknown sample. It’s not hard to imagine which factors tend to increase the likelihood of such an unfortunate fluke: the true means are close together, or the error terms are large. To get a handle on the possibility that your results were total crap due to bad luck alone (not incompetence), it is necessary to have some kind of model for the distribution of the error terms. How about this? The error terms are normally distributed with mean $ \mu=0 $ and standard deviation $ \sigma $. (Note that the error distribution among all of the sample groups is the same.)

Within the confines of this model, it is possible to estimate the chance that your result was a fluke. There are 6 possible pairwise hypotheses to test:

  1. $ M_4\stackrel{?}{=}M_1 $
  2. $ M_4\stackrel{?}{=}M_2 $
  3. $ M_4\stackrel{?}{=}M_3 $
  4. $ M_1\stackrel{?}{=}M_2 $
  5. $ M_1\stackrel{?}{=}M_3 $
  6. $ M_2\stackrel{?}{=}M_3 $

Evaluating the hypotheses

Student’s t-test offers a method for assigning a numerical degree of confidence to each null hypothesis. Essentially, the test considers the entire universe of possible outcomes of your experimental study. Imagine that you repeated the study an infinite number of times. (This may not be hard for you to imagine.) Repeating the study ad infinitum will elicit all possible outcomes of $ E_{j,i} $. The t-test categorizes each outcome into one of two realms: those that are more favorable to the null hypothesis (i.e., $ O $ is closer to $ M $ than the result you got); and those that are less favorable ($ O $ is farther from $ M $ than the result you got).

The t-test can be summarized by a p-value, which is equal to the percentage of possible outcomes that is less favorable to the null hypothesis than the result you got. A low p-value means that there are not many possible results less favorable to the null hypotheses than the one you got and many results that are more favorable. So it's probably reasonable to reject the null hypothesis. Rejecting the null hypothesis means that the means are likely not the same.

In most circumstances, the experimenter chooses a significance level such as 10% or 5% or 1% in advance of examining the data. Another way to think of this: if you chose a significance level of 5% and repeated the study 100 times, you would expect to get the wrong answer because of bad luck on 5 occasions.

If all went well, two of the first 3 null hypotheses were rejected and null hypotheses 4-6 were all rejected. For example, if the unknown was the same as sample 3, null hypotheses 1, 2, 4, 5, and 6 ought to have been rejected.

Which unknown?

A problem comes up when you compare multiple means using the t-test. For example, if you chose a significance level for each t-test of 5%, there would be a 30% chance of incorrectly rejecting at least one null hypothesis somewhere among the 6 comparisons. The multcompare function in MATLAB implements a correction to the t-test procedure to assure that the family-wise error rate (FWER) is less than the significance level. In other words, the chance of any hypothesis being incorrectly rejected due to bad luck is less than the FWER. There is an optional argument to multcompare that lets you set the FWER.

Some people argued that only the first three hypotheses are relevant to the question at hand. Maybe. But imagine what the defense counsel would say if it turned out that only hypotheses 1 and 2 were rejected and hypotheses 4-6 were not. You would testify that the murderer must be suspect number 3. But the defense would argue, "how can you say it's number 3 if you can't even tell suspect 3 from 1 or 2?" It would be unconvincing to present a result that implicated suspect 3 unless at least 4 of the 6 hypotheses were rejected (numbers 1, 2, 5, and 6).

A few people also argued that the p-value of hypothesis 3 is a good measure of confidence. This is not correct. Consider the case where hypothesis 2 has a p-value of 5.1% (so it is rejected) and hypothesis 3 has a p-value of 4.9%. Clearly, you would have less confidence that the unknown is sample 3 than if hypothesis 2 had a p-value of 95%.

If you used multcompare in your analysis, a good measure of confidence is the FWER that you chose. Since all the hypotheses may not be required to uniquely identify the unknown, the FWER is slightly conservative. So what. You required more things to be true than you strictly needed to. But it is likely that you would have gained very little by removing the unnecessary hypothesis from consideration. It is even more likely that the error terms did not perfectly satisfy the assumptions of test, so your calculation is at best an approximation of the possibility of this type of error.

You can probably think of ways the simple error model we relied on might be deficient. For example, there is no attempt to include any kind of systematic error. If there were significant systematic error sources in your experiment, your estimate of the chance that an unlucky accident happened may be very far from the truth. Because most real experiments do not perfectly satisfy the assumptions of the test, it is usually ridiculous to report an extremely small p-value. (That doesn't stop people from doing it, though.)