Difference between revisions of "Nonlinear regression"

From Course Wiki
Jump to: navigation, search
Line 1: Line 1:
 
{{Template:20.309}}
 
{{Template:20.309}}
  
 +
<blockquote>
 +
<div>
 +
''… the safe use of regression requires a good deal of thought and a good dose of skepticism''
 +
 +
&mdash; [http://mitsloan.mit.edu/faculty/detail.php?in_spseqno=41132 Arnold Barnett]
 +
</div>
 +
</blockquote>
 +
 +
 +
==Review of linear regression==
 +
 +
Linear Regression is a method for finding the magnitude of the relationship between two variables that co-vary. The technique assumes that a straight line characterizes the relationship between the two quantities: 𝑦=𝛽π‘₯+𝛼, where 𝛽 is the true slope and 𝛼 is the true intercept. If two points on the line, (<i>x<sub>i</sub></i>, <i>y<sub>i</sub></i>); <i>i</i> = {1,2}, are precisely known, solving for the exact values of 𝛼 and 𝛽 is trivial. Unfortunately, all physical measurements include noise. The presence of noise precludes finding the exact values of 𝛼 and 𝛽.
 +
 +
The function of linear regression is to produce estimates of 𝛼 and 𝛽, denoted by <i>&alpha;&#770;</i> and <i>&beta;&#770;</i>, from a sample of N value pairs (<i>x<sub>i</sub></i>, <i>y<sub>i</sub></i>); <i>i</i> = {1, ..., N} that includes noise in the <i>y</i>-values. Thus, the samples can be modeled by adding a noise term, <i>&epsilon;<sub>i</sub>x</i>, to the right side of the equation: <i>y<sub>i</sub></i>=<i>&Beta;<sub>i</sub>x</i>+<i>&alpha;</i>+<i>&epsilon;<sub>i</sub>x</i>. The most common regression model assumes that x is known exactly. In practice, regression works well if the relative magnitude of noise in <i>x</i> is much smaller than <i>y</i>.
 +
 +
 +
The most common type of LR minimizes the value of the squared vertical distances between observed and predicted values
 +
Model :
 +
Assumptions:
 +
the independent variable π‘₯ is known with certainty (or at least very much less error than 𝑦)
 +
πœ€ is an independent, random variable with πœ‡=0
 +
The distribution of πœ€ is symmetric around the origin
 +
the likelihood of large errors is less than small ones
 +
Uncertainty in slope estimate
 +
The error in slope π‘Š=π›½Β Μ‚βˆ’π›½
 +
Variance of  π‘Š characterizes slope error
 +
You can calculate a 95% (or other significance level) confidence interval for 𝛽 ̂
 +
What factors should the uncertainty depend on?
 +
Estimate 𝜎^2 (π‘Š): 𝑉^2 (π‘Š)=(βˆ‘β–’γ€–π‘Ÿ_𝑖^ γ€—^2 )/((π‘βˆ’2)βˆ‘β–’γ€–(π‘₯_π‘–βˆ’π‘₯Β Μ…)γ€—^2  )
 +
N-2 is a β€œpenalty” because regression line minimizes variance of residuals
 +
If the interval contains 0, the null hypothesis that 𝛽=0 cannot be rejected
 +
 +
 +
 +
'''''Step 1: PLOT THE DATA'''''
 +
 +
===Examine the residuals===
 +
 +
* plot 'em for an informal look
 +
* various tests of residuals exist
 +
 +
 +
==Overview of nonlinear regression==
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
 
|[[Image:Regression block diagram.png|center|700px]]
 
|[[Image:Regression block diagram.png|center|700px]]
Line 6: Line 49:
 
|Block diagram of nonlinear regression
 
|Block diagram of nonlinear regression
 
|}
 
|}
 +
 +
==Practical nonlinear regression==
  
 
{{Template:20.309 bottom}}
 
{{Template:20.309 bottom}}

Revision as of 18:45, 5 January 2013

20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


… the safe use of regression requires a good deal of thought and a good dose of skepticism

Arnold Barnett


Review of linear regression

Linear Regression is a method for finding the magnitude of the relationship between two variables that co-vary. The technique assumes that a straight line characterizes the relationship between the two quantities: 𝑦=𝛽π‘₯+𝛼, where 𝛽 is the true slope and 𝛼 is the true intercept. If two points on the line, (xi, yi); i = {1,2}, are precisely known, solving for the exact values of 𝛼 and 𝛽 is trivial. Unfortunately, all physical measurements include noise. The presence of noise precludes finding the exact values of 𝛼 and 𝛽.

The function of linear regression is to produce estimates of 𝛼 and 𝛽, denoted by α̂ and β̂, from a sample of N value pairs (xi, yi); i = {1, ..., N} that includes noise in the y-values. Thus, the samples can be modeled by adding a noise term, εix, to the right side of the equation: yi=Βix+α+εix. The most common regression model assumes that x is known exactly. In practice, regression works well if the relative magnitude of noise in x is much smaller than y.


The most common type of LR minimizes the value of the squared vertical distances between observed and predicted values Model : Assumptions: the independent variable π‘₯ is known with certainty (or at least very much less error than 𝑦) πœ€ is an independent, random variable with πœ‡=0 The distribution of πœ€ is symmetric around the origin the likelihood of large errors is less than small ones Uncertainty in slope estimate The error in slope π‘Š=π›½Β Μ‚βˆ’π›½ Variance of π‘Š characterizes slope error You can calculate a 95% (or other significance level) confidence interval for 𝛽 ̂ What factors should the uncertainty depend on? Estimate 𝜎^2 (π‘Š): 𝑉^2 (π‘Š)=(βˆ‘β–’γ€–π‘Ÿ_𝑖^ γ€—^2 )/((π‘βˆ’2)βˆ‘β–’γ€–(π‘₯_π‘–βˆ’π‘₯Β Μ…)γ€—^2 ) N-2 is a β€œpenalty” because regression line minimizes variance of residuals If the interval contains 0, the null hypothesis that 𝛽=0 cannot be rejected


Step 1: PLOT THE DATA

Examine the residuals

  • plot 'em for an informal look
  • various tests of residuals exist


Overview of nonlinear regression

Regression block diagram.png
Block diagram of nonlinear regression

Practical nonlinear regression