Lixiaoxu

Membre des de 6 jul 2009
 
(Hi ha 4 revisions intermèdies del mateix usuari que no es mostren)
Línia 165: Línia 165:
cat('\nraw data')
cat('\nraw data')
if (rawdata) (x);
if (rawdata) (x);
</pre>
==方差分析教学课件==
理解ANOVA是在作回归
===总体与抽样次数设定===
ANOVA在R里头和回归是同一个东西,几句话讲完。麻烦的是模拟数据:
<Rform name="data">
所有男生(所谓总体)用中文卷的智商均值是:<br/>
  <input name="xmc" value="105" size="8"/><br/>
所有女生(所谓总体)用中文卷的智商均值是:<br/>
  <input name="xfc" value="101" size="8"/><br/>
所有男生(所谓总体)用英文卷的智商均值是:<br/>
  <input name="xme" value="80" size="8"/><br/>
所有女生(所谓总体)用英文卷的智商均值是:<br/>
  <input name="xfe" value="90" size="8"/><br/>
抽样分成男中、女中、男英、女英四组,每组内抽的人数是:<br/>
  <input name="n" value="5" size="8"/><br/><br/>
<input type="submit" name="submit" value="点击按键重复抽样,更新如下数据和结果"/><br/>
智商的抽样误差(不一定是误出来的,但一定是抽出来的)满足正态分布,还假定已知各组内的总体标准差相同。这些抽样误差彼此独立,每次实验都重新变化。
组内总体标准差为:<br/>
  <input name="sigma" value="10" size="8"/><br/><br/>
</Rform>
===可观测的数据及其影响的统计结论===
每个观测到的智商成绩都包含随机的成分,所以实际观测到的数据不是各组总体的均值。
<R name="data" iframe="height:1800px;">
s_group = as.factor(c('Male','Male','Female','Female'));
q_group = as.factor(c('Chinese ',' English','Chinese ',' English'));
xmc = ifelse(exists("xmc"),as.numeric(xmc),105);
xme = ifelse(exists("xme"),as.numeric(xme),80);
xfc = ifelse(exists("xfc"),as.numeric(xfc),101);
xfe = ifelse(exists("xfe"),as.numeric(xfe),90);
sigma = ifelse(exists("sigma"),as.numeric(sigma),10);
x_true_group = c(xmc,xme,xfc,xfe);
##每组找n(=10)个被试;
n = ifelse(exists("n"),as.integer(n),10);
s = rep(s_group,each=n); q =rep(q_group,each=n);x_true = rep(x_true_group,each=n);
## 但是每个被试都有随机抽样带来的偏差,标准差是sigma
error = rnorm(n*4,mean=0,sd=sigma);
x_observe = x_true + error;
## 所以最终看到的是如下
cat('<font color="blue">')
data.frame(s,q,x_observe,x_true)
cat('</font>')
cat('\n最后一列总体值看不到\n想研究智商成绩在性别、语言上的区别\n')
cat('也就是性别、语言对智商能不能部分地预测\n')
cat('\n回归的表述:成绩=常数截距+性别预测增值+语言预测增值+二者交互增值+预测残差\n')
cat('\n先理解一下真实的参数值,四组的总体\n')
(lm(x_true ~ 0 + s:q))
cat('\n先理解一下真实的参数值,四组的总体差异被拆解成三个差异分项\n')
(lm(x_true ~ 1 + s + q + s:q))
cat('\n基于观测值的预测会受到随机误差扰动,\n')
cat('\n四组组内均值的观测\n')
lm(x_observe ~ 0 + s:q)
cat('\n四组三个差异分项的观测\n')
(model = lm(x_observe ~ 1 + s + q + s:q))
cat('\n考虑这种扰动之后,各个预测的95%把握置信区间上下界\n')
cat('如果这个区间不包括0,表示有大于95%的把握确信这个系数有预测作用,无论大小。\n但是,交互项的存在使得解读变得困难。比如,男生-女生的效应包括sMale + (1/2) sMale:qChinese\n')
confint(model)
cat('\n<font color="red">ANOVA的表述:成绩的波动=性别预测增值的波动+语言预测增值的波动+残差波动\n')
cat('“波动”的操作化定义是Sum of Squares(SS),一列数与其均值的差距平方和\n')
cat('Df表示每个波动项吸收抽样误差波动的理论比例\n')
cat('Sum Sq表示每个波动项解释波动的观测比例,\n如果和Df的理论比例极端不匹配(F比1大很多),就支持其中包含由不同总体带入的非随机波动\n')
cat('最后一列Pr(>F)是一般的假设检验中反映F极端程度的p值\n\n')
anova(model)
cat('\n</font>')
</R>
===R代码===
把下面的代码copy paste到R中,或者将语句贴到[http://pbil.univ-lyon1.fr/Rweb/ Rweb](中文UTF-8码),结合#号后的注解理解方差分析
<pre>
## 数据:智商抽样,error是个体随机偏差(学名抽样误差,不一定是误出来的,但总是抽出来的)
##4组被试依次性别如下,考试卷子语言如下
(s_group = as.factor(c('Male','Male','Female','Female')));
(q_group = as.factor(c('Chinese','English','Chinese','English')));
## 男生总体用中文卷智商为105,女生总体用中文卷智商为101,男生总体用E文卷子智商为80,女生总体用E文卷智商为90
x_true_group = c(105,80,101,90);
## 组内总体标准差
sigma = 10;
##每组找n(=10)个被试;
n = 10; s = rep(s_group,each=n); q =rep(q_group,each=n);x_true = rep(x_true_group,each=n); data.frame(s,q,x_true)
## 但是每个被试都有随机抽样带来的偏差,标准差是15
error = rnorm(4*n,mean=0,sd=sigma);
x_observe = x_true + error;
## 所以最终看到的是如下
data.frame(s,q,x_observe)
boxplot(x_observe~s*q)
##以上都是数据模拟部分,如果已经有数据,直接完成下面的步骤
##linear model模型,就是回归了;
model = lm(x_observe ~ 1 + s + q + s:q ) ## ~符号后面的1表示截距,s:q表示交互作用项
anova(model)
plot(model,which=c(1,2))
</pre>
</pre>


Línia 352: Línia 453:
If ''df'' on this display is set to <math>\infty</math> (''Inf'' in R) and noncentral parameter set to 0, a standard normal distribution will be plotted and critical ''z'' score calculated.
If ''df'' on this display is set to <math>\infty</math> (''Inf'' in R) and noncentral parameter set to 0, a standard normal distribution will be plotted and critical ''z'' score calculated.


===Noncentral <math>F</math>===
===Noncentral ''F''===


The noncentral parameter of ''F'' is only defined on its numerator. The noncentral ''F'' distributed
The noncentral parameter of ''F'' is only defined on its numerator. The noncentral ''F'' distributed
Línia 489: Línia 590:
}
}


pdf(rpdf, horizontal=FALSE, width=8, height=4)
pdf(rpdf, width=8, height=4)




Línia 530: Línia 631:
For two-related-group case, the difference scores between each pair of samples can apply one-group mean test interface. Usually <math>\mu_{null}</math> is set to zero.
For two-related-group case, the difference scores between each pair of samples can apply one-group mean test interface. Usually <math>\mu_{null}</math> is set to zero.


===Power of <math>F</math> test for a given Cohen's <math>\tilde{f}^2</math>===
===Power of ''F'' test for a given Cohen's ''f''===
Let's use <math>\tilde{f}^2</math> denote the population of Cohen's <math>f^2</math>, specially
Let's use <math>\tilde{f}^2</math> denote the population of Cohen's <math>f^2</math>, specially
:<math>SS(\mu_1,\mu_2,\cdots,\mu_K) \over {K\times \sigma^2}</math>
:<math>SS(\mu_1,\mu_2,\cdots,\mu_K) \over {K\times \sigma^2}</math>
Línia 548: Línia 649:
==External links==
==External links==
* [http://en.wikipedia.org/wiki/Noncentral_t-distribution Noncentral ''t''-distribution] on Wikipedia
* [http://en.wikipedia.org/wiki/Noncentral_t-distribution Noncentral ''t''-distribution] on Wikipedia
* [http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution Noncentral <math>\chi^2</math> distribution] on Wikipedia
* [http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution Noncentral chi-square distribution] on Wikipedia
* [http://en.wikipedia.org/wiki/Noncentral_F-distribution Noncentral ''F''-distribution] on Wikipedia
* [http://en.wikipedia.org/wiki/Noncentral_F-distribution Noncentral ''F''-distribution] on Wikipedia
* [http://en.wikipedia.org/wiki/Effect_size#Confidence_interval_and_relation_to_noncentral_parameters Confidence interval of Effect Size] on Wikipedia
* [http://en.wikipedia.org/wiki/Effect_size#Confidence_interval_and_relation_to_noncentral_parameters Confidence interval of Effect Size] on Wikipedia
Usuari anònim