Lixiaoxu

← Edició anterior

Usuari:Lixiaoxu (mostra el codi font)

Revisió del 16:29, 6 jul 2009

5.196 bytes afegits , 6 jul 2009

→‎Online calculator for critical values, cumulative probabilities, and critical noncentral parameters

Usuari anònim

116.77.168.69

@@ Línia 165: / Línia 165: @@
 cat('\nraw data')
 if (rawdata) (x);
+</pre>
+==方差分析教学课件==
+理解ANOVA是在作回归
+===总体与抽样次数设定===
+ANOVA在R里头和回归是同一个东西，几句话讲完。麻烦的是模拟数据：
+<Rform name="data">
+所有男生(所谓总体)用中文卷的智商均值是:<br/>
+　　<input name="xmc" value="105" size="8"/><br/>
+所有女生(所谓总体)用中文卷的智商均值是:<br/>
+　　<input name="xfc" value="101" size="8"/><br/>
+所有男生(所谓总体)用英文卷的智商均值是:<br/>
+　　<input name="xme" value="80" size="8"/><br/>
+所有女生(所谓总体)用英文卷的智商均值是:<br/>
+　　<input name="xfe" value="90" size="8"/><br/>
+抽样分成男中、女中、男英、女英四组，每组内抽的人数是：<br/>
+　　<input name="n" value="5" size="8"/><br/><br/>
+<input type="submit" name="submit" value="点击按键重复抽样，更新如下数据和结果"/><br/>
+智商的抽样误差（不一定是误出来的，但一定是抽出来的）满足正态分布，还假定已知各组内的总体标准差相同。这些抽样误差彼此独立，每次实验都重新变化。
+组内总体标准差为：<br/>
+　　<input name="sigma" value="10" size="8"/><br/><br/>
+</Rform>
+===可观测的数据及其影响的统计结论===
+每个观测到的智商成绩都包含随机的成分，所以实际观测到的数据不是各组总体的均值。
+<R name="data" iframe="height:1800px;">
+s_group = as.factor(c('Male','Male','Female','Female'));
+q_group = as.factor(c('Chinese ',' English','Chinese ',' English'));
+xmc = ifelse(exists("xmc"),as.numeric(xmc),105);
+xme = ifelse(exists("xme"),as.numeric(xme),80);
+xfc = ifelse(exists("xfc"),as.numeric(xfc),101);
+xfe = ifelse(exists("xfe"),as.numeric(xfe),90);
+sigma = ifelse(exists("sigma"),as.numeric(sigma),10);
+x_true_group = c(xmc,xme,xfc,xfe);
+##每组找n(=10)个被试；
+n = ifelse(exists("n"),as.integer(n),10);
+s = rep(s_group,each=n); q =rep(q_group,each=n);x_true = rep(x_true_group,each=n);
+## 但是每个被试都有随机抽样带来的偏差，标准差是sigma
+error = rnorm(n*4,mean=0,sd=sigma);
+x_observe = x_true + error;
+## 所以最终看到的是如下
+cat('<font color="blue">')
+data.frame(s,q,x_observe,x_true)
+cat('</font>')
+cat('\n最后一列总体值看不到\n想研究智商成绩在性别、语言上的区别\n')
+cat('也就是性别、语言对智商能不能部分地预测\n')
+cat('\n回归的表述：成绩=常数截距+性别预测增值+语言预测增值+二者交互增值+预测残差\n')
+cat('\n先理解一下真实的参数值,四组的总体\n')
+(lm(x_true ~ 0 + s:q))
+cat('\n先理解一下真实的参数值,四组的总体差异被拆解成三个差异分项\n')
+(lm(x_true ~ 1 + s + q + s:q))
+cat('\n基于观测值的预测会受到随机误差扰动，\n')
+cat('\n四组组内均值的观测\n')
+lm(x_observe ~ 0 + s:q)
+cat('\n四组三个差异分项的观测\n')
+(model = lm(x_observe ~ 1 + s + q + s:q))
+cat('\n考虑这种扰动之后，各个预测的95%把握置信区间上下界\n')
+cat('如果这个区间不包括0，表示有大于95%的把握确信这个系数有预测作用，无论大小。\n但是，交互项的存在使得解读变得困难。比如，男生-女生的效应包括sMale + (1/2) sMale:qChinese\n')
+confint(model)
+cat('\n<font color="red">ANOVA的表述：成绩的波动=性别预测增值的波动+语言预测增值的波动+残差波动\n')
+cat('“波动”的操作化定义是Sum of Squares(SS)，一列数与其均值的差距平方和\n')
+cat('Df表示每个波动项吸收抽样误差波动的理论比例\n')
+cat('Sum Sq表示每个波动项解释波动的观测比例，\n如果和Df的理论比例极端不匹配(F比1大很多)，就支持其中包含由不同总体带入的非随机波动\n')
+cat('最后一列Pr(>F)是一般的假设检验中反映F极端程度的p值\n\n')
+anova(model)
+cat('\n</font>')
+</R>
+===R代码===
+把下面的代码copy paste到R中，或者将语句贴到[http://pbil.univ-lyon1.fr/Rweb/ Rweb](中文UTF-8码)，结合#号后的注解理解方差分析
+<pre>
+## 数据:智商抽样，error是个体随机偏差（学名抽样误差，不一定是误出来的，但总是抽出来的）
+##4组被试依次性别如下，考试卷子语言如下
+(s_group = as.factor(c('Male','Male','Female','Female')));
+(q_group = as.factor(c('Chinese','English','Chinese','English')));
+## 男生总体用中文卷智商为105,女生总体用中文卷智商为101，男生总体用E文卷子智商为80，女生总体用E文卷智商为90
+x_true_group = c(105,80,101,90);
+## 组内总体标准差
+sigma = 10;
+##每组找n(=10)个被试；
+n = 10; s = rep(s_group,each=n); q =rep(q_group,each=n);x_true = rep(x_true_group,each=n); data.frame(s,q,x_true)
+## 但是每个被试都有随机抽样带来的偏差，标准差是15
+error = rnorm(4*n,mean=0,sd=sigma);
+x_observe = x_true + error;
+## 所以最终看到的是如下
+data.frame(s,q,x_observe)
+boxplot(x_observe~s*q)
+##以上都是数据模拟部分，如果已经有数据，直接完成下面的步骤
+##linear model模型，就是回归了；
+model = lm(x_observe ~ 1 + s + q + s:q ) ## ~符号后面的1表示截距,s:q表示交互作用项
+anova(model)
+plot(model,which=c(1,2))
 </pre>
@@ Línia 352: / Línia 453: @@
 If ''df'' on this display is set to <math>\infty</math> (''Inf'' in R) and noncentral parameter set to 0, a standard normal distribution will be plotted and critical ''z'' score calculated.
-===Noncentral <math>F</math>===
+===Noncentral ''F''===
 The noncentral parameter of ''F'' is only defined on its numerator. The noncentral ''F'' distributed
@@ Línia 489: / Línia 590: @@
 }
-pdf(rpdf, horizontal=FALSE, width=8, height=4)
+pdf(rpdf,  width=8, height=4)
@@ Línia 530: / Línia 631: @@
 For two-related-group case, the difference scores between each pair of samples can apply one-group mean test interface. Usually <math>\mu_{null}</math> is set to zero.
-===Power of <math>F</math> test for a given Cohen's <math>\tilde{f}^2</math>===
+===Power of ''F'' test for a given Cohen's ''f''===
 Let's use <math>\tilde{f}^2</math> denote the population of Cohen's <math>f^2</math>, specially
 :<math>SS(\mu_1,\mu_2,\cdots,\mu_K) \over {K\times \sigma^2}</math>
@@ Línia 548: / Línia 649: @@
 ==External links==
 * [http://en.wikipedia.org/wiki/Noncentral_t-distribution Noncentral ''t''-distribution] on Wikipedia
-* [http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution Noncentral <math>\chi^2</math> distribution] on Wikipedia
+* [http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution Noncentral chi-square distribution] on Wikipedia
 * [http://en.wikipedia.org/wiki/Noncentral_F-distribution Noncentral ''F''-distribution] on Wikipedia
 * [http://en.wikipedia.org/wiki/Effect_size#Confidence_interval_and_relation_to_noncentral_parameters Confidence interval of Effect Size] on Wikipedia