Lixiaoxu

Xiaoxu LI szpku.lixiaoxu@gmail.com Shenzhen Graduate School of Peking University, Guangdong, China

Regression of Inputted Data

<Rform name="owndata"> Enter your own data for a scatterplot:
You can use <a href="https://spreadsheets.google.com/ccc?key=0Aic4pmEZm32xclhJZm9hNWFyZlZOV1RSV19xWXRlbmc&hl=en">the free online spreadsheet </a> of Google Docs to edit your data before pasting. Just click the link, need NO login.
<textarea name="mydata" rows="8"> 1.262954285 3.8739569 -0.326233361 1.0400041 1.329799263 2.0161824 1.272429321 2.8284819 0.414641434 2.1324980 -1.539950042 0.4565291 -0.928567035 1.6093698 -0.294720447 0.9723025 -0.005767173 2.5310696 2.404653389 2.7861843 </textarea> <input type="submit" value=" Submit "> </Rform>

<R output="display" name="owndata" iframe="height:500px;"> if (exists("mydata")) {

 main <- "Data from user"
 x <- readdataSK(mydata, format="txt")

} else {

 main <- "Default data"
 set.seed(0);
 x<-matrix(rnorm(20),10,2);
 x[,2]=2.1+x[,1]*.8+x[,2];
 colnames(x)<-c('V1','V2');

}

pdf(rpdf, width=6, height=6) lm.m<-lm(x[,2]~x[,1]); main<-paste(main,'\nV2 =',round(lm.m$coefficients[1],3),'+',round(lm.m$coefficients[2],3),'*V1 + ',round(summary(lm.m)$sigma,3),'*e') plot(x, cex=2, main=main) abline(lm.m)

</R>

回归分析课件

输入参数

<Rform name="Tri"> 向量 ${\vec {Y}}$ 与 ${\vec {X}}_{1}$ 、 ${\vec {X}}_{2}$ 的夹角(90度为直角)分别是

<Input name="cy1" value="89" size="5"/>度

和

<Input name="cy2" value="89" size="5"/>度。

向量 ${\vec {X}}_{1}$ 与 ${\vec {X}}_{2}$ 的夹角是

<Input name="c12" value="177.9" size="5"/>度。

这三个角度应当满足两两之和大于第三者。这些向量的 $N$ 个分量代表 $N$ 个标准化之后的样本。

请设定样本量为

<Input name="N" value="100" size="5"/>，输出模拟数据<Input name="rawdata" type="checkbox"/>

练习

请观察 ${\vec {X}}_{2}$ 加入前后，回归方程

{\vec {Y}}=\beta _{1}{\vec {X}}_{1}+{\vec {\epsilon }}

与

{\vec {Y}}=\beta _{1}{\vec {X}}_{1}+\beta _{2}{\vec {X}}_{2}+{\vec {\epsilon }}

的 $R^{2}$ 的变化。

两个与DV相关极小的IV却能极好地预测DV

三个角度分别为 89,89,177.9

两个与DV高正相关的IV却出现负回归系数

三个角度分别为 5,2.6,2.6

两个不相关的DV对IV的预测能力( $R^{2}$ )可以相加

第三个角度为90

( $R_{1}^{2}+R_{2}^{2}-R_{12}^{2}$ )从0变大再变小甚至变负的情形

零:三个角度分别为：60,45,90

正:三个角度分别为：60,45,45

负:三个角度分别为：60,45,15.1

与Redundancy的关系: Cohen & Cohen (2003, p. 76)

结果

<R output="html" name="Tri" iframe="height:400px;"> cy1 <- ifelse(exists("cy1"),as.numeric(cy1),89); cy2 <- ifelse(exists("cy2"),as.numeric(cy2),89); c12 <- ifelse(exists("c12"),as.numeric(c12),177.9); N <- ifelse(exists("N"),as.integer(N),100); rawdata <- ifelse(exists("rawdata"),as.logical(N),FALSE);

S <- matrix(rep(1,9),3); S[1,2]<-S[2,1]<-cos(cy1/180*pi); S[1,3]<-S[3,1]<-cos(cy2/180*pi); S[2,3]<-S[3,2]<-cos(c12/180*pi);

if ((det(S)<= 0 )|(N<1)) outHTML(rhtml,NA,title='Please check your input!\n Sum of any two angles should be larger than the third one.'); require(MASS);

x<-mvrnorm(n=N,mu=c(0,0,0),Sigma=S,empirical= TRUE); Y<-x[,1];X_1<-x[,2];X_2<-x[,3];

colnames(x)<-colnames(S)<-rownames(S)<-c('Y','X_1','X_2');

lm1 <- lm(Y~0+ X_1); lm2 <- lm(Y~0+ X_2); lm12 <- lm(Y~0+ X_1+X_2); R2<-matrix(rep(NA,3),nrow=3); rownames(R2)<-c('Y ~ 0+ X_1','Y ~ 0+ X_2','Y ~ 0+ X_1 + X_2'); R2[,1] <- c( summary(lm1)$r.squared, summary(lm2)$r.squared, summary(lm12)$r.squared); colnames(R2)[1]<-round( summary(lm1)$r.squared + summary(lm2)$r.squared - summary(lm12)$r.squared,4); outHTML(rhtml, t(R2), title="R^2_1+R^2_2-R^2_12", format="f", digits=4);

outHTML(rhtml, summary(lm1)$coefficients, title=rownames(R2)[1], format="f", digits=4);

outHTML(rhtml, summary(lm2)$coefficients, title=rownames(R2)[2], format="f", digits=4);

outHTML(rhtml, summary(lm12)$coefficients, title=rownames(R2)[3], format="f", digits=4);

outHTML(rhtml, S, title="correlation\n", format="f", digits=4); if (rawdata) outHTML(rhtml, x, title="Raw data\n", format="f", digits=4); </R>

R 代码

cy1 <- 89; ## \angle YX_1
cy2 <- 89; ## \angle YX_2
c12 <- 177.9; ## \angle X_1X_2
N <- 100;
rawdata=TRUE;

S <- matrix(rep(1,9),3);
S[1,2]<-S[2,1]<-cos(cy1/180*pi);
S[1,3]<-S[3,1]<-cos(cy2/180*pi);
S[2,3]<-S[3,2]<-cos(c12/180*pi);

require(MASS);## install.packages('MASS');

x<-mvrnorm(n=N,mu=c(0,0,0),Sigma=S,empirical= TRUE);
Y<-x[,1];X_1<-x[,2];X_2<-x[,3];

colnames(x)<-colnames(S)<-rownames(S)<-c('Y','X_1','X_2');

R2<-matrix(rep(NA,3),nrow=3);
colnames(R2)<-c('R^2');
rownames(R2)<-c('Y = b_1*X_1 + e','Y =b_2*X_2 + e','Y =b_1*X_1 + b_2*X_2 + e');
lm1 <- lm(Y~0+X_1);
lm2 <- lm(Y~0+X_2);
lm12 <- lm(Y~0+X_1+X_2);
R2[,1] <- c( summary(lm1)$r.squared, summary(lm2)$r.squared, summary(lm12)$r.squared);

R2
R2[1,1]+R2[2,1]-R2[3,1]
summary(lm1)
summary(lm2)
summary(lm12)
cat('\ncorr')
S
cat('\nraw data')
if (rawdata) (x);

方差分析教学课件

理解ANOVA是在作回归

总体与抽样次数设定

ANOVA在R里头和回归是同一个东西，几句话讲完。麻烦的是模拟数据： <Rform name="data"> 所有男生(所谓总体)用中文卷的智商均值是:
　　<input name="xmc" value="105" size="8"/>
所有女生(所谓总体)用中文卷的智商均值是:
　　<input name="xfc" value="101" size="8"/>
所有男生(所谓总体)用英文卷的智商均值是:
　　<input name="xme" value="80" size="8"/>
所有女生(所谓总体)用英文卷的智商均值是:
　　<input name="xfe" value="90" size="8"/>
抽样分成男中、女中、男英、女英四组，每组内抽的人数是：
　　<input name="n" value="5" size="8"/>

<input type="submit" name="submit" value="点击按键重复抽样，更新如下数据和结果"/>
智商的抽样误差（不一定是误出来的，但一定是抽出来的）满足正态分布，还假定已知各组内的总体标准差相同。这些抽样误差彼此独立，每次实验都重新变化。组内总体标准差为：
　　<input name="sigma" value="10" size="8"/>

</Rform>

可观测的数据及其影响的统计结论

每个观测到的智商成绩都包含随机的成分，所以实际观测到的数据不是各组总体的均值。 <R name="data" iframe="height:1800px;"> s_group = as.factor(c('Male','Male','Female','Female')); q_group = as.factor(c('Chinese ',' English','Chinese ',' English')); xmc = ifelse(exists("xmc"),as.numeric(xmc),105); xme = ifelse(exists("xme"),as.numeric(xme),80); xfc = ifelse(exists("xfc"),as.numeric(xfc),101); xfe = ifelse(exists("xfe"),as.numeric(xfe),90); sigma = ifelse(exists("sigma"),as.numeric(sigma),10); x_true_group = c(xmc,xme,xfc,xfe);

1. 每组找n(=10)个被试；

n = ifelse(exists("n"),as.integer(n),10); s = rep(s_group,each=n); q =rep(q_group,each=n);x_true = rep(x_true_group,each=n);

1. 但是每个被试都有随机抽样带来的偏差，标准差是sigma

error = rnorm(n*4,mean=0,sd=sigma); x_observe = x_true + error;

1. 所以最终看到的是如下

cat('') data.frame(s,q,x_observe,x_true) cat('')

cat('\n最后一列总体值看不到\n想研究智商成绩在性别、语言上的区别\n') cat('也就是性别、语言对智商能不能部分地预测\n') cat('\n回归的表述：成绩=常数截距+性别预测增值+语言预测增值+二者交互增值+预测残差\n') cat('\n先理解一下真实的参数值,四组的总体\n') (lm(x_true ~ 0 + s:q)) cat('\n先理解一下真实的参数值,四组的总体差异被拆解成三个差异分项\n') (lm(x_true ~ 1 + s + q + s:q)) cat('\n基于观测值的预测会受到随机误差扰动，\n') cat('\n四组组内均值的观测\n') lm(x_observe ~ 0 + s:q) cat('\n四组三个差异分项的观测\n') (model = lm(x_observe ~ 1 + s + q + s:q)) cat('\n考虑这种扰动之后，各个预测的95%把握置信区间上下界\n') cat('如果这个区间不包括0，表示有大于95%的把握确信这个系数有预测作用，无论大小。\n但是，交互项的存在使得解读变得困难。比如，男生-女生的效应包括sMale + (1/2) sMale:qChinese\n') confint(model)

cat('\nANOVA的表述：成绩的波动=性别预测增值的波动+语言预测增值的波动+残差波动\n') cat('“波动”的操作化定义是Sum of Squares(SS)，一列数与其均值的差距平方和\n') cat('Df表示每个波动项吸收抽样误差波动的理论比例\n') cat('Sum Sq表示每个波动项解释波动的观测比例，\n如果和Df的理论比例极端不匹配(F比1大很多)，就支持其中包含由不同总体带入的非随机波动\n') cat('最后一列Pr(>F)是一般的假设检验中反映F极端程度的p值\n\n') anova(model) cat('\n') </R>

R代码

把下面的代码copy paste到R中，或者将语句贴到Rweb(中文UTF-8码)，结合#号后的注解理解方差分析

## 数据:智商抽样，error是个体随机偏差（学名抽样误差，不一定是误出来的，但总是抽出来的）
##4组被试依次性别如下，考试卷子语言如下
(s_group = as.factor(c('Male','Male','Female','Female')));
(q_group = as.factor(c('Chinese','English','Chinese','English')));
## 男生总体用中文卷智商为105,女生总体用中文卷智商为101，男生总体用E文卷子智商为80，女生总体用E文卷智商为90
x_true_group = c(105,80,101,90);
## 组内总体标准差
sigma = 10;

##每组找n(=10)个被试；
n = 10; s = rep(s_group,each=n); q =rep(q_group,each=n);x_true = rep(x_true_group,each=n); data.frame(s,q,x_true)
## 但是每个被试都有随机抽样带来的偏差，标准差是15
error = rnorm(4*n,mean=0,sd=sigma);
x_observe = x_true + error;
## 所以最终看到的是如下
data.frame(s,q,x_observe)
boxplot(x_observe~s*q)
##以上都是数据模拟部分，如果已经有数据，直接完成下面的步骤

##linear model模型，就是回归了；
model = lm(x_observe ~ 1 + s + q + s:q ) ## ~符号后面的1表示截距,s:q表示交互作用项
anova(model)
plot(model,which=c(1,2))

Online calculator for critical values, cumulative probabilities, and critical noncentral parameters

Input

<Rform name="dncpx"> Choose a statistic: <input type="radio" name="name" value="t" checked/> $t$ , <input type="radio" name="name" value="chisq"/> $\chi ^{2}$ , or <input type="radio" name="name" value="F"/> $F$ (noncentral parameter of $\chi ^{2}$ or $F$ must be non-negative)
noncentral parameter (ncp)

= <input name="ncp" type="text" size="5" maxlength="10" value="4.5"> ( vs. <input name="ncp_c" type="text" size="5" maxlength="10" value="0"> )

degree freedom

= <input name="df1" type="text" size="5" maxlength="10" value="4">

degree freedom of denominator (only for $F$ )

= <input name="df2" type="text" size="5" maxlength="10" value="3">

Mark the points with cumulative probability

= <input type="input" name="q0" maxlength="10" value=".025" size="5">

and the critical statistic

= <input type="input" name="x1" maxlength="10" value="5.1" size="5">

Click to update display with precision <input type="submit" name="submit" value=".001"><input type="submit" name="submit" value=".0001"><input type="submit" name="submit" value=".000001"><input type="submit" name="submit" value="fully">
</Rform>

Results of critical statistic, cumulative probability, and critical noncentral parameter

How to cite this result

<R output="display" name="dncpx" iframe="height:400px;"> digits <- ifelse(exists("submit"),ifelse(as.character(submit)=='fully',Inf,round(log(as.numeric(submit),.1))),3) if (exists("ncp")) ncp <- as.numeric(ncp) else ncp <- 4.5 ncp_c <- ifelse(exists("ncp_c"), as.numeric(ncp_c),0); if (exists("df1")) df1 <- as.numeric(df1) else df1 <- 4 if (exists("df2")) df2 <- as.numeric(df2) else df2 <- 3 if (exists("name")) name <- as.character(name) else name <- 't' if (exists("q0")) q0 <- as.numeric(q0) else q0 <- .025 if (exists("x1")) x1 <- as.numeric(x1) else x1 <- 5.1

ncpt<-function(x,q,df,confirm=FALSE){ .f<-function(ncp,x,df,q)abs(q-pt(x,df=df,ncp=ncp)) .n<-1; while ( ( (pt(x,df=df,ncp=-.n) < q+(1-q)/2 ) | (pt(x,df=df,ncp=.n) > q/2) ) & (.n < Inf) ) .n <- .n *2 ; if (confirm) optimize(f=.f,x=x,df=df,q=q,interval=c(-.n,.n)) else optimize(f=.f,x=x,df=df,q=q,interval=c(-.n,.n))$minimum }

ncpchisq<-function(x,q,df,confirm=FALSE){ .f<-function(ncp,x,df,q)abs(q - pchisq(x,df=df,ncp=ncp)) if (pchisq(x,df=df)<=q){ if (confirm) { minimum <-0; objective <- pchisq(x,df=df)-q; data.frame(minimum,objective) }else 0 }else { .n<- 1; while ( (pchisq(x,df=df,ncp=.n) > q/2) & (.n < Inf) ) .n <- .n + 1; if (confirm) optimize(f=.f,x= x,df=df,q=q,interval=c(0,.n)) else optimize(f=.f,x= x,df=df,q=q,interval=c(0,.n))$minimum } }; ncpf<-function(x,q,df1,df2,confirm=FALSE){ .f<-function(ncp,x,df1,df2,q)abs(q - pf(x,df1=df1,df2=df2,ncp=ncp)) if (pf(x,df1=df1,df2=df2)<=q){ if (confirm) { minimum <-0; objective <- pf(x,df1=df1,df2=df2)-q; data.frame(minimum,objective) }else 0 }else { .n<- 1; while ( (pf(x,df1=df1,df2=df2,ncp=.n) > q/2) & (.n < Inf) ) .n <- .n +1 ; if (confirm) optimize(f=.f,x= x,df1=df1,df2=df2,q=q,interval=c(0,.n)) else optimize(f=.f,x= x,df1=df1,df2=df2,q=q,interval=c(0,.n))$minimum } };

d.f <- function(x,df1,df2,ncp=0) { delta <- 10^-6; (pf(x+delta,df1,df2,ncp)-pf(x-delta,df1,df2,ncp))/(2*delta) } df <- df1

if (name=='chisq'){ x <- seq(.001,ncp+df*9,length.out=500); pr <- dchisq(x,df=df,ncp=ncp); pb <- dchisq(x,df=df,ncp=ncp_c); x0 <- qchisq(q0,df=df,ncp=ncp); q1 <- pchisq(x1,df=df,ncp=ncp); ncp2 <- ncpchisq(x=x1,q=q0,df=df); pg <- dchisq(x,df=df,ncp=ncp2); q2 <- pchisq(x1,df=df,ncp=ncp2);

}else if (name=='F'){ x <- seq(.001,ncp/df1+9,length.out=500); pr <- d.f(x,df1=df,df2=df2,ncp=ncp); pb <- d.f(x,df1=df,df2=df2,ncp=ncp_c); x0 <- qf(q0,df1=df1,df2=df2,ncp=ncp); q1 <- pf(x1,df1=df1,df2=df2,ncp=ncp); ncp2 <- ncpf(x=x1,q=q0,df1=df1,df2=df2); pg <- d.f(x,df1=df1,df2=df2,ncp=ncp2); q2 <- pf(x1,df1=df1,df2=df2,ncp=ncp2);

}else{ x <- seq(min(ncp,ncp_c)-8,max(ncp,ncp_c)+8,length.out=500) pr <- dt(x,df=df,ncp=ncp); pb <- dt(x,df=df,ncp=ncp_c); x0 <- qt(q0,df=df,ncp=ncp); q1 <- pt(x1,df=df,ncp=ncp); ncp2 <- ncpt(x=x1,q=q0,df=df); pg <- dt(x,df=df,ncp=ncp2); q2 <- pt(x1,df=df,ncp=ncp2); } if (df <= 2) { pr[pr<=0] <- NA; pb[pb<=0] <- NA; pg[pg<=0] <- NA; pr[pr>10] <- NA; pb[pb>10] <- NA; pg[pg>10] <- NA; }

x0 <- round(x0,digits); q1 <- round(q1,digits); ncp2 <- round(ncp2,digits); q2 <- round(q2,digits);

pdf(rpdf, width=5, height=5) main <- paste('Pr(',name,'<',x0,';df=',df,',ncp=',ncp,')=',q0,'\nPr(',name,'<',x1,';df=',df,',ncp=',ncp,')=',q1,'\nPr(',name,'<',x1,';df=',df,',ncp=',ncp2,')=',q2, sep=) sub <-paste('\nnoncentral parameter = ',ncp,'(Red), ',ncp_c,'(Blue),\nand ',round(ncp2,digits),'(Black) which fits \nPr(',name,'<',x1,';ncp=',round(ncp2,digits),')=',round(q2,digits), sep=) plot(c(x,x),c(pr,pb),type='n',main=main,sub=sub,xlab=,ylab=paste(name,'probability density')) points(x[x<=max(x0,x1)],pr[x<=max(x0,x1)],col='green',type='h') points(x[x<=min(x0,x1)],pr[x<=min(x0,x1)],col='yellow',type='h') lines(x,pb,col='blue') lines(x,pr,col='red') lines(x,pg) </R>

z and noncentral distributions (chi-square, t, and F)

Noncentral chi-square

Let $Z_{i}$ ,i=0,1,2,... denote a series of independent random variables of standard normal distribution.

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle V=\sum _{i=1}^{df}{Z_{i}}^{2}}

will be a random variable of $\chi ^{2}$ distribution with df degrees of freedom. For any given series of constants No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \mu _{i}} ,i=1,2,...,df,

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \sum _{i=1}^{df}(Z_{i}+\mu _{i})^{2}}

will be a random variable of the respective noncentral $\chi ^{2}$ distribution with the same df and the distinct noncetral parameter

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle ncp=\sum _{i=1}^{df}{\mu _{i}}^{2}}

It is different from the random variable No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle V+ncp=\sum _{i=1}^{df}{Z_{i}}^{2}+\sum _{i=1}^{df}{\mu _{i}}^{2}} of the respective central $\chi ^{2}$ distribution with a central drift.

Noncentral t

For any given constant No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \mu _{0}} ,

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\frac {Z_{0}+\mu _{0}}{\sqrt {V/df}}}=} No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\frac {Z_{0}+\mu _{0}}{\sqrt {\sum _{i=1}^{df}{Z_{i}}^{2}/df}}}}

is a random variable of noncentral t-distribution with noncentrality parameter

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle ncp=\mu _{0}} ,

which is different from ${\frac {Z_{0}}{\sqrt {V/df\ }}}+\mu _{0}$ , the central t-distributed random variable drifted with the same mean.

If df on this display is set to $\infty$ (Inf in R) and noncentral parameter set to 0, a standard normal distribution will be plotted and critical z score calculated.

Noncentral F

The noncentral parameter of F is only defined on its numerator. The noncentral F distributed

{\frac {\sum _{i=1}^{df_{1}}(Z_{i}+\mu _{i})^{2}/df_{1}}{\sum _{i=df_{1}+1}^{df_{1}+df_{2}}Z_{i}^{2}/df_{2}}}

with noncentral parameter

ncp=No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \sum _{i=1}^{df_{1}}\mu _{i}^{2}}

is different from the central F distributed random variable plus the respective constant No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\frac {\sum _{i=1}^{df_{1}}Z_{i}^{2}/df_{1}}{\sum _{i=df_{1}+1}^{df_{1}+df_{2}}Z_{i}^{2}/df_{2}}}+{\frac {\sum _{i=1}^{df_{1}}\mu _{i}^{2}}{df_{1}}}} .

Confidence interval of standardized effect size by noncentral parameters

Confidence interval of unstandardized effect size like difference of means No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle (\mu _{1}-\mu _{2})} can be found in common statistics textbooks and software, while confidence intervals of standardized effect size, especially Cohen's ${\tilde {d}}:={\frac {\mu _{1}-\mu _{2}}{\sigma }}$ and No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\tilde {f}}^{2}:={\frac {SS(\mu _{1},\mu _{2},...,\mu _{K})}{K\cdot \sigma ^{2}}}} , rely on the calculation of confidence intervals of noncentral parameters (ncp).

A common method to find confident interval limits of ncp is to solve the critical ncp value for marginal extreme quantile. The ncp parameter of the black curve in the above diagram could be directly adopted. For example, No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \left(-\infty ,8.968\right)} can be 97.5% one-way confidence interval of ncp if observed No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle t_{df=4}=5.1} , while change quantile from .025 to .975, we shall find that the two-way interval (1.139, 8.968) can be of 95% confidence level.

T test for mean difference of single group or two related groups

In case of single group, M (No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \mu } ) denotes the sample (population) mean of single group , and SD (No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \sigma } ) denotes the sample (population) standard deviation. N is the sample size of the group. T test is used for the hypothesis on the difference between mean and a baseline $\mu _{baseline}$ . Usually, $\mu _{baseline}$ is zero, while not necessary. In case of two related groups, the single group is constructed by difference in each pair of samples, while SD (No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \sigma } ) denotes the sample (population) standard deviation of differences rather than within original two groups.

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle t:={\frac {M}{SD/{\sqrt {N}}}}={\frac {{\sqrt {N}}{\frac {M-\mu }{\sigma }}+{\sqrt {N}}{\frac {\mu -\mu _{baseline}}{\sigma }}}{\frac {SD}{\sigma }}}}

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle ncp={\sqrt {N}}{\frac {\mu -\mu _{baseline}}{\sigma }}} and Cohen's No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle d:={\frac {M-\mu _{baseline}}{SD}}} is the point estimate of No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\frac {\mu -\mu _{baseline}}{\sigma }}} .

So,

{\tilde {d}}={\frac {ncp}{\sqrt {N}}}

.

T test for mean difference between two independent groups

$n_{1}$ or $n_{2}$ is sample size within the respective group.

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle t:={\frac {M_{1}-M_{2}}{SD_{within}/{\sqrt {\frac {n_{1}n_{2}}{n_{1}+n_{2}}}}}}} , wherein

SD_{within}:={\sqrt {\frac {SS_{within}}{df_{within}}}}={\sqrt {\frac {(n_{1}-1)SD_{1}^{2}+(n_{2}-1)SD_{2}^{2}}{n_{1}+n_{2}-2}}}

.

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle ncp={\sqrt {\frac {n_{1}n_{2}}{n_{1}+n_{2}}}}{\frac {\mu _{1}-\mu _{2}}{\sigma }}} and Cohen's No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle d:={\frac {M_{1}-M_{2}}{SD_{within}}}} is the point estimate of No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\frac {\mu _{1}-\mu _{2}}{\sigma }}} .

So,

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\tilde {d}}={\frac {ncp}{\sqrt {\frac {n_{1}n_{2}}{n_{1}+n_{2}}}}}} .

One-way ANOVA test for mean difference across multiple independent groups

One-way ANOVA test applies noncentral F distribution. While with a given population standard deviation No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \sigma } , the same test question applies noncentral chi-square distribution.

F:={\frac {{\frac {SS_{between}}{\sigma ^{2}}}/df_{between}}{{\frac {SS_{within}}{\sigma ^{2}}}/df_{within}}}

For each j-th sample within i-th group No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle X_{i,j}} , denote $M_{i}\left(X_{i,j}\right):={\frac {\sum _{w=1}^{n_{i}}X_{i,w}}{n_{i}}};\;\mu _{i}\left(X_{i,j}\right):=\mu _{i};$ .

While,

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle SS_{between} \over \sigma ^{2}}

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle ={\frac {SS\left(M_{i}\left(X_{i,j}\right);i=1,2,\cdots ,K,\;j=1,2,\cdots ,n_{i}\right)}{\sigma ^{2}}}}
No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle =SS\left({\frac {M_{i}\left(X_{i,j}-\mu _{i}\right)}{\sigma }}+{\frac {\mu _{i}}{\sigma }};i=1,2,\cdots ,K,\;j=1,2,\cdots ,n_{i}\right)}
No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \sim \chi ^{2}\left(df=K-1,\;ncp=SS\left({\frac {\mu _{i}\left(X_{i,j}\right)}{\sigma }};i=1,2,\cdots ,K,\;j=1,2,\cdots ,n_{i}\right)\right)}

So, both ncp(s) of F and $\chi ^{2}$ equate

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle SS\left(\mu _{i}(X_{i,j})/\sigma ;i=1,2,\cdots ,K,\;j=1,2,\cdots ,n_{i}\right)} .

In case of No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle n:=n_{1}=n_{2}=\cdots =n_{K}} for K independent groups of same size, the total sample size is No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle N:=n\cdot K} .

Cohen's\;{\tilde {f}}{}^{2}:={\frac {SS(\mu _{1},\mu _{2},...,\mu _{K})}{K\cdot \sigma ^{2}}}={\frac {SS\left(\mu _{i}\left(X_{i,j}\right)/\sigma ;i=1,2,\cdots ,K,\;j=1,2,\cdots ,n_{i}\right)}{n\cdot K}}={\frac {ncp}{n\cdot K}}={\frac {ncp}{N}}

.

T-test of pair of independent groups is a special case of one-way ANOVA. Note that noncentral parameter No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle ncp_{F}} of F is not comparable to the noncentral parameter $ncp_{t}$ of the corresponding t. Actually, $ncp_{F}=ncp_{t}^{2}$ , and No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\tilde {f}}=\left|{\frac {\tilde {d}}{2}}\right|} in the case.

RMSEA of Structural Equation Model

ncp of $\chi ^{2}$ reported by Structural Equation Model softwares is proportional to the population value of No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle RMSEA^{2}} , or the squared distance per df from population var-cov matrix to the model space.

No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\tilde {RMSEA}}={\sqrt {\frac {ncp}{(N-1)df}}}}

Power vs. Standardized Effect Size or ncp

Power of t test for a given Cohen's No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \delta }

Example of one-group mean test

Input

A normally distributed population, for example, IQ distribution of students, is sampled

N

=<Input name="N" size="3" value="16"/>

times independently. The mean and standard deviation estimates from all $M$ samples are respectively denoted $M$ and No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle SD} in the current replication.

The statistical interest is usually on the mean of population, named No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \mu } ; sometimes also on the standard deviation of population, named No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \sigma } . The statistic $t$ is defined as following --

t:={\frac {M-\mu _{null}}{SD/{\sqrt {N}}}}

It measures whether or not $M$ is significantly

<select name="direction"><option value="gt" selected>greater than</option><option value="ne" >different from</option><option value="lt">less than</option></select> a baseline :

\mu _{null}

=<input name="mu0" size="5" value="100.00"/>,

relative to the scale of standard error estimate of $M$ . If No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \mu } is really $\mu _{null}$ , the $t$ statistic distribution is known with noncentral parameter 0 and degrees freedom No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle (N-1)} .

Type I error, denoted

\alpha

=<Input name="alpha" size="3" value=".05"/>,

defines the probability domain of the extreme $t$ values.

However, the real No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \mu } may be No s'ha pogut entendre (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \mu _{alternative}} rather than $\mu _{null}$ . Then, the noncentral parameter of the $t$ statistic distribution will change to be

No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \sqrt{N}\times\frac{\mu_{alternative}-\mu_{null}}{\sigma}}

wherein No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \delta:=(\mu_{alternative}-\mu_{null})/\sigma} is estimated by Cohen's No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle d} No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle :=(M-\mu_{null})/SD} . A known/hypothesized eg.

No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \delta=} <input name="delta" size="4" value="0.75">,

together with the sample size No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle N} , will give a known/hypothesized noncentral No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle t} distribution, while a No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \mu_{alternative}} alone without a given No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \sigma} is helpless.

Change

No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle M} =<Input name="M" size="5" value="105.22"/> and No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle SD} =<Input name="s" size="5" value="16.72" />,

then verify whether they affect the statistical power.

Results

How to cite this result

<R output="display" name="tEx" iframe="height:320px;"> N <- ifelse(exists("N"), as.numeric(N),16) mu0 <- ifelse(exists("mu0"), as.numeric(mu0),100) alpha <- ifelse(exists("alpha"), as.numeric(alpha),.05) delta <- ifelse(exists("delta"), as.numeric(delta),0.75) M <- ifelse(exists("M"), as.numeric(M),105.22) s <- ifelse(exists("s"), as.numeric(s),16.72) direction <- ifelse(exists("direction"), as.character(direction),"gt")

ncpt<-function(x,q,df,confirm=FALSE){ if (q<=0) (+Inf) else if (q>=1) (-Inf) else if ((q>0)&(q<1)) { .f<-function(ncp,x,df,q)abs(q-pt(x,df=df,ncp=ncp)) .n<-1; while ( ( (pt(x,df=df,ncp=-.n) < q+(1-q)/2 ) | (pt(x,df=df,ncp=.n) > q/2) ) & (.n < Inf) ) .n <- .n *2 ; if (confirm) optimize(f=.f,x=x,df=df,q=q,interval=c(-.n,.n)) else optimize(f=.f,x=x,df=df,q=q,interval=c(-.n,.n))$minimum } }

pdf(rpdf, width=8, height=4)

alpha_r <- ifelse(direction == "ne",alpha/2, ifelse(direction == "gt",alpha,0)); alpha_l <- alpha - alpha_r; se <- s/sqrt(N); df <- N-1; ncp <- sqrt(N)*delta; t <-(M-mu0)/se; tc_r <- qt(1-alpha_r,df=df); tc_l <- qt(alpha_l,df=df); d_l <- ncpt(x=t,q=1-alpha_r,df=df)/sqrt(N); d_r <- ncpt(x=t,q=alpha_l,df=df)/sqrt(N); sub <- paste("H0:blue central t; H1:red noncentral t\n",1-alpha," confidence interval of Cohen's delta\n",round(d_l,4)," ~ ",round(d_r,4),sep=""); if (direction == "ne") main1=paste("two-tail p value", round(1-pt(abs(t),df=df)+pt(-abs(t),df=df),4)); if (direction == "gt") main1=paste("right-tail p value", round(1-pt(t,df=df),4)); if (direction == "lt") main1=paste("left-tail p value", round(pt(t,df=df),4));

main1=paste("t value",round(t,4),",",main1,"\n",1-alpha,"confidence interval of mean\n",round(M+se*qt(alpha_r,df=df),4),"~",round(M-se*qt(alpha_l,df=df),4));

x <- seq(min(-4,ncp-5),max(4,ncp+5),length.out=200); op <-par(mfrow=c(1,2)); plot(x,dt(x,df=df),main=main1,sub=sub,xlab=,ylab=,type='l',col='blue'); points(x,dt(x,df=df,ncp=ncp),type='l',col='red'); x_reject <- c(x[(x > tc_r) | (x < tc_l)],tc_r,tc_l); points(x_reject,dt(x_reject,df=df,ncp=ncp),type='h',col='grey90'); points(x_reject,dt(x_reject,df=df,ncp=0),type='h',col='grey70'); points(t,dt(t,df=df,ncp=0));points(t,dt(t,df=df,ncp=0),type='h');

main2=paste("statistical power is", round(1-pt(tc_r,df=df,ncp=ncp)+pt(tc_l,df=df,ncp=ncp),4),"."); xN <- round(N/3):(3*N); plot(xN,1-pt(tc_r,df=df,ncp=sqrt(xN)*delta)+pt(tc_l,df=df,ncp=sqrt(xN)*delta),main=main2,xlab='sample size',ylab='statistical power',col='gray'); points(N,1-pt(tc_r,df=df,ncp=sqrt(N)*delta)+pt(tc_l,df=df,ncp=sqrt(N)*delta),col='red',type='h'); par(op); </R>

Two-related-group mean test

For two-related-group case, the difference scores between each pair of samples can apply one-group mean test interface. Usually No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \mu_{null}} is set to zero.

Power of F test for a given Cohen's f

Let's use No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \tilde{f}^2} denote the population of Cohen's No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle f^2} , specially

No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle SS(\mu_1,\mu_2,\cdots,\mu_K) \over {K\times \sigma^2}}

in one-way ANOVA of No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle K} groups setup with within-group sample size n and within-group population mean No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \mu_1,\mu_2,\cdots,\mu_K} respectively. The noncentral parameter of the corresponding F or No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle \chi^2} distribution is No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle ncp=n\times K \times \tilde{f}^2} .

Power of SEM close-fit test for a given RMSEA

No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle (N-1)\cdot df\cdot RMSEA^2+df\sim \chi^2_{df,ncp=(N-1)\cdot df\cdot Distance^2perDf(\tilde{\Sigma},Model\,Space)}}

No s'ha pogut entendre (MathML amb SVG o PNG alternatiu (recomanat per a navegadors moderns i eines d'accessibilitat): Resposta invàlida («Math extension cannot connect to Restbase.») del servidor «https://wikimedia.org/api/rest_v1/»:): {\displaystyle RMSEA=\sqrt{\frac{\hat{\chi}^2-df}{df\cdot(N-1)}}}

How to cite this page in APA style

In APA style this page can be cited in reference lists like --

Comparison of noncentral and central distributions. (yyyy, Month dd). In SlideWiki. Retrieved MM:SS, Month dd, yyyy, from http://mars.wiwi.hu-berlin.de/mediawiki/slides/index.php/Comparison_of_noncentral_and_central_distributions

For other styles, refers to examples on wikipedia.

External links

Noncentral t-distribution on Wikipedia
Noncentral chi-square distribution on Wikipedia
Noncentral F-distribution on Wikipedia
Confidence interval of Effect Size on Wikipedia