Usuari:Lixiaoxu
Xiaoxu LI szpku.lixiaoxu@gmail.com Shenzhen Graduate School of Peking University, Guangdong, China
Regression of Inputted Data
<Rform name="owndata">
Enter your own data for a scatterplot:
You can use <a href="https://spreadsheets.google.com/ccc?key=0Aic4pmEZm32xclhJZm9hNWFyZlZOV1RSV19xWXRlbmc&hl=en">the free online spreadsheet </a> of Google Docs to edit your data before pasting. Just click the link, need NO login.
<textarea name="mydata" rows="8">
1.262954285 3.8739569
-0.326233361 1.0400041
1.329799263 2.0161824
1.272429321 2.8284819
0.414641434 2.1324980
-1.539950042 0.4565291
-0.928567035 1.6093698
-0.294720447 0.9723025
-0.005767173 2.5310696
2.404653389 2.7861843
</textarea>
<input type="submit" value=" Submit ">
</Rform>
<R output="display" name="owndata" iframe="height:500px;"> if (exists("mydata")) {
main <- "Data from user" x <- readdataSK(mydata, format="txt")
} else {
main <- "Default data" set.seed(0); x<-matrix(rnorm(20),10,2); x[,2]=2.1+x[,1]*.8+x[,2]; colnames(x)<-c('V1','V2');
}
pdf(rpdf, width=6, height=6) lm.m<-lm(x[,2]~x[,1]); main<-paste(main,'\nV2 =',round(lm.m$coefficients[1],3),'+',round(lm.m$coefficients[2],3),'*V1 + ',round(summary(lm.m)$sigma,3),'*e') plot(x, cex=2, main=main) abline(lm.m)
</R>
回归分析课件
输入参数
<Rform name="Tri"> 向量与、的夹角(90度为直角)分别是
- <Input name="cy1" value="89" size="5"/>度
和
- <Input name="cy2" value="89" size="5"/>度。
向量与的夹角是
- <Input name="c12" value="177.9" size="5"/>度。
这三个角度应当满足两两之和大于第三者。 这些向量的个分量代表个标准化之后的样本。
请设定样本量为
- <Input name="N" value="100" size="5"/>,输出模拟数据<Input name="rawdata" type="checkbox"/>
<input type="submit" /> </Rform>
练习
请观察加入前后,回归方程
与
的的变化。
两个与DV相关极小的IV却能极好地预测DV
三个角度分别为 89,89,177.9
两个与DV高正相关的IV却出现负回归系数
三个角度分别为 5,2.6,2.6
两个不相关的DV对IV的预测能力()可以相加
第三个角度为90
()从0变大再变小甚至变负的情形
零:三个角度分别为:60,45,90
正:三个角度分别为:60,45,45
负:三个角度分别为:60,45,15.1
与Redundancy的关系: Cohen & Cohen (2003, p. 76)
结果
<R output="html" name="Tri" iframe="height:400px;"> cy1 <- ifelse(exists("cy1"),as.numeric(cy1),89); cy2 <- ifelse(exists("cy2"),as.numeric(cy2),89); c12 <- ifelse(exists("c12"),as.numeric(c12),177.9); N <- ifelse(exists("N"),as.integer(N),100); rawdata <- ifelse(exists("rawdata"),as.logical(N),FALSE);
S <- matrix(rep(1,9),3); S[1,2]<-S[2,1]<-cos(cy1/180*pi); S[1,3]<-S[3,1]<-cos(cy2/180*pi); S[2,3]<-S[3,2]<-cos(c12/180*pi);
if ((det(S)<= 0 )|(N<1)) outHTML(rhtml,NA,title='Please check your input!\n Sum of any two angles should be larger than the third one.');
require(MASS);
x<-mvrnorm(n=N,mu=c(0,0,0),Sigma=S,empirical= TRUE); Y<-x[,1];X_1<-x[,2];X_2<-x[,3];
colnames(x)<-colnames(S)<-rownames(S)<-c('Y','X_1','X_2');
lm1 <- lm(Y~0+ X_1); lm2 <- lm(Y~0+ X_2); lm12 <- lm(Y~0+ X_1+X_2); R2<-matrix(rep(NA,3),nrow=3); rownames(R2)<-c('Y ~ 0+ X_1','Y ~ 0+ X_2','Y ~ 0+ X_1 + X_2'); R2[,1] <- c( summary(lm1)$r.squared, summary(lm2)$r.squared, summary(lm12)$r.squared); colnames(R2)[1]<-round( summary(lm1)$r.squared + summary(lm2)$r.squared - summary(lm12)$r.squared,4); outHTML(rhtml, t(R2), title="R^2_1+R^2_2-R^2_12", format="f", digits=4);
outHTML(rhtml, summary(lm1)$coefficients, title=rownames(R2)[1], format="f", digits=4);
outHTML(rhtml, summary(lm2)$coefficients, title=rownames(R2)[2], format="f", digits=4);
outHTML(rhtml, summary(lm12)$coefficients, title=rownames(R2)[3], format="f", digits=4);
outHTML(rhtml, S, title="correlation\n", format="f", digits=4); if (rawdata) outHTML(rhtml, x, title="Raw data\n", format="f", digits=4); </R>
R 代码
cy1 <- 89; ## \angle YX_1 cy2 <- 89; ## \angle YX_2 c12 <- 177.9; ## \angle X_1X_2 N <- 100; rawdata=TRUE; S <- matrix(rep(1,9),3); S[1,2]<-S[2,1]<-cos(cy1/180*pi); S[1,3]<-S[3,1]<-cos(cy2/180*pi); S[2,3]<-S[3,2]<-cos(c12/180*pi); require(MASS);## install.packages('MASS'); x<-mvrnorm(n=N,mu=c(0,0,0),Sigma=S,empirical= TRUE); Y<-x[,1];X_1<-x[,2];X_2<-x[,3]; colnames(x)<-colnames(S)<-rownames(S)<-c('Y','X_1','X_2'); R2<-matrix(rep(NA,3),nrow=3); colnames(R2)<-c('R^2'); rownames(R2)<-c('Y = b_1*X_1 + e','Y =b_2*X_2 + e','Y =b_1*X_1 + b_2*X_2 + e'); lm1 <- lm(Y~0+X_1); lm2 <- lm(Y~0+X_2); lm12 <- lm(Y~0+X_1+X_2); R2[,1] <- c( summary(lm1)$r.squared, summary(lm2)$r.squared, summary(lm12)$r.squared); R2 R2[1,1]+R2[2,1]-R2[3,1] summary(lm1) summary(lm2) summary(lm12) cat('\ncorr') S cat('\nraw data') if (rawdata) (x);
Online calculator for critical values, cumulative probabilities, and critical noncentral parameters
Input
<Rform name="dncpx">
Choose a statistic: <input type="radio" name="name" value="t" checked/>, <input type="radio" name="name" value="chisq"/>, or <input type="radio" name="name" value="F"/> (noncentral parameter of or must be non-negative)
noncentral parameter (ncp)
- = <input name="ncp" type="text" size="5" maxlength="10" value="4.5"> ( vs. <input name="ncp_c" type="text" size="5" maxlength="10" value="0"> )
degree freedom
- = <input name="df1" type="text" size="5" maxlength="10" value="4">
degree freedom of denominator (only for )
- = <input name="df2" type="text" size="5" maxlength="10" value="3">
Mark the points with cumulative probability
- = <input type="input" name="q0" maxlength="10" value=".025" size="5">
and the critical statistic
- = <input type="input" name="x1" maxlength="10" value="5.1" size="5">
Click to update display with precision <input type="submit" name="submit" value=".001"><input type="submit" name="submit" value=".0001"><input type="submit" name="submit" value=".000001"><input type="submit" name="submit" value="fully">
</Rform>
Results of critical statistic, cumulative probability, and critical noncentral parameter
<R output="display" name="dncpx" iframe="height:400px;"> digits <- ifelse(exists("submit"),ifelse(as.character(submit)=='fully',Inf,round(log(as.numeric(submit),.1))),3) if (exists("ncp")) ncp <- as.numeric(ncp) else ncp <- 4.5 ncp_c <- ifelse(exists("ncp_c"), as.numeric(ncp_c),0); if (exists("df1")) df1 <- as.numeric(df1) else df1 <- 4 if (exists("df2")) df2 <- as.numeric(df2) else df2 <- 3 if (exists("name")) name <- as.character(name) else name <- 't' if (exists("q0")) q0 <- as.numeric(q0) else q0 <- .025 if (exists("x1")) x1 <- as.numeric(x1) else x1 <- 5.1
ncpt<-function(x,q,df,confirm=FALSE){ .f<-function(ncp,x,df,q)abs(q-pt(x,df=df,ncp=ncp)) .n<-1; while ( ( (pt(x,df=df,ncp=-.n) < q+(1-q)/2 ) | (pt(x,df=df,ncp=.n) > q/2) ) & (.n < Inf) ) .n <- .n *2 ; if (confirm) optimize(f=.f,x=x,df=df,q=q,interval=c(-.n,.n)) else optimize(f=.f,x=x,df=df,q=q,interval=c(-.n,.n))$minimum }
ncpchisq<-function(x,q,df,confirm=FALSE){ .f<-function(ncp,x,df,q)abs(q - pchisq(x,df=df,ncp=ncp)) if (pchisq(x,df=df)<=q){ if (confirm) { minimum <-0; objective <- pchisq(x,df=df)-q; data.frame(minimum,objective) }else 0 }else { .n<- 1; while ( (pchisq(x,df=df,ncp=.n) > q/2) & (.n < Inf) ) .n <- .n + 1; if (confirm) optimize(f=.f,x= x,df=df,q=q,interval=c(0,.n)) else optimize(f=.f,x= x,df=df,q=q,interval=c(0,.n))$minimum } }; ncpf<-function(x,q,df1,df2,confirm=FALSE){ .f<-function(ncp,x,df1,df2,q)abs(q - pf(x,df1=df1,df2=df2,ncp=ncp)) if (pf(x,df1=df1,df2=df2)<=q){ if (confirm) { minimum <-0; objective <- pf(x,df1=df1,df2=df2)-q; data.frame(minimum,objective) }else 0 }else { .n<- 1; while ( (pf(x,df1=df1,df2=df2,ncp=.n) > q/2) & (.n < Inf) ) .n <- .n +1 ; if (confirm) optimize(f=.f,x= x,df1=df1,df2=df2,q=q,interval=c(0,.n)) else optimize(f=.f,x= x,df1=df1,df2=df2,q=q,interval=c(0,.n))$minimum } };
d.f <- function(x,df1,df2,ncp=0) { delta <- 10^-6; (pf(x+delta,df1,df2,ncp)-pf(x-delta,df1,df2,ncp))/(2*delta) } df <- df1
if (name=='chisq'){ x <- seq(.001,ncp+df*9,length.out=500); pr <- dchisq(x,df=df,ncp=ncp); pb <- dchisq(x,df=df,ncp=ncp_c); x0 <- qchisq(q0,df=df,ncp=ncp); q1 <- pchisq(x1,df=df,ncp=ncp); ncp2 <- ncpchisq(x=x1,q=q0,df=df); pg <- dchisq(x,df=df,ncp=ncp2); q2 <- pchisq(x1,df=df,ncp=ncp2);
}else if (name=='F'){ x <- seq(.001,ncp/df1+9,length.out=500); pr <- d.f(x,df1=df,df2=df2,ncp=ncp); pb <- d.f(x,df1=df,df2=df2,ncp=ncp_c); x0 <- qf(q0,df1=df1,df2=df2,ncp=ncp); q1 <- pf(x1,df1=df1,df2=df2,ncp=ncp); ncp2 <- ncpf(x=x1,q=q0,df1=df1,df2=df2); pg <- d.f(x,df1=df1,df2=df2,ncp=ncp2); q2 <- pf(x1,df1=df1,df2=df2,ncp=ncp2);
}else{ x <- seq(min(ncp,ncp_c)-8,max(ncp,ncp_c)+8,length.out=500) pr <- dt(x,df=df,ncp=ncp); pb <- dt(x,df=df,ncp=ncp_c); x0 <- qt(q0,df=df,ncp=ncp); q1 <- pt(x1,df=df,ncp=ncp); ncp2 <- ncpt(x=x1,q=q0,df=df); pg <- dt(x,df=df,ncp=ncp2); q2 <- pt(x1,df=df,ncp=ncp2); } if (df <= 2) { pr[pr<=0] <- NA; pb[pb<=0] <- NA; pg[pg<=0] <- NA; pr[pr>10] <- NA; pb[pb>10] <- NA; pg[pg>10] <- NA; }
x0 <- round(x0,digits); q1 <- round(q1,digits); ncp2 <- round(ncp2,digits); q2 <- round(q2,digits);
pdf(rpdf, width=5, height=5) main <- paste('Pr(',name,'<',x0,';df=',df,',ncp=',ncp,')=',q0,'\nPr(',name,'<',x1,';df=',df,',ncp=',ncp,')=',q1,'\nPr(',name,'<',x1,';df=',df,',ncp=',ncp2,')=',q2, sep=) sub <-paste('\nnoncentral parameter = ',ncp,'(Red), ',ncp_c,'(Blue),\nand ',round(ncp2,digits),'(Black) which fits \nPr(',name,'<',x1,';ncp=',round(ncp2,digits),')=',round(q2,digits), sep=) plot(c(x,x),c(pr,pb),type='n',main=main,sub=sub,xlab=,ylab=paste(name,'probability density')) points(x[x<=max(x0,x1)],pr[x<=max(x0,x1)],col='green',type='h') points(x[x<=min(x0,x1)],pr[x<=min(x0,x1)],col='yellow',type='h') lines(x,pb,col='blue') lines(x,pr,col='red') lines(x,pg) </R>
and noncentral distributions (,, and )
Noncentral
Let ,i=0,1,2,... denote a series of independent random variables of standard normal distribution.
will be a random variable of distribution with df degrees of freedom. For any given series of constants ,i=1,2,...,df,
will be a random variable of the respective noncentral distribution with the same df and the distinct noncetral parameter
It is different from the random variable of the respective central distribution with a central drift.
Noncentral
For any given constant ,
is a random variable of noncentral t-distribution with noncentrality parameter
- ,
which is different from , the central t-distributed random variable drifted with the same mean.
If df on this display is set to (Inf in R) and noncentral parameter set to 0, a standard normal distribution will be plotted and critical z score calculated.
Noncentral
The noncentral parameter of F is only defined on its numerator. The noncentral F distributed
with noncentral parameter
- ncp=
is different from the central F distributed random variable plus the respective constant .
Confidence interval of standardized effect size by noncentral parameters
Confidence interval of unstandardized effect size like difference of means can be found in common statistics textbooks and software, while confidence intervals of standardized effect size, especially Cohen's and , rely on the calculation of confidence intervals of noncentral parameters (ncp).
A common method to find confident interval limits of ncp is to solve the critical ncp value for marginal extreme quantile. The ncp parameter of the black curve in the above diagram could be directly adopted. For example, can be 97.5% one-way confidence interval of ncp if observed , while change quantile from .025 to .975, we shall find that the two-way interval (1.139, 8.968) can be of 95% confidence level.
In case of single group, M () denotes the sample (population) mean of single group , and SD () denotes the sample (population) standard deviation. N is the sample size of the group. T test is used for the hypothesis on the difference between mean and a baseline . Usually, is zero, while not necessary. In case of two related groups, the single group is constructed by difference in each pair of samples, while SD () denotes the sample (population) standard deviation of differences rather than within original two groups.
- and Cohen's is the point estimate of .
So,
- .
T test for mean difference between two independent groups
or is sample size within the respective group.
- , wherein .
- and Cohen's is the point estimate of .
So,
- .
One-way ANOVA test for mean difference across multiple independent groups
One-way ANOVA test applies noncentral F distribution. While with a given population standard deviation , the same test question applies noncentral chi-square distribution.
For each j-th sample within i-th group , denote .
While,
So, both ncp(s) of F and equate
- .
In case of for K independent groups of same size, the total sample size is .
- .
T-test of pair of independent groups is a special case of one-way ANOVA. Note that noncentral parameter of F is not comparable to the noncentral parameter of the corresponding t. Actually, , and in the case.
RMSEA of Structural Equation Model
ncp of reported by Structural Equation Model softwares is proportional to the population value of , or the squared distance per df from population var-cov matrix to the model space.
Power vs. Standardized Effect Size or ncp
Power of t test for a given Cohen's
Example of one-group mean test
Input
<Rform name="tEx">
A normally distributed population, for example, IQ distribution of students, is sampled
- =<Input name="N" size="3" value="16"/>
times independently. The mean and standard deviation estimates from all samples are respectively denoted and in the current replication.
The statistical interest is usually on the mean of population, named ; sometimes also on the standard deviation of population, named . The statistic is defined as following --
It measures whether or not is significantly
- <select name="direction"><option value="gt" selected>greater than</option><option value="ne" >different from</option><option value="lt">less than</option></select> a baseline :=<input name="mu0" size="5" value="100.00"/>,
relative to the scale of standard error estimate of . If is really , the statistic distribution is known with noncentral parameter 0 and degrees freedom .
Type I error, denoted
- =<Input name="alpha" size="3" value=".05"/>,
defines the probability domain of the extreme values.
However, the real may be rather than . Then, the noncentral parameter of the statistic distribution will change to be
wherein is estimated by Cohen's . A known/hypothesized eg.
- <input name="delta" size="4" value="0.75">,
together with the sample size , will give a known/hypothesized noncentral distribution, while a alone without a given is helpless.
Change
- =<Input name="M" size="5" value="105.22"/> and =<Input name="s" size="5" value="16.72" />,
then verify whether they affect the statistical power.
<input type="submit" value="Click to Update"/> </Rform>
Results
<R output="display" name="tEx" iframe="height:320px;"> N <- ifelse(exists("N"), as.numeric(N),16) mu0 <- ifelse(exists("mu0"), as.numeric(mu0),100) alpha <- ifelse(exists("alpha"), as.numeric(alpha),.05) delta <- ifelse(exists("delta"), as.numeric(delta),0.75) M <- ifelse(exists("M"), as.numeric(M),105.22) s <- ifelse(exists("s"), as.numeric(s),16.72) direction <- ifelse(exists("direction"), as.character(direction),"gt")
ncpt<-function(x,q,df,confirm=FALSE){ if (q<=0) (+Inf) else if (q>=1) (-Inf) else if ((q>0)&(q<1)) { .f<-function(ncp,x,df,q)abs(q-pt(x,df=df,ncp=ncp)) .n<-1; while ( ( (pt(x,df=df,ncp=-.n) < q+(1-q)/2 ) | (pt(x,df=df,ncp=.n) > q/2) ) & (.n < Inf) ) .n <- .n *2 ; if (confirm) optimize(f=.f,x=x,df=df,q=q,interval=c(-.n,.n)) else optimize(f=.f,x=x,df=df,q=q,interval=c(-.n,.n))$minimum } }
pdf(rpdf, horizontal=FALSE, width=8, height=4)
alpha_r <- ifelse(direction == "ne",alpha/2, ifelse(direction == "gt",alpha,0));
alpha_l <- alpha - alpha_r;
se <- s/sqrt(N);
df <- N-1;
ncp <- sqrt(N)*delta;
t <-(M-mu0)/se;
tc_r <- qt(1-alpha_r,df=df);
tc_l <- qt(alpha_l,df=df);
d_l <- ncpt(x=t,q=1-alpha_r,df=df)/sqrt(N);
d_r <- ncpt(x=t,q=alpha_l,df=df)/sqrt(N);
sub <- paste("H0:blue central t; H1:red noncentral t\n",1-alpha," confidence interval of Cohen's delta\n",round(d_l,4)," ~ ",round(d_r,4),sep="");
if (direction == "ne") main1=paste("two-tail p value", round(1-pt(abs(t),df=df)+pt(-abs(t),df=df),4));
if (direction == "gt") main1=paste("right-tail p value", round(1-pt(t,df=df),4));
if (direction == "lt") main1=paste("left-tail p value", round(pt(t,df=df),4));
main1=paste("t value",round(t,4),",",main1,"\n",1-alpha,"confidence interval of mean\n",round(M+se*qt(alpha_r,df=df),4),"~",round(M-se*qt(alpha_l,df=df),4));
x <- seq(min(-4,ncp-5),max(4,ncp+5),length.out=200); op <-par(mfrow=c(1,2)); plot(x,dt(x,df=df),main=main1,sub=sub,xlab=,ylab=,type='l',col='blue'); points(x,dt(x,df=df,ncp=ncp),type='l',col='red'); x_reject <- c(x[(x > tc_r) | (x < tc_l)],tc_r,tc_l); points(x_reject,dt(x_reject,df=df,ncp=ncp),type='h',col='grey90'); points(x_reject,dt(x_reject,df=df,ncp=0),type='h',col='grey70'); points(t,dt(t,df=df,ncp=0));points(t,dt(t,df=df,ncp=0),type='h');
main2=paste("statistical power is", round(1-pt(tc_r,df=df,ncp=ncp)+pt(tc_l,df=df,ncp=ncp),4),"."); xN <- round(N/3):(3*N); plot(xN,1-pt(tc_r,df=df,ncp=sqrt(xN)*delta)+pt(tc_l,df=df,ncp=sqrt(xN)*delta),main=main2,xlab='sample size',ylab='statistical power',col='gray'); points(N,1-pt(tc_r,df=df,ncp=sqrt(N)*delta)+pt(tc_l,df=df,ncp=sqrt(N)*delta),col='red',type='h'); par(op); </R>
For two-related-group case, the difference scores between each pair of samples can apply one-group mean test interface. Usually is set to zero.
Power of test for a given Cohen's
Let's use denote the population of Cohen's , specially
in one-way ANOVA of groups setup with within-group sample size n and within-group population mean respectively. The noncentral parameter of the corresponding F or distribution is .
Power of SEM close-fit test for a given RMSEA
How to cite this page in APA style
In APA style this page can be cited in reference lists like --
Comparison of noncentral and central distributions. (yyyy, Month dd). In SlideWiki. Retrieved MM:SS, Month dd, yyyy, from http://mars.wiwi.hu-berlin.de/mediawiki/slides/index.php/Comparison_of_noncentral_and_central_distributions
For other styles, refers to examples on wikipedia.
External links
- Noncentral t-distribution on Wikipedia
- Noncentral distribution on Wikipedia
- Noncentral F-distribution on Wikipedia
- Confidence interval of Effect Size on Wikipedia