[Thanks to Scott Glover for comments]
Someone said to me: ``The smaller the p-value, the higher the likelihood under the alternative.’’
They probably mean:
``The smaller the p-value, the higher the likelihood ratio under the alternative vs the null.’’
This statement ignores the fact that under low power conditions, 100% of the significant effects will be based on overestimates of the true effect. This is what Gelman’s Type M error is all about, and it is a shocking fact, which I demonstrate below.
Suppose the alternative is that \(\mu = 0.1\), let sd = 1. Thus, we assume here that this specific alternative is in fact true.
Power is low for sample size 10:
mu<-0.1
power.t.test(d=mu,n=10,sd=1,alternative="two.sided",type="one.sample",strict=TRUE)
##
## One-sample t test power calculation
##
## n = 10
## delta = 0.1
## sd = 1
## sig.level = 0.05
## power = 0.0592903
## alternative = two.sided
Simulate some data and compute likelihood ratio Ha/H0 using the true value of 0.1, and using the sample mean. The latter is what we normally do.
nsim<-10000
ttestpval<-likratsample<-means<-rep(NA,nsim)
for(i in 1:nsim){
y <- rnorm(10,mean=mu,sd=1)
ttestpval[i]<-t.test(y)$p.value
means[i]<-mean(y)
## likelihood ratio under sample mean:
likratsample[i]<- -2*log(prod(dnorm(y,mean=0,sd=1))/prod(dnorm(y,mean=means[i],sd=1)))
}
Create a data frame with sample means and p-values, and likelihoods:
(criticalval<-qchisq(0.05,df=1,lower.tail=FALSE))
## [1] 3.841459
likratpval<-pchisq(likratsample,df=1,lower.tail=FALSE)
dat<-data.frame(means,ttestpval,likratsample,likratpval)
head(dat)
## means ttestpval likratsample likratpval
## 1 0.05127913 0.9030325 0.02629549 0.8711808
## 2 0.46804303 0.1630380 2.19064275 0.1388514
## 3 0.17049681 0.6508300 0.29069161 0.5897777
## 4 0.24739473 0.4041785 0.61204151 0.4340202
## 5 0.11691442 0.7022102 0.13668981 0.7115942
## 6 0.09655920 0.7593891 0.09323679 0.7601019
Earlier I wrote incorrectly: ``Note that the t-test based p-values will be significant more often compared to the likelihood based p-values.’’ Actually, the t-test and likelihood ratio test are completely identical (see the Casella and Berger textbook). So they should deliver the same proportion of significant results:
mean(dat$ttestpval<0.05)
## [1] 0.0612
mean(dat$likratpval<0.05)
## [1] 0.0662
What we also see is that the t-test p-value is strongly correlated with the likelihood ratio, as claimed.
plot(likratsample~ttestpval)
abline(lm(likratsample~ttestpval),col="red")
If we focus only on likelihoods associated with significant effects, the likelihood ratio is correlated with p-values computed using the t-test:
dat<-subset(dat,ttestpval<0.05)
plot(dat$ttestpval,dat$likratsample)
abline(lm(dat$likratsample~dat$ttestpval))
But note that all of the estimates associated with significant effects are over-estimates:
hist(dat$means,main="estimated means")
## true mean
abline(v=mu)
## 100% of the means are overestimated:
mean(abs(dat$means)>mu)
## [1] 1
Further, note that Type M error tends to go up to 12 if we focus only on significant effects:
## distribution of type M error
hist(abs(dat$means)/mu)
Finally, note that the likelihood ratio increases just in case the sample mean estimate is an overestimate.
plot(dat$means,dat$likratsample)
abline(v=mu)
So, all significant effects are the result of overestimated means; both Type S and M error are in play here.
This is true even if we use the p-values computed using the likelihood ratio test:
plot(dat$means,dat$likratpval)
abline(v=mu)
Why should we care that a p-value is associated with the likelihood ratio under the alternative vs the null, if the estimates associated with 100% of the p-values are unrealistically high?