Jlovborg posted an interesting paper in the comments by Aguinis et.al (2010). The paper challenges the systematic overprediction of minority performance that I mentioned in this post. The paper is technical and poorly presented and I would not normally blog on it. However, the main author is a big shot who co-authored a amici curiae brief in the Ricci case contending that New Haven had not properly validated the fire fighter test. That is, he wrote against Frank Ricci. This paper was well covered in the press since it asserts that the statistical procedures used to assess tests bias may themselves be biased.
The most common method to assess test bias in two populations is to assess via linear regression how well tests scores (e.g., SATs) assess later outcomes (e.g. GPAs). If the relationship between test scores and the later outcomes are the same in the two groups then the test is considered "unbiased." Most tests find the minority overprediction phenomenon shown in the graph below. That is, both groups have the same slope but the low scoring group has a lower intercept indicating a lower level of achievement associated with the same test score.
- Underpowered to detect slope differences
- Overestimate differences in the intercepts (m1 - m2).
Point one, if true, has no methodological implications. Tests creators just need larger samples (say n > 500) to rule out significant slope differences. The second point, if true, is a bigger problem since it implies that regardless of sample size you cannot properly estimate the intercept difference.
I am not an expert on psychometrics but I am skeptical of the veracity of 2. Without getting into the technical details, the authors test for the difference in intercepts (i.e., m1 - m2) by looking at difference in R-Squared instead of the coefficient estimates themselves. The simulations are parameterized in terms of R-squared and they provide some perplexing results as shown in their graph below.
Panel C below shows that the type I error increases as the difference in the intercepts increases. Since a type I error is not defined if the intercept difference (m1-m2) is not zero they something else is being tested. Panel D shows that the type I error increases with the sample size which implies that the test used is useless and the authors do not name a replacement.


3 comments:
Thanks.
"Next, we provide examples of the types of mechanisms that may cause slope-based differences across ethnic-based groups...(p. 652)"
I remember going over this paper with that jackass Greg Landen over at Discover blog. My reply to him was: "Sorry, g-loaded differences; we're already beyond the "sociohistoric" explanations."
Landen is a Jackass. This paper was poorly refereed. I hope someone sets the authors straight.
Post a Comment