Sunday, September 28, 2008

Does being fair pay? Not always

There is a lot of discussion on Sailer's Blog about "the diversity recession." This reminds me about this post which references this post. Both conclude that in some cases it is unprofitable to treat people equally.

Suppose you have a measure (e.g., SATs, FICO score) that are suppose to be a proxy measured with error for some other quality (e.g., "intelligence", "credit worthiness"). Your job is to pick a single cutoff of the score to distribute some good (college admission, mortgage) that will minimize something bad (drop outs, defaults). Suppose that

1) Your proxy is a good predictor of the bad thing

2) There are two populations with different normal distributions of your proxy measure: one with a lower mean.

Should you be fair and use the same cut off for both populations or should you be mean and demand a higher score for the group with lower average? The second link above uses Bayes rule to claim that being mean is the most profitable. I was surprised at this until I wrote out Bayes theorem. Suppose FICO scores measure credit worthiness. If you use the same cut off for creditworthiness in both populations you will be more likely to incorrectly categorize the lower scoring group as creditworthy.

If you do not believe me, run the following simple simulation in R. Suppose you have two populations with different distribution of creditworthiness (a distributed like a fico score):

Group 1: sampg1_rnorm(1000,680,100)
Group 2: sampg2_rnorm(1000,600,100).

Suppose Fico are an unbiased measure of the above but are measured with error (SD=50)

coef_1
true_rbind(sampg1,sampg2)
fico.est_coef*true + rnorm(length(samp),0,50)

You set up a cut off of FICO 700 as a "good risk". How many people are you going to incorrectly categorize as "good risk" in group 1?

crit_700
clt1_(true[1,]<crit)*(fico.est[1,]>crit)
denom1_(fico.est[1,]>crit)
c(sum(clt1),sum(denom1),sum(clt1)/sum(denom1))

About 17%. How about group 2?

clt2_(true[2,]<crit)*(fico.est[2,]>crit)
denom2_(fico.est[2,]>crit)
c(sum(clt2),sum(denom2),sum(clt2)/sum(denom2))

About 30%. So, if the real world matched this simple example you should probably discriminate. No wonder they made it illegal, this crime could pay.

0 comments: