Discussion on: Understanding racial bias in machine learning algorithms

View post

It's worth noting that 68.3% white is not that far outside the statistical population representation. As of 2019 census estimates, the US is 60.4% white. Also, the same data shows another self-reported measure of white ethnicity as 76.5%. Since Stack Overflow's data is both self-selected and self-reported, it's hard to say how far out of line with population it is. Also note that the percentage white is usually marginally higher in each previous year going back at least through the emergence of software development as a career. If we looked at the breakdown by age, I would predict to see more skew towards white males in older groups, both due to population makeup and more pronounced bias toward white males in STEM careers in previous decades (for reasons of class economics, social pressure, and prejudice). In short, this single data point is not enough to show current skew in hiring. In fact, in some markets like Silicon Valley, Whites are statistically underrepresented due to an overrepresentation of Asians, though this does not imply that Whites are being biased against so much as it's an artifact of other factors increasing the representation of Asians in STEM in the US. From here, we could dive further into STEM graduation rates, etc.

My only point in this being that we have to be cautious about how we interpret this data, too, and not jump to conclusions. It's only useful as a metric.

This in no way undermines your main point. We need to be cautiously aware of all biases in our data and our algorithms, racial or otherwise. Researchers have know this for decades and still regularly get it wrong. By comparison, ML practitioners are babes in the woods.

Source: census.gov/quickfacts/fact/table/U...