Discussion on: Diversity Matters in The Workplace

View post

Replies for: Wait... But wouldn't these details be paved on the requirements and problem assessment, even before the solution engineering starts? Wouldn't the...

Thank for your patience!! This clarifies a lot of the issues. After this starting point, I can pick up searching and knowing more.

DrBearhands • Jun 5 '19

I wish people would stop using the AI examples for this, as it is a very poor argument.

When training an ML model, or doing any kind of statistics, you must ensure your test set is representative of the population you are going to make a statement about. Sex and skin color are blatantly obvious cases in vision systems. There are biases that are far harder to detect but have the same result on an individual. Adding team members with different sex or skin colors might fix this particular symptom, but the problem is that your data-gathering is inadequate.

For instance, a little ago there was a post about a soap dispenser using AI/computer vision to recognize whether a hand was beneath it, but only worked for light skin colors. The argument was made that a more diverse team would have spotted this problem, completely missing the point that a cheap sensors would have been a more robust solution and would work for different skin colors, missing fingers, tattoos...

There's a further problem. Often, there just aren't enough willing participants to get a representative data-set. This is a well-known problem in academia. Many people just have better things to do than subject themselves to some tests they do not understand for a few bucks. While we should be wary of unrepresentative data-sets, often the only alternative is doing nothing at all.

There are good arguments (beyond public relations or social injustice) for at least male/female diversity, and there are excellent arguments for tearing down some of the 'soft barriers' keeping mostly women out of STEM. This just isn't one of them.

Alvaro Montoro • Jun 6 '19

I understand the AI example may not be the best, but it's a sign of something bigger. And it is not limited to poor test data. As I put in a different comment, development is not only coding, it involves all the steps in the SDLC, and data gathering too.

The data-gathering is definitely inadequate, but it's not an excuse either. Training data doesn't show up out of thin air, it is created and gathered by people (or algorithms created by people), which may influence its representation of the population and neutrality.

Even if the data is wrong, and the training is wrong. Nobody realizing that the accuracy was so far sided is a sign that they were oblivious to a sex/skin color issue. No one thought "hey, we have 99% accuracy for white men, but 65% for black women"? And if they did, nobody did anything? That's not a data gathering issue.

I agree that it is not always possible to get a good representation of the population. But in this day and age, with many free sources available for images and portraits, having bad data for a vision system is a poor excuse.

DrBearhands • Jun 6 '19

I can't really see the point you're trying to make here. Nevertheless, I think there are a few problems with what you're saying.

it's a sign of something bigger

Yes, it is a product of a divided society. The reasoning "biased AI → we need diversity in tech" does not hold though.

And it is not limited to poor test data [...] data gathering too.

If you know a good example about how diversity in the development team can profit the company, use that rather than AI. Let's not dilute good arguments with bad ones.

You also appear to assume the entire team is responsible for the whole process, which is often not true. Essentially this issue only matters for QA.

The data-gathering [...] sex/skin color issue.

I think you've missed my point here. There are an uncountable number of biases your dataset might have. A good data-gathering process ensures samples are representative of the final use case. Skin color issues are an indicator that the data-gathering process is poor and produces bad results. That is a problem in and of itself. Adding a black woman to the team might solve this particular issue, but the team is still going to produce dangerously biased models, with biases that are far less obvious to notice.

and the training is wrong

This is unlikely to be the case. ML will just match the data, whatever that is. Beyond having a model that is too simple, which will result in low accuracy, bias of the model after training is a reflection of the bias in the input data.

with many free sources available for images and portraits, having bad data for a vision system is a poor excuse.

This would cause exactly the bias problems I was talking about. Data gathering is hard. You can't just download some pictures and expect it to be an unbiased dataset.

I'd like to reiterate: I'm not making and argument against diversity. I've had rather good experiences pair-programming with women; men and women have different ways of tackling problems and there's definitely a "stronger together" effect. I would, however, like to see the argument of biased AI go away.

If you add bad arguments to good ones, the good arguments lose credibility.