StateOf Vizz

Posted on Jun 2, 2022

Testing the influence of the gender parameter in the State of JS 2021 survey

Following several complaints about the State of JS data being skewed towards male respondents, the purpose of this article is to test the influence of the gender parameter on the answers. As a matter of fact, for the 2022 survey, 71% of the respondents were men (out of 12 991 respondents). The aim is to confirm whether gender has a significant impact on the respondent's answer.
If it does not, then the imbalance in the data does not generate a bias in the answers.
If it does, it means that oversampling male respondents leads to bias in the answers.
Answering this question scientifically is a first step to improve the survey sampling process, but also to reduce the risk of bias during the data analysis. In this survey, the respondents first give information about themselves (giving their gender, years of experience, etc.) and then answer to various questions (notably on JavaScript or CSS language features, saying if they have never heard of it, have heard of it or have used it). In this article we will focus on the technical knowledge part of the survey.

1. Chi2 test

We want to do a segmented analysis on the gender pattern. It will test the homogeneity of the sample “non-male” compared to the male sample that is over-sampled. It can be done using a Chi2 test. This will help answer our problematic in a scientific way, using statistical tests.
We do the test on 3 different features: a moderately used one (Proxies), a very used one (Optimal Chaining) and a not widely used one (Numeric Separator). Using an acceptance threshold: α=5% and the following hypothesis for each test:

H0: There is no significant difference in popularity/use of the feature related to the gender of the respondents.
H1: “Male” and “non-male” have different level of awareness of this feature.

If H0 is true, this means that the oversampling doesn’t lead to a bias on the answer, because male and non-male respondent answer the same way on this question.
If H1 is true, this means that we do have a significant bias.

A) Proxies

A very low p-value is obtained: about 2,423%. We reject H0 and accept H1 at a 5% risk.

B) Optional chaining

A very low p-value is obtained: about 0,1229%. We reject H0 and accept H1 at a 5% risk.

C) Numeric separator

A very low p-value is obtained: about 0,06094%. We reject H0 and accept H1 at a 5% risk.

Observations

Thanks to the Chi2 test performed on 3 languages of different popularity and the residuals, we can conclude at 95% that there is a difference in the level of awareness between male and female/non-binary/non-listed respondent. This test allows to reveal that there exists a difference in the homogeneity in the samples.
Many hypotheses can be raised to explain this difference. However, only a thorough qualitative study could confirm or infirm them. The point here is to formally prove the existence of a bias, before leading further research on how to correct this bias.
Nevertheless, this test does not indicate the proportion of the difference between the samples. The previous difference is given by the residuals and by looking at it for the 3 cases studied we can conclude that male use more these features than the non-male group. To confirm that, we seek to compare the frequency of occurrence of the different responses to study more precisely how this differs.

2. Frequency test

The chi2 test of equal frequencies checks whether the frequencies in each category are statistically different from each other. We want to compare the frequency of appearance of the response "have used it" in the responses of male with that of the non-male group.

Prerequisite:

The samples sizes are larger than 30 individuals.
The number of individuals who answered "have used it" in each sample is more than 5.
The number of individuals who did not answer "have used it" in each sample is more than 5.

In the test, we will be using the following hypothesis:

H0: The proportion of men who responded, 'have used it' (p1) is equal to the proportion p2, i.e. the proportion of women + NB + NL who responded 'have used it'.
H1: p1 and p2 are statistically different.

Table of result :

We therefore conclude that we are sur with a 95% confidence that the frequencies of "have used it" among male and non-male are different each time.
Also, for Proxies, Optional Chaining and Numeric Separators, we can tell that more non-male persons answered, "never heard" or "know about it" and male people have more used this language.

Conclusion

From the results obtained, it can be said that there is a difference between the results obtained by men and all “non-binary, non-listed and women”. As we suspected, the sampling bias does indeed translate into a bias in the overall results. We are continuing our research to design corrections for these biases, so that the answers of respondents who are now in the minority in the survey are better considered in our analysis.
As this issue has only been addressed on one parameter: gender, a more in-depth study could be conducted to find other parameters that influence the answers given by the respondents. For example, is there not a different level of experience between male and female, non-binary and non-listed respondents? Also, does the male/female ratio vary according to the level of experience? These questions may be addressed in a future article.

EPF (Ecole Polytechnique Feminine) is a French engineering school that trains generalist engineers and offers several majors, including the Data Engineering major.
This article was written by a team of 6 EPF students who are currently working on the State of JS survey: DELAHAYE Matthieu, DURAND Benjamin, GAUTRON Chloé, JOUBERT Maxime, JUSTE Lucas, RICHARD Solène.
Supervised by : BUREL Éric

Top comments (1)

Sacha Greif • Jun 13 '22

I'm the lead survey maintainer.This is very interesting, thanks for doing this analysis! But I think there's actually two related issues here:

1) Are different genders correlated with different responses?
2) Are men over-represented in the survey?

I think you've proven that 1) is true, although as you say it's probably due to differing levels of experience, and not gender itself. Correlation vs causation, etc.

As for 2), I think it's quite probable that it's also true, but the difficulty is defining "over-represented compared to what?".

While I think 1) is not really something to worry about or try to "fix", 2) definitely is. Even if we can't quantify the bias because we don't have a good reference or goal to strive towards, we can always make efforts to increase diversity and inclusivity and hope to see improvements from year to year.

DEV Community

Testing the influence of the gender parameter in the State of JS 2021 survey

1. Chi2 test

A) Proxies

B) Optional chaining

C) Numeric separator

Observations

2. Frequency test

Conclusion

Top comments (1)

Read next

Software Testing

Developing an Arabic Learning Terminal-Based App!

SSC CGL Typing Test, What is the Common WPM, What is a keystroke

Count Good Nodes in Binary Tree | LeetCode | Java