Over the 2 years of my MS in Data Science program at UPenn, I had the opportunity to work on quite a few projects. However, one of the projects that really stands out was the Cash bail project with the ACLU of Pennsylvania. This was a project under the Data Science for Social Good initiative of the Penn Data Science Group (https://penndsg.com/). The reason this project stands out was that the paper we wrote based on this project, was accepted at the Data Science for Social Good workshop of The Web Conference 2020 (https://www2020.thewebconf.org/), and this was definitely a highlight for PDSG.
A key component of pretrial justice reform has been to overhaul the cash bail system, which has been known to unfairly and disproportionately affect poor communities of color. The lack of clear standards and the variability in decision making by magistrate judges has resulted in a system that is inconsistent in determining who must pay cash bail as well as the amount of bail. Together with the lack of any metrics or transparency associated with this process, the problem has escalated. The American Civil Liberties Union (ACLU) has been at the forefront of this fight to reform the bail determination system. To support their efforts, we utilized a dataset containing all criminal offenses charged within the state of Pennsylvania in 2016-17 in order to derive analytical insights, and perform a series of statistical tests to evaluate and quantify biases in the bail determination system in Pennsylvania. In addition to this, we also developed a predictive model and utilised model interpretability tools to evaluate the factors affecting bail type assignment. One of the key focus areas of our work was the determination of racial bias, particularly between the defendants identified as Black and White in the court records. Through our work, we were able to develop a pipeline for identifying racial bias and conclusively establish that bail assignment patterns do vary across judges.
For this project, we had 2 key goals
- Data analysis to investigate bias in cash bail assignment
- Develop an informative dashboard to display key statistics and results
For the first goal, we relied mostly on Pandas and Scikit-learn packages of Python. All our code was run through Jupyter notebooks.
For the second goal, the dashboard was built entirely on RShiny. We also had to develop some R code for wrangling the data, before it was passed to the dashboard for display. The dashboard was then hosted using the shinyapps platform.
This project was special because of the impact that our work was making. The results from our analysis are actively being used to bring about judicial reforms. Moreover, this was the first time that I led a data science project and it was a great learning experience for me on how to manage a team. This project was handled completely by students, from connecting with ACLU to the final publication of the paper. None of the project members had any previous experience in writing academic papers, but we were able to pull it off and get it accepted at such an important conference.