<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: jesusbaquiax</title>
    <description>The latest articles on DEV Community by jesusbaquiax (@jesusbaquiax).</description>
    <link>https://dev.to/jesusbaquiax</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F656609%2F22a4e462-9bd4-40d9-84a1-e4838e451465.png</url>
      <title>DEV Community: jesusbaquiax</title>
      <link>https://dev.to/jesusbaquiax</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jesusbaquiax"/>
    <language>en</language>
    <item>
      <title>Updating your GitHub Personal Access Token for Windows</title>
      <dc:creator>jesusbaquiax</dc:creator>
      <pubDate>Wed, 08 Dec 2021 19:23:07 +0000</pubDate>
      <link>https://dev.to/jesusbaquiax/updating-your-github-personal-access-token-for-windows-38p7</link>
      <guid>https://dev.to/jesusbaquiax/updating-your-github-personal-access-token-for-windows-38p7</guid>
      <description>&lt;p&gt;Happy December everyone! My Personal Access Token (PAT) for GitHub recently expired and I completely forgot how to update it. After spending about thirty minutes reading different online articles that had a variety of methods to update your PAT. I want to provide a condensed set of instructions that will hopefully be helpful to anyone whose PAT has recently expired. These instructions are for Windows and allows you to cache your PAT so that you don't have to input it every time you git clone/ git push a new repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1) Create Personal Access Token
&lt;/h2&gt;

&lt;p&gt;GitHub provides a clear and succinct set of instructions to create a PAT on their website. After creating your new PAT, do not close the window that contains the new PAT until you have finished these steps. Link  to instructions is &lt;a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2) Add Credentials to Windows Credential Manager
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Go to Credential Manager&lt;/li&gt;
&lt;li&gt;Click on "Windows Credential"&lt;/li&gt;
&lt;li&gt;Click on "Add a generic credential"&lt;/li&gt;
&lt;li&gt;For "Internet or network address" Type in: 'git:&lt;a href="https://github.com"&gt;https://github.com&lt;/a&gt;'&lt;/li&gt;
&lt;li&gt;For "User name" type in 'GitHub User Name'&lt;/li&gt;
&lt;li&gt;For "Password" paste in 'Personal Access Token'&lt;/li&gt;
&lt;li&gt;Click OK&lt;/li&gt;
&lt;li&gt;Rejoice! You Are Done!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the previous GitHub Credential is still listed under "Generic Credentials", you can update the PAT by clicking on the GitHub credential and then clicking on the "Edit" button and pasting in the new PAT in the password field.&lt;/p&gt;

&lt;p&gt;To ensure that everything was done successfully, try to git push your local repo to GitHub. If you get no error messages, then the update was a success. Let me know if this was helpful or not. Thank you!&lt;/p&gt;

</description>
      <category>github</category>
      <category>beginners</category>
      <category>codenewbie</category>
    </item>
    <item>
      <title>Acknowledging Societal Bias as a Data Scientist</title>
      <dc:creator>jesusbaquiax</dc:creator>
      <pubDate>Tue, 14 Sep 2021 00:03:30 +0000</pubDate>
      <link>https://dev.to/jesusbaquiax/acknowledging-societal-bias-as-a-data-scientist-2of6</link>
      <guid>https://dev.to/jesusbaquiax/acknowledging-societal-bias-as-a-data-scientist-2of6</guid>
      <description>&lt;p&gt;One of the many aspects that a data scientists must deal with on a day to day basis is addressing statistical bias within the models and datasets they use. Another aspect of the work that is less often mentioned is addressing societal bias on statistical and machine learning predictions. Mitchell et al's article: &lt;a href="https://arxiv.org/abs/1811.07867"&gt;"Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions"&lt;/a&gt; attempts to offer a concise reference for thinking through the choices, assumptions and fairness considerations of predictions-based decision systems. &lt;/p&gt;

&lt;p&gt;As the use of prediction based and decision making machine learning models continues to grow and interweave itself into the societal fabrics of one's day to day life. It has become all the more important to investigate and know if any bias are found due to statistical or societal bias. Mitchell et al uses the two real world examples of pretrial risk assessment and lending models to summarize the definitions and results that have been formalized to date. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wHt4Ux5F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/aij7vsh8zpj7zgj3j3ym.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wHt4Ux5F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/aij7vsh8zpj7zgj3j3ym.png" alt="types of Bias"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Th article highlights three examples of societal bias within predictive modelling. The first is the data: Even if the data were to be representative and accurate, they could perpetuate social inequalities that run counter to the decision maker's goals. In the case of pretrial risk assessments, using arrests as a measure of crim can introduce statistical bias from measurement error that is differential by race because of a racist policing system.&lt;/p&gt;

&lt;p&gt;This bias is compounded because statistical and machine learning models are designed to identify patterns in the data used to train them and to an extent reproduce unfair patterns. This also takes place in the parameters that a data analyst uses in the models such as choosing the class of a model, which metrics to focus on for interpretability, and which covariates to include in the model. &lt;/p&gt;

&lt;p&gt;Lastly, a final choice in mathematical formulations of fairness is the axes along which fairness is measured. In most cases, deciding how attributes map individuals to these groups is important and highly context specific. In regulated domains such as employment, credit, and housing, these so-called “protected characteristics” are specified in the relevant discrimination laws. Even in the absence of formal regulation, though, certain attributes might be viewed as sensitive, given specific histories of oppression, the task at hand, and the context of a model’s use.&lt;/p&gt;

&lt;p&gt;As a junior data scientist, this paper can be informative both in the job application process and in understanding the company's work culture. One can elevate themselves as a more ideal candidate by acknowledging the implicit societal bias in the data / model that they will work with during their interview. Even if one doesn't fully comprehend or know all the factors at play, a recruiter will appreciate a new approach or that you, as a candidate, have domain knowledge about the industry. Knowing how a company addresses societal bias could also be a helpful indicator to a job applicant on the type of work culture at the company they are interested in applying to. &lt;/p&gt;

&lt;p&gt;As the two main examples listed in the article show: there are many important and real life applications were the policy and decisions made could have adverse effects to a certain demographic, but not others. As American society is becoming more cognizant about the societal inequalities endemic in its systems and structures, it is becoming more important to also be aware of the bias in machine learning models that play a decision or prediction roles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Citation
&lt;/h3&gt;

&lt;h5&gt;
  
  
  Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions
&lt;/h5&gt;

&lt;p&gt;Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, Kristian Lum&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Logistic Regression Workflow Guide</title>
      <dc:creator>jesusbaquiax</dc:creator>
      <pubDate>Tue, 10 Aug 2021 12:19:16 +0000</pubDate>
      <link>https://dev.to/jesusbaquiax/logistic-regression-workflow-guide-ni1</link>
      <guid>https://dev.to/jesusbaquiax/logistic-regression-workflow-guide-ni1</guid>
      <description>&lt;p&gt;If you are one of the many whose first time working on a logistic regression project is through a boot camp program, you might feel overwhelmed about where to start to begin when working on your project. This guide will hopefully serve as a beacon of light in order for you to better approach your project and spend less time trying to figure out your approach and more time practicing and improving on your understanding of logistic regression modelling.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I'll be using the project I worked on while in my intensive bootcamp program: the Terry Stop Dataset. This tutorial will highlight all the steps one should take when finding a predictive logistic regression model through multiple iterations.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Dataset&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I used the &lt;a href="https://catalog.data.gov/dataset/terry-stops"&gt;Terry Stop dataset&lt;/a&gt; that is publicly available on the seattle.gov website. This data represents records of police reported stops under Terry v. Ohio, 392 U.S. 1 (1968). Each row represents a unique stop. Each record contains perceived demographics of the subject, as reported by the officer making the stop and officer demographics as reported to the Seattle Police Department, for employment purposes. Where available, data elements from the associated Computer Aided Dispatch (CAD) event (e.g. Call Type, Initial Call Type, Final Call Type) are included. I chose the 'arrest flag' column as my target column and included the demographic, frisk, police officer YOB, and date column as my features.&lt;/p&gt;

&lt;p&gt;First Decision - Deciding on which columns you will use in your modelling. You can mix and match and see which combinations work better than others. For my project, I decided to stick to the same set of features, you could choose to create a data frame with just demographics + precinct and another one with demographics and date reported and compare.&lt;/p&gt;

&lt;p&gt;Second Decision - Deciding on how you will clean your data. This will change depending on which columns you decide to use. Basic data cleaning include removing NaN values and consolidating similar values into one value.&lt;/p&gt;

&lt;p&gt;Third Decision - Deciding on which predictive models you will use. I used a logistic regression, KNearestNeighbors, RandomForestTree, and DecisionTree models. I used pipelines to expedite the process and cut back on code. &lt;/p&gt;

&lt;p&gt;Fourth Decision - If you use a gridsearch which parameters will you decide on using and how many options will you provide for each parameter for each model. This might be limited based on your hardware capabilities. &lt;/p&gt;

&lt;p&gt;Fifth Decision - Deciding on which model will be your final one. This depends on many factors about your dataset. In this project, there was a class imbalance in my target column so I focused on precision, recall, and F1 score.&lt;/p&gt;

&lt;p&gt;Sixth Decision - Deciding on which visualizations you will use. Ideally you will want visualizations that are easy to read, add to your story, and is catered to the stakeholder.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Libraries&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Many of the libraries imported below should be familiar to you by now. If you are interested, you look at the documentation to learn more about each. I'll briefly go over the lesser known ones that I used now. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;I used &lt;em&gt;datetime, date&lt;/em&gt; to transform the 'date reported' column into a datetime(n64) dtype and then created two new columns with the month and the year respectively.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I used mdates, cbook, mtick, and calendar to transform the month numbers to names when creating the visualizations of terry stops by month and year.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;'&lt;br&gt;
import pandas as pd&lt;br&gt;
import numpy as np&lt;br&gt;
from datetime import datetime, date&lt;br&gt;
import matplotlib.pyplot as plt&lt;br&gt;
%matplotlib inline&lt;br&gt;
import seaborn as sns&lt;br&gt;
from sklearn.preprocessing import StandardScaler, OneHotEncoder&lt;br&gt;
from imblearn.over_sampling import SMOTE&lt;br&gt;
from sklearn.model_selection import train_test_split, GridSearchCV&lt;br&gt;
from sklearn.dummy import DummyClassifier&lt;br&gt;
from sklearn.linear_model import LogisticRegression&lt;br&gt;
from sklearn.neighbors import KNeighborsClassifier&lt;br&gt;
from sklearn.ensemble import RandomForestClassifier&lt;br&gt;
from sklearn.metrics import classification_report, confusion_matrix, plot_confusion_matrix, \&lt;br&gt;
accuracy_score, recall_score, precision_score, f1_score, log_loss, roc_auc_score, roc_curve&lt;br&gt;
from sklearn.compose import ColumnTransformer&lt;br&gt;
from imblearn.pipeline import Pipeline&lt;br&gt;
from sklearn.tree import DecisionTreeClassifier&lt;br&gt;
import matplotlib.dates as mdates&lt;br&gt;
import matplotlib.cbook as cbook&lt;br&gt;
import matplotlib.ticker as mtick&lt;br&gt;
import calendar&lt;br&gt;
'&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Below is the link to project repository on github. You'll find markdowns for almost each cell of code and a explanatory narrative of my process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/jesusbaquiax/dsc_phase3_project"&gt;Link&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Hopefully This guide serves as a basic outline for you as you begin your logistic regression project. Please feel free to reach out if you have any questions or comments about anything!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Jesus Baquiax: Using Data Analytics to Inform Educational Approaches</title>
      <dc:creator>jesusbaquiax</dc:creator>
      <pubDate>Tue, 29 Jun 2021 23:20:26 +0000</pubDate>
      <link>https://dev.to/jesusbaquiax/jesus-baquiax-using-data-analytics-to-inform-educational-approaches-2mg1</link>
      <guid>https://dev.to/jesusbaquiax/jesus-baquiax-using-data-analytics-to-inform-educational-approaches-2mg1</guid>
      <description>&lt;p&gt;Over the past 6 years, I have worked in an Operations role at two different charter schools in Brooklyn. I had a variety of roles and responsibilities, but the favorite part of my job was always the data analysis of student investment, academic and behavior data, and teacher performance using google spreadsheets and excel. I enjoyed the google search for a method that someone else used that would work with the issue at hand and I would get a rush of excitement whenever I would figure out a long formula I was working on. &lt;/p&gt;

&lt;p&gt;I first learned about Data Science from a co-worker who was always impressed with what I was doing with google spreadsheets and strongly encouraged me to look into Data Analytic Bootcamp Programs. My co-worker also mentioned that someone they knew had taken the program and enjoyed it and was able to get a job after the program. As I began doing my research, I realized that this was similar to what I was doing in a different and more robust "language" and at a much higher difficulty/intensity.&lt;/p&gt;

&lt;p&gt;I want to explore how data science can help schools and the education technology industry improve in order to better support the students they serve especially in lower income communities. &lt;/p&gt;

&lt;p&gt;One specific question I want to explore is how New York City (NYC) charter schools have performed over time with the New York State Exam (NYSE). The NYSE is a required exam that all NYC Department of Education (DOE) public schools have to administer and is used to determine whether a student will be promoted to the next grade. I don't believe there is a analysis that exists that answers this question and I think this is important to 1) to see if charter schools are a viable option for parents to send their kids to who want to see their children be well educated and 2) and which charter schools have performed more exceptionally than others. &lt;/p&gt;

&lt;p&gt;Flatiron School was the program my co-worker recommended to me and when I began looking deeper into the program, the two main factors that appealed most to me were its job application network and full-time program. Flatiron School has multiple locations across the U.S. and access to their respective job market and having the option to apply to various locations with the support and network Flatiron school offers will provide me an advantage when I begin applying for jobs. I was also intrigued by the schedule and curriculum that their full time program offers. I have discovered that the best way for me to learn a new skill is to fully dedicate myself to the absorption of the material and not get distracted by other commitments.&lt;/p&gt;

&lt;p&gt;In the future, I want to apply the data analytic skills I learn from this program to become a data analyst or some similar capacity for a education technology company, a large charter school network, or State Department of Education.&lt;/p&gt;

&lt;p&gt;Hopefully I'll have more exciting news and updates to offer in the coming weeks and months. Stay tuned!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jesús Baquiax&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
  </channel>
</rss>
