DEV Community

Cover image for 10 Portfolio Projects you can try as an entry-level Data Analyst/Scientist
Durgesh kumar prajapati
Durgesh kumar prajapati

Posted on

14 12 12 12 12

10 Portfolio Projects you can try as an entry-level Data Analyst/Scientist

I hate the word newbie. If you are in a hurry, skip to the third paragraph. I always do this “catching up” thing before going straight to the point.

In the course of my journey, here are 10 projects I had engaged in to build my portfolio/career.


1.Crop Recommendation System

Tools used: Python, HTML, CSS, Flask, Basic ML knowledge
Difficulty: Easy

This was the first project I ever did and even though I hate it so much now, I’m so proud of it. I built a decision tree model that recommends the best crop under certain weather and soil condition. I deployed it locally using Flask and I have a terrible version of the project on my github currently so I do not want to link it. When I push a better version, I will link it here.


2.Movie recommender system

Tools used: Python, Knowledge of NLTK and Cosine Similarity, Heroku, Streamlit
Difficulty: Medium

Now, this was my second project but it was nothing like the first project. It uses NLP and cosine similarity. I had just finished Andrew Ng’s Machine learning course on Coursera and watched a TMDB movie recommender tutorial on YouTube so I built one on the Netflix dataset. I also worked on streamlit to allow user access and even deployed using Heroku. For me, this is the hardest project I have ever done. I even cried. Currently, I have learned better ways to do things but I did learn a lot from it.This is a link to the github. It needs some tidying but it’s not that terrible.


3. Forbes 2022 EDA using Python

Tools used: Python (Pandas and Matplolib)
Difficulty: Easy

This was the first EDA project that I published. I had written about it too on this link. The project was easy, it made me realize you learn from small projects too. I revised my knowledge of Pandas and Matplolib. I also learned how to ask the right questions, and how analysis is targeted toward uncovering something. A whole lot of people got to know me through this project too. This is a GitHub link to the project.


4. Market Basket Analysis

Tools used: Python(pandas, matplotlib, association rules)
Difficulty: Medium

I haven't posted about this project yet but it’s one of the projects I think a data analyst should try. You get to understand association rules, how products in a company sell, and which products are best sold with each other. How a high-sales product can aid in selling a low-sales one and so on. I enjoyed learning and doing this one and might be pushing it on my GitHub soon but before then you should research and try it. It is easy.


5. Implementing Gayle-Shapley’s Stable Matching Algorithm

Tools Used: Python
Difficulty: Medium

Now, this isn’t a data-related project. I went for an academy program last year that is python oriented and I was opportune enough to implement this algorithm in python. This algorithm is so interesting. The Gayle-Sharply matching algorithm is aimed at ensuring stable matching. The end goal is meant to be that everyone gets married to a (man)/(woman) and they are all happy with their matches. They all get to be with their most available preference. I don’t think I am explaining it well enough. I might dedicate a whole post to it but before then, you can read/research about it on google.


6. The Bechdel test

Tools used: Tableau, Python (For analysis)
Difficulty: Easy

The Bechdel Test ascertains there exists at least a scene in a movie where a woman speaks to another woman and it isn’t about a man. I will definitely write a post about this project. It’s one of the ones that hooked me on the first read. The moment I heard of this test, I wanted to do something with it to tell people about it. I linked it with the evolution of feminism and researched if the impact of feminism has improved how society viewed women. As such, I grouped the years into different centuries and observed the number of movies that passed the test over the years. I even made a tableau visualization for it but I haven’t perfected it yet. I haven’t posted about it either.


7. Sentiment Analysis Project

Tools: Python, NLTK, Power BI
Difficulty: Easy

I had done a sentiment analysis project when black panther 2 came out and I did another recently with two different libraries. It’s quite easy to do and I think it’s something every data analyst should try. I even visualized it using Power BI and I dared to use a black background. Yes. I did that. Here is a link to the post: Black Panther.


8. Data science job salaries

Tools Used: PostgreSQL, Excel, Power BI
Difficulty: Medium

Again, one of the projects that made me out there. I got so many reviews and feedback on this project. I used SQL, Excel for cleaning, and Power BI for visualization. I had written about it and published it too on this link. The data was gotten from this link and I explored the salaries of data professionals by their professions, mobility, employment type, and many more. SQL was used for the data analysis. I had used window functions and subqueries and honestly, I was able to properly practice what I had learned.


9. Classification of a phishing mail

Tools used: Python
Difficulty: Hard

This is one of the toughest projects I have engaged in. I built models that classify phishing emails and non-phishing emails using email structure, stylometric features, and so on. It took quite a time. I worked on feature extraction, data cleaning, dimensionality reduction, cross-validation, and model building. explored different evaluation methods too. I haven't pushed this on my GitHub either but I will soon. I don’t think I can make a post about it though.


10. Open Source Contribution

There are still some more projects to talk about but the number 10 project will be to contribute to open source. I learned unit testing, git and so much more through open source. It is something I don't do often because I always have little jobs that keep me so occupied but once I have a full-time job, I will definitely become a regular contributor. There is so much to learn and open source is one of the fastest ways to learn them.


Connect With Me:
LinkedIn : https://www.linkedin.com/in/durgesh4993/
GitHub : https://github.com/Durgesh4993
Twitter : https://twitter.com/durgesh4993

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay