It is a truth universally acknowledged that a data scientist in possession of a good portfolio must be in want of a job. A curated selection of your projects is the best way to showcase your work, interests, and thinking to potential employers. But what makes a "good portfolio"? From my experience and discussions with colleagues, I summed up four tips that you should apply to make your data science portfolio stand out from the crowd.
When you started learning data science, you've most probably learned to predict Boston house prices, classify Iris flowers or handwritten digits, and wrangle features associated with Titanic survivors. Though these projects are a good start for learning the basic concepts of data science, they have become so common that they don't impress anyone anymore.
So instead of investing more time on these datasets, pick a new one of your own interest, apply different models and answer questions that you'd find insightful. I personally focused on projects that reflect my interest in Linguistics and NLP – you can explore data related to your experience or the industry you'd like to work in.
Kaggle is a great resource of datasets for all sorts of topics, but if you want to be even more original, you can request access to some research datasets (e.g. JAFFE), export your personal data from an app (e.g. Goodreads) or create your own dataset.
The hard work is done: you've analysed the data, made predictions and maybe some pretty charts, and uploaded your Jupyter notebook to GitHub in a repository with a descriptive name.
You would probably not buy a book or read a paper without checking out its blurb or abstract first, to see if it's interesting and worth your time. Same goes for data science projects: don't expect people to browse through all the files or read the whole Jupyter notebook in your repository – a project portfolio is not a school assignment, it's a pitch! This means you have to give your audience a taste of your work and the value they can gain from it.
This is what the README is for. In this markdown file, you should include a description of the project (what, why, and how you did it), results (key findings), setup configuration (how the users can test the project, libraries used), and at least one image (insightful charts or screencasts). Also, you should adapt the writing style and structure to the target audience of your project.
For example, for my research project on psych-verbs I wrote a paper-like README targeted at academics/fellow linguists, whereas for my movie recommender system I wrote an informal short description and included a screencast, aimed at a general audience.
The point of a data science portfolio is to give employers an idea of your potential and prove that you can complete a project from idea to presentation, thus displaying both hard skills (coding and analysis) and soft skills (communication and presentation). As in almost all areas of life, I believe it's important to focus on quality over quantity.
It's better to have only two projects that are really well-done than tens of repositories with incomplete projects or code errors. Also, people tend to get stressed out and lose interest if they are given too many options (i.e. the paradox of choice). That's why it's important to curate your work.
On GitHub, up to six repositories are displayed on your main profile. You can either create public repositories of only your best projects or pin the ones you want to highlight.
At the end of the day, you are more than your work. You are not only a data scientist, but maybe also a photographer, a former lawyer, a musician, a defi enthusiast – whatever it is, don't be afraid to show this side of you. All your experiences, interests, and hobbies shape your personality and give you a unique perspective in data science.
On GitHub you can easily personalize your profile by adding a special profile README and a custom status. You could share what you are currently working on, what languages or tools you're learning, and how people can get in touch with you. Here you can let your creativity flow and personality shine!
Creating a good data science portfolio takes time. Don't rush it, take your time to learn and polish both your coding and presentation skills. Good luck!