Graduating during the time of quarantine has been a unique experience. As I reflect on the past years, I'd like to share tips and advice for people interested in data science. I hope those are valuable regardless of whether you are pursuing an advanced degree or MOOC in applied data science.
Data science is an evolving field that roughly subdivides into 3 subfields:
- Analytics (expect to be well-versed in business lingos and non-technical communication)
- Inference (expect to understand experimentation, causality and statistical modeling)
- Machine learning (expect to know ML/DL algorithms at a deep level. Understand cloud technologies can be helpful too. Might need to specialize in a field such as computer vision or NLP)
Sometimes a role might entail a combination of these. Other related fields include data engineering, software engineer, or data visualization.
Knowing which track you are most interested in can help. Personalized education is not yet a reality, so you might want to DIY a curriculum. Many programs offer independent studies, where you can decide on a project and a lecturer will provide guidance. Another setting I liked is the inversed classroom where students learn from each other and the lecturer play a facilitator role.
Getting into a lab or collaborating with a research team can help develop domain knowledge even if your goal is not research. By default, you are most likely to be dedicated to applied/practical problems. So considering factoring some time for those too.
There will be no shortage of learning but a shortage of time. Given the limited energy, you can consider balancing these two to achieve the breadth and depth:
- exploitation (do only what you are good at)
- exploration (try a bit of everything else)
Sharing either concepts or projects are helpful in two ways:
- helps you fortify learning and identify weaknesses.
- your learning journey, thought process, trials and tribulations are helpful to others. You do not have to be the world's top expert before you can share learning or provide advice. Your experience can be valuable for those who yet to have it, and vice versa.
- at work you will notice that verbal, visual, and written communication skills matter as much, if not more, than your Python/R skills. And it is important to communicate the impact in a concise way.
If possible, explaining the concepts as simply as possible. There are no shortage of technical papers, but few can explain complex concepts like to 4-year-olds. This is probably why youtube channels like 3blue1brown, 2 Minute Papers and StatsQuest are popular. Unless your goal is academic publishing, you might want to make it more accessible.
After I learned more about design and web, I made this table of recommended tools for building a portfolio depending on whether you have design/web dev experience or not.
Before I chose my first elective my advisor asked me if I prefer to ease in or ease out the workload. Two ways of facing challenges and getting un-stuck are:
- Decrescendos: if you habitually solve hard challenges, some day you will find everything easy. This is applicable is you are moving from hard sciences to other fields.
- Crescendos: if the topic at hand is too challenging, find a simpler explanation first. After establishing some foundation, tune up the level. This is applicable in more scenarios.
Learning is a life long journey but life is more than learning. So it is important to find a balance, and seek help early when you stress out. You can plan out different pathways to peace in advance, and deploy them when stress levels go through the roof, as it will happen.
If you have a local or online community, there are ways to help organize or get involved. If not you can find 2-3 like-minded friends to explore topics of interest and give each other feedback.
Going to conferences allow you to immerse in different topics and talk to people from different perspectives. Many conferences offer student volunteering opportunities, which come with a free ticket.
Each conference has its own persona. Last year a lecturer recommended me attend the Eyeo festival for data visualization. It is a great sense of community. I then went to Strata Data Conference in NYC, which has a more commercial feel. This year I volunteered at data science conference ODSC East, where most speakers are PhDs working in the industry, and quite a few are involved in open-source and open-tech. I've seen other more academic-minded graduate students presenting at conferences like ICML etc.
For introverts, online conferences might be easier to handle. And organizers are taking efforts to make the online experience immersive.
There is a natural tendency to procrastinate. Strategic procastination might even be conducive to creativity. But the field of data science moves very fast, and there is an influx of people from other fields. There might be PR and economic reason for this. But I also choose to think it is partly because there is a real value in leveraging data to solve organization problems, commericial or social. Taking stock of the competition, you may want to start early in whatever you are doing.
In grad school study is largely self-directed. As such you need to be disciplined in owning your progress, ideas and unique contributions. This might be particularly relevant in the post-pandemic virtual new norm.
Clockify - a browser plug-in for tracking time
Evernote for note taking
There are often human behind the numbers and algorithms. It is good to be aware of the biases, privacy concerns and consequences.
Some of the recommended books (I've yet to read):
Weapons of Math Destruction
Invisible Women: Data Bias in a World Designed for Men.
Cover image credit: Photo by Matt Howard on Unsplash