It is the year 2025, everybody and their grandma have asked ChatGPT about the meaning of life. While we cannot be sure whether it generated a hallucinated answer, we do know that LLMs are developed using Python. Data scientists today are expected to work with AI/ML models and therefore Python (see below), effectively settling the age-old "Python vs. R" debate.
Package and environment manager
To keep your projects tidy and make your code reproducible on any machine, you need to keep track of the version numbers of the project's dependencies. Package and environment managers help you with that. There have been many package managers (conda, pip etc.) and perhaps even more virtual environment managers (virtualenv, pyenv, pipenv etc.). The general consensus nowadays is that you should just use uv as it combines both functions while being faster than the other solutions.
Development environment
Jupyter notebooks are great to get started: easy to setup and run interactively (cell by cell). However, in the real world you will be expected to ship code to production in the form of scripts and apps, that is, not notebooks.
You could copy-paste code from a Jupyter notebook to a text editor, but there's a more convenient way: integrated development environments (IDE) like VS Code and Cursor. Not only do they combine file explorer, text editor and terminal in one application, but there are also many extensions. They will make your life easier, e.g., code formatter, linter etc. Plus, you don't need to give up on Jupyter notebooks. You can create and run them inside of VS Code/Cursor. Lastly, they allow you to take advantage of AI features like tab/auto completions, making you even more productive.
How to get started with VS Code and uv
- Download and install VS Code: https://code.visualstudio.com/Download
- Install uv by executing the following command in your VS Code terminal:
- (Linux/MacOS)
curl -LsSf https://astral.sh/uv/install.sh | sh
- (Windows)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
- (Linux/MacOS)
- Install Python:
uv python install
- Install a package:
uv install pandas
- Create a project:
uv init your-project-name
- Add a dependency to your project lock file:
uv add pandas
- Create a script: example.py
print('Hello world!')
- Run a script:
uv run example.py
Skills
I have analyzed postings of data science jobs from AI frontier labs, like OpenAI, to identify those skills that are most likely future-proof.
Programming languages
Python and SQL are listed as required qualifications in all listings. R was not mentioned even once.
General capabilities
- Design statistical experiments
- Conduct A/B tests
- Define and operationalize metrics
- Visualize results, dashboarding
- Communicate with stakeholders
- Prototyping
- Run simulations
- Version control (git)
Frameworks, modules and tools
A list of popular, but not necessarily required experiences:
- Pandas, NumPy, scikit-learn, flask
- Seaborn/Matplotlib, Tableau/Power BI
- GitHub
Conclusion
AI will certainly change how data scientists will work going forward. However, I believe that LLMs will not replace them. Instead, there will be a growing need for capable data scientists that are able to uncover the failure modes of today's AIs and design better systems.
Top comments (0)