Steven here again.
Where to start... It was a busy busy week. Yes I know, I used the word 'busy' twice. Last week, I found an amazing project, BEHAFIOR-1K and it took me hours to understand the code that is related to the issue. Therefore, I wanted to find a bit of easy-going issue to contribute on. As I mentioned before, I am always interested in Machine Learning so I browsed through pandas
, numpy
and scikit-learn
repositories. They are the three famous libraries that is used by Machin Learning projects. I thought it would be a good chance to leave some trace on this big size project.
Finding a "Good First Issue" in Scikit-learn
After the tough first experience, I knew I needed a different approach. Since I’ve always been interested in ML, I decided to choose the Scikit-learn
repository. I figured, why not try to contribute to a tool I actually use? I went directly to their Issues
tab on GitHub and started looking for labels like good first issue
or help wanted
. It felt way more focused, and that’s when I found it.
The issue
The issue was quite simple: change relative import paths
to absolute import paths
in some of their Cython
files.
from ...utils._typedefs cimport ...
from sklearn.utils._typedefs cimport ...
The code change itself was just changing relative path to absolute path. Even though it was simple task, I had to go through project files to find the functions or types to make sure I wrote the correct absolute path. It may seem simple but I learned more interesting new technique through this procedure.
Cython???
Okay, so the task was just changing an import path. Simple, right? But I did learn some new technique that can be useful in future.
The files I was editing weren't standard .py
files; they were .pyx
and .pxd
files. This was my first real encounter with Cython
, which is a programming language used to give Python
C-level
speed. It’s one of the secret sauces that makes libraries like Scikit-learn so fast.
from ...utils._typedefs cimport float64_t, float32_t, intp_t
At first, I had no clear idea what cimport
or float64_t
meant. I learned that cimport
is a special Cython
command to import
C-level
definitions. And things like float64_t
are basically C-style
"nicknames" for NumPy
data types.
By using these C-style types, Cython knows to treat them as simple, super-fast C variables instead of slower Python objects. So, while I was just fixing a file path, I accidentally learned a fundamental concept about how Python libraries are optimized for performance.
I don't know if you can get this feeling from my blog posts but I am having lots of fun doing this. I hope everyone else is experiencing the same.
Top comments (0)