Mistakes to Avoid During Python Library Creation

The creation of a Python library can be daunting. If you haven’t already created one then I’m sure you will end up making some mistakes, unless you come across some good advice.

Luckily for the reader, I have listed some irritating but common issues and technical problems that you might encounter and how you can take care not to repeat them, and thus overcome them.

There are many mistakes I made while making a library, unaware of the consequences I might face regarding the overall design as well as the hindrance to productivity!

Why do I stress and list these out? Because your design decisions can have a huge impact on your library creation. In the following lines, we will look at the common issues.

1. Choose a Name

Choosing a name is hard, especially if you are going to open-source it! Good luck finding an unused repo name on GitHub and an unused package name on PyPi, seriously! I had to change the entire package name to something else because I was a few days late.

2. Set Up a Git Workflow at the Earliest

Good luck trying to manage a new feature or fixing a bug without having a Git setup. It just makes your life easier. Seriously if you aren’t using git oh boi, I wish you luck, you are going to learn it the **hard way.*

3. Using Conventional Commits

Conventional Commits can help you identify which commit does what, and also can help when you are using Semantic Versioning.

Semver is simple tbh. 3 digits, x.y.z ** one for **MAJOR, MINOR, PATCH. With any commit using ‘fix:’, should increment the z, *any *‘feat:’ usually updated the y, and any major change should end up in x.

More of the above can be read from here:

Conventional Commits

Semantic Versioning 2.0.0 | Semantic Versioning (semver.org)

4. Chose a Design Pattern

Chose a design pattern at the earliest and save time converting modular code to OOP patterns.

5. Use Multiple Python Environments

Yes, create multiple Python environments for testing the library’s compatibility on different Python versions! A simple trick is just to test it out on Google Collab for example! Be it a conda or virtualenv!

Managing environments — conda 4.10.3.post40+c1579681 documentation

How to create a virtual environment for python 3.7.0? — Stack Overflow

6. Use Pre-Commits

I wish I knew about git-hooks earlier, it can save so much time by performing formatting, checking for commit messages, proper doc-strings, sorting imports. Just grunt work, which can be automated.

Highly recommend you to check this Medium post by Khuyen Tran out for more on how to set up pre-commits.

4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python | by Khuyen Tran | Towards Data Science

7. Use NumPy/Jax.NumPy/torch

Using a list is a bad choice. NumPy is a lot faster than default Python lists, because NumPy is based in C and uses BLAS, and other libraries under the hood, moreover it makes your life easier to perform matrix operations using lists that are inefficient.

8. Decide Soon Whether to Support Cython/PyPy/Numba

Although PyPy supports Python out of the box, it only supports 3.7.1 at the time of writing and can lead to some missing features in standard Python libraries. Numba again will force you to adopt a certain coding style with its decorators, and other issues which may pop up due to @jit, @nojit.

Upon further testing, the Numba decorators don’t speed up my code; however, that’s also down to the design decisions.

Cython: C-Extensions for Python

Numba: A High-Performance Python Compiler (pydata.org)

9. Avoid Multi-threading Due to GIL

10. Make the entire pattern and mathematical library centered around the crucial decision as to whether you want to support multiprocessing! Your code structure including new feature should adapt well to this pattern.

11. Use gc or del or delete Unwanted Objects

12. Instantiate the self.parameter

Carefully instantiating these parameters can prevent your locals from clashing with global (namespace collisions), and thus, you are assured there is no weird behavior.

13. Write Unit-tests or pytests

This can again save a lot of time by just running tests and being assured your code runs.

Also check the output of each return value, despite automated tests. Sometimes you might have to define a test, but you need to identify what are the possible sources of bugs, which may be only possible by running multiple tests on different values or doing it manually. This can be done by writing examples notebook on Colab and testing different things out!

14. Polish Current Features instead of Obsessing over New Features

I’m guilty of doing this, probably learning this as we speak, adding features isn’t bad but needs to be a fine balance, you shouldn’t get carried away!

Do not obsess over adding new features when your current features aren’t polished enough, adding new features can then aggravate you’re already existing issues and bugs. Good luck with trying to solve the same bugs across all the features you implemented.

And that’s it. Thank you for reading.

Ciao!

Stalk me on Twitter! 👀

Agrover112 (@agrover112) / Twitter

or see my garbage code on GitHub 💻:

Agrover112 (github.com)

The package I created that I talk about:

Agrover112/fliscopt: Algorithms for flight scheduling optimization. (github.com)