I made a follow-up package, scikit-multilearn-ng, to the widely used scikit-multilearn package for multilabel classification

northern-64bit — Sun, 28 Jan 2024 15:18:09 +0000

After needing to use scikit-multilearn and detecting errors, I opened a PR and waited. But after double checking I saw that there hadn't been any commits in 7 months (now 9 months) and that it had not been a release since 2018, I dug in and found out that no one had access to the PyPi credentials and so on. So I opened a discussion about creating a fork and many were eager for it.

So after some developing, I'm here to introduce scikit-multilearn-ng (GitHub: https://github.com/scikit-multilearn-ng/scikit-multilearn-ng), an advanced, open-source tool for multi-label classification in Python. It's a direct successor to scikit-multilearn and brings a host of improvements and new features.

What Makes scikit-multilearn-ng Stand Out?

Enhanced Integration with scikit-learn: This package not only integrates with the scikit-learn ecosystem but also extends its capabilities, making it a natural fit for those familiar with scikit-learn.
Expanded Algorithm Collection: Among its new offerings are StructuredGridSearchCV and the SMiLE algorithm, specifically designed for more complex multi-label classification tasks, including handling missing labels and heterogeneous features.
Open Source Philosophy: As a community-driven project, it's free to use and open for contributions, perfect for collaborative development.

Why Should You Consider Upgrading?

Ease of Transition: For those already using scikit-multilearn, upgrading is as simple as switching the dependency to scikit-multilearn-ng. Your existing code will work without any changes.
Active Development and Support: scikit-multilearn-ng offers bug fixes and new features, ensuring your projects stay current and robust.

Whether you're a seasoned Python developer or just starting out in machine learning, scikit-multilearn-ng is worth exploring.

Some Example Use Cases:

A simple example use case is iterative splitting multilabel data between train and test data while trying to maintain the distribution of each label between the training and test sets. This is particularly useful for datasets where certain label combinations are rare.

from skmultilearn.model_selection import iterative_train_test_split
import numpy as np

# Assuming X is your feature matrix and y is your label matrix
# X should be a numpy array or a sparse matrix
# y should be a binary indicator matrix (each label is either 0 or 1)

# Define the size of your test set
test_size = 0.2

# Perform the split
# The function returns flattened arrays, so you need to reshape them
X_train, y_train, X_test, y_test = iterative_train_test_split(X, y, test_size = test_size)

# Reshape the outputs back to the original shapes
num_labels = y.shape[1]
y_train = y_train.reshape(-1, num_labels)
y_test = y_test.reshape(-1, num_labels)

But it also supports advanced problem transformations to single label problems:

from skmultilearn.problem_transform import BinaryRelevance
from sklearn.svm import SVC

# Initialize and train
classifier = BinaryRelevance(classifier=SVC(), require_dense=[False, True])
classifier.fit(X_train, y_train)

# Predict
predictions = classifier.predict(X_test)

Please contribute and star the project!

I'm looking forward to your feedback, questions, and how you might use it in your projects!

Aspiring 16 year old quant developer contributing to Open Source Application

northern-64bit — Fri, 24 Dec 2021 19:26:40 +0000

You may wonder who I am after reading the title. I’m a 16 year old who is in high school, with the dream of becoming a quant developer. This is my dream as I know this role comes with many flexible tasks, it uses math & finance which I find very interesting and where I can, apply my coding skills.

I have been developing computer programs since I was 11 years old. The first programming language I learned was html if you even dare to call it one. Quickly after this, I learned JavaScript, Python and then C. Python is by far my favourite programming language, since it’s easy to use, clear and has many powerful libraries.

I started to make my own programs and discovered that I could use Python libraries to speed up development. From this, I learned about open-source projects which helped me a lot as I was able to read code from more experienced developers which helped me learn solutions in my code and improved my coding skills. After a while, I thought it was time to publish my own open-source repository: A stocks discord bot, which is a bot version of one of my GitHub stock programs, to possibly help other developers and traders.

Another thing that I’ve heard that’s very important is networking. So, I thought that it would be best to contribute to a project with a large community, high coding standard (to learn from it) and experienced contributors that are ready to help. If I stick to this, I thought, I may even get to get some valuable connections to industry professionals.

In August 2021, I sent a message to the Gamestonk Terminal (GST) Discord after seeing their brilliant work and discovering it. At that time, I was working on my previously mentioned open-source Discord bot and was motivated to continue with it. However, I immediately changed my mind after seeing all the features of GST and their contributor’s work.

The repository was nearly a match made in heaven, since it was scripted in python, has multiple pull requests merged every day, and maintains high code standard by experienced developers. In addition, it’s the best financial open-source project on GitHub (at least according to me).

Based on my experience with Discord bots, I got the idea of implementing one specifically for GST. The more I thought about it, the better of an idea it was in my mind. The reasons were that the bot can be widely distributed to phones, since it’s so easy to use via the Discord chat and can even be used by non-tech savvy users, a bot makes it easier to get fresh and easily shareable data for your investment conversations.
I knew that by adding the bot to GST, the project would grow and be better since more users would get to know it. So, I asked in the Discord server of the repository if I could help and possibly make a Discord bot to make it more widely distributed and usable on the phone.

To my surprise the response was very positive and I started to develop it right away. In the beginning I was a bit lost, since I have never thought about code architecture (most of my other applications were a 1000+ lines in one huge file) and I got the responsibility of the whole project. These troubles got quickly fixed after some calls with the creator of GST, Didier R. Lopes, who really helped me; I learned lots about making a robust, structured and easily understandable application.

From then on it was just adding feature after feature and improvements from many other GST contributors. This was the case until I had several difficult challenges. The first one was that we wanted to implement a menu like the terminal has – this was solved by adding reactions: 0,1,2,3,..9 to the message so the user could select the command through a reaction. The next challenge was when a menu had more than 10 commands – which I solved by implementing “pagination”, which is a sort of scrolling system via buttons formatting the message like a book. However, this resulted in an additional bug with the emoji detection system due to it being loop and the “pagination” being a loop too. Therefore, I started to experiment with multithreading of which I knew nothing about. But after some time, I managed to merge the code bits from the two loops together into one loop.

Overall, the development process was exciting and a great learning experience that I wish every other young developer can have. My two cents are to make real useful code with simplicity and understandability in mind to improve your code, since it helps more than leetcode and super theoretical programs that never can be contributed to by other programmers.

The hard part (or rather the time-consuming part) is to understand code from other people to the extent that you can contribute to it in a meaningful way, so I needed to learn to use many other libraries. This is not meant to discourage you, but to get real hands-on experience with the libraries. It’s also more meaningful to learn more new libraries and function if it has a purpose.

The Discord bot has a multitude of functions from the terminal and is easy to set up & host so that it’s easy to use the terminal on any device and to share it with other people. It’s also awesome to use it to show the underlying data of your investing thesis quickly to your friends in your own Discord server.

Here’s a link to it:
https://github.com/GamestonkTerminal/GamestonkTerminal/tree/main/discordbot

Currently I’m looking forward to improving the bot and continue to work with the GST team. My long-term goal is to become a quant (quantitative analyst/researcher/developer), but there’s a long way there since I’m currently only in high school. Thus, I’m ready to contribute on other finance open-source applications at any time, so please contact me on GitHub: https://github.com/northern-64bit