Context
In my quest to showcase my coding prowess in both Python and CUDA, and to prove my adeptness at seamlessly navigating diverse codebases, I sought out an opportunity for contribution. The natural choice was PyTorch, a prominent open-source machine learning library. What better way to demonstrate my versatility than by engaging with a project at the forefront of the deep learning landscape? PyTorch presented the perfect challenge, requiring a blend of Python expertise and an understanding of CUDA for efficient GPU acceleration. By diving into this vibrant community, I aimed to contribute substantively and underscore my ability to transition seamlessly between programming languages and tackle intricate, large-scale codebases. It's a journey of skill validation and a testament to my commitment to mastering the intricacies of cutting-edge technologies. 🚀🐍🔥
Finding an issue
My initial foray into contributing to PyTorch involved the crucial step of identifying a suitable issue to tackle. Navigating through the repository, I honed in on issues tagged with "good first issue" labels, recognizing them as ideal entry points for newcomers like myself. These labels serve as beacons for code changes that are not only manageable but also self-contained, providing a perfect starting ground for someone newly acquainted with the project. In this way, I aimed to ensure that my initial contribution would be a positive and constructive addition to the PyTorch codebase. The "good first issue" label became my compass, guiding me toward an opportunity to make a meaningful impact while acclimating to the intricacies of the PyTorch ecosystem. 🌱👩💻🔍
I chose this issue: https://github.com/pytorch/pytorch/issues/113198
It was really easy to replicate the error, and there was already a minor discussion about the starting point and some strategies.
The bug was because, for CUDA tensors, a user couldn't call a pow
function using Tensor and Boolean as arguments. It was generating an error message about that.
In [52]: torch.pow(torch.tensor([1, 2]), True)
Out[52]: tensor([1, 2])
In [53]: torch.pow(torch.tensor([1, 2], device='cuda'), True)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[53], line 1
----> 1 torch.pow(torch.tensor([1, 2], device='cuda'), True)
RuntimeError: false INTERNAL ASSERT FAILED at "/home/isuruf/git/pytorch/aten/src/ATen/native/cuda/PowKernel.cu":199, please report a bug to PyTorch. invalid combination of type in Pow function, common dtype:Longexp is integral?0
The maintainer wanted just to disable this possibility because sending a boolean as an exponent is kind of strange, to say the least.
However, other people with the same issues were discussing that they were expecting the same already existing behaviour considering CPU tensors. Also removing this possibility could cause some compatibility breaks.
Reading through Python's official documentation, I discovered that booleans in Python are a subtype of integers, so it actually made sense that someone would try to use them as an integer exponent.
The last thing to convince me that this would be a good idea was:
x=True
x+x
2
So I decided to write code to support a boolean as an argument, even if the maintainer wanted the opposite. As it would be a minor change, I didn't mind writing the code to bring this discussion. Sometimes, I think it's better to start a discussion with a working code to showcase the solution and its implications. It worked out in the end! 😉
Starting to work
Compiling PyTorch
So, kicking off any software gig involves just building the darn thing as is; no tweaks are allowed. 🏗️ Sounds easy, right? Well, hold your horses! It's like trying to tame a wild beast because software projects are this crazy maze of environments and dependencies. 🤯 Getting that baseline set without messing around is a bit like laying the groundwork for an epic adventure. You gotta grasp all the nitty-gritty stuff before diving into the cool customization part. This seemingly basic step is actually a secret weapon—it’s the key to tackling all the twists and turns in the software jungle. 🌐✨
Compilation instructions
https://github.com/pytorch/pytorch#from-source
I won't repeat their instructions here because that is subject to change, and then whatever I write here will become obsolete.
Just remember, use conda
because it will definitely make your life easier.
When you finally type python setup.py develop
, be prepared to wait for some hours if you're using your personal computer. Unless you have a really high-end machine, that is going to take a while. Cross your fingers that you have more luck than me and that everything goes great so you don't have to start all over again.
Problem with GCC13
My First problem was that it wasn't working for gcc13 for some CUDA code. So, I had to install gcc12 and set CC and CXX for gcc12.
I don't remember exactly what it was at the time, but here's a thread from May 2023 that it's on my web history. I wonder if my errors were the same...who knows? :)
https://forums.developer.nvidia.com/t/strange-errors-after-system-gcc-upgraded-to-13-1-1/252441/9
Anyway, that was just a matter of installing GCC12 and setting CC and CXX accordingly:
yay -S gcc-12
export CC=gcc-12
export CXX=g++-12
And then compilation went ahead.
Build was failing for tests
I also had to set BUILD_TESTS=OFF
because tests were not compilating right.
Below is the file where I found all those flags and its description. I highly recommend always looking because you can always find something useful. Sometimes, you just want to disable something so you can save some compilation time.
https://github.com/pytorch/pytorch/blob/main/setup.py
GLIBC problem
I was having this error: libstdc++.so.6: version GLIBCXX_3.4.32
Then I had to update my glibc with Conda. I think this might be related to my default GCC being GCC13 within my OS.
But to fix that, I did the following:
conda install -c conda-forge libstdcxx-ng
I never thought that conda would have a repository that had generic dependencies, as in packages that are not directly related to Python. That is indeed an amazing tool. And now, more than even, I understand why Python has so good package managers and environment switchers. It's because it only works if everything is meticulously tightened to the same version or everything will fall apart.
Smoke test
After my first compilation success, my smoke test was just opening python
in the PyTorch folder and checking if import torch
was working. As I don't have pytorch somewhere else in my machine, if that line worked, I knew that my first step was done! 💪
Changing code
Embarking on large software projects for the first time can feel like diving into a code labyrinth—daunting and disorienting. Navigating through the intricacies of an expansive codebase without a roadmap can be a real head-scratcher. Having a starting point handed to you is like finding a treasure map; it might not lead you directly to the X, but it sure beats aimlessly wandering through the code wilderness. Imagine trying to trace changes from the main function—it's like a quest with no end in sight! Getting a hint or a guide in the right direction can be a game-changer, turning what seems like an impossible task into a manageable adventure. 🗺️🕵️♂️
So my treasure map was given by a maintainer in the issue's comment section. It was called native_functions.yaml
.
You can think of this file as an API descriptor of Pytorch. All the functions that are exported for the user, on Python or C++, must be described in this YAML file.
More info about its description and format: https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native#readme
As I was trying to fix the pow
function, I went into this file looking for pow
, and then I found this block:
- func: pow.Tensor_Scalar(Tensor self, Scalar exponent) -> Tensor
device_check: NoCheck # TensorIterator
structured_delegate: pow.Tensor_Scalar_out
variants: function, method
dispatch:
SparseCPU, SparseCUDA: pow_sparse_scalar
tags: [core, pointwise]
A skilled software developer should embrace rather than fear the challenge of delving into unfamiliar code and definitions. It's a crucial skill to be able to decipher and understand written code, even when it's initially unfamiliar. The ability to glean insights just by glancing at the code is like having a superpower—it allows you to confidently navigate complex systems.
For this particular case, you can see that it has the same arguments as my broken pow
function (Tensor and a Scalar) and returns a Tensor. I have no idea what structured_delegate
means, to be honest, I still don't 😅. But I just figured that this pow.Tensor_Scalar_out
could be the function that I was looking for, and I was godd*mn right
I found this function implementation on cpp/Pow.cpp
:
TORCH_IMPL_FUNC(pow_Tensor_Scalar_out) (const Tensor& base, const Scalar& exp, const Tensor& out) {
if (exp.equal(0.0)) {
out.fill_(1);
} else if (exp.equal(1.0)) {
out.copy_(base);
} else {
pow_tensor_scalar_stub(device_type(), *this, exp);
}
}
After identifying an issue in the code, I needed to implement a fix. The problem was with the CUDA kernel, but upon examining the code, I noticed that for exponents 0 and 1, they didn't go into CUDA because it was trivial. For exponent=0
, the results were always 1, and for exponent=1
, it was the same value. Therefore, I decided to maintain this behavior, as there's always overhead involved in calling the kernel. As native Python treats True
as 1 and False
as 0, I modified the code to the following:
TORCH_IMPL_FUNC(pow_Tensor_Scalar_out) (const Tensor& base, const Scalar& exp, const Tensor& out) {
if (exp.equal(0.0) || exp.equal(false)) {
out.fill_(1);
} else if (exp.equal(1.0) || exp.equal(true) ) {
out.copy_(base);
} else {
pow_tensor_scalar_stub(device_type(), *this, exp);
}
}
After fixing I built my code again and tested if my changes did something recreating the scenario that the ticket creator has tried. Great success, no error message anymore!
Python 3.12.0 | packaged by Anaconda, Inc. | (main, Oct 2 2023, 17:29:18) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.pow(torch.tensor([1, 2], device='cuda'), True)
tensor([1, 2], device='cuda:0')
>>> torch.pow(torch.tensor([1, 2]), True)
tensor([1, 2])
>>> torch.pow(torch.tensor([1, 2]), False)
tensor([1, 1])
>>> torch.pow(torch.tensor([1, 2], device='cuda'), False)
tensor([1, 1], device='cuda:0')
Submitting the PR
Preparation
Before diving into open source contribution, it's imperative for every aspiring contributor to thoroughly read and understand the community guidelines. These guidelines serve as the holy grail, outlining standards and instructions ranging from formatting commit messages to the nuances of crafting a PR. They detail essential information, such as required tests, legal obligations, and the community's code of conduct. Ignoring these guidelines is akin to starting a journey on the wrong foot, especially in the realm of open source where community is paramount. Trust is the bedrock of open source collaboration, and maintainers, often inundated with review requests and issues, rely on contributors to adhere to established norms. Missing a crucial step could lead to your PR being overlooked indefinitely. Therefore, approaching open source with humility, attentiveness, and a commitment to following guidelines is key—read, comprehend, and don't hesitate to seek clarification if needed. Your contributions will be all the more impactful when aligned with the community's expectations. 🌐💡
For Pytorch there's:
- Guide for contributions (of all kinds)
- Code of conduct (Pay attention to your language, i.e. remember it's the MAIN branch, and not the word)
- Contributor License Agreement - (don't sue them)
PR experience
After everything I just went there and created my PR from personal branch. In my humble opinion, just go there, present yourself briefly be as polite and objective as you can.
Follow the standard PR message. For PyTorch you must mention your issue with Fixes XXXX
. Explain your code decisions, show at least that you tested something. And be open-minded and patient, a maintainer will review your code and be prepared to criticism and don't take it personally. It's possible that you think your approach is the best one and you might be right but always respect the maintainer opinion and if he asks to change something even after your argumentation, drop your PR or make the change. Maintainers have final word for a reason, they have been there for a long time and you won't imagine the different scenarios they have seem before, they probably have a reason to deny some codes and have little to zero time to explain why they will.
Back to the experience I was asked to create a test for my PR. Something that I had some trouble to understand but eventually I did 😅.
PyTorch has a complex test framework that is written because of many scenarios and devices that they need to test. Is related to some annotations about what devices, input, and operations that you would like to test.
Explanation is within this file: https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_device_type.py
For my case I just had to find the pow
test and add True
and False
as exponents on the test input and everything was done.
As I created my PR there was another person that sent me an e-mail saying that he was working on the same issue asking to work together. I added him as a co-author because it was the right thing to do for his courtesy.
PR approved! 🎉
Check for boolean values as argument on pow function. #114133
Hello everyone! 😄 Also @lezcano , nice to meet you! :)
Sorry if I miss anything, this is my first time around here. 🙃
This PR basically makes the same behaviour for cuda when using torch.pow
. Basically Python considers True as 1 and False as 0. I just added this check into pow
function. From what I understood, when I do .equal
for Scalar
that is boolean, I'm sure that types match so that won't cause more trouble.
I know that the issue suggest to disable this case but that could be a little more complicated, in my humble opinion. And that can create some compability problems too, I guess.
My argument is that code below is correct for native language, so I guess it does makes sense sending booleans as Scalar.
$ x = True
$ x + x
2
This was my first test:
Python 3.12.0 | packaged by Anaconda, Inc. | (main, Oct 2 2023, 17:29:18) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.pow(torch.tensor([1, 2], device='cuda'), True)
tensor([1, 2], device='cuda:0')
>>> torch.pow(torch.tensor([1, 2]), True)
tensor([1, 2])
>>> torch.pow(torch.tensor([1, 2]), False)
tensor([1, 1])
>>> torch.pow(torch.tensor([1, 2], device='cuda'), False)
tensor([1, 1], device='cuda:0')
I've run test_torch.py
and got following results, so my guess is that I didn't break anything. I was just looking for a test that uses linear regression, as suggested.
Ran 1619 tests in 52.363s
OK (skipped=111)
[TORCH_VITAL] Dataloader.enabled True
[TORCH_VITAL] Dataloader.basic_unit_test TEST_VALUE_STRING
[TORCH_VITAL] CUDA.used true
(I can paste whole log, if necessary)
If this is a bad idea overall, dont worry about it. It's not a big deal, it's actually a two line change 😅 so can we talk of how do things in a different strategy.
For the record I've signed the agreement already. And I didn't run linter because it's not working 😞 . Looks like PyYaml 6.0 is broken and there's a 6.0.1 fix already but I have no idea how to update that 😅
Fixes #113198
Top comments (0)