DEV Community

Ronald R
Ronald R

Posted on

Progress report 0.4

This week I started wroking on the issue, I tried understanding the layers upon layers of code I need to look into.

I ended up a total of 4 functions I need to fully understand and how to make kwargs work into it.

all_gather_into_tensor
all_gather_tensor_inplace
legacy_allgather
all_gather_tensor
traceable_collective_remaps
Enter fullscreen mode Exit fullscreen mode

From my understanding this is how it works.

We have all_gather_into_tensor as the main function we import it from torch.distributed.distributed_c10d and then we assign it as legacy_allgather from there it is called by traceable_collective_remaps to be rewritten in dynamo and then assign it to all_gather_tensor_inplace where all the variables and attributes would be placed finally into all_gather_tensor calling again all_gather_into_tensor to be placed in all_gather_tensor_inplace

Confusing? i know but thats the very very straight forward way of saying it.

Now I realize that we could just add the keyword kwargs** and be done with it but after analyzing is I saw that there was a total mismatch in arguments that might be causing the bugs, so I went on and ask ifit is happening with the another function. if my theory is correct and it is happening on another function then it means we have to go into all_gather_into_tensor and fix the problem from there if not then we just need to replace it straight forward.

I am very thankful that I learned testing because I know it is going to come in very useful for this one.

The first test I did was searching up wha the arguments looks like and why it wasnt accepting kwargs

and the result I got was this

First extra element 4:
'tag'

- ('args', 'kwargs', 'error', 'error_msg_dict')
+ ('output', 'input', 'group', 'async_op', 'tag', 'gather_dim')
Enter fullscreen mode Exit fullscreen mode

meaning there are mismatches on the elements being parse so that might be one of the issue.

I still have to keep checking, i did hope to submit a PR this week but I need a few more things to understand, that is why I might hop on zoom call tomorrow with the dev team and try to understand what is wrong. Let's see.

For now we continue!!

Top comments (0)