Residue Number Systems for GPU computing as indie-researcher. Thoughts?

Murage Kibicho — Mon, 19 May 2025 17:39:08 +0000

I've been thinking about "Are there analogs to parallel computing rooted in number theory? Like a way to emulate a GPU on a regular CPU, but not through hardware. Rather by replacing GPU threads with concepts from prime numbers and finite field theory?" I know this sounds cookiesque.

However, I care about this question because, in a world where AI is becoming commonplace, being GPU-poor is somewhat akin to being locked out of the future. There must be some way to perform lots of matmuls (or at least an intelligence-evoking amount of muls) on consumer CPUs. Maybe we just haven't invented the right number system. I believe Math is mostly invented, rarely discovered. ie binary 1's and 0's are surely discovered. However, floating-point, and fixed-point, are human inventions tweaked for specific use cases.

In fact, researchers built Residue Number Systems(RNS) in the 1960's as an alt to binary Base-2 arithmetic for wildly parallel computing. However, they fell out of favor because finite field theory (the foundational math for RNS) supports addition and multiplication : neither division nor comparisons are supported. Here's the thing : matrix multiplications are merely mults and adds. RNS is great for general matmuls but fails at stuff like softmax and backprop.

I argue that solving the division, and comparison problem for RNS is the key to solving GPU poverty. RNS's foundational research was done in the 1960's. What if math invented in the 21st century is needed to solve this?.For context, only after the invention of new mathematics in the 20th century did Wiles prove Fermat's Last Theorem.

I know "Talk is Cheap" and that I need to "Be the change i want to see in the world".

So here's everything I attempted with the RNS problem and how it failed :
*Please be lenient with criticism. I'm not VC funded so I do this during my nights and weekends after coming home from my day job.

Finite Field Gemm (Attempting to rewrite backpropagation within a residue number system)

It works in principle but in practice, it involves big integer multiplications and additions. So during the backward pass, the deltas are rather large making it impossible to converge towards a local minima.
Link : https://leetarxiv.substack.com/p/ffgemm-finite-field-general-matrix

Mediant 32 (A fractional number system to get rid of explicit division in RNS) - We can't divide efficiently in an RNS so why not have an explicit integer and denominator? It works extremely well, honestly, I was impressed. The only difficulty is with handling overflow. The fractions tend to become quite large and the absence of division makes simplification difficult. Link : https://leetarxiv.substack.com/p/mediant32-intro
Discrete Logarithm Neural Networks (A network architecture built around big integers because my biggest challenge was dealing with big ints) - It actually works. I was surprised that it got a 74% accuracy on the Iris dataset. The problem is that training the network involves solving a discrete logarithm problem. So my network weights can only be found by brute force. I don't think I'll find a way to backpropagate. If I do then I would achieve the impossible : solving modern-day cryptography. Link : https://leetarxiv.substack.com/p/discrete-logarithm-neural-networks

Next steps:
I'm looking at the Factorial Number System. The modulo function is pretty well defined and it supports floating-point and reals so I don't have to worry about the RNS division problem.
I bet I can define an RNS using factorials and this will be easy to work with. I started by going down the Enumerative Combinatorics rabbit hole here
Link :[https://leetarxiv.substack.com/p/counting-integer-compositions?r=2at73k]

Let me know what you guys think.

AI in languages beyond Python

Murage Kibicho — Sat, 03 Feb 2024 09:56:43 +0000

Hey y'all,

I've been coding a compiler in my free time. It's built for programmers like me - people interested in AI but find the Python language super insufferable.

It's free and open-source.

Take a gander and please leave a GitHub⭐

https://github.com/Fileforma/AntiPython-AI-Compiler-Colab

FFMPEG C Data Structures Memory Leak Cheatsheet

Murage Kibicho — Tue, 19 Dec 2023 14:08:08 +0000

FFMPEG C API Memory Leak Cheatsheet

Every malloc has an equal and opposite free ~ Newton's fourth law of Physics.

Hello guys,
My name is Murage Kibicho and this is a quick guide to freeing memory when using the FFmpeg C api.
I list the data structure, how to allocate memory and how to free memory. I also write about making a video player in less than 1000 lines of C here

AVFormatContext

Allocate avformat_open_input(AVFormatContext **ps, const char *url, AVInputFormat *fmt, AVDictionary **options)
Free avformat_close_input(AVFormatContext **s)

AVFrame

Allocate av_frame_alloc()
Dereference buffers av_frame_unref (AVFrame *frame)
Free and leave pointer null av_freep(void *ptr)
Regular Free av_free(void *ptr)

AVPacket

Allocate av_packet_alloc()
Dereference buffers av_packet_unref(AVPacket *pkt)
Free av_packet_free(AVPacket **pkt)
NOTE av_free_packet is deprecated. NEVER use.

AVCodecContext

Allocate avcodec_alloc_context3(const AVCodec *codec)
Free avcodec_free_context (AVCodecContext **avctx)

struct SwsContext

Allocate sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat, int dstW, int dstH, enum AVPixelFormat dstFormat, int flags, SwsFilter *srcFilter, SwsFilter *dstFilter, const double *param)
Free sws_freeContext(struct SwsContext *swsContext)

DEV Community: Murage Kibicho