<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Murage Kibicho</title>
    <description>The latest articles on DEV Community by Murage Kibicho (@muragekibicho).</description>
    <link>https://dev.to/muragekibicho</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1111334%2Fd818baf1-e7de-4062-93de-96ccff56e22e.jpeg</url>
      <title>DEV Community: Murage Kibicho</title>
      <link>https://dev.to/muragekibicho</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/muragekibicho"/>
    <language>en</language>
    <item>
      <title>Residue Number Systems for GPU computing as indie-researcher. Thoughts?</title>
      <dc:creator>Murage Kibicho</dc:creator>
      <pubDate>Mon, 19 May 2025 17:39:08 +0000</pubDate>
      <link>https://dev.to/muragekibicho/residue-number-systems-for-gpu-computing-as-indie-researcher-thoughts-4mo4</link>
      <guid>https://dev.to/muragekibicho/residue-number-systems-for-gpu-computing-as-indie-researcher-thoughts-4mo4</guid>
      <description>&lt;p&gt;I've been thinking about "Are there analogs to parallel computing rooted in number theory? Like a way to emulate a GPU on a regular CPU, but not through hardware. Rather by replacing GPU threads with concepts from prime numbers and finite field theory?" I know this sounds cookiesque.&lt;/p&gt;

&lt;p&gt;However, I care about this question because, in a world where AI is becoming commonplace, being GPU-poor is somewhat akin to being locked out of the future. There must be some way to perform lots of matmuls (or at least an intelligence-evoking amount of muls) on consumer CPUs. Maybe we just haven't invented the right number system. I believe Math is mostly invented, rarely discovered. ie binary 1's and 0's are surely discovered. However, floating-point, and fixed-point, are human inventions tweaked for specific use cases.&lt;/p&gt;

&lt;p&gt;In fact, researchers built Residue Number Systems(RNS) in the 1960's as an alt to binary Base-2 arithmetic for wildly parallel computing. However, they fell out of favor because finite field theory (the foundational math for RNS) supports addition and multiplication : neither division nor comparisons are supported. Here's the thing : matrix multiplications are merely mults and adds. RNS is great for general matmuls but fails at stuff like softmax and backprop. &lt;/p&gt;

&lt;p&gt;I argue that solving the division, and comparison problem for RNS is the key to solving GPU poverty. RNS's foundational research was done in the 1960's. What if math invented in the 21st century is needed to solve this?.For context, only after the invention of new mathematics in the 20th century did Wiles prove Fermat's Last Theorem. &lt;/p&gt;

&lt;p&gt;I know "Talk is Cheap" and that I need to "Be the change i want to see in the world". &lt;/p&gt;

&lt;p&gt;So here's everything I attempted with the RNS problem and how it failed :&lt;br&gt;
*Please be lenient with criticism. I'm not VC funded so I do this during my nights and weekends after coming home from my day job.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Finite Field Gemm (Attempting to rewrite backpropagation within a residue number system)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It works in principle but in practice, it involves big integer multiplications and additions. So during the backward pass, the deltas are rather large making it impossible to converge towards a local minima.&lt;br&gt;
  Link : &lt;a href="https://leetarxiv.substack.com/p/ffgemm-finite-field-general-matrix" rel="noopener noreferrer"&gt;https://leetarxiv.substack.com/p/ffgemm-finite-field-general-matrix&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Mediant 32 (A fractional number system to get rid of explicit division in RNS) -
We can't divide efficiently in an RNS so why not have an explicit integer and denominator? It works extremely well, honestly, I was impressed. The only difficulty is with handling overflow. The fractions tend to become quite large and the absence of division makes simplification difficult.
Link : &lt;a href="https://leetarxiv.substack.com/p/mediant32-intro" rel="noopener noreferrer"&gt;https://leetarxiv.substack.com/p/mediant32-intro&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Discrete Logarithm Neural Networks (A network architecture built around big integers because my biggest challenge was dealing with big ints) -
It actually works. I was surprised that it got a 74% accuracy on the Iris dataset. The problem is that training the network involves solving a discrete logarithm problem. So my network weights can only be found by brute force. I don't think I'll find a way to backpropagate. If I do then I would achieve the impossible : solving modern-day cryptography.
Link : &lt;a href="https://leetarxiv.substack.com/p/discrete-logarithm-neural-networks" rel="noopener noreferrer"&gt;https://leetarxiv.substack.com/p/discrete-logarithm-neural-networks&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Next steps:&lt;br&gt;
I'm looking at the Factorial Number System. The modulo function is pretty well defined and it supports floating-point and reals so I don't have to worry about the RNS division problem. &lt;br&gt;
I bet I can define an RNS using factorials and this will be easy to work with. I started by going down the Enumerative Combinatorics rabbit hole here&lt;br&gt;
Link :[&lt;a href="https://leetarxiv.substack.com/p/counting-integer-compositions?r=2at73k" rel="noopener noreferrer"&gt;https://leetarxiv.substack.com/p/counting-integer-compositions?r=2at73k&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;Let me know what you guys think.&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>webperf</category>
      <category>programming</category>
    </item>
    <item>
      <title>AI in languages beyond Python</title>
      <dc:creator>Murage Kibicho</dc:creator>
      <pubDate>Sat, 03 Feb 2024 09:56:43 +0000</pubDate>
      <link>https://dev.to/muragekibicho/ai-in-languages-beyond-python-3hgi</link>
      <guid>https://dev.to/muragekibicho/ai-in-languages-beyond-python-3hgi</guid>
      <description>&lt;p&gt;Hey y'all,&lt;/p&gt;

&lt;p&gt;I've been coding a compiler in my free time. It's built for programmers like me - people interested in AI but find the Python language super insufferable.&lt;/p&gt;

&lt;p&gt;It's free and open-source.&lt;/p&gt;

&lt;p&gt;Take a gander and please leave a GitHub⭐&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Fileforma/AntiPython-AI-Compiler-Colab"&gt;https://github.com/Fileforma/AntiPython-AI-Compiler-Colab&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>gpu</category>
      <category>beginners</category>
    </item>
    <item>
      <title>FFMPEG C Data Structures Memory Leak Cheatsheet</title>
      <dc:creator>Murage Kibicho</dc:creator>
      <pubDate>Tue, 19 Dec 2023 14:08:08 +0000</pubDate>
      <link>https://dev.to/muragekibicho/ffmpeg-c-data-structures-memory-leak-cheatsheet-2nhb</link>
      <guid>https://dev.to/muragekibicho/ffmpeg-c-data-structures-memory-leak-cheatsheet-2nhb</guid>
      <description>&lt;h1&gt;
  
  
  FFMPEG C API Memory Leak Cheatsheet
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Every malloc has an equal and opposite free&lt;/strong&gt; ~ Newton's fourth law of Physics.&lt;/p&gt;

&lt;p&gt;Hello guys,&lt;br&gt;
My name is Murage Kibicho and this is a quick guide to freeing memory when using the FFmpeg C api.&lt;br&gt;
I list the data structure, how to allocate memory and how to free memory. I also write about making a video player in less than 1000 lines of C &lt;a href="https://ffmpeg.substack.com/p/ffmpeg-and-sdl-part-1-of-making-a"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AVFormatContext
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Allocate &lt;code&gt;avformat_open_input(AVFormatContext **ps, const char *url, AVInputFormat *fmt, AVDictionary **options)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Free &lt;code&gt;avformat_close_input(AVFormatContext **s)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AVFrame
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Allocate &lt;code&gt;av_frame_alloc()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Dereference buffers &lt;code&gt;av_frame_unref (AVFrame *frame)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Free and leave pointer null &lt;code&gt;av_freep(void *ptr)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Regular Free &lt;code&gt;av_free(void *ptr)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AVPacket
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Allocate &lt;code&gt;av_packet_alloc()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Dereference buffers &lt;code&gt;av_packet_unref(AVPacket *pkt)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Free &lt;code&gt;av_packet_free(AVPacket **pkt)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;NOTE &lt;code&gt;av_free_packet&lt;/code&gt; is deprecated. &lt;strong&gt;NEVER&lt;/strong&gt; use.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AVCodecContext
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Allocate &lt;code&gt;avcodec_alloc_context3(const AVCodec *codec)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Free &lt;code&gt;avcodec_free_context (AVCodecContext **avctx)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  struct SwsContext
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Allocate &lt;code&gt;sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat, int dstW, int dstH, enum AVPixelFormat dstFormat, int flags, SwsFilter *srcFilter, SwsFilter *dstFilter, const double *param)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Free &lt;code&gt;sws_freeContext(struct SwsContext *swsContext)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ffmpeg</category>
      <category>audio</category>
      <category>video</category>
    </item>
  </channel>
</rss>
