<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sarkar-AGI</title>
    <description>The latest articles on DEV Community by Sarkar-AGI (@sarkaragi).</description>
    <link>https://dev.to/sarkaragi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3946047%2F2a7f29fe-1483-4f31-b59c-ad0699b65871.png</url>
      <title>DEV Community: Sarkar-AGI</title>
      <link>https://dev.to/sarkaragi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sarkaragi"/>
    <language>en</language>
    <item>
      <title>TitanCore Core-1 – Trillion-parameter LLM training infra in C++/CUDA with ZeRO-3</title>
      <dc:creator>Sarkar-AGI</dc:creator>
      <pubDate>Fri, 22 May 2026 12:07:24 +0000</pubDate>
      <link>https://dev.to/sarkaragi/titancore-core-1-trillion-parameter-llm-training-infra-in-ccuda-with-zero-3-5lc</link>
      <guid>https://dev.to/sarkaragi/titancore-core-1-trillion-parameter-llm-training-infra-in-ccuda-with-zero-3-5lc</guid>
      <description>&lt;p&gt;Hi&lt;/p&gt;

&lt;p&gt;I built TitanCore Core-1, a lightweight core infrastructure (around 75+ files) written in C++ and custom CUDA kernels to address the VRAM bottleneck in trillion-parameter LLM training.&lt;/p&gt;

&lt;p&gt;By implementing Fully Sharded Data Parallelism (FSDP) via ZeRO-3 and bypassing standard framework overhead with fused kernels, I managed to hit 890 GB/s memory bandwidth utilization ($2.6\times$ speedup compared to traditional pipelines).&lt;/p&gt;

&lt;p&gt;The code is fully open-source. I would love to get your feedback on the custom memory handling and activation checkpointing logic!&lt;br&gt;
GitHub link &lt;a href="https://github.com/Sarkar-AGI/Core-1" rel="noopener noreferrer"&gt;https://github.com/Sarkar-AGI/Core-1&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>llm</category>
      <category>performance</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
