<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maaz Khan</title>
    <description>The latest articles on DEV Community by Maaz Khan (@maazkhanxo).</description>
    <link>https://dev.to/maazkhanxo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3979385%2Ff81b930f-27f7-4093-8cc9-c162377a73e9.png</url>
      <title>DEV Community: Maaz Khan</title>
      <link>https://dev.to/maazkhanxo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/maazkhanxo"/>
    <language>en</language>
    <item>
      <title>How I Built a Suite of 8 AI Tools with $0/Month in API Costs Using NVIDIA NIM</title>
      <dc:creator>Maaz Khan</dc:creator>
      <pubDate>Fri, 19 Jun 2026 10:13:02 +0000</pubDate>
      <link>https://dev.to/maazkhanxo/how-i-built-a-suite-of-8-ai-tools-with-0month-in-api-costs-using-nvidia-nim-33ge</link>
      <guid>https://dev.to/maazkhanxo/how-i-built-a-suite-of-8-ai-tools-with-0month-in-api-costs-using-nvidia-nim-33ge</guid>
      <description>&lt;p&gt;&lt;a href="https://jobeasyapply.com/blog/how-i-built-8-ai-tools-for-0-dollars-with-nvidia-nim" rel="noopener noreferrer"&gt;jobeasyapply.com/blog/how-i-built-8-ai-tools-for-0-dollars-with-nvidia-nim&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building a SaaS is hard; driving traffic to it is even harder. &lt;/p&gt;

&lt;p&gt;Paid ads for career keywords are notoriously expensive, often costing anywhere from &lt;strong&gt;$2 to $5 per click&lt;/strong&gt;. For a bootstrapped indie hacker, that's a quick way to run out of money before you even find product-market fit. &lt;/p&gt;

&lt;p&gt;To solve this for our platform, &lt;a href="https://jobeasyapply.com" rel="noopener noreferrer"&gt;JobEasyApply&lt;/a&gt;, we decided to build a suite of &lt;strong&gt;8 free AI career tools&lt;/strong&gt; (ATS resume checkers, interview prep assistants, cover letter generators, etc.) to act as an SEO and utility marketing engine.&lt;/p&gt;

&lt;p&gt;But free AI tools are a double-edged sword. If they go viral or get indexed by bots, a spike in traffic can translate to hundreds of dollars in LLM API costs overnight. &lt;/p&gt;

&lt;p&gt;Here is the exact engineering stack, Python code, and Redis Lua rate-limiting setup we use to host and run all 8 tools for &lt;strong&gt;$0/month in infrastructure and API costs&lt;/strong&gt; while serving thousands of active users.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Challenge: LLM API Costs vs. Virality
&lt;/h2&gt;

&lt;p&gt;Free tools are the top of our funnel. When a user runs their resume through our &lt;a href="https://jobeasyapply.com/free-tools" rel="noopener noreferrer"&gt;Free ATS Resume Checker&lt;/a&gt;, our backend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parses the raw PDF/Word document text.&lt;/li&gt;
&lt;li&gt;Cross-references it against a job description.&lt;/li&gt;
&lt;li&gt;Scores match accuracy, scans for missing keywords, and compiles improvement tips.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because parsing and semantic analysis require high intelligence and a large context window, lightweight 8B models don't cut it. We needed a heavy-hitting model like &lt;strong&gt;Llama 3.3 70B Instruct&lt;/strong&gt; or &lt;strong&gt;Nemotron 70B&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;If we paid standard token rates on OpenAI or Anthropic for this volume of free traffic, we would have gone broke in weeks. We needed a model that was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fast (low time-to-first-token).&lt;/li&gt;
&lt;li&gt;Intelligent enough for resume analysis.&lt;/li&gt;
&lt;li&gt;Free or heavily subsidized for developer experimentation.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Enter NVIDIA NIM (NVIDIA Inference Microservice)
&lt;/h2&gt;

&lt;p&gt;NVIDIA NIM provides optimized API endpoints for open-weights models running on their infrastructure. &lt;/p&gt;

&lt;p&gt;For developers, they offer free API keys with a highly generous rate-limit quota. Since we wanted top-tier reasoning for ATS scoring, we chose &lt;code&gt;meta/llama-3.3-70b-instruct&lt;/code&gt; and &lt;code&gt;nvidia/llama-3.3-nemotron-super-49b-v1&lt;/code&gt; as our primary engines.&lt;/p&gt;

&lt;p&gt;To make this architecture robust enough for production traffic under free quotas, we had to solve two main problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;API Key Limits:&lt;/strong&gt; Handling rate limits (HTTP 429) dynamically without interrupting user sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Abuse &amp;amp; Scraping:&lt;/strong&gt; Blocking bot traffic and spam aggressively at the IP level.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here is how we implemented the solutions.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Dual-Key Failover Client (Python / FastAPI)
&lt;/h2&gt;

&lt;p&gt;To maximize our free quota and handle heavy spikes in traffic, we built a dual-key failover client. &lt;/p&gt;

&lt;p&gt;If our primary NVIDIA API key hits a rate limit (HTTP 429) or throws a connection error, the client catches the exception and immediately falls back to a secondary key. If that key also fails, it down-shifts to our secondary fallback model.&lt;/p&gt;

&lt;p&gt;Here is the Python implementation in our FastAPI backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;NVIDIA_BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;NVIDIA_MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.3-70b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# Primary: Best reasoning &amp;amp; speed
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia/llama-3.3-nemotron-super-49b-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;    &lt;span class="c1"&gt;# Fallback: Resilient secondary
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_nvidia&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Call NVIDIA NIM with dual-key + multi-model failover.
    Tries each model with each key before giving up.
    Returns parsed JSON dict or None on failure.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;api_keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No NVIDIA API keys configured&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;NVIDIA_MODELS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_keys&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Initialize standard OpenAI client pointed at NVIDIA's endpoint
&lt;/span&gt;                &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NVIDIA_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

                &lt;span class="c1"&gt;# Clean up LLM output if it wraps response in markdown code blocks
&lt;/span&gt;                &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;```

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;first_newline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;first_newline&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

                &lt;span class="c1"&gt;# Return the structured JSON response
&lt;/span&gt;                &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NVIDIA success: model=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, key=#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;

            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NVIDIA &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; key #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: JSON parse error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;err_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;404&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;err_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not available (404), skipping model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# Skip to next model, don't waste time trying other keys
&lt;/span&gt;                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NVIDIA &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; key #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  2. Atomic Sliding Window Rate Limiting (Redis + Lua)
&lt;/h2&gt;

&lt;p&gt;Free API keys have limits. To prevent scraping scripts and bots from draining our quotas, we enforce a strict limit: &lt;strong&gt;5 requests per hour per IP address&lt;/strong&gt; for public endpoints.&lt;/p&gt;

&lt;p&gt;Using a simple counter in Redis (like &lt;code&gt;INCR&lt;/code&gt; with an &lt;code&gt;EXPIRE&lt;/code&gt; time) creates a vulnerability: if a user makes 5 requests in the final second of an hour, they can immediately make 5 more in the first second of the next hour (a spike of 10 requests in 2 seconds).&lt;/p&gt;

&lt;p&gt;To prevent this, we use a &lt;strong&gt;rolling sliding window&lt;/strong&gt; implemented with a Redis Sorted Set (&lt;code&gt;ZSET&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Race Condition Problem
&lt;/h3&gt;

&lt;p&gt;If you check the size of the sorted set, delete old keys, and add a new timestamp in multiple round-trips from Python, two concurrent requests from the same user can execute in parallel, bypass the count checks, and execute both actions.&lt;/p&gt;

&lt;p&gt;To make the rate check 100% atomic, we run the entire check on the Redis server using a &lt;strong&gt;Lua Script&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Redis Lua script for sliding window rate limiting&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;KEYS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;window_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;-- 1. Remove timestamps older than our 1-hour sliding window&lt;/span&gt;
&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ZREMRANGEBYSCORE'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;-- 2. Count active requests within the window&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ZCARD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="c1"&gt;-- Deny request (limit reached)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;-- 3. If under limit, add current request timestamp and refresh expiration&lt;/span&gt;
&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ZADD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;tostring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'EXPIRE'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;-- Allow request&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  FastAPI Route Middleware
&lt;/h3&gt;

&lt;p&gt;Here is how we integrate this Lua script into our FastAPI endpoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;APIRouter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;

&lt;span class="c1"&gt;# Connect to Redis
&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decode_responses&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register the Lua script
&lt;/span&gt;&lt;span class="n"&gt;_rate_limit_script&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register_script&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_RATE_LIMIT_LUA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;RATE_LIMIT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;RATE_WINDOW&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt; &lt;span class="c1"&gt;# 1 hour in seconds
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_rate_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Atomic Redis rate limiter (sliding window).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rate_limit:free_tools:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;window_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;RATE_WINDOW&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_rate_limit_script&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;window_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RATE_LIMIT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RATE_WINDOW&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Fail-open to protect UX if Redis experiences hiccups
&lt;/span&gt;        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Redis rate limit failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Client-Side Browser Automation (Saving Hundreds in Server Bills)
&lt;/h2&gt;

&lt;p&gt;The free tools optimize the resumes, but once they are ready, users want to auto-apply to matching roles on LinkedIn.&lt;/p&gt;

&lt;p&gt;Running browser automation (Puppeteer, Playwright, or Selenium) on cloud servers is incredibly expensive. You need raw CPU cores to render chromium pages, and you must purchase residential proxy pools to bypass LinkedIn's bot detection.&lt;/p&gt;

&lt;p&gt;We solved this with a hybrid architecture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cloud for LLM Reasoning:&lt;/strong&gt; Parsing and scoring happen on our backend via NVIDIA NIM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Client for Action Execution:&lt;/strong&gt; The actual auto-apply click-actions run inside a &lt;strong&gt;Chrome Extension&lt;/strong&gt; on the user's local machine.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because the extension runs in the user's active browser, it utilizes their own residential IP and active LinkedIn session cookies. This keeps their account completely safe from bot detection and eliminates the need for us to pay for expensive cloud browser instances and residential proxies.&lt;/p&gt;




&lt;h2&gt;
  
  
  The $0/Month Economic Breakdown
&lt;/h2&gt;

&lt;p&gt;By combining cloud free tiers, static hosting, and NVIDIA NIM, our operational costs are exactly &lt;strong&gt;$0.00 / month&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NVIDIA NIM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B &amp;amp; Nemotron Inference&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;$0.00&lt;/code&gt; (Free Dev Quota)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vercel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js Frontend &amp;amp; SEO Landing Page hosting&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;$0.00&lt;/code&gt; (Hobby Tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Oracle Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FastAPI backend &amp;amp; Redis container host&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;$0.00&lt;/code&gt; (Always-Free Tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Running 8 free AI tools in production&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;$0.00&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Wrap Up
&lt;/h2&gt;

&lt;p&gt;If you are bootstrapping a SaaS in 2026, utility marketing via free tools is one of the most effective ways to build an organic traffic engine. &lt;/p&gt;

&lt;p&gt;Instead of treating LLM API calls as a cost center, you can shift the work to developer-friendly microservices like NVIDIA NIM, wrap them in failover loops, protect them with Redis Lua rate limiters, and offload browser heavy-lifting to local Chrome extensions.&lt;/p&gt;

&lt;p&gt;Have any questions about the Redis Lua setup or the failover loop? Ask in the comments below!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Feel free to check out the project live at &lt;a href="https://jobeasyapply.com" rel="noopener noreferrer"&gt;JobEasyApply&lt;/a&gt; or explore our open-source browser automation codebase on GitHub:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/maazkhanxo/jobeasyapply-linkedin-auto-apply" rel="noopener noreferrer"&gt;maazkhanxo/jobeasyapply-linkedin-auto-apply&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>redis</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Was Tired of Applying to 100+ Jobs Manually. So I Built an AI to Do It for Me.</title>
      <dc:creator>Maaz Khan</dc:creator>
      <pubDate>Thu, 11 Jun 2026 11:27:59 +0000</pubDate>
      <link>https://dev.to/maazkhanxo/i-was-tired-of-applying-to-100-jobs-manually-so-i-built-an-ai-to-do-it-for-me-2pmi</link>
      <guid>https://dev.to/maazkhanxo/i-was-tired-of-applying-to-100-jobs-manually-so-i-built-an-ai-to-do-it-for-me-2pmi</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpaekrcwhzc7qgyybbpe5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpaekrcwhzc7qgyybbpe5.png" alt="JobEasyApply SaaS dashboard showing automated job applications count, success rates, and matching score charts" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The job hunt in 2026 is broken. &lt;/p&gt;

&lt;p&gt;You spend hours editing your resume, searching through LinkedIn, and filling out the exact same form fields over and over again. And then comes the worst part: writing custom paragraphs answering &lt;em&gt;"Why should we hire you?"&lt;/em&gt; or &lt;em&gt;"Describe your experience with React"&lt;/em&gt; for the 50th time.&lt;/p&gt;

&lt;p&gt;Only to get ghosted. &lt;/p&gt;

&lt;p&gt;As a developer, I couldn't stand the repetition anymore. We automate deployments, tests, and code reviews—so why are we still applying to jobs like it's 2010?&lt;/p&gt;

&lt;p&gt;I decided to build an automated solution: &lt;strong&gt;JobEasyApply&lt;/strong&gt;. Here is how it works and the tech stack behind it.&lt;/p&gt;




&lt;h3&gt;
  
  
  🛠️ The Tech Stack
&lt;/h3&gt;

&lt;p&gt;I wanted the tool to be fast, secure, and run in the background without needing my browser window to remain open. Here is what I chose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: React &amp;amp; Next.js for a sleek, responsive dashboard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chrome Extension&lt;/strong&gt;: Vanilla JavaScript to securely sync session cookies (like &lt;code&gt;li_at&lt;/code&gt; for LinkedIn) to the server so the backend can apply on the user's behalf.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend Workers&lt;/strong&gt;: Python Celery tasks running on a background server to handle job crawling, resume parsing, and auto-filling questionnaires.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Engine&lt;/strong&gt;: Groq / NVIDIA LLMs to read the form questions and match them with the user's resume content to write custom, high-scoring responses.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ⚙️ How It Works (Under the Hood)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Resume Parsing &amp;amp; Scoring
&lt;/h4&gt;

&lt;p&gt;When a user uploads a resume, the backend extracts their core skills, years of experience, and achievements. When the crawler finds a job listing, the AI reads the job description and generates a &lt;strong&gt;compatibility score&lt;/strong&gt;. If the score is high, it gets queued.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. The Auto-Apply Automation
&lt;/h4&gt;

&lt;p&gt;Instead of running a heavy headless browser like Puppeteer for hours on my local machine (which burns CPU), we use a Chrome Extension to sync session cookies. &lt;/p&gt;

&lt;p&gt;This lets our backend worker do the heavy lifting in the cloud. It simulates the application network requests directly, injecting custom, AI-generated answers for the employer's screening questions based on the candidate's actual resume.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
javascript
// A peak into how our frontend wizard coordinates the steps:
const STEPS = [
  { key: "resume", icon: "📄", title: "Upload Your Resume" },
  { key: "linkedin", icon: "🔗", title: "Connect LinkedIn" },
  { key: "preferences", icon: "🎯", title: "Job Preferences" },
  { key: "groq", icon: "🤖", title: "AI API Key" },
];

![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sog630n81ny0dnesed8t.png)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>career</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
