<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: synthpyt</title>
    <description>The latest articles on DEV Community by synthpyt (@synthpy).</description>
    <link>https://dev.to/synthpy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F783789%2Fa94c301c-8d29-4bd9-ae66-4cadd50953b1.jpg</url>
      <title>DEV Community: synthpyt</title>
      <link>https://dev.to/synthpy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/synthpy"/>
    <language>en</language>
    <item>
      <title>🚀 Building Recallr: How I Turned PDFs into Anki Flashcards with AI</title>
      <dc:creator>synthpyt</dc:creator>
      <pubDate>Thu, 15 May 2025 23:24:14 +0000</pubDate>
      <link>https://dev.to/synthpy/building-recallr-how-i-turned-pdfs-into-anki-flashcards-with-ai-133i</link>
      <guid>https://dev.to/synthpy/building-recallr-how-i-turned-pdfs-into-anki-flashcards-with-ai-133i</guid>
      <description>&lt;p&gt;i built a FastAPI tool that automatically converts dense PDFs into Anki flashcards using LLMs—but not without battling slow APIs, broken parsing, and cursed 404 errors. Here’s how I fixed it (and how you can too).&lt;/p&gt;

&lt;p&gt;As a student, it took me hours on end to be able to manually create flashcards.&lt;/p&gt;

&lt;p&gt;until i realized Anki has their own &lt;a href="https://ankiweb.net/shared/info/2055492159" rel="noopener noreferrer"&gt;API Interface&lt;/a&gt;, which made me realize that i can automate this task, efficiently, while also getting hand on experience using &lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt; and LLMs (but am i going to change the stack later?.. lets see)&lt;/p&gt;

&lt;p&gt;First off, the task seemed pretty easy&lt;br&gt;
&lt;strong&gt;Parse from PDF -&amp;gt; Chunk PDF content to paragraphs -&amp;gt; Send Chunks to through the API Port to generate the flashcards using DeepSeek -&amp;gt; get response as a txt file (which will be changed later to Anki Format &lt;code&gt;(.apkg)&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Simple right?.. &lt;strong&gt;right&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First problem i faced was how slow it is to send the LLM Requests, chunk by chunk , which was extremely slow especially for large PDFs like the one i tested at first.&lt;/p&gt;

&lt;p&gt;This Problem stayed with me for a while, how can i optimize the response time? since i knew the bottle neck was in the function that handled exactly..&lt;/p&gt;

&lt;p&gt;back to this problem in a bit, lets see what smaller problems i faced&lt;/p&gt;

&lt;p&gt;Ready? its &lt;strong&gt;PARSING HELL&lt;/strong&gt;&lt;br&gt;
and no, its not parsing from input PDF&lt;br&gt;
Problem: The API returned "question\answer" placeholders instead of real content.&lt;br&gt;
it was literally just :&lt;/p&gt;

&lt;p&gt;Question: Answer:&lt;br&gt;
Question: Answer:&lt;br&gt;
.&lt;br&gt;
.&lt;br&gt;
.&lt;br&gt;
Question: Answer: &lt;/p&gt;

&lt;p&gt;Debugging Steps:&lt;/p&gt;

&lt;p&gt;Realized the LLM response format was inconsistent.&lt;/p&gt;

&lt;p&gt;Switched to regex parsing for robustness:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_flashcards_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question:\s*(.*?)\s*Answer:\s*(.*?)(?=(?:Question:|$))&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and finally.. it worked.&lt;/p&gt;

&lt;p&gt;and then i faced another problem , OpenRouter's daily rate limit of 5 requests/day.. which was a nightmare not just for scalability, but for even testing.&lt;/p&gt;

&lt;p&gt;So i switched to &lt;a href="https://groq.com/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;'s llama3-8b-8192&lt;br&gt;
Which has a very generous token and requests, daily limit.&lt;br&gt;
You can find all their free and paid tier available models and their rate limits on their &lt;a href="https://groq.com/" rel="noopener noreferrer"&gt;website&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;now my application is 10x faster due to async chunk feeding to the API and my new &lt;em&gt;Cool&lt;/em&gt; model.&lt;/p&gt;

&lt;p&gt;Next step -&amp;gt; Actually adding the Anki file formation logic mentioned earlier &lt;code&gt;(.apkg)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Using &lt;a href="https://github.com/kerrickstaley/genanki" rel="noopener noreferrer"&gt;genanki&lt;/a&gt;, the process was straight forward, instead of my initial .txt output, now its a full on functional .apkg that can be imported directly to the Anki App as a studying deck.&lt;br&gt;
which i faced a problem with my old parsing and cleaning logic..&lt;br&gt;
which in the output looked something like this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0yoby4q5fev2yv0eipw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0yoby4q5fev2yv0eipw.png" alt="Image description" width="800" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After changing the logic behind parsing with even more (jumpscare warning) &lt;em&gt;regex&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;it kind of got fixed.&lt;/p&gt;

&lt;p&gt;finally, some prompt engineering to ensure that there is no numbering, no headings, no fluff is there, i used this prompt :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Convert to flashcards using EXACTLY this format:
Question: [Your question here]
Answer: [Your answer here]

NO other text, NO numbering, NO headers, just alternating Question/Answer pairs.

Content to convert:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;batch_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After adding some simple error handling for each page, and testing the error handling functionality, i could say that the backend is officially done and i couldn't be happier :)&lt;/p&gt;

&lt;p&gt;Now for the Frontend, i decided i will be using React with TailwindCSS for a clean, sleek frontend.&lt;/p&gt;

&lt;p&gt;im not a frontend master or anything, but i got it sorted out, the frontend is MVP ready!&lt;br&gt;
now, for some final E2E Testing.. i used more dense lecture PDFs, and thankfully it passed all checks ✅&lt;/p&gt;

&lt;p&gt;Hosting.. hosting... hosting...&lt;/p&gt;

&lt;p&gt;hosting was a problem for me at first, since i didn't really want to pay at the start since its a hobby MVP with no revenue, i used &lt;a href="https://render.com/" rel="noopener noreferrer"&gt;Render&lt;/a&gt;'s free tier for the backend hosting and used &lt;a href="https://dev.toVercel"&gt;https://vercel.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and with that i could say i have a viable MVP!&lt;br&gt;
And of course, the app still needs polishing like..&lt;br&gt;
&lt;em&gt;Better Prompt engineering&lt;/em&gt;, &lt;em&gt;Tweak frontend bugs&lt;/em&gt;, and so on..&lt;/p&gt;

&lt;p&gt;Thank you for reading! i hope that was a pretty good read, it was my first time writing about a project im working on.&lt;/p&gt;

&lt;p&gt;here is the link incase you want to check out the MVP!&lt;br&gt;
&lt;a href="https://recallr.vercel.app/" rel="noopener noreferrer"&gt;Recallr&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
