<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dogukan Karademir</title>
    <description>The latest articles on DEV Community by Dogukan Karademir (@mido-dev).</description>
    <link>https://dev.to/mido-dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4000548%2F13eac3da-eebe-4bf5-9baf-e3c72b90129e.png</url>
      <title>DEV Community: Dogukan Karademir</title>
      <link>https://dev.to/mido-dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mido-dev"/>
    <language>en</language>
    <item>
      <title>Four Bugs Stood Between Me amd "Sign in with Google"</title>
      <dc:creator>Dogukan Karademir</dc:creator>
      <pubDate>Fri, 26 Jun 2026 20:18:47 +0000</pubDate>
      <link>https://dev.to/mido-dev/four-bugs-stood-between-me-sign-in-with-google-2ajn</link>
      <guid>https://dev.to/mido-dev/four-bugs-stood-between-me-sign-in-with-google-2ajn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt; I had a rough time adding Google login to my app, Kenning. It took me a while to figure out four issues that were causing problems. These issues were not related to each other and were not covered in any tutorial I read.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My second post about building Kenning, this phase is about OAuth2 login. I thought it would be easy. It was not. I had to deal with four confusing bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 1: the client ID with a hidden character
&lt;/h2&gt;

&lt;p&gt;Google did not accept my login. It gave me an error message saying "Error 401: invalid_client". I checked my client ID in the .env file. It looked correct. I had copied it from the Cloud Console.&lt;/p&gt;

&lt;p&gt;When I looked at the actual request that was being sent, I saw the problem. The client ID had a hidden character at the end. This character was a carriage return, represented by %0D in URL encoding. My .env file had Windows line endings (CRLF), and that extra character was being included in the value.&lt;/p&gt;

&lt;p&gt;The fix was switching my editor's line ending setting from CRLF to LF and re-saving the file. (You can also strip it from an existing file with &lt;code&gt;sed -i 's/\r$//' .env&lt;/code&gt;, but the actual cause was the editor's line-ending mode, not a one-off corrupted file.)&lt;/p&gt;

&lt;p&gt;What I learned from this is that just because something looks correct does not mean it is correct. I should have checked the actual value instead of just looking at it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 2: the user service Spring never called
&lt;/h2&gt;

&lt;p&gt;After I fixed that bug I was able to complete the login process. But I noticed that no user was being added to my database. I had written a custom user-loading service, and it was not being called.&lt;/p&gt;

&lt;p&gt;I looked into the auth object that Spring had built after login and saw that it had an authority called OIDC_USER. This told me that Spring was routing the login through the OidcUserService interface. My custom service was extending the wrong base class — &lt;code&gt;DefaultOAuth2UserService&lt;/code&gt; instead of &lt;code&gt;OidcUserService&lt;/code&gt; — so it was simply never invoked, even though it was wired in correctly.&lt;/p&gt;

&lt;p&gt;To fix this I changed my custom service to extend &lt;code&gt;OidcUserService&lt;/code&gt; instead. This fixed the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 3: the CSRF cookie that needs to be asked for
&lt;/h2&gt;

&lt;p&gt;After fixing that, login worked end to end. When I tried to upload a file, I got a 403 Forbidden error. I had set up CSRF protection on purpose, so this made sense in principle — except the cookie it depends on, &lt;code&gt;XSRF-TOKEN&lt;/code&gt;, was never being written in the first place.&lt;/p&gt;

&lt;p&gt;It turns out Spring Security 6+ defers writing that cookie until something in the request actually reads the token. A &lt;code&gt;GET&lt;/code&gt; request that never touches it never triggers the write.&lt;/p&gt;

&lt;p&gt;To fix this I wrote a filter that forces the token to be read on every request, which triggers the cookie write.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 4: two cookies, same name, different values
&lt;/h2&gt;

&lt;p&gt;I spent a lot of time on this one. I kept copying the &lt;code&gt;X-XSRF-TOKEN&lt;/code&gt; value into a manual request and it kept getting rejected, even right after confirming in DevTools that the cookie existed.&lt;/p&gt;

&lt;p&gt;Looking closer, DevTools was showing two separate &lt;code&gt;XSRF-TOKEN&lt;/code&gt; entries with the same name but different values — one with an empty partition key, and one partitioned under &lt;code&gt;resource://devtools&lt;/code&gt;. I had been copying the DevTools-partitioned one, which isn't the value the browser actually sends on a real request. Once I copied the other one — the unpartitioned cookie — it worked immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  A thing I haven't solved yet
&lt;/h2&gt;

&lt;p&gt;Before any of this, I tried using the &lt;a href="https://github.com/paulschwarz/spring-dotenv" rel="noopener noreferrer"&gt;spring-dotenv&lt;/a&gt; library to load my &lt;code&gt;.env&lt;/code&gt; file automatically instead of exporting variables by hand every time. After adding it, login stopped working, and I genuinely don't know why — I never confirmed whether it was even loading the file, or something else entirely. I removed the dependency and went back to exporting variables manually. If anyone's gotten this working alongside Spring Security + OAuth2, I'd like to hear how.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually mattered
&lt;/h2&gt;

&lt;p&gt;None of these bugs were caused by Spring or OAuth2 being badly designed. Each one had a clear explanation once I found it. What would have saved me time is checking the actual outgoing request the moment something failed for a reason that didn't make sense, instead of trusting &lt;code&gt;echo&lt;/code&gt; output or DevTools at face value.&lt;/p&gt;

&lt;h2&gt;
  
  
  And the frontend?
&lt;/h2&gt;

&lt;p&gt;Comparatively quiet, which I'm counting as a win. I used Angular and PrimeNG to build the document list and chat screen. Once I had the right cookie and header names configured, the whole CSRF back-and-forth from Bug 3 just worked automatically on every request, because the frontend handles this pattern natively.&lt;/p&gt;

&lt;p&gt;Next up: a reader on the last post called the chunk-dilution theory exactly right and suggested keeping chunks to one topic each. So up next is testing that properly — comparing chat models, embedding models, and chunking strategies head to head, local and cloud, on quality, speed, and cost.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building Kenning in public. Corrections welcome — especially on the spring-dotenv mystery above.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spring</category>
      <category>oauth</category>
      <category>java</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Built a RAG App, Then Asked It What Car I Like. It Didn't Know.</title>
      <dc:creator>Dogukan Karademir</dc:creator>
      <pubDate>Wed, 24 Jun 2026 20:33:23 +0000</pubDate>
      <link>https://dev.to/mido-dev/i-built-a-rag-app-then-asked-it-what-car-i-like-it-didnt-know-583n</link>
      <guid>https://dev.to/mido-dev/i-built-a-rag-app-then-asked-it-what-car-i-like-it-didnt-know-583n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Phase 1 of a from-scratch RAG app — Spring AI, pgvector, local Ollama — ends with a working pipeline and two failures that look identical from the outside but have nothing to do with each other. One was a chunking bug. The other was a 3B model running out of brain. Here's how I told them apart.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why build this
&lt;/h2&gt;

&lt;p&gt;I'm finishing my apprenticeship as a &lt;em&gt;IT specialist in application development&lt;/em&gt; and wanted a portfolio project that's more than another CRUD app. Kenning is a document-chat tool: upload a file, ask questions about it, get answers with sources attached. Standard RAG (Retrieval-Augmented Generation), but built end to end by hand instead of stitched together from a tutorial.&lt;/p&gt;

&lt;p&gt;Phase 0 was infrastructure: Docker Compose with &lt;code&gt;pgvector/pgvector:pg16&lt;/code&gt; and &lt;code&gt;ollama/ollama&lt;/code&gt;, a Spring Boot scaffold, an Angular scaffold. Phase 1's job was narrower and more important: prove the actual RAG loop works — upload one document, ask one question, get a real answer with the source attached. No login, no UI polish, no multi-document handling. Just: does this architecture actually do the thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Java 21, Spring Boot 4.1.0, Spring AI&lt;/li&gt;
&lt;li&gt;Angular (frontend, mostly untouched in this phase)&lt;/li&gt;
&lt;li&gt;PostgreSQL + pgvector as the vector store&lt;/li&gt;
&lt;li&gt;Ollama, running locally: &lt;code&gt;nomic-embed-text&lt;/code&gt; for embeddings and &lt;code&gt;llama3.2:3b&lt;/code&gt; for chat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pipeline: upload → Apache Tika extracts text → &lt;code&gt;TokenTextSplitter&lt;/code&gt; chunks it → Spring AI's Ollama embedding client turns each chunk into a vector → pgvector stores it → a question gets embedded the same way → similarity search pulls the closest chunks → those get stuffed into a &lt;code&gt;ChatClient&lt;/code&gt; call alongside the question → the model answers, and I attach the source chunks it used.&lt;/p&gt;

&lt;p&gt;That's the theory. None of it survived contact with reality without a fight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake #1: naming my own entity &lt;code&gt;Document&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Spring AI ships its own &lt;code&gt;Document&lt;/code&gt; class for representing a chunk of text plus metadata. I also wanted an entity called &lt;code&gt;Document&lt;/code&gt; for "a file the user uploaded." Same name, two completely different things, one annoying import ambiguity every time autocomplete guessed wrong. Renamed mine to &lt;code&gt;SourceDocument&lt;/code&gt; and moved on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake #2: two Ollamas, one port
&lt;/h2&gt;

&lt;p&gt;I ran &lt;code&gt;qwen2.5-coder:14b&lt;/code&gt; natively on Windows for my coding assistant to test out local models and forgot about it. Docker Compose also wants port 11434 for the Ollama container that's supposed to serve Kenning. Two processes, one port, predictable result. The fix was trivial — stop the native Windows process before starting the container — but the error message gave zero hint that this was the cause. Worth remembering if you run a local AI coding tool &lt;em&gt;and&lt;/em&gt; an Ollama-based app on the same machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake #3 (not really a mistake): the GPU that does nothing
&lt;/h2&gt;

&lt;p&gt;I have an AMD RX 6700 XT with 12 GB of VRAM sitting in this machine, doing nothing for local inference. Rather than assume, I checked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; ollama ollama ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME                       ID              SIZE      PROCESSOR    CONTEXT    UNTIL
llama3.2:3b                a80c4f17acd5    2.6 GB    100% CPU     4096       4 minutes from now
nomic-embed-text:latest    0a109f422b47    376 MB    100% CPU     2048       4 minutes from now
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;100% CPU, confirmed. As far as I can tell, the issue is that ROCm needs &lt;code&gt;/dev/kfd&lt;/code&gt; exposed to the container, and WSL2 doesn't expose it — so GPU acceleration for Ollama running inside Docker on WSL2 seems to be a dead end with this setup.&lt;/p&gt;

&lt;p&gt;One thing I haven't tried yet: running Ollama natively on Windows instead of inside WSL2/Docker. Native Windows Ollama has its own path to the GPU drivers that doesn't go through WSL2's passthrough limitations, so that's probably the more realistic way to actually use the 6700 XT — I just haven't switched Kenning over to test it. For now: CPU-only for local dev, on a machine where RAM is already sitting at 96% usage before the model even loads.&lt;/p&gt;

&lt;p&gt;For the eventual public demo, the plan is to be upfront about it — a small local model on a GPU-less VPS will be slow and occasionally wrong, and I'll say so, with an option to plug in your own API key if you want better answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real test: asking it about myself
&lt;/h2&gt;

&lt;p&gt;Once the loop worked mechanically, I needed a document with facts I could verify myself. So I wrote a short bio — what Kenning is, what stack it uses, what I'm into outside of code, including one line: &lt;em&gt;"He enjoys mechanical keyboards and is interested in BMW cars."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 1: "What embedding model does this project use?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer is one unambiguous sentence in the document: &lt;em&gt;nomic-embed-text for embeddings, llama3.2:3b for chat&lt;/em&gt;. The chunk had actually been retrieved — the full document came back in &lt;code&gt;sources&lt;/code&gt; — so this wasn't a missing-context problem. Here's what the model actually answered:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I don't know which embedding model this project specifically uses. The text mentions that Ollama provides two models: nomic-embed-text for generating embeddings, but it does not specify which one is used by Kenning Project."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Read that twice. The model opens by saying it doesn't know, then in the very next sentence names the correct embedding model — &lt;code&gt;nomic-embed-text for generating embeddings&lt;/code&gt; — and then closes by claiming the project doesn't specify which one it uses. It has the right answer sitting in its own explanation and still doesn't commit to it.&lt;/p&gt;

&lt;p&gt;My read on it: that original sentence names two models in one clause, each bound to a different job ("X for embeddings, Y for chat"), and a 3B model running on CPU can apparently retrieve the right name but not lock it in as a confident final answer. I haven't tested this side by side against a bigger model yet, but based on what I'm seeing, I'd assume something like an 8B+ model would commit to the answer instead of second-guessing itself — that's the next experiment, not something I've actually confirmed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 2: "What car brand do I like?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Zero source matches. No chunk even cleared the similarity threshold (&lt;code&gt;0.5&lt;/code&gt;) to make it into the prompt. Lowering the threshold to &lt;code&gt;0.3&lt;/code&gt; surfaced it — at an actual similarity score of &lt;code&gt;0.46&lt;/code&gt;, just under the original cutoff.&lt;/p&gt;

&lt;p&gt;This looks like the same kind of failure as Question 1, but I don't think it is. My test document was short enough (2,063 characters) to stay as a single chunk. That one chunk covers Spring AI, Tika, async processing, OAuth2 plans — and, almost as an afterthought in the last sentence, BMW. My working theory: embedding a chunk that mixed produces a vector that's diluted across five unrelated topics, so a focused query like "what car brand" scores lower against it than it would against a clean, topic-specific chunk. I haven't actually re-run it with smaller, topic-coherent chunks to confirm yet — but that's the fix I'd try next, and my guess is it would clear the threshold comfortably.&lt;/p&gt;

&lt;p&gt;Two questions, two failures that looked identical from where I was sitting ("the bot doesn't know basic facts about me") but, as far as I can tell, have different causes underneath. One I'd expect a bigger model to fix. The other I'd expect better chunking to fix. Telling them apart — even just as working theories — was the most useful thing to come out of this phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Phase 1 lands
&lt;/h2&gt;

&lt;p&gt;A document goes in, a question goes in, an answer comes back with its source chunk attached — all on a fully local stack, no API costs, end to end. It's slow, it's occasionally wrong in the specific way small models are wrong, and the chunking is still naive. But the architecture holds up, and proving that was the actual point of this phase.&lt;/p&gt;

&lt;p&gt;If you've actually debugged RAG pipelines at this level, I'd genuinely like to know whether the two working theories above hold up — the attribute-binding explanation for the embedding-model question, and the chunk-dilution explanation for the BMW question. I'm reasoning from what I observed here, not from having traced either one to the bottom, so if you've got more experience with this and either guess is wrong, half right, or missing something obvious, I'd love to hear it.&lt;/p&gt;

&lt;p&gt;Phase 2 is auth (Google OAuth2), a real upload UI, and multi-document support per user. I'll write that one up once it's running.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>spring</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
