<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fernando</title>
    <description>The latest articles on DEV Community by Fernando (@fernando77).</description>
    <link>https://dev.to/fernando77</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3264027%2F80f01c2d-286b-420d-80e0-a0eb8f50477b.png</url>
      <title>DEV Community: Fernando</title>
      <link>https://dev.to/fernando77</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fernando77"/>
    <language>en</language>
    <item>
      <title>RAG Architecture with n8n + PostgreSQL (pgvector) + Ollama Gemma4 on AWS EC2</title>
      <dc:creator>Fernando</dc:creator>
      <pubDate>Fri, 22 May 2026 21:31:12 +0000</pubDate>
      <link>https://dev.to/fernando77/rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2-1ke</link>
      <guid>https://dev.to/fernando77/rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2-1ke</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;A Complete Enterprise AI Knowledge Retrieval Architecture for Private Document Intelligence&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;br&gt;
This article explains a Retrieval-Augmented Generation (RAG) architecture using n8n, PostgreSQL with pgvector, Ollama, and Gemma4 running on AWS EC2. The platform automatically ingests emails and PDFs, creates embeddings, stores vectors in PostgreSQL, and retrieves contextual information to generate AI answers grounded in enterprise data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content&lt;/strong&gt;&lt;br&gt;
You can view a video of this Gemma4 architecture here:&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=bTP-sNKlsxc" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=bTP-sNKlsxc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RAG architectures combine vector search with large language models. In this solution, n8n orchestrates ingestion workflows and query processing. Emails and PDF documents are read automatically, text is extracted and cleaned, then split into semantic chunks. The chunks are embedded using the nomic-embed-text model and stored in PostgreSQL pgvector. When users ask questions, the question is embedded and compared against stored vectors to retrieve the most relevant chunks. Gemma4 then generates a final response using retrieved context.&lt;br&gt;
The architecture uses two AWS EC2 instances. The first server hosts n8n, PostgreSQL, Docker, and orchestration services. The second server hosts Ollama, Gemma4, and the embedding model. This separation improves scalability and isolates AI workloads from orchestration tasks.&lt;br&gt;
Docker containers simplify deployment and maintenance. PostgreSQL with pgvector enables semantic similarity search directly inside the relational database. This architecture is modular and can evolve with future embedding models and LLM technologies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business Applications&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;1.Customer Support AI&lt;/em&gt;&lt;br&gt;
Support teams can query manuals, troubleshooting guides, and ticket histories using natural language to accelerate customer assistance.&lt;br&gt;
&lt;em&gt;2.Enterprise Knowledge Management&lt;/em&gt;&lt;br&gt;
Organizations can centralize contracts, policies, reports, and procedures into an intelligent AI search platform.&lt;br&gt;
Financial Analytics&lt;br&gt;
Executives can ask natural language questions about sales trends, ERP reports, invoices, and operational metrics.&lt;br&gt;
Technical Details&lt;br&gt;
Infrastructure Requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 #1: t3.large or t3.xlarge with PostgreSQL, pgvector, Docker, and n8n.&lt;/li&gt;
&lt;li&gt;EC2 #2: GPU-enabled instance such as g6e.2xlarge for Ollama and Gemma4.&lt;/li&gt;
&lt;li&gt;Ubuntu 22.04 recommended.&lt;/li&gt;
&lt;li&gt;SSD storage and private VPC networking.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Implementation Details:&lt;/em&gt;&lt;br&gt;
n8n automates email ingestion, PDF extraction, chunking, embedding generation, and vector storage. Chunk sizes around 1000 characters with overlap improve semantic retrieval. PostgreSQL pgvector performs cosine similarity searches. Gemma4 receives contextual prompts generated from retrieved chunks.&lt;br&gt;
Security and Networking:&lt;br&gt;
Use HTTPS with reverse proxies, encrypted EBS volumes, private networking between EC2 instances, and restricted security groups to protect sensitive enterprise data.&lt;br&gt;
Estimated Costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 #1 daily cost: approximately USD $2–$3.&lt;/li&gt;
&lt;li&gt;EC2 #1 monthly cost: approximately USD $60–$90.&lt;/li&gt;
&lt;li&gt;EC2 #2 GPU server daily cost: approximately USD $25–$40.&lt;/li&gt;
&lt;li&gt;EC2 #2 monthly cost: approximately USD $750–$1,200.&lt;/li&gt;
&lt;li&gt;Total monthly infrastructure: approximately USD $850–$1,400 depending on workload.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firlyovfhd822dwnpm52k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firlyovfhd822dwnpm52k.png" alt="Figure 1: Architecture of two AWS EC2, one running gemma4 and the other N8N" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
Figure 1: Architecture of two AWS EC2, one running gemma4 and the other N8N&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;COMPLETE STEP-BY-STEP FLOW&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;STEP 1 — User Sends Data Into the System&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;According to the infographic:&lt;/p&gt;

&lt;p&gt;Emails with PDFs are read on EC2 #1.&lt;/p&gt;

&lt;p&gt;This is the beginning of the Ingestion Workflow.&lt;/p&gt;

&lt;p&gt;The company may receive:&lt;/p&gt;

&lt;p&gt;invoices&lt;br&gt;
manuals&lt;br&gt;
contracts&lt;br&gt;
reports&lt;br&gt;
customer emails&lt;br&gt;
technical documentation&lt;br&gt;
support tickets&lt;/p&gt;

&lt;p&gt;n8n automatically monitors:&lt;/p&gt;

&lt;p&gt;IMAP mailboxes&lt;br&gt;
folders&lt;br&gt;
APIs&lt;br&gt;
SharePoint&lt;br&gt;
Google Drive&lt;br&gt;
CRMs&lt;br&gt;
ERPs&lt;br&gt;
What n8n Does&lt;/p&gt;

&lt;p&gt;n8n acts as the automation orchestrator.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;New email arrives&lt;br&gt;
n8n detects it&lt;br&gt;
Downloads PDF attachment&lt;br&gt;
Starts the AI pipeline automatically&lt;/p&gt;

&lt;p&gt;No human intervention is required.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;&lt;em&gt;STEP 2 — PDF Text Extraction&lt;/em&gt;&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
The infographic shows:&lt;/p&gt;

&lt;p&gt;Extract PDF Text&lt;/p&gt;

&lt;p&gt;At this stage:&lt;/p&gt;

&lt;p&gt;PDFs are parsed&lt;br&gt;
text is extracted&lt;br&gt;
metadata is collected&lt;/p&gt;

&lt;p&gt;Metadata may include:&lt;/p&gt;

&lt;p&gt;sender&lt;br&gt;
date&lt;br&gt;
filename&lt;br&gt;
document type&lt;br&gt;
department&lt;br&gt;
customer ID&lt;br&gt;
Why This Matters&lt;/p&gt;

&lt;p&gt;LLMs cannot directly understand PDFs.&lt;/p&gt;

&lt;p&gt;The system must convert documents into raw text before AI processing.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;A 200-page manual becomes machine-readable text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;_STEP 3 — Clean and Normalize Text&lt;/strong&gt;&lt;br&gt;
_&lt;br&gt;
The infographic shows:&lt;/p&gt;

&lt;p&gt;Clean &amp;amp; Normalize Text&lt;/p&gt;

&lt;p&gt;Raw PDF extraction is usually messy.&lt;/p&gt;

&lt;p&gt;Problems include:&lt;/p&gt;

&lt;p&gt;broken lines&lt;br&gt;
duplicated spaces&lt;br&gt;
headers/footers&lt;br&gt;
page numbers&lt;br&gt;
encoding problems&lt;br&gt;
tables split incorrectly&lt;/p&gt;

&lt;p&gt;n8n cleans the content using scripts or functions.&lt;/p&gt;

&lt;p&gt;Example&lt;/p&gt;

&lt;p&gt;Before cleaning:&lt;/p&gt;

&lt;p&gt;Invoice      #2939&lt;/p&gt;

&lt;p&gt;Customer:&lt;br&gt;
ACME Corp&lt;/p&gt;

&lt;p&gt;Page 1&lt;/p&gt;

&lt;p&gt;After cleaning:&lt;/p&gt;

&lt;p&gt;Invoice #2939 Customer: ACME Corp&lt;br&gt;
&lt;strong&gt;&lt;em&gt;STEP 4 — Chunking the Text&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The infographic shows:&lt;/p&gt;

&lt;p&gt;Chunk Text&lt;/p&gt;

&lt;p&gt;This is one of the MOST important steps in RAG.&lt;/p&gt;

&lt;p&gt;Why Chunking Is Necessary&lt;/p&gt;

&lt;p&gt;LLMs have token limits.&lt;/p&gt;

&lt;p&gt;A 500-page document cannot be sent entirely to the model.&lt;/p&gt;

&lt;p&gt;So the document is split into smaller pieces called:&lt;/p&gt;

&lt;p&gt;Chunks&lt;/p&gt;

&lt;p&gt;Example chunk size:&lt;/p&gt;

&lt;p&gt;1000 characters&lt;br&gt;
200 overlap&lt;br&gt;
What Overlap Means&lt;/p&gt;

&lt;p&gt;Suppose chunk #1 ends with:&lt;/p&gt;

&lt;p&gt;The warranty expires after...&lt;/p&gt;

&lt;p&gt;Chunk #2 begins with:&lt;/p&gt;

&lt;p&gt;...after 24 months of operation.&lt;/p&gt;

&lt;p&gt;Overlap preserves semantic continuity.&lt;/p&gt;

&lt;p&gt;Without overlap:&lt;/p&gt;

&lt;p&gt;information can be lost&lt;br&gt;
meaning breaks between chunks&lt;br&gt;
&lt;strong&gt;&lt;em&gt;STEP 5 — Create Embeddings&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The infographic shows:&lt;/p&gt;

&lt;p&gt;Prepare Embedding Payload&lt;br&gt;
Call Embedding Server&lt;/p&gt;

&lt;p&gt;This is where semantic AI begins.&lt;/p&gt;

&lt;p&gt;What Is an Embedding?&lt;/p&gt;

&lt;p&gt;An embedding converts text into mathematical vectors.&lt;/p&gt;

&lt;p&gt;The embedding model understands meaning.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;"car"&lt;/p&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;p&gt;"vehicle"&lt;/p&gt;

&lt;p&gt;generate similar vectors.&lt;/p&gt;

&lt;p&gt;Embedding Process&lt;/p&gt;

&lt;p&gt;Chunk text is sent from EC2 #1 to EC2 #2.&lt;/p&gt;

&lt;p&gt;The embedding server uses:&lt;/p&gt;

&lt;p&gt;nomic-embed-text&lt;/p&gt;

&lt;p&gt;to transform text into vectors.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;[0.023, -0.991, 0.224, ...]&lt;/p&gt;

&lt;p&gt;These vectors may contain:&lt;/p&gt;

&lt;p&gt;768 dimensions&lt;br&gt;
1024 dimensions&lt;br&gt;
1536 dimensions&lt;/p&gt;

&lt;p&gt;depending on the model.&lt;/p&gt;

&lt;p&gt;Why Embeddings Are Powerful&lt;/p&gt;

&lt;p&gt;Traditional search uses keywords.&lt;/p&gt;

&lt;p&gt;Embeddings use:&lt;/p&gt;

&lt;p&gt;Semantic Meaning&lt;/p&gt;

&lt;p&gt;This means users can ask:&lt;/p&gt;

&lt;p&gt;How long is the warranty?&lt;/p&gt;

&lt;p&gt;Even if the document says:&lt;/p&gt;

&lt;p&gt;Coverage remains valid for 24 months.&lt;/p&gt;

&lt;p&gt;The system still finds the answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;_STEP 6 — Store Embeddings in PostgreSQL pgvector&lt;/strong&gt;&lt;br&gt;
_&lt;br&gt;
The infographic shows:&lt;/p&gt;

&lt;p&gt;Store Embeddings in PostgreSQL (pgvector)&lt;/p&gt;

&lt;p&gt;Now the vectors are saved in PostgreSQL.&lt;/p&gt;

&lt;p&gt;What Is pgvector?&lt;/p&gt;

&lt;p&gt;pgvector is an extension for PostgreSQL that adds:&lt;/p&gt;

&lt;p&gt;vector storage&lt;br&gt;
similarity search&lt;br&gt;
AI search capabilities&lt;/p&gt;

&lt;p&gt;Example table:&lt;/p&gt;

&lt;p&gt;id  chunk_text  embedding&lt;br&gt;
1   warranty info   [0.12, ...]&lt;br&gt;
Why PostgreSQL Is Used&lt;/p&gt;

&lt;p&gt;Advantages:&lt;/p&gt;

&lt;p&gt;mature database&lt;br&gt;
ACID compliance&lt;br&gt;
reliability&lt;br&gt;
backups&lt;br&gt;
indexing&lt;br&gt;
SQL support&lt;br&gt;
enterprise-ready&lt;/p&gt;

&lt;p&gt;Instead of needing a separate vector DB like Pinecone or Weaviate, pgvector keeps everything inside PostgreSQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;_STEP 7 — User Asks a Question&lt;br&gt;
_&lt;/strong&gt;&lt;br&gt;
The infographic says:&lt;/p&gt;

&lt;p&gt;User sends question (HTTPS POST /ask)&lt;/p&gt;

&lt;p&gt;A user opens the web interface and types:&lt;/p&gt;

&lt;p&gt;What is the warranty period for industrial pumps?&lt;/p&gt;

&lt;p&gt;The question goes to EC2 #1.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;STEP 8 — Create Embedding for the Question&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The infographic shows:&lt;/p&gt;

&lt;p&gt;Create Embedding for Question&lt;/p&gt;

&lt;p&gt;The question itself is transformed into a vector using the SAME embedding model.&lt;/p&gt;

&lt;p&gt;This is critical.&lt;/p&gt;

&lt;p&gt;If documents and questions use different embedding models:&lt;/p&gt;

&lt;p&gt;similarity breaks&lt;br&gt;
retrieval quality drops&lt;br&gt;
*&lt;em&gt;&lt;em&gt;STEP 9 — Vector Similarity Search&lt;/em&gt;&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
The infographic shows:&lt;/p&gt;

&lt;p&gt;Vector Search in PostgreSQL&lt;/p&gt;

&lt;p&gt;This is the core of RAG.&lt;/p&gt;

&lt;p&gt;How Similarity Search Works&lt;/p&gt;

&lt;p&gt;The question vector is compared against ALL stored chunk vectors.&lt;/p&gt;

&lt;p&gt;Using:&lt;/p&gt;

&lt;p&gt;cosine similarity&lt;br&gt;
Euclidean distance&lt;br&gt;
inner product&lt;/p&gt;

&lt;p&gt;PostgreSQL finds the chunks mathematically closest in meaning.&lt;/p&gt;

&lt;p&gt;Example&lt;/p&gt;

&lt;p&gt;User asks:&lt;/p&gt;

&lt;p&gt;How many vacation days do employees receive?&lt;/p&gt;

&lt;p&gt;The database may retrieve:&lt;/p&gt;

&lt;p&gt;Employees are entitled to 15 annual leave days.&lt;/p&gt;

&lt;p&gt;even without keyword matching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;STEP 10 — Build Context&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The infographic shows:&lt;/p&gt;

&lt;p&gt;Build Context&lt;/p&gt;

&lt;p&gt;The best matching chunks are combined together.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Chunk 1: vacation policy&lt;br&gt;
Chunk 2: HR policy&lt;br&gt;
Chunk 3: employment handbook&lt;/p&gt;

&lt;p&gt;The system assembles them into context.&lt;/p&gt;

&lt;p&gt;Why Context Is Critical&lt;/p&gt;

&lt;p&gt;LLMs hallucinate when lacking information.&lt;/p&gt;

&lt;p&gt;RAG prevents hallucinations by giving:&lt;/p&gt;

&lt;p&gt;Relevant Ground Truth Data&lt;/p&gt;

&lt;p&gt;The model answers from company knowledge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;_STEP 11 — Build Prompt for Gemma4&lt;br&gt;
_&lt;/strong&gt;&lt;br&gt;
The infographic shows:&lt;/p&gt;

&lt;p&gt;Build Prompt for Gemma4&lt;/p&gt;

&lt;p&gt;A structured prompt is generated.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;You are an enterprise assistant.&lt;/p&gt;

&lt;p&gt;Answer ONLY using the provided context.&lt;/p&gt;

&lt;p&gt;Context:&lt;br&gt;
[retrieved chunks]&lt;/p&gt;

&lt;p&gt;Question:&lt;br&gt;
How many vacation days do employees receive?&lt;/p&gt;

&lt;p&gt;This prompt engineering layer is extremely important.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;&lt;em&gt;STEP 12 — Send to Gemma4 via Ollama&lt;/em&gt;&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
The infographic shows:&lt;/p&gt;

&lt;p&gt;Send to Gemma4 (EC2 #2)&lt;/p&gt;

&lt;p&gt;The prompt is sent to Ollama.&lt;/p&gt;

&lt;p&gt;Ollama exposes APIs like:&lt;/p&gt;

&lt;p&gt;/v1/chat/completions&lt;/p&gt;

&lt;p&gt;Gemma4 processes:&lt;/p&gt;

&lt;p&gt;context&lt;br&gt;
instructions&lt;br&gt;
user question&lt;/p&gt;

&lt;p&gt;Then generates the final response.&lt;/p&gt;

&lt;p&gt;Why Ollama Is Important&lt;/p&gt;

&lt;p&gt;Ollama simplifies:&lt;/p&gt;

&lt;p&gt;local LLM serving&lt;br&gt;
model management&lt;br&gt;
GPU usage&lt;br&gt;
API exposure&lt;/p&gt;

&lt;p&gt;Without Ollama:&lt;/p&gt;

&lt;p&gt;running LLMs locally is much harder.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;STEP 13 — Return Answer to User&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The infographic ends with:&lt;/p&gt;

&lt;p&gt;Answer is returned to the user&lt;/p&gt;

&lt;p&gt;The final answer travels back:&lt;/p&gt;

&lt;p&gt;Gemma4 → EC2 #1 → Web Client&lt;/p&gt;

&lt;p&gt;The user receives a grounded response.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Employees receive 15 vacation days annually after completing one year of employment.&lt;br&gt;
Why This Architecture Is Powerful&lt;/p&gt;

&lt;p&gt;This architecture creates:&lt;/p&gt;

&lt;p&gt;Private AI&lt;/p&gt;

&lt;p&gt;Data stays inside AWS infrastructure.&lt;/p&gt;

&lt;p&gt;Semantic Search&lt;/p&gt;

&lt;p&gt;Searches by meaning, not keywords.&lt;/p&gt;

&lt;p&gt;Scalable AI&lt;/p&gt;

&lt;p&gt;You can scale:&lt;/p&gt;

&lt;p&gt;database&lt;br&gt;
workflows&lt;br&gt;
LLM server&lt;br&gt;
embedding server&lt;/p&gt;

&lt;p&gt;independently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Conclusions&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This RAG architecture demonstrates how organizations can build private and scalable AI systems using open-source technologies. The combination of n8n, PostgreSQL pgvector, Ollama, and Gemma4 enables intelligent retrieval of enterprise knowledge while maintaining full infrastructure control. The modular design supports future scalability, model upgrades, and advanced AI workflows.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
