DEV Community: Mario García

Anatomy of a RAG System Architecture

Mario García — Tue, 17 Mar 2026 21:39:59 +0000

For deploying a RAG System Architecture, consider that in a production environment requirements may vary when choosing a vector database, amount of data to be ingested, models used for creating embeddings, and architecture design when choosing a cloud platform. A RAG system can be built from scratch or implemented using solutions that already have the necessary components.

Following best practices is also critical when designing the system, to avoid common issues like hallucinations, or data exposure. Also consider that the model can be changed through the time, and using layer architecture may be helpful for future changes or updates.

What is a RAG System?

Presenting false or inaccurate information when not knowing the response, using unvalidated sources, or giving outdated data are some of the challenges for LLMs. How to solve this, and improve the knowledge base? Retrieval-Augmented Generation Architecture (RAG) is the approach for solving this problem.

RAG uses some methods, and tools for acquiring knowledge from additional data sources. This information is then converted into a format that LLMs can understand, using different database technologies for saving this information, providing advanced search capabilities for data retrieval.

RAG System Architecture Components

Data Sources

RAG systems can have a variety of data sources from which to learn or acquire information. Those include but are not limited to:

Documents in different formats (PDF, Word, TXT, Markdown)
Images
Audio or transcriptions
Datasets (CSV, JSON)
Application Programming Interfaces (APIs)
Relational databases (SQL)
NoSQL databases
Wikis or knowledge base
Web content
Internal files

Data ingestion in a RAG system goes through a process where:

Content is extracted from the data sources
Divided into chunks
Converted to embeddings
Stored in a vector database

Embedding Model

An embedding model is a machine learning model used for transforming data sources into vectors. Also known as embeddings, and are a series of float values representing the meaning of the data source. The model determines not only how embeddings are created but also the size of the vector, and is mainly provided by LLMs.

For apps built with Python use Gemini or OpenAI models through their official SDKs:

OpenAI Python AI library → github.com/openai/openai-python
Google GenAI Python SDK → github.com/googleapis/python-genai

Another option is to use LangChain, a framework for building agents, and LLM-powered applications. Requests to those models are sent through this framework.

LLMs can also be executed locally without sending data to the cloud, and is possible by using Open Source tools like Ollama. Models are downloaded from its own catalog.

A similar library is Sentence Transformers. It has access to the Hugging Face catalog from which models are downloaded to use locally, generally via PyTorch, without having to create an API key.

Vector Database

In a vector database, information is stored in vectors, better known as embeddings, containing numerical values that represent the meaning of the stored data. Query results are determined by the values in the vectors, returning nearest vectors.

Having this text: "Open source software is transforming the technology ecosystem.”, when converting the value to an embedding, will look like this: [-0.007894928, 0.0010742444, -0.03274113, -0.066677086, 0.004894953, 0.013967406, 0.0068471087, 0.009695383, 0.020278132, 0.019470839].

Which vector database to use? Depends on the requirements of the project, experience using vector databases, or how the RAG system is being implemented. If the system is being built from the ground up, the best option is the one that serves the needs of the project, in terms of requirements, and budget. Choose solutions like RAGFlow that already provide all the necessary components to implement the system when time plays a key role.

A few options to consider when choosing a vector database:

pgvector: An extension for PostgreSQL that adds the vector datatype, and search capabilities found in vector databases. The extension can be installed manually by compiling the source code from the official repository, or deployed via Docker or Kubernetes, as well as on cloud platforms that support PostgreSQL like Amazon Web Services (AWS), Google Cloud Platform (GCP) or Supabase.
Pinecone: A vector database designed for scalability, and to be used in production. Run as a managed service in AWS, GCP, and Azure. Every request to Pinecone is sent through an API that routes it to a control plane or a data plane, for managing projects, and indexes; reading, and writing data respectively.
Weaviate: An Open Source vector database. Can be configured via the Weaviate Cloud, using a cloud provider like GCP, to deploy an instance of the database. It uses an API key for authentication, and sending requests. Docker image is available for running locally.
Elasticsearch: Distributed, and analytics search engine, scalable data store, vector database focused on speed, production-ready, and scalability support. RAGFlow component, used as its default vector database, while testing, and improving Infinity. Can be used in production environments through Elastic Cloud or configured manually in the cloud platform of choice. A Docker image is available to use it locally.

Challenges in RAG System Architecture Deployment

Ready to deploy the RAG system? Some challenges may be encountered when designing the architecture, and preparing the system to deploy in a production environment.

A few challenges to take into consideration:

Hallucinations: Occur when a model in a RAG system answers with unexpected information due to poor prompt engineering, passing irrelevant chunks, or weak context selection. To fix it, use templates to tell the model how to answer when no information is found, validate periodically the data in the knowledge base to check for outdated or wrong information.
Data Exposure & Prompt Injection: When adopting RAG systems, some security issues are expected to happen if the architecture has a poor design. These issues include exposing sensitive data or tricking the model to bypass the rules when receiving malicious prompts. A way to avoid it is by applying input/output sanitization, and setting up guardrails.

Best Practices for RAG Deployment

Data Exposure & Prompt Injection: When adopting RAG systems, some security issues are expected to happen if the architecture has a poor design. These issues include exposing sensitive data or tricking the model to bypass the rules when receiving malicious prompts. A way to avoid it is by applying input/output sanitization, and setting up guardrails.

Decouple the Retrieval layer from the Generation layer: Following a layer approach when designing the system could help reduce hallucinations, maintain the information updated, and upgrades can be applied independently. The Retrieval layer is when data ingestion, embedding conversion, chunk division, and information storing happen, as well as query conversion into vectors for semantic search. While the Generation layer is where the retrieved information, and the original prompt are taken to produce a human-like response.
Embedding model replaceability: To prevent vendor lock-in, and allow upgrades to better models, design the system to enable embedding model replacement without having to make critical changes or having to rewrite the implementation. How to get it? Create a wrapper function to change between models, use a framework like LangChain instead of using individual SDKs, create metadata with model ID, and version, separate ingestion from retrieval, create a dataset for benchmarking that is constantly evaluated to determine when to change the model, and normalize data before embedding.

Conclusion

Each component in a RAG system architecture is key for the system to work, and provides better answers by improving the knowledge base, but depending on the requirements, and budget of the project, embedding models, vector databases, and cloud platforms are chosen. Already familiar with PostgreSQL? Choose pgvector, otherwise evaluate other solutions. Planning to build the system from scratch? Consider using Python for implementing the system, as there are frameworks, and libraries available for this language. And don’t forget to follow the recommendations.

PostgreSQL: First Approach to Vector Databases with pgvector and Python

Mario García — Sun, 15 Mar 2026 04:31:27 +0000

If you're already familiar with relational databases like PostgreSQL, you're one step closer to start with vector databases and build AI applications. Through this tutorial you'll learn how to enable vector capabilities on your PostgreSQL instance using pgvector, transform raw text into the required format using Python, and perform searches.

Set PostgreSQL as a Vector Database

First of all, what is a vector database? In a vector database, information is stored as vectors—often referred to as embeddings. These contain numerical values that represent the meaning of the data, allowing LLMs to interpret and relate information. Query results are determined by the values within these vectors, returning the nearest vectors based on similarity.

Imagine having a text like: 'Artificial Intelligence is transforming the way we process data in PostgreSQL.'. When converting this value using an embedding model, you'll get a high-dimensional vector that looks like this:

[-0.015540238, 0.0014074693, 0.009978753, -0.07941696, -0.027072648, 0.02588584, 0.0045492477, 0.050993927, 0.019187931, 0.0050778543]

We’ll discuss models, and vectors in detail later. For now, let’s get PostgreSQL ready.

Manual Installation

Suppose your PostgreSQL instance was installed manually or via your Linux distribution's package manager. To install pgvector, follow the instructions provided in the official repository. Before starting, make sure make, clang, and llvm are installed on your system.

Clone the repository:

cd /tmp
git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git

Compile the source code:

cd pgvector
make

Install the extension

sudo make install

Once the extension is installed, enable it within your PostgreSQL instance.

sudo -u postgres psql

Verify the extension was installed correctly:

\dx

You'll get the following output:

                                      List of installed extensions
  Name   | Version | Default version |   Schema   |                     Description                      
---------+---------+-----------------+------------+------------------------------------------------------
 plpgsql | 1.0     | 1.0             | pg_catalog | PL/pgSQL procedural language
 vector  | 0.8.2   | 0.8.2           | public     | vector data type and ivfflat and hnsw access methods

Set up a database to store movie synopses

CREATE DATABASE movies;

Select the database: \c movies
Enable the extension:

CREATE EXTENSION IF NOT EXISTS vector;

Running a Docker Container

If you prefer to deploy your PostgreSQL instance using Docker, just run the following command:

docker run -d --name postgres -e POSTGRES_PASSWORD=password -p 5432:5432 pgvector/pgvector:pg17

Replace password with your desired password.

The image used for creating the container is the official Docker image of pgvector that also includes the PostgreSQL server.

docker exec -it postgres psql -U postgres

Set up a database to store movie synopses

CREATE DATABASE movies;

Select the database: \c movies

Enable the extension:

CREATE EXTENSION IF NOT EXISTS vector;

Embedding Models

An embedding model is a machine learning model used to transform data sources into vectors.

The model determines not only how embeddings are created but also the dimensionality of the vector. While these are often provided by LLMs, they can also be specialized models designed solely for embedding tasks.

For apps built with Python, you can utilize Gemini or OpenAI models through their official SDKs:

OpenAI Python library → https://github.com/openai/openai-python
Google GenAI Python SDK → https://github.com/googleapis/python-genai

Another option is to use LangChain, a framework for building agents, and LLM-powered applications, which can facilitate requests to these models.

LLMs can also be executed locally without sending data to the cloud. This is possible using Open Source tools like Ollama, where models are downloaded from its own library.

A similar and highly popular library is Sentence Transformers. It provides access to the Hugging Face catalog, allowing you to download models to run locally—typycally via PyTorch—without the need for an API key.

Here's a comparison table of the most used embedding models, including name, provider, and dimensionality. Dimensionality refers to the size of the vector created by the model.

Model Name	Provider	Dimensionality
text-embedding-3-large	OpenAI	3072
text-embedding-3-small	OpenAI	1536
text-embedding-ada-002	OpenAI	1536
gemini-embedding-001	Google	3072

Transforming Data into Vectors

Gemini

If you're planning to use the model provided by Google, create an API key first.

Go to the API Keys page in the Google AI Studio
Click on Create API key
Assign a name to the key
Choose a project or create a new one
Click on Create key
Click on the created API key to see the details
Copy the API key

Now set the environment variable for the API key by running the following command:

export GEMINI_API_KEY='YOUR_API_KEY'

Replacing YOUR_API_KEY with the value of the key you prevously created.

Install the Python library:

pip install google-genai

If you want to transform this text: "A futuristic journey through the stars." into a vector. Here's how you can use the gemini-embedding-001 model from a Python script:

from google import genai

client = genai.Client()

text = "A futuristic journey through the stars."
model_name = "gemini-embedding-001"

response = client.models.embed_content(
    model=model_name,
    contents=text,
)

vector = response.embeddings[0].values

print(f"Dimension: {len(vector)}")
print(f"Vector preview: {vector[:5]}...")

The above script executes these tasks:

Initialize the client
Define the text to transform and the model to be used
Generate the embedding
Extract the vector, print the values, and the vector size

Run the script:

python get_embeddings.py

You'll get this output:

Dimension: 3072
Vector preview: [-0.003496697, 0.004707519, 0.02058491, -0.0735851, 0.0041175582]...

Showing only first 5 dimensions.

Sentence Transformers

You can run the models locally via Sentence Transformers without the need of an API key.

Install the Python library:

pip install sentence-transformers

CHoose a model from the Hugging Face catalog. Here's a Python script that uses the all-MiniLM-L6-v2 whose dimensionality is 384:

from sentence_transformers import SentenceTransformer

movies = [
    "A futuristic journey through the stars and the mysteries of the universe.",
    "A lighthearted story about two strangers falling in love in New York City.",
    "A gritty detective story set in a neon-lit city ruled by artificial intelligence."
]

model = SentenceTransformer('all-MiniLM-L6-v2')

embeddings = model.encode(movies)

for i, embedding in enumerate(embeddings):
    print(f"Movie {i+1} dimension: {len(embedding)}")
    print(f"Vector preview: {embedding[:3]}...")

Previous script executes the following tasks:

Define a list of texts to transform
Load the model
Generate the embeddings
Extract the vectors, print the values, and vector size

Run the script:

python get_embeddings.py

You'll get the following output:

Movie 1 dimension: 384
Vector preview: [-0.04579255  0.01413548 -0.01935582]...
Movie 2 dimension: 384
Vector preview: [ 0.02027559 -0.03948853  0.06786963]...
Movie 3 dimension: 384
Vector preview: [-0.01461564  0.01758054  0.00982607]...

Storing Generated Vectors in Your Database

Create the synopses table in the movies database:

\c movies
CREATE TABLE synopses (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding VECTOR(3072)
);

Note: The dimensionality defined for your VECTOR column must match exactly the output size of the embedding model you are using. For example, if you set VECTOR(3072) for a Gemini model, PostgreSQL will reject any attempts to insert embeddings from a model like all-MiniLM-L6-v2, which outputs 384 dimensions. If you decide to switch models in the future, you will need to either recreate the table or alter the column definition.

Install required Python libraries:

pip install psycopg2 pgvector

Here's the Gemini script, updated to transform the list of movie synopses, and later insert the vectors into the database:

import psycopg2
from google import genai
from pgvector.psycopg2 import register_vector

movies = [
    "A futuristic journey through the stars and the mysteries of the universe.",
    "A lighthearted story about two strangers falling in love in New York City.",
    "A gritty detective story set in a neon-lit city ruled by artificial intelligence."
]

client = genai.Client()
response = client.models.embed_content(
    model="gemini-embedding-001",
    contents=movies,
)

conn = psycopg2.connect("dbname=movies user=postgres password=secret")
register_vector(conn)
cur = conn.cursor()

for i, text in enumerate(movies):
    embedding = response.embeddings[i].values
    cur.execute("INSERT INTO synopses (content, embedding) VALUES (%s, %s)", (text, embedding))

conn.commit()
print("Vectors from Gemini successfully stored!")

cur.close()
conn.close()

Don't forget to set the environment variable for the API key, and replace the value of the following variable: password.

Run the script:

python vectors-to-postgresql.py

Previous script executes the following tasks:

Define a list of texts to transform
Initialize the client
Generate the embeddings
Establish a connection to the database
Insert each synopses with their respective vector
Close the connection

Conclusion

You've got your PostgreSQL instance up and running as a vector database. You’ve learned how to enable pgvector, how to transform raw text into embeddings using both cloud-based tools like Gemini and local models like Sentence Transformers, and how to store those vectors so your database can finally manage more than just plain text. You’ve built the foundation for an AI-powered app—now you’re all set to start experimenting with the data you’ve stored!

AI-Assisted Python: Refactoring and Reviewing with Copilot

Mario García — Tue, 10 Mar 2026 22:44:53 +0000

How much time do you spend debugging, implementing features, or refactoring code? Sometimes, these tasks consume hours. While pair programming often optimizes the process, you may find yourself working on a solo project. In the era of AI, you can integrate tools directly into your favorite editor and have a constant collaborator to help you improve your development workflow.

General-purpose LLMs like ChatGPT or Gemini can help you with these tasks. However, nothing compares to using a tool without having to leave your editor. In Visual Studio Code, you can use Copilot by installing the extension and logging in with your GitHub account. Let's see how it works.

To demonstrate, I selected a collection of Python scripts designed to generate test data for database schemas, which you can find in this repository.

Code Review

The following code block is from the sql.py script that generates data for MySQL and PostgreSQL:

if __name__ == "__main__":
    num_cores = cpu_count() - 1
    with Pool() as pool:
        data = pd.concat(pool.map(create_dataframe, range(num_cores)))

    data.to_sql(name='employees', con=engine, if_exists='append', index=False, dtype=schema)
    with engine.connect() as conn:
        conn.execute("ALTER TABLE employees ADD id INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;")
        # conn.execute("ALTER TABLE employees ADD COLUMN id SERIAL PRIMARY KEY;")

I asked Copilot to perform a code review. By selecting the code block, right-clicking, and choosing Generate Code > Review, the tool analyzed the logic and provided seven suggestions to improve the script's quality.

Copilot made the following suggestions:

Process Safety: It warned that cpu_count() - 1 could result in zero cores on single-core systems, causing the multiprocessing pool to fail or hang.
Database Compatibility: It identified that the ALTER TABLE statement was MySQL-specific and would fail on other backends like PostgreSQL or SQLite.
Security: To prevent SQL injection, it recommended wrapping raw SQL strings with the sqlalchemy.text() function.
Performance: For large datasets, it suggested using the chunksize parameter in to_sql() to ensure efficient batching.
Input Validation: It noted that pd.concat would fail if create_dataframe didn't return a valid DataFrame for every call.
Explicit Control: It recommended passing the processes argument explicitly to the Pool for better clarity and execution control.
Documentation: It suggested documenting the purpose of the schema argument, especially when imported from custom modules.

After reviewing the suggestions, I manually implemented the changes:

if __name__ == "__main__":
    num_cores = cpu_count() if cpu_count() <= 2 else cpu_count() - 1

    with Pool(processes=num_cores) as pool:
        data = pd.concat(pool.map(create_dataframe, range(num_cores)))

    data.to_sql(name='employees', con=engine, if_exists='append', index=False, dtype=schema)

    with engine.begin() as conn:
        conn.execute(text("ALTER TABLE employees ADD id INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;"))
        # conn.execute("ALTER TABLE employees ADD COLUMN id SERIAL PRIMARY KEY;")

Refactoring

Within the modules/ directory, you will find the core logic of the generator:

base.py: Handles the database connection setup and session management for both MySQL and PostgreSQL.
dataframe.py: Contains the logic to generate synthetic data and store it temporarily in a pandas DataFrame before ingestion.
schema.py: Defines the database schema and data types required for cross-database compatibility.

Open the base.py script and use the Copilot Chat to refactor the code.

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine("mysql+pymysql://user:password@localhost/company")
#engine = create_engine("postgresql+psycopg2://user:password@localhost:5432/company")
Session = sessionmaker(bind=engine)

Then I asked the tool to enable dynamic database connection selection for all the technologies supported by this project. The prompt used was the following:

Refactor my database logic into base.py. Create a function get_client() that returns either a SQLAlchemy Session (for MySQL/Postgres) or a PyMongo MongoClient (for MongoDB) based on the DB_TYPE environment variable. Ensure the function handles the connection logic for all three databases using credentials from os.getenv. Use PyMongo instead of Motor to keep the workflow synchronous. Follow PEP 8 strictly and include type hints to distinguish between the SQL session and the NoSQL client.

After analyzing the prompt, Copilot will review the code and suggest changes that you can apply directly from the chat interface, as shown in the following image.

After applying the changes, this is how the script looks like:

def _build_sql_url(db_type: str) -> str:
    """Build a SQLAlchemy URL for MySQL or PostgreSQL."""
    user = os.getenv("DB_USER", "")
    password = os.getenv("DB_PASSWORD", "")
    host = os.getenv("DB_HOST", "localhost")
    port = os.getenv("DB_PORT")
    db_name = os.getenv("DB_NAME", "")

    if not db_name:
        raise ValueError("DB_NAME environment variable must be set")

    if db_type == "mysql":
        driver = "mysql+pymysql"
        port = port or "3306"
    elif db_type == "postgres":
        driver = "postgresql+psycopg2"
        port = port or "5432"
    else:
        raise ValueError(f"Unsupported SQL DB_TYPE: {db_type!r}")

    return f"{driver}://{user}:{password}@{host}:{port}/{db_name}"


def _build_mongo_uri() -> str:
    """Return a MongoDB URI using authSource=admin."""
    user = os.getenv("DB_USER", "")
    password = os.getenv("DB_PASSWORD", "")
    host = os.getenv("DB_HOST", "localhost")
    port = os.getenv("DB_PORT", "27017")
    db_name = os.getenv("DB_NAME", "")

    if not db_name:
        raise ValueError("DB_NAME environment variable must be set")

    return (
        f"mongodb://{user}:{password}@{host}:{port}/{db_name}"
        "?authSource=admin"
    )


def get_client() -> Union[SASession, MongoClient]:
    """Return a DB client based on DB_TYPE.

    For 'mysql' or 'postgres' returns a SQLAlchemy Session instance.
    For 'mongodb' returns a pymongo.MongoClient.
    """
    db_type = os.getenv("DB_TYPE", "mysql").lower()

    if db_type in {"mysql", "postgres"}:
        url = _build_sql_url(db_type)
        engine = create_engine(url)
        return db_type, engine

    if db_type == "mongodb":
        uri = _build_mongo_uri()
        return db_type, MongoClient(uri)

    raise ValueError(f"Unsupported DB_TYPE: {db_type!r}")

You must create an .env file and set the environment variables for the database connection details.

# .env.example
DB_TYPE=mysql
DB_USER=your_user
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=3306
DB_NAME=your_database

Where DB_TYPE can be set to any of the following values: mysql, postgres, and mongodb.

Now that the database connection selection is configured, the sql.py script must be adapted to automatically select the database and insert the data based on the environment variables in the .env file:

if __name__ == "__main__":
    num_cores = cpu_count() if cpu_count() <= 2 else cpu_count() - 1

    db_type, client = get_client()

    with Pool(processes=num_cores) as pool:
        data = pd.concat(pool.map(create_dataframe, range(num_cores)))

    if db_type in {"mysql", "postgres"}:
        data.to_sql(name='employees', con=client, if_exists='append', index=False, dtype=schema)

        with client.begin() as conn:
            if db_type == "mysql":
                conn.execute(text("ALTER TABLE employees ADD id INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;"))
            elif db_type == "postgres":
                conn.execute(text("ALTER TABLE employees ADD COLUMN id SERIAL PRIMARY KEY;"))
    elif db_type == "mongodb":
        data_dict = data.to_dict('records')
        db = client["company"]
        collection = db["employees"]
        collection.insert_many(data_dict)

You must rename the sql.py script to generate_data.py, and delete the mongodb.py script.

Running Tests

Configure your database instance to validate that the script is working. Run any of the following instructions to start the corresponding container. Choose the one you're using for your project.

MariaDB:

docker run -d \
  --name mariadb \
  -p 3306:3306 \
  -e MARIADB_ROOT_PASSWORD=12345 \
  -e MARIADB_DATABASE=company \
  mariadb

PostgreSQL:

docker run -d \
  --name postgres \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=12345 \
  -e POSTGRES_DB=company \
  postgres

MongoDB

docker run -d \
  --name db-mongo \
  -p 27017:27017 \
  -e MONGO_INITDB_ROOT_USERNAME=username \
  -e MONGO_INITDB_ROOT_PASSWORD=12345 \
  mongo

Then edit the .env file and replace the values of the environment variables with the connection details.

Now you can run the script:

python generate_data.py

The script is designed to create data for a database that stores information about employees, but you can adapt it to your needs.

Conclusion

While GitHub Copilot is a powerful tool to integrate within your workflow, it is essential to remember that you must always review and validate the suggested code, as it may not work as expected. The quality of the output depends heavily on your input; therefore, your prompts should be as descriptive as possible, providing clear context about your architectural requirements and dependencies.

From Prototype to Pharmacy Dashboard: Scaling an AI-Generated App with Google Gemini

Mario García — Sun, 01 Mar 2026 01:33:35 +0000

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

Last year, a friend hired me to develop a custom dashboard for his pharmacy. At the time, I was working daily with Vue.js and TypeScript, but I wanted to find a platform to accelerate the initial implementation. That is when I discovered Tempo, an AI-powered design and development tool for React.

The platform provided enough daily credits on its free tier to get started, and I loved that the code synced directly with a GitHub repository. I used most of my credits for bug fixing and feature implementation; however, the platform eventually tightened its credit limits, resetting them every 24 hours. Despite these constraints, I managed to get an initial implementation ready for testing, which included the following features:

User Management
Branch Management
Inventory Management
Monthly Sales Reporting
Multi-language Support (Spanish & English)
IndexedDB for Local Data Storage

Due to these limitations and several features yet to be built, I transitioned the development to Google Gemini to take the project to production.

Gemini played a key role in bug fixing, feature development, and migration of the database from local storage to Cloud Firestore. Additionally, it guided the deployment strategy, using GitLab Pages and Vercel to take the project to a production-ready state.

Additional features implemented with Gemini:

Dual-Layer Inventory: Divided into Pharmacy and Warehouse stocks.
Daily and Weekly Sales Reporting
Inventory Search and Editing
Orders Module
Daily, Weekly and Monthly Orders Reporting
Daily Sales and Orders Editing
POS Module with Barcode Scanner Support
Firestore User Management
Stock Alerts
PDF Reports Download

Demo

I don't have a public demo instance available since the only live version is the one my client is using (and it requires a login, of course!). So, I’ve put together some screenshots to show you how the dashboard looks like.

Access & Security (Sign In): The entry point of the app, integrated with Firebase Authentication for secure access.

The Command Center (Dashboard): A high-level overview of the pharmacy's health, from stock alerts to quick sales stats.

Dual-Layer Inventory: This is where the magic happens: managing stock between the Pharmacy and the Warehouse.

Point of Sale (Sales Module): The real-time interface for pharmacists, featuring barcode scanner support for quick checkouts.

Data-Driven Decisions (Sales Report): Daily, weekly, and monthly analytics that help the owner understand their revenue trends.

Supply Chain Management (Orders): The module built to handle restocking and tracking orders for new medicine supplies.

What I Learned

This project was a continuous learning journey. Here are the key technical milestones I reached with Gemini’s guidance:

Mastering AI-Assisted Development: I learned to write better, more contextual prompts. By using tools like Tempo for the initial UI and Gemini to refactor auto-generated code, I was able to transform a simple prototype into a production-ready system.
From IndexedDB to Cloud Firestore: I hadn't used IndexedDB or SQLite before. While I learned to configure their connections and CRUD methods, I quickly realized their biggest constraint: local data doesn't sync across devices. As a developer with experience in MySQL, and PostgreSQL, I needed a scalable, cloud-native solution. Gemini suggested Cloud Firestore, and its free tier perfectly satisfied the project's requirements. Although I had previous experience with NoSQL databases like MongoDB, this was my first time with Firestore. I learned how data is structured and how to use Firebase Authentication to handle users securely without storing sensitive login data in plain documents.
Scalable Solution (Two Repositories): I decided to separate the project into two distinct repositories:

The Backend API: Built to solve a specific challenge: creating users without terminating the current administrator's session. This API manages user creation, updates, and authentication independently. I deployed it using Vercel (my first time using the platform) and configured a GitLab CI/CD pipeline for automated deployment.

The Frontend Application: Deployed via GitLab Pages. I also configured a dedicated GitLab CI pipeline to build and deploy the application automatically.
Overcoming SPA Routing Hurdles on GitLab Pages: During the deployment of the Frontend, I faced a common issue with Single Page Applications (SPAs): 404 errors when refreshing routes. By diving into the GitLab documentation and working with Gemini, I learned how GitLab Pages handles routing and implemented the necessary configuration to ensure the SPA router functioned correctly in production.
Local Development with gcloud Emulator: To ensure a safe and efficient workflow, I learned how to test database changes using the gcloud Emulator. This allowed me to validate Firestore security rules and data structures locally before deploying them to the cloud. I found this process so valuable that I documented it in a dedicated article.

Google Gemini Feedback

Working with Gemini was a game-changer for this project, but like any powerful tool, it has its learning curve. Here’s my feedback:

What Worked Well:

Guidance: Gemini excelled when I asked concrete questions about technology stacks (like the move from IndexedDB to Firestore).
Refactoring: When I shared specific code snippets and explained the required business logic, Gemini was incredibly effective at suggesting precise changes and handling complex refactors.
Problem-Solving: It was instrumental in debugging the SPA routing issues on GitLab Pages and configuring the CI/CD pipelines for Vercel.

Areas for Improvement:

Repository Integration: Currently, Gemini struggles to analyze an entire repository at once. I believe it would be a massive productivity boost if Gemini could connect directly to a GitHub or GitLab repository to understand the full codebase context.
IDE Extension: While the web interface is great, having a more integrated experience directly within the code editor (beyond simple autocomplete) would reduce the friction of switching back and forth.
Accuracy & Verification: AI responses are not always 100% accurate. I quickly learned that you must always review and test suggested changes before merging them, as some suggestions can inadvertently break existing module logic.

App Localization with Python and Argos Translate

Mario García — Fri, 27 Feb 2026 03:14:36 +0000

Last year, I worked on localizing a platform from Spanish to English. The strings were stored in JSON files within a directory called es, and the goal was to generate the same files translated into English and save them in a directory named en. Here's an article I wrote on how I optimized the process by generating the initial localization using Python and DeepL.

However, DeepL is not Open Source and its usage is limited depending on the plan you choose. So, what’s a good Open Source alternative? Argos Translate—and in this article, I’ll show you how to use it for your localization projects.

Argos Translate is an Open Source tool that uses OpenNMT for translations and works offline. They also offer LibreTranslate, an API built on top of Argos Translate that doesn't require creating an account.

It can be used as a Python library, a command-line tool, or a GUI application. For this workflow, it’s recommended to use the Python library along with the API provided by LibreTranslate.

Install Dependencies

Before importing it to your Python script, you may install it by running the following command:

pip install argostranslate

Make sure that compatible versions of the following dependencies are installed:

urllib3
charset-normalizer
chardet

To get them just run:

pip install --upgrade --force-reinstall urllib3==1.26.19 charset-normalizer==2.1.1 chardet==4.0.0

If you want to try LibreTranslate, install it by executing:

pip install libretranslate

Download Language Model

Language models can be installed via the Python library, downloaded using the command-line tool, or obtained manually from the Argos Translate Package Index.

Via the command-line tool, by running:

argospm install translate-es_en

Here, es is the original language, and en is the target language.

If you use argospm, downloading the model from Python is optional, but here's the script to do it:

import argostranslate.package

from_code = "es"
to_code = "en"

argostranslate.package.update_package_index()
available_packages = argostranslate.package.get_available_packages()
package_to_install = next(
    filter(
        lambda x: x.from_code == from_code and x.to_code == to_code, available_packages
    )
)
argostranslate.package.install_from_path(package_to_install.download())

First, import the required library: argotranslate.package.

Set source and target language:

from_code = "es"
to_code = "en"

Update the package index and get a list of the available packages:

argostranslate.package.update_package_index()
available_packages = argostranslate.package.get_available_packages()

Find and get the name of the package to install by filtering from the available packages:

package_to_install = next(
    filter(
        lambda x: x.from_code == from_code and x.to_code == to_code, available_packages
    )
)

And finally, install the model:

argostranslate.package.install_from_path(package_to_install.download())

Using the Command-line Tool

It will work for direct translation, but not for translating JSON files. If you want to translate text, you can use it this way:

argos-translate --from es --to en "¡Hola mundo!"

You'll get the following output: Hey, World!

Using the Python Library

Suppose you have the following content in a JSON file:

{
  "tipo-perfil": {
    "label": "Tipo de perfil",
    "description": "Tipo de perfil",
    "tooltip": "Tipo de perfil",
    "validations": {
        "required": "El campo Tipo de perfil es requerido",
        "minMessage": "El número de caracteres debe ser de al menos {min}",
        "maxMessage": "El número de caracteres debe ser máximo de {max}",
        "regexMessage": "Formato de Tipo de perfil inválido"
    }
  }
}

As the language model is already installed, you can translate the string in the JSON file with the following Python script:

import json
import argostranslate.translate

installed_languages = argostranslate.translate.get_installed_languages()
spanish = next(filter(lambda l: l.code == "es", installed_languages))
english = next(filter(lambda l: l.code == "en", installed_languages))
translation = spanish.get_translation(english)

def translate_json(obj, translator):
    if isinstance(obj, dict):
        return {k: translate_json(v, translator) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [translate_json(i, translator) for i in obj]
    elif isinstance(obj, str):
        return translator.translate(obj)
    else:
        return obj

with open("input.json", "r", encoding="utf-8") as f:
    data = json.load(f)

translated_data = translate_json(data, translation)

with open("translated.json", "w", encoding="utf-8") as f:
    json.dump(translated_data, f, indent=2, ensure_ascii=False)

Import the required libraries: json & argostranslate.translate

Load the language models installed:

installed_languages = argostranslate.translate.get_installed_languages()
spanish = next(filter(lambda l: l.code == "es", installed_languages))
english = next(filter(lambda l: l.code == "en", installed_languages))
translation = spanish.get_translation(english)

Open the JSON file and read the content:

with open("input.json", "r", encoding="utf-8") as f:
    data = json.load(f)

Call the function that translates the content of the JSON file:

translated_data = translate_json(data, translation)

And save the result in a new file:

with open("translated.json", "w", encoding="utf-8") as f:
    json.dump(translated_data, f, indent=2, ensure_ascii=False)

The recursive translate_json function takes an object that can be a dictionary, list, or string. Strings are translated directly, while dictionaries and lists are processed recursively to translate all nested strings, ensuring the entire JSON is translated correctly regardless of depth.

If you have multiple JSON files in a folder and subfolders, you can extend the script to process all of them automatically.

Add the os module to the imports.

import os

Set input and output folders:

input_folder = "es"
output_folder = "en"
os.makedirs(output_folder, exist_ok=True)

Then, use the following code to recursively translate all JSON files while preserving the folder structure:

for root, dirs, files in os.walk(input_folder):
    for filename in files:
        if filename.endswith(".json"):
            input_path = os.path.join(root, filename)
            # Mantener la misma estructura de subcarpetas en output
            relative_path = os.path.relpath(input_path, input_folder)
            output_path = os.path.join(output_folder, relative_path)
            os.makedirs(os.path.dirname(output_path), exist_ok=True)

            with open(input_path, "r", encoding="utf-8") as f:
                data = json.load(f)
            translated_data = translate_json(data)
            with open(output_path, "w", encoding="utf-8") as f:
                json.dump(translated_data, f, indent=2, ensure_ascii=False)

The above code will replace this single-file block:

with open("input.json", "r", encoding="utf-8") as f:
    data = json.load(f)

translated_data = translate_json(data, translation)

with open("translated.json", "w", encoding="utf-8") as f:
    json.dump(translated_data, f, indent=2, ensure_ascii=False)

Using the LibreTranslate API

You can also translate your JSON files using the LibreTranslate API instead of using local models. The workflow is almost identical to the previous script, with only a few changes.

Remove argostranslate.translate from imports and add requests.

import json
import requests

Create the translate_text function, set the source and target languages, and send a request to the API.

def translate_text(text):
    source = "es"
    target = "en"
    response = requests.post(
        "https://translate.argosopentech.com/translate",
        json={"q": text, "source": source, "target": target},
        headers={"Content-Type": "application/json"}
    )
    return response.json()["translatedText"]

The translate_json function will now call translate_text, which uses the API, instead of the local translator.

def translate_json(obj, translator):
    if isinstance(obj, dict):
        return {k: translate_json(v, translator) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [translate_json(i, translator) for i in obj]
    elif isinstance(obj, str):
        return translator.translate(obj)
    else:
        return obj

If you're translating a single file, you must open it first, and load the data, call the translate_json function, and save the result in another file.

with open("input.json", "r", encoding="utf-8") as f:
    data = json.load(f)

translated_data = translate_json(data)

with open("translated.json", "w", encoding="utf-8") as f:
    json.dump(translated_data, f, indent=2, ensure_ascii=False)

If you want to translate multiple JSON files recursively, use the following code:

for root, dirs, files in os.walk(input_folder):
    for filename in files:
        if filename.endswith(".json"):
            input_path = os.path.join(root, filename)
            relative_path = os.path.relpath(input_path, input_folder)
            output_path = os.path.join(output_folder, relative_path)
            os.makedirs(os.path.dirname(output_path), exist_ok=True)

            with open(input_path, "r", encoding="utf-8") as f:
                data = json.load(f)

            translated_data = translate_json(data)

            with open(output_path, "w", encoding="utf-8") as f:
                json.dump(translated_data, f, indent=2, ensure_ascii=False)

And don't forget to add the os module to the imports, and set input and output folders:

import os

input_folder = "es"
output_folder = "en"
os.makedirs(output_folder, exist_ok=True)

Conclusion

In this article, you learned how to use Argos Translate and LibreTranslate to simplify translating JSON-based applications.

BSON to JSON: Efficient Data Conversion with Java

Mario García — Mon, 02 Feb 2026 21:36:53 +0000

Through this blog post, you will learn how to convert a BSON document to JSON using Java.

BSON to JSON with Java

If you’re a Java developer, there are two ways to read BSON documents and convert them to JSON.

Using the MongoDB Java Driver to query an active database
Reading and parsing local .bson files at the byte level

Let's create a sample project.

mvn archetype:generate \
  -DgroupId=com.your-domain \
  -DartifactId=bson-to-json \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DinteractiveMode=false

Now, add dependencies and configure the project by editing the pom.xml file as follows:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.your-domain</groupId>
  <artifactId>bson-to-json</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>bson-to-json</name>

  <properties>
    <maven.compiler.release>11</maven.compiler.release>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongodb-driver-sync</artifactId>
      <version>5.3.1</version>
    </dependency>

    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-nop</artifactId>
      <version>2.0.9</version>
    </dependency>

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.13.2</version> <scope>test</scope>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <version>3.1.0</version>
        <configuration>
          <mainClass>com.your-domain.App</mainClass>
          <cleanupDaemonThreads>false</cleanupDaemonThreads>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

Lock the project to a specific Java version (in this case, Java 11) to ensure consistent compilation across different environments

  <properties>
    <maven.compiler.release>11</maven.compiler.release>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

Add the MongoDB driver as dependency

    <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongodb-driver-sync</artifactId>
      <version>5.3.1</version>
    </dependency>

Silence internal MongoDB driver logs using SLF4J-NOP

    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-nop</artifactId>
      <version>2.0.9</version>
    </dependency>

Automate dependency linking and silence MongoDB's background thread warnings by using the exec-maven-plugin

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <version>3.1.0</version>
        <configuration>
        <mainClass>com.your-domain.App</mainClass>
        <cleanupDaemonThreads>false</cleanupDaemonThreads>
        </configuration>
      </plugin>
    </plugins>
  </build>

Using the MongoDB Java Driver to query an active database

Now, edit the App.java file stored in the src/main/java/com/your-domain/.

package com.your-domain;

import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;
import org.bson.json.JsonWriterSettings;

import java.io.FileWriter;
import java.io.IOException;

public class App {
    public static void main(String[] args) {
        String uri = "mongodb://user:password@localhost:27017/?authSource=admin";

        try (MongoClient mongoClient = MongoClients.create(uri)) {
            MongoDatabase database = mongoClient.getDatabase("database");
            MongoCollection<Document> collection = database.getCollection("collection");

            JsonWriterSettings settings = JsonWriterSettings.builder().indent(true).build();

            try (FileWriter file = new FileWriter("collection.json")) {
                file.write("[\n");

                for (Document doc : collection.find()) {
                    file.write(doc.toJson(settings) + ",\n");
                }

                file.write("]");
                System.out.println("Exported successfully!");
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

Declares the package that the Java class belongs to
```
package com.your-domain;
```
Establish connection to the MongoDB instance
```
        try (MongoClient mongoClient = MongoClients.create(uri))
```
Where uri is the connection string.

Target the specific database and collection from which the data will be exported

            MongoDatabase database = mongoClient.getDatabase("company");
            MongoCollection<Document> collection = database.getCollection("employees");

Pretty format the content of the JSON file

            JsonWriterSettings settings = JsonWriterSettings.builder().indent(true).build();

Manually construct and write a JSON array by wrapping the converted documents in brackets

                file.write("[\n");
                for (Document doc : collection.find()) {
                    file.write(doc.toJson(settings) + ",\n");
                }
                file.write("]");

Reading and parsing local .bson files at the byte level

Now, edit the App.java file stored in the src/main/java/com/your-domain/.

package com.your-domain;

import org.bson.Document;
import org.bson.BsonType;
import org.bson.codecs.DocumentCodec;
import org.bson.codecs.DecoderContext;
import org.bson.BsonBinaryReader;
import org.bson.io.ByteBufferBsonInput;
import org.bson.ByteBufNIO;
import org.bson.json.JsonWriterSettings;

import java.io.FileInputStream;
import java.io.FileWriter;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class App {
    public static void main(String[] args) throws Exception {
        String inputPath = "collection.bson";
        String outputPath = "collection.json";
        JsonWriterSettings settings = JsonWriterSettings.builder().indent(true).build();
        DocumentCodec codec = new DocumentCodec();

        try (FileInputStream fis = new FileInputStream(inputPath);
             FileChannel channel = fis.getChannel();
             FileWriter writer = new FileWriter(outputPath)) {

            ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
            channel.read(buffer);
            buffer.flip();

            try (ByteBufferBsonInput bsonInput = new ByteBufferBsonInput(new ByteBufNIO(buffer));
                 BsonBinaryReader reader = new BsonBinaryReader(bsonInput)) {

                writer.write("[\n");
                boolean first = true;

                while (buffer.hasRemaining()) {

                    if (reader.readBsonType() == BsonType.END_OF_DOCUMENT) {
                        break;
                    }

                    if (!first) {
                        writer.write(",\n");
                    }

                    Document doc = codec.decode(reader, DecoderContext.builder().build());
                    writer.write(doc.toJson(settings));
                    first = false;
                }

                writer.write("\n]");
                System.out.println("Conversion completed successfully: " + outputPath);
            }
        }
    }
}

Initialize the file paths, configures the JSON output with indentation for readability and sets up the engine for translating binary data into Java objects

    String inputPath = "collection.bson";
    String outputPath = "collection.json";
    JsonWriterSettings settings = JsonWriterSettings.builder().indent(true).build();
    DocumentCodec codec = new DocumentCodec();

Allocates a memory buffer the exact size of the file and uses a FileChannel to load the binary data for processing

            ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
            channel.read(buffer);
            buffer.flip();

Wraps the memory buffer into a ByteBufferBsonInput and creates a BsonBinaryReader to navigate the binary structure

                try (ByteBufferBsonInput bsonInput = new ByteBufferBsonInput(new ByteBufNIO(buffer));
                  BsonBinaryReader reader = new BsonBinaryReader(bsonInput)) {

Writes the opening bracket and uses a loop to iterate through the buffer until no documents remain

                  writer.write("[\n");
                  boolean first = true;

                  while (buffer.hasRemaining()) {
                      if (reader.readBsonType() == BsonType.END_OF_DOCUMENT) {
                          break;
                      }
                      // ... loop logic
                  }

Translates each binary BSON segment into a Java Document and writes it to the file as a formatted JSON string

                  Document doc = codec.decode(reader, DecoderContext.builder().build());
                  writer.write(doc.toJson(settings));

The BSON file must be in the root directory of your Maven project. Now, you can run the above code by executing the following command:

mvn clean compile exec:java

It will do a clean build, delete the target folder, compile the Java code, and run the project.

After running, you will find the corresponding JSON file in the root directory of your project

Conclusion

If you’re a developer, you can use the MongoDB Java driver to query and analyze database collections before exporting them to JSON, or directly parse local BSON files for a quick format conversion.

BSON to JSON: The Python Way

Mario García — Tue, 27 Jan 2026 03:51:41 +0000

Through this blog post, you will learn how to convert a BSON document to JSON using Python.

BSON to JSON with Python

If you’re a Python developer, there are two ways for reading a BSON document and converting it to JSON.

Using the bson module from PyMongo

from bson import decode_all
from bson.json_util import dumps

with open('./data.bson','rb') as f:
    data = decode_all(f.read())

with open("./data.json", "w") as outfile:
    outfile.write(dumps(data, indent=2))

This is what the script is doing:

Import the `decode_all` and `dumps` methods from the `bson` module
Open the file to read the content and decode the data
Create a JSON file, and write the JSON document created from the data of the BSON file

The script works with BSON files generated by mongodump. Before running the script, you must install PyMongo: pip install pymongo.

Connecting to the database and querying the data with PyMongo, the Python driver for MongoDB.

from pymongo import MongoClient
from bson.json_util import dumps

uri = "mongodb://username:password@host:port/"
client = MongoClient(uri)

db = client.company
employees = db.employees

cursor = employees.find()
list_cur = list(cursor)

json_data = dumps(list_cur, indent = 2)

with open('data.json', 'w') as file:
    file.write(json_data)

This is what the script is doing:

Import the `MongoClient` method from the `pymongo` library, and the `dumps` method from the `bson` module
Establish the connection to the database
Set the database (e.g., `company` ) and the collection (e.g., `employees`) you want to query
Retrieve the documents in the collection with the `find()` method and create a list with the result. If you don’t pass any parameter to this method, the result will be similar to `SELECT *` in MySQL
Create a JSON object by calling the `dumps` method. The `indent = 2` parameter will tell `dumps()` to pretty format the JSON object
Write the content of the `json_data` variable to the `data.json` file

Before running the script, you must install PyMongo: pip install pymongo.

Conclusion

If you’re a developer, you can use the MongoDB driver of your programming language of choice and query the data to analyze the content of the collections in your database. For Python, you can install PyMongo, connect to the database, query the data and use the bson module to save the content as a JSON document.

BSON to JSON: The Standard Tools

Mario García — Tue, 27 Jan 2026 03:44:03 +0000

Binary Javascript Object Notation (BSON) is a binary-encoded serialization of JSON documents. JSON is easier to understand as it is human-readable, but compared to BSON, it supports fewer data types. BSON has been extended to add some optional non-JSON-native data types, like dates and binary data.

MongoDB stores data in BSON format both internally and over the network. It is also the format used for the output files generated by mongodump. To read the content of a BSON document, you have to convert it to a human-readable format like JSON.

Through this blog post, you will learn how to convert a BSON document to JSON. Some of the methods I will explain include using bsondump, mongoexport, and Bash.

BSON to JSON with bsondump

The bsondump converts BSON files into human-readable formats, including JSON. For example, bsondump is useful for reading the output files generated by mongodump. The bsondump tool is part of the MongoDB Database Tools package.

Run bsondump from the system command line:

bsondump --outFile=collection.json collection.bson

It will create a JSON file (collection.json) from an existing BSON document (collection.bson), like the ones created after backing up your database.

BSON to JSON with mongoexport

mongoexport is a command-line tool that produces a JSON or CSV export of data stored in a MongoDB instance. The mongoexport tool is part of the MongoDB Database Tools package.

Run mongoexport from the command line:

mongoexport --collection=employees --db=company --out=employees.json --pretty

To connect to a local MongoDB instance running on port 27017, you do not have to specify the host or port. If otherwise needed, check the Connect to a MongoDB Instance section in the documentation for more information.

The --pretty option will pretty format the content of the JSON file.

BSON to JSON with Bash

I asked the AI at phind.com to tell me how to convert a BSON file to JSON, and one of the solutions that it showed me was to create a Bash script in the directory where the BSON files are.

#!/bin/bash
declare -a bson_files
bson_files=( $(ls -d $PWD/*.bson) )

for file in "${bson_files[@]}"; 
do 
bsondump $file --outFile=$file.json
done

The script lists all the BSON files in the present directory and saves the result in an array, then loops through the array and converts every BSON file to JSON files. The script uses bsondump.

To run the script

Add execution permission to the script: chmod +x bson_to_json.sh
Execute this command in the command line:

./bson_to_json.sh

Conclusion

If you want to read the content of a BSON document, you can use bsondump and mongoexport to convert a BSON document to a human-readable format like JSON. These tools are part of the MongoDB Database Tools.

There are other solutions like online tools, but here you learned some ways to do it.

Local Email Testing with Python and Mailpit

Mario García — Mon, 26 Jan 2026 23:27:16 +0000

I'm currently building an app that automates the logistics of tech conferences. It generates certificates of participation for both attendees and speakers and also takes care of sending invitations to prospective presenters. Since it emails multiple recipients, the question arises: in a development environment, how do you test email sending without using real accounts?

In this tutorial, you'll learn how to configure a fake SMTP server and run email tests for Python apps.

Configure a Local SMTP Server

I'm using Mailpit, an Open Source email testing tool. It can be installed following the instructions in the Installation section of the official repository, or by using Docker.

To ensure your data survives a container restart, run the Docker container with a volume to enable persistence:

docker run -d \
  --name mailpit \
  -p 1025:1025 \
  -p 8025:8025 \
  -v $(pwd)/mailpit-data:/data \
  axllent/mailpit

The server listens for SMTP traffic on port 1025, while the web-based dashboard is accessible via port 8025.

Running Tests for Python

Let's create a script to test our email logic. The script will perform the following tasks:

Create a list of random names and emails using Faker
Construct MIME headers (From, To, Subject)
Render the HTML body
Establish and SMTP connection and transmit the data

Create a recipient list.

First, we generate a list of participants.

from faker import Faker

fake = Faker('en_US')

if __name__ == "__main__":
    participants = [(fake.name(), fake.ascii_company_email()) for _ in range(10)]

The generated data will look like this:

Name                      | Email                         
-------------------------------------------------------
Jessica Powell            | ryan40@atkinson.com           
Chelsey Glover            | pstevens@hurst.com            
Sheryl Williams           | kenneth61@williams-jacobson.com
Paula Boyd                | larsontheresa@dean.com        
Maxwell Kelly             | justinestrada@willis.org      
Carl Morrow               | pmorris@cross.biz             
David Webb                | abigailfields@holt.com        
Tyler Wolfe               | williamsanna@martinez.info    
Joshua Medina             | williamsrodney@medina.biz     
Mrs. Donna Butler         | williamsmartin@eaton.com

Construct MIME Headers

We use Python's built-in email.mime library to structure the message.

...
from email.mime.multipart import MIMEMultipart

def send_simple_email(recipient_email, recipient_name):
    SENDER_EMAIL = "hello@name.com"

    msg = MIMEMultipart()
    msg['From'] = SENDER_EMAIL
    msg['To'] = recipient_email
    msg['Subject'] = f"Invitation: {recipient_name}"

Render the HTML body

We attach the HTML content to our MIME message.

...
from email.mime.text import MIMEText

def send_simple_email(recipient_email, recipient_name):
    ...

    html_body = f"""
    <html>
        <body style="font-family: sans-serif;">
            <h2 style="color: #2c3e50;">Hello, {recipient_name}!</h2>
            <p>You are formally invited to participate as a speaker at our next event.</p>
            <p>This is a test email captured locally by <strong>Mailpit</strong>.</p>
        </body>
    </html>
    """
    msg.attach(MIMEText(html_body, 'html'))

Establish SMTP connection and transmit email data

Finally, we connect to the local Mailpit server and send the message.

import smtplib
...

def send_simple_email(recipient_email, recipient_name):
    ...

    SMTP_SERVER = "localhost"
    SMTP_PORT = 1025

    try:
        with smtplib.SMTP(SMTP_SERVER, SMTP_PORT) as server:
            server.send_message(msg)
            return True
    except Exception as e:
        print(f"❌ Error: {e}")
        return False

if __name__ == "__main__":
    ...
    print(f"\n📧 Starting email delivery to {len(participants)} recipients...")

    for name, email in participants:
        if send_simple_email(email, name):
            print(f" ✅ Sent: {email}")

    print("\n🚀 Check your emails at: http://localhost:8025")

The Complete Script

Here is the full implementation:

import smtplib
from faker import Faker
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

fake = Faker('en_US')

def send_simple_email(recipient_email, recipient_name):
    SENDER_EMAIL = "hello@name.com"

    msg = MIMEMultipart()
    msg['From'] = SENDER_EMAIL
    msg['To'] = recipient_email
    msg['Subject'] = f"Invitation: {recipient_name}"

    html_body = f"""
    <html>
        <body style="font-family: sans-serif;">
            <h2 style="color: #2c3e50;">Hello, {recipient_name}!</h2>
            <p>You are formally invited to participate as a speaker at our next event.</p>
            <p>This is a test email captured locally by <strong>Mailpit</strong>.</p>
        </body>
    </html>
    """
    msg.attach(MIMEText(html_body, 'html'))

    SMTP_SERVER = "localhost"
    SMTP_PORT = 1025

    try:
        with smtplib.SMTP(SMTP_SERVER, SMTP_PORT) as server:
            server.send_message(msg)
            return True
    except Exception as e:
        print(f"❌ Error: {e}")
        return False

if __name__ == "__main__":
    participants = [(fake.name(), fake.ascii_company_email()) for _ in range(10)]


    print(f"\n📧 Starting email delivery to {len(participants)} recipients...")

    for name, email in participants:
        if send_simple_email(email, name):
            print(f" ✅ Sent: {email}")

    print("\n🚀 Check your emails at: http://localhost:8025")

Viewing the Results

After running the script, navigate to http://localhost:8025 in your browser. You will find the Mailpit dashboard with an inbox containing all the successfully intercepted test emails.

Now you can safely test email features before deploying to production.

Manjaro: How to Reinstall GRUB on a BTRFS and UEFI System

Mario García — Thu, 22 Jan 2026 02:38:37 +0000

I'm running Manjaro alongside Windows on a Dell XPS 13 Plus (9320), using BTRFS for the / and /home mount points and a FAT32 partition for the EFI system. After an update, I lost the GRUB bootloader. In this tutorial, you'll learn how to reinstall GRUB on a BTRFS and UEFI system using a Live USB.

Identify Partitions

To begin, check which partitions belong to your Manjaro installation. Run the following command:

lsblk -f

You'll get an output like this:

NAME       MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0        7:0    0  92.9M  1 loop /run/miso/sfs/livefs
loop1        7:1    0   1.2G  1 loop /run/miso/sfs/mhwdfs
loop2        7:2    0   1.8G  1 loop /run/miso/sfs/desktopfs
loop3        7:3    0 939.3M  1 loop /run/miso/sfs/rootfs
sda          8:16   1  14.9G  0 disk /run/miso/bootmnt
sda1         8:17   1   4.2G  0 part
sda2         8:18   1     4M  0 part
nvme0n1    259:0    0 476.9G  0 disk
nvme0n1p1  259:1    0   240M  0 part
nvme0n1p2  259:2    0   128M  0 part
nvme0n1p3  259:3    0 299.2G  0 part
nvme0n1p4  259:4    0   1.2G  0 part
nvme0n1p5  259:5    0 104.3G  0 part
nvme0n1p6  259:6    0   300M  0 part
nvme0n1p7  259:7    0   1.1G  0 part
nvme0n1p8  259:8    0    52G  0 part
nvme0n1p9  259:9    0  17.1G  0 part
nvme0n1p10 259:10   0   1.4G  0 part

In my case, the partition layout is as follows:

nvme0n1p6 --> EFI system
nvme0n1p8 --> / partition
nvme0n1p5 --> /home partition

Mount the Top-level BTRFS Volume

As Manjaro is installed on a BTRFS partition, identify which subvolumes are available in the root partition to mount the system correctly.

First, mount the / partition:

sudo mount -o subvolid=5 /dev/nvme0n1p8 /mnt

Replace /dev/nvme0n1p8 with the corresponding / partition of your installation.

Now, list available subvolumes:

sudo btrfs subvolume list /mnt

You'll get an output like this:

ID 256 gen 87366 top level 5 path @
ID 257 gen 87347 top level 5 path @cache
ID 258 gen 87365 top level 5 path @log

If you have the /home directory in the same root partition, you may get the @home subvolume listed.

Mount the Correct Root Subvolume

Once identified, mount the correct root volume:

sudo umount /mnt
sudo mount -o subvol=@ /dev/nvme0n1p8 /mnt

Replace /dev/nvme0n1p8 with the corresponding / partition of your installation.

Create Mount Points

These folders should already exist; we are just ensuring they are present before mounting.

sudo mkdir -p /mnt/var/cache
sudo mkdir -p /mnt/var/log

Mount the Subvolumes

Mount the cache subvolume:

sudo mount -o subvol=@cache /dev/nvme0n1p8 /mnt/var/cache

Replace /dev/nvme0n1p8 with the corresponding / partition of your installation.

Mount the log subvolume:

sudo mount -o subvol=@log /dev/nvme0n1p8 /mnt/var/log

Replace /dev/nvme0n1p8 with the corresponding / partition of your installation.

(Optional) Mount the separate home partition:

sudo mount /dev/nvme0n1p5 /mnt/home

Replace /dev/nvme0n1p5 with the corresponding /home partition of your installation.

Now, verify that all necessary BTRFS subvolumes are mounted and that the mount options are correct (specifically, ensure they are listed as rw for read-write). Run the following command:

mount | grep btrfs

You'll get an output like this:

/dev/nvme0n1p8 on /mnt type btrfs (rw,relatime,ssd,discard=async,space_cache=v2,subvolid=256,subvol=/@)
/dev/nvme0n1p8 on /mnt/var/cache type btrfs (rw,relatime,ssd,discard=async,space_cache=v2,subvolid=257,subvol=/@cache)
/dev/nvme0n1p8 on /mnt/var/log type btrfs (rw,relatime,ssd,discard=async,space_cache=v2,subvolid=258,subvol=/@log)
/dev/nvme0n1p5 on /mnt/home type btrfs (rw,relatime,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/)

Mount the EFI Partition

Finally mount the EFI partition by running the following command:

sudo mount /dev/nvme0n1p6 /mnt/boot/efi

Replace /dev/nvme0n1p6 with the corresponding EFI partition of your installation.

Bind System Directories

Next, bind the system directories. These 'virtual' folders act as a bridge between the Live USB and your Manjaro installation, allowing the repair tools to interact with your hardware:

sudo mount --bind /dev  /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys  /mnt/sys
sudo mount --bind /run  /mnt/run

Chroot and Reinstall GRUB

Now that all subvolumes and partitions are mounted, and system directories are bound, you can access your Manjaro installation and reinstall GRUB. Run the following commands:

sudo chroot /mnt
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=Manjaro --recheck
grub-mkconfig -o /boot/grub/grub.cfg

GRUB is reinstalled and the bootloader would appear after rebooting.

SVG: Image Editing and Convertion with Python

Mario García — Mon, 20 Oct 2025 23:23:40 +0000

For creating or editing images for social media, I generally use any of the following tools:

I usually prefer SVG over other formats, and have some templates for Instagram, that I modify whenever I have to announce a new blog post.

There's a GitLab repository where I upload JSON files with the content of every new post, where there's a Python script that takes care of posting to DEV, then shares the link on a new Post on X. I needed another script for editing the following image:

Image Editing

In the SVG, there's a <tspan> element with id='tspan1' that contains the text to modify Text. To edit it, you can use the xml.etree.ElementTree Python module as follows:

import subprocess
import os
import tempfile
import xml.etree.ElementTree as ET

file_name = "new-post.svg"
new title = "SVG: Image Editing and Convertion with Python"

with open(file_name, 'r', encoding='utf-8') as file:
    svg_content = file.read()

root = ET.fromstring(svg_content)

target_element = root.find(".//*[@id='tspan1']")

target_element.text = new_title

modified_svg = ET.tostring(root, encoding='unicode')

Previous code block does:

Read the content of the SVG file
Treat the content of the SVG file as XML with the ElementTree module
Find the <tspan> element
Replace the title
Convert the modified XML back to string

Image Convertion

Now to convert the modified SVG to PNG:

output_file = "new_post.png"
width = 1080
height = 1080

with tempfile.NamedTemporaryFile(mode='w', suffix='.svg', delete=False, encoding='utf-8') as temp_svg:
    temp_svg.write(modified_svg)
    temp_svg_path = temp_svg.name

command = [
    "inkscape",
    temp_svg_path,
    f"--export-filename={output_file}",
    f"--export-width={width}",
    f"--export-height={height}"
]

result = subprocess.run(command, check=True, capture_output=True, text=True)

os.remove(temp_svg_path)

Previous code does:

Create a temporary SVG file with the content of the modified_svg variable
Run Inkscape from the command line to convert the file to PNG
Remove the temporary file

Once you run previous script, you will get the following picture in the same directory:

Local Firestore Development with the gcloud Emulator

Mario García — Mon, 13 Oct 2025 07:13:27 +0000

Worked on a dashboard for a pharmacy, deployed on GitLab Pages, for which I chose the following stack:

React
TypeScript
Cloud Firestore

Previously used Firestore to enable Views and Likes for Blowfish, a Hugo theme I configured for my website. This time I needed a free cloud database solution to store information about:

Locations
Medicines
Orders
Sales
Users

On the free Spark Plan of Firebase, you can't create a second database within the same project if you're planning to use it for running tests against it instead of the default database. One workaround is to create a separate project, destined to be used for testing.

A better solution would be to use the Firestore Emulator, intended to use for local testing, provided by the Google Cloud CLI.

Install Google Cloud CLI

Follow the instructions from the documentation.

On Linux, download the corresponding file.

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-linux-x86_64.tar.gz

Extract the contents of the file.

tar -xf google-cloud-cli-linux-x86_64.tar.gz

Run the installation script.

./google-cloud-sdk/install.sh

Update your gcloud CLI installation to get the latest features.

gcloud components update

Backup your database

Create a directory to store the backup and credentials.
```
mkdir local-firestore
cd local-firestore
```
Install firestore-backfire.
```
npm install -g firestore-backfire
```
Get Service Account credentials.
- Go to the Firebase Console → Project Settings → Service Accounts.
- Click Generate new private key and download the JSON file and rename it as credentials.json.
- Copy the file to the directory created above.
Export all documents from specific collections to a local .ndjson file.
```
backfire export ./data.ndjson \
  -P your-project-id \
  -K ./credentials.json \
  --paths users products
```
The name assigned to the backup of your database is data.ndjson.

Where your-project-id is the ID of your project that you can get from the Firebase console.

The Service Account credentails are in the credentials.json file downloaded before.

users and products are the collections being backed up. Replace with the collections in your database.

Run the emulator

Run the emulator with the following command.

gcloud emulators firestore start --host-port=127.0.0.1:8304

Restore your database

To restore your database from the data.ndjson file, run the following command.

backfire import ./data.ndjson \
  -E 127.0.0.1:8304 \
  --mode overwrite \
  -P your-project-id

Update your app

I have a .env.local file with the following variables set.

VITE_FIREBASE_API_KEY=""
VITE_FIREBASE_AUTH_DOMAIN=""
VITE_FIREBASE_PROJECT_ID=""
VITE_FIREBASE_STORAGE_BUCKET=""
VITE_FIREBASE_MESSAGING_SENDER_ID=""
VITE_FIREBASE_APP_ID=""

I added the following ones:

VITE_USE_EMULATOR=true
VITE_FIRESTORE_EMULATOR_HOST=127.0.0.1
VITE_FIRESTORE_EMULATOR_PORT=8304

On the firebase.ts file, import connectFirestoreEmulator from firebase/firestore.

import {
  getFirestore,
  Firestore,
  collection,
  doc,
  getDoc,
  setDoc,
  updateDoc,
  deleteDoc,
  query,
  where,
  getDocs,
  onSnapshot,
  writeBatch,
  connectFirestoreEmulator
} from 'firebase/firestore';

After these lines:

    app = initializeApp(firebaseConfig);
    auth = getAuth(app);
    db = getFirestore(app);

I added the following lines to validate when the project is running locally.

    if (import.meta.env.VITE_USE_EMULATOR === 'true') {
      const EMULATOR_HOST = import.meta.env.VITE_FIRESTORE_EMULATOR_HOST || "127.0.0.1";
      const EMULATOR_PORT = Number(import.meta.env.VITE_FIRESTORE_EMULATOR_PORT) || 8304;

      connectFirestoreEmulator(db, EMULATOR_HOST, EMULATOR_PORT);
      console.log(`Firebase.ts: Connected to Firestore Emulator at http://${EMULATOR_HOST}:${EMULATOR_PORT}`);
    }

Now when the app is run locally, it will connect to the Firestore emulator.