DEV Community

Cover image for Serverless Research Paper Intelligence: Docling, Lambda Containers, and Amazon Bedrock

Serverless Research Paper Intelligence: Docling, Lambda Containers, and Amazon Bedrock

1.🚀 Introduction

Processing scientific PDFs is not as simple as extracting text.

Many papers include tables, multiple columns, formulas, figures, and structures that can easily break when we use traditional extractors.
The problem becomes even bigger when those documents are private. We do not always want to depend completely on multimodal models to analyze them, and the cost can also grow quickly when we work with many files.

A few months ago, I attended PyData Berlin and during one of the talks I discovered IBM Docling, an open source project focused on intelligent document processing. What caught my attention the most was its ability to extract structured information from complex PDFs, especially scientific documents with tables, multiple columns, formulas, and layouts that are difficult to process with traditional tools.

From that moment, I started thinking about how to bring this type of processing to the cloud in a simple and scalable way, while also keeping costs under control. Some current solutions for analyzing complex documents with generative AI rely heavily on multimodal models, but in scenarios where we work with large volumes of papers or private documents, cost and privacy can quickly become a problem.

If you have read some of my previous articles, you have probably seen that I like to build content around a real use case. In this tutorial, I decided to work with scientific papers related to research on GLP-1 receptor agonists, a class of medications widely studied for type 2 diabetes and obesity.

These treatments are currently very popular because many people use them for weight loss purposes.


The objective of the tutorial

The idea is not to build a generic search engine over the internet, but something much more interesting: a private knowledge base where you can query only your own research documents in a secure environment.
To solve this, we are going to build an architecture based on:

  • 📦 AWS Lambda Containers
  • 📑 Amazon Bedrock Knowledge Bases
  • 🐣 PDF processing with Docling
  • 🪣 Storage in Amazon S3
  • ✂️ Chunking strategies to improve information retrieval

During the tutorial, I will also show several real problems that I found while implementing this solution:

  • 〰️ size limits in Lambda,
  • 〰️ timeouts caused by model downloads,
  • 〰️ Docker image optimization,
  • 〰️ scientific document processing,
  • 〰️ and architecture decisions to keep a serverless and low cost approach.

The final objective will be to transform a set of scientific papers into a knowledge base that can be queried using natural language. This will allow us to ask questions about adverse effects, clinical criteria, study results, and comparisons between different research papers.


2.🧪 Use case

In this tutorial, we are going to work with a set of scientific papers related to research on GLP-1 receptor agonists (Glucagon-Like Peptide-1), a natural hormone involved in glucose regulation, insulin secretion, and the feeling of fullness.

In recent years, different treatments based on this family of molecules have appeared, and a large number of clinical studies, academic papers, and research documents have been published. These documents are related to cardiovascular outcomes, weight loss, adverse effects, and inclusion or exclusion criteria in clinical trials.

The objective of this use case is not to build a search engine over the internet or use public information in real time. The idea is to work with a private and curated set of scientific documents, simulating a scenario where researchers, medical teams, or research areas need to query only their own papers in a secure environment.

For this MVP, I am going to use 10 public papers as an example dataset, but the architecture is designed for scenarios where the documents can be private or belong to internal research processes.
From these documents, we are going to build a knowledge base that allows queries using natural language, for example:

  • 〰️ identify adverse effects reported in different studies,
  • 〰️ compare results between treatments,
  • 〰️ validate exclusion criteria in clinical trials,
  • 〰️ analyze cardiovascular outcomes,
  • 〰️ retrieve specific information across multiple scientific papers.

3. 🏗️ Solution Architecture

Before going into the theoretical concepts, we are going to describe the solution that we will build.

This solution is based on a serverless architecture that processes scientific papers in PDF format and later uses them as input for an Amazon Bedrock Knowledge Base to build a RAG system.

The architecture clearly separates the ingestion and processing flow from the intelligent query flow, while keeping the solution simple and scalable.

The following blueprint shows how each component connects inside the complete pipeline.

In summary, this pipeline processes PDF files using a Python Docker image with Docling, running inside an AWS Lambda function based on a container image.

This Lambda function transforms the files into structured documents in Markdown.

Then, these documents are stored in Amazon S3 and indexed by Amazon Bedrock, which generates embeddings and allows semantic queries over the content.


4. 📑 Docling: structured document extraction

One of the main challenges when working with scientific PDFs is that they are not “simple” documents. They are full of tables, columns, formulas, figures, and complex layouts that are not always preserved correctly when text is extracted.

IBM Docling is an open source library designed for PDF extraction and document structuring. Its goal is not only to extract text, but also to convert complex documents into a structured representation that can be used in artificial intelligence pipelines and RAG systems.

Instead of returning messy plain text, Docling tries to preserve the structure of the document, including the reading order, tables, formulas, images, and other key elements of the content.

The following image summarizes some of the key benefits of using Docling for complex document processing.


Why use Docling?

Traditional tools like PyPDF, PDFPlumber, or classic OCR are usually enough for simple documents, but they can struggle when working with scientific papers that have complex layouts.

In these cases, important information can be lost, such as:

  • 〰️table structure
  • 〰️ column separation
  • 〰️relationship between text and figures
  • 〰️mathematical formulas

Docling appears as an alternative that tries to solve exactly these problems, generating a much more consistent output for later analysis.


Docling features

Below, you can find the main features published by the library on Hugging Face:

  1. 🏷️ DocTags for Efficient Tokenization – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with DoclingDocuments.
  2. 🔍 OCR (Optical Character Recognition) – Extracts text accurately from images.
  3. 📐 Layout and Localization – Preserves document structure and document element bounding boxes.
  4. 💻 Code Recognition – Detects and formats code blocks including identation.
  5. 🔢 Formula Recognition – Identifies and processes mathematical expressions.
  6. 📊 Chart Recognition – Extracts and interprets chart data.
  7. 📑 Table Recognition – Supports column and row headers for structured table extraction.
  8. 🖼️ Figure Classification – Differentiates figures and graphical elements.
  9. 📝 Caption Correspondence – Links captions to relevant images and figures.
  10. 📜 List Grouping – Organizes and structures list elements correctly.
  11. 📄 Full-Page Conversion – Processes entire pages for comprehensive document conversion including all page elements (code, equations, tables, charts etc.)
  12. 🔲 OCR with Bounding Boxes – OCR regions using a bounding box.
  13. 📂 General Document Processing – Trained for both scientific and non-scientific documents.

🏥 Practical example: processing a medical record with Docling

In this example, we will use a synthetically generated clinical record in PDF format to show how Docling can extract and structure information from a healthcare document.

All patient data, medical records, and clinical findings are completely fictional and were created only for educational purposes. No real patient information was used.

This example represents a common use case in the healthcare industry, where medical documents need to be processed, structured, and prepared for analysis with AI.

In the next steps, we will use Docling to:

  • 〰️ load and convert the PDF
  • 〰️ explore the document structure and identify sections
  • 〰️ extract structured patient data into a pandas DataFrame

📌The following image shows part of the clinical record that we will process in this example.


📑 Loading and converting the PDF

In this step, we load the clinical record PDF using Docling's DocumentConverter.

Docling automatically detects the document structure and exports the result in two formats:

  • 〰️Markdown: a human readable output to preview the content
  • 〰️Dictionary: programmatic access to text, tables, images, and metadata

This structured output is what makes Docling more powerful than a basic PDF text extractor.

from docling.document_converter import DocumentConverter, PdfFormatOption
import pandas as pd
converter = DocumentConverter()
result = converter.convert("clinical_history_structured.pdf")
# export markdown
data_markdown = result.document.export_to_markdown()
# export dict
data_dict = result.document.export_to_dict()
texts = data_dict['texts']
Enter fullscreen mode Exit fullscreen mode

🗂️ Exploring document sections

Every clinical record is organized into sections. Here, we extract all the section headers detected by Docling, such as Patient Identification, Chief Complaint, and Laboratory Results.

This gives us:

  • 〰️ A map of the document structure
  • 〰️ The ability to target specific sections for downstream processing
[item['text'] for item in data_dict['texts']  if item['label'] == 'section_header']
Enter fullscreen mode Exit fullscreen mode
['CITYVIEW MEDICAL CENTER CLINICAL HISTORY AND RECORD',
 '1. PATIENT IDENTIFICATION',
 '4. PAST MEDICAL HISTORY',
 '5. MEDICATIONS',
 '6. ALLERGIES',
 '2. CHIEF COMPLAINT',
 '3. HISTORY OF PRESENT ILLNESS',
 '7. FAMILY HISTORY',
 '8. SOCIAL HISTORY',
 '9. REVIEW OF SYSTEMS',
 '10. PHYSICAL EXAMINATION',
 '12. LABORATORY RESULTS',
 '13. ASSESSMENT',
 '14. PLAN',
 '11. IMAGING']
Enter fullscreen mode Exit fullscreen mode

🧩 Extracting patient data as a structured table

Now we extract the content of the first section, Patient Identification, by filtering the items that belong to #/groups/0.

Docling preserves the key value layout of the original PDF, so we can split the flat list into field names and values using Python slice notation.

The result is a clean pandas DataFrame ready for:

  • 〰️ Analysis
  • 〰️ Storage
  • 〰️ Downstream AI processing
# Filter group 0
group_0 = [item['orig'] for item in texts 
           if item.get('parent', {}).get('$ref') == '#/groups/0']
# Convert the flat list into key value pairs
keys   = group_0[0::2]  
values = group_0[1::2]  
df = pd.DataFrame({
    'field': [k.replace(':', '').strip() for k in keys],
    'value': values
})
df
Enter fullscreen mode Exit fullscreen mode


result.document.tables[0].export_to_dataframe()
Enter fullscreen mode Exit fullscreen mode


5. ⚡ AWS Lambda

AWS Lambda is a serverless service that allows you to run code without managing infrastructure. It scales automatically and you only pay for what you use.
It is commonly used for file processing, service integration, scheduled tasks, and real time event processing.

However, even though it is one of the most used services in the serverless ecosystem, some limitations appear quickly when we start working with heavier workloads or complex dependencies.
Some of the main limitations are:

  • 〰️ memory and CPU limits
  • 〰️ maximum execution timeout
  • 〰️ deployment package size restrictions
  • 〰️ the need to use ZIP files or Layers for dependencies
  • 〰️ cold starts in heavier workloads These restrictions mean that, in some cases, traditional Lambda is not enough to run workloads such as intensive PDF processing or libraries with large dependencies.

6. 🐳 AWS Lambda Containers

To solve part of these limitations, AWS Lambda allows you to run functions using container images instead of ZIP packages.
This approach allows you to package the function as a Docker image, push it to Amazon Elastic Container Registry, and run it directly from Lambda.
The main advantage is that it significantly increases the size limit, up to 10 GB. This makes it possible to include heavy dependencies, predownloaded models, or complex libraries like Docling without needing workarounds with Layers.
In this project, this option is key because it allows us to run Docling inside Lambda without compromising dependencies or the runtime.
The following image summarizes the key benefits of using Lambda Containers for this type of workload.


Deploying a Docling Lambda Container to AWS

As we saw in the previous section, the limitations of traditional Lambda make it difficult to run heavy libraries like Docling using ZIP packages or Layers.
To solve this, we are going to run AWS Lambda from a container image. This allows us to package Docling, its dependencies, and its models inside a Docker image, and deploy it using Amazon Elastic Container Registry (ECR).
In this section, we are going to build the image, push it to AWS, and use it inside Lambda to process our scientific papers.
The following image shows the deployment flow that we will follow step by step.


Prerequisites

Before starting, you need to have:

  1. AWS CLI installed and configured
  2. Docker installed with buildx support
  3. Repository cloned locally
  4. Amazon S3 bucket named docling-papers-tutorial, with the PDFs that we are going to process already uploaded
  5. You also need an IAM user with permissions to create images in ECR and deploy Lambda functions. In the repository, you will find JSON files with the required policies inside iam/user_policies.

Github repository

GitHub logo RominaElenaMendezEscobar / docling-bedrock-research-rag

Serverless RAG pipeline for turning research papers into a private knowledge base using Docling, AWS Lambda Containers, Amazon S3, and Amazon Bedrock.

Buy Me A Coffee


Serverless Research Paper Intelligence: Docling, Lambda Containers, and Amazon Bedrock

01-preview


1.🚀 Introduction

The objective of the tutorial

The idea is not to build a generic search engine over the internet, but something much more interesting: a private knowledge base where you can query only your own research documents in a secure environment. To solve this, we are going to build an architecture based on:

  • 📦 AWS Lambda Containers
  • 📑 Amazon Bedrock Knowledge Bases
  • 🐣 PDF processing with Docling
  • 🗑️ Storage in Amazon S3
  • ✂️ Chunking strategies to improve information retrieval

During the tutorial, I will also show several real problems that I found while implementing this solution:

  • 〰️ size limits in Lambda,
  • 〰️ timeouts caused by model downloads,
  • 〰️ Docker image optimization,
  • 〰️ scientific document processing,
  • 〰️and architecture decisions to keep a serverless and low cost approach.

The final objective will be to transform a set of scientific papers into…


Building the Docker image

Once the repository is cloned, we start by configuring the environment variables required for the deployment.

•••••

Setup

Create a .env file with your credentials:

AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_DEFAULT_REGION=us-east-1
Enter fullscreen mode Exit fullscreen mode

Then export the variables:

export $(grep -v '^#' .env | xargs)
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity \
--query Account --output text)
export ECR_REPO_NAME=docling-lambda
export LAMBDA_FUNCTION_NAME=docling-lambda
export IMAGE_NAME=docling-lambda
Enter fullscreen mode Exit fullscreen mode

⚠️ Remember to add .env to your .gitignore.

•••••

Step 1: Verify your AWS identity

Before deploying, verify which AWS account and IAM user are currently configured in your environment.

aws sts get-caller-identity
Enter fullscreen mode Exit fullscreen mode
•••••

Step 2: Authenticate Docker with Amazon ECR

This command generates a temporary ECR authentication token and passes it to docker login, so Docker can push images to your private ECR registry.

aws ecr get-login-password --region $AWS_DEFAULT_REGION | \
  docker login --username AWS --password-stdin \
  $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

⚠️ This token expires after 12 hours. Run this step again if you get authentication errors.

•••••

Step 3: Build the Docker image

Now we build the Docker image from the Dockerfile.

docker buildx build \
  --platform linux/amd64 \
  --provenance=false \
  --sbom=false \
  --no-cache \
  --load \
  -t $IMAGE_NAME .
Enter fullscreen mode Exit fullscreen mode

The most important flags are:

Flag Description
--platform linux/amd64 Forces the x86_64 architecture required by AWS Lambda. This is required if you are building on an Apple Silicon Mac, such as M1, M2, or M3.
--provenance=false Disables build attestation metadata, which can cause issues with Lambda image deployments.
--sbom=false Disables Software Bill of Materials generation, which can also cause issues with Lambda deployments.
--no-cache Builds the image from scratch, ignoring cached layers.
--load Loads the image into your local Docker daemon after building.
-t $IMAGE_NAME Tags the image with the selected image name.
•••••

Step 4: Tag the image for ECR

Before pushing the image, we need to create a new tag that points to the full ECR repository URI.
Docker requires the image name to match the complete ECR URI before it can push the image to the registry.

docker tag $IMAGE_NAME:latest \
$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$ECR_REPO_NAME:latest
Enter fullscreen mode Exit fullscreen mode
•••••

Step 5: Verify that the image exists locally

Before pushing the image to ECR, confirm that it exists in your local Docker environment.

docker images
Enter fullscreen mode Exit fullscreen mode

The image should appear with both tags: the local tag and the ECR tag.

•••••

Step 6: Push the image to ECR

Now we push the image to your private ECR repository.

This step may take several minutes because the Docling image is large due to the ML models included inside the container.

docker push \
$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$ECR_REPO_NAME:latest
Enter fullscreen mode Exit fullscreen mode
•••••

Step 7: Update the Lambda function

Run this step only if you need to update an existing Lambda function with a new image version.

aws lambda update-function-code \
--function-name $LAMBDA_FUNCTION_NAME \
--image-uri $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$ECR_REPO_NAME:latest
Enter fullscreen mode Exit fullscreen mode

This command tells AWS Lambda to use the new image that you just pushed to ECR.
Lambda will pull the image from ECR and deploy it automatically.


7. 🧯 Real problems during the deployment

What I had to solve to run Docling on Lambda

Up to this point, the flow looks relatively simple: build the image, push it to ECR, and deploy the Lambda function.
However, when working with heavy libraries like Docling, several problems started to appear. These problems were related to the image size, the Lambda runtime, and the download of models during execution.
This section summarizes some of the real problems I found during the implementation and the solutions I finally applied.

•••••

Reducing the image size

One of the first problems I ran into was related to the Docker image size. When working with libraries like Docling, which include ML models and multiple heavy dependencies, the final image can grow considerably.
To avoid issues during the build and push process, I added a cleanup step inside the Dockerfile to remove temporary files, __pycache__ folders, and compiled .pyc files.

# Clean up temporary files to reduce image size
RUN find /var/lang/lib/python3.12/site-packages \
-type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true && \
find /var/lang/lib/python3.12/site-packages \
-type f -name "*.pyc" -delete
Enter fullscreen mode Exit fullscreen mode

Although this may look like a small optimization, this type of cleanup helps reduce the final size of images with many Python dependencies.

•••••

Avoiding timeouts and model downloads at runtime

Another important problem appeared during the first executions of the Lambda function.

In the version used in this project, Docling tried to automatically download the models at startup if they were not found locally. This caused timeouts and also created another issue: the Lambda filesystem is read only outside the temporary directory,
which means models cannot be downloaded or saved there at runtime.

To solve this, I decided to predownload the models during the Docker build and store them directly inside the image.

In the Dockerfile, I added the following:

# Copy and run model download script
COPY download_models.py /tmp/download_models.py

RUN mkdir -p /opt/docling-models && \
python3.12 /tmp/download_models.py && \
rm /tmp/download_models.py
Enter fullscreen mode Exit fullscreen mode

The script initializes a DocumentConverter, which forces the required Docling models to be downloaded during the image build instead of during Lambda execution.

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from pathlib import Path

def main():

   artifacts_path = Path("/opt/docling-models")

   pipeline_options = PdfPipelineOptions(
       artifacts_path=artifacts_path,
       do_ocr=False
   )

   converter = DocumentConverter(
       format_options={
           InputFormat.PDF: PdfFormatOption(
               pipeline_options=pipeline_options
           )
       }
   )
if __name__ == "__main__":
   main()
   print("✓ Models downloaded successfully")
Enter fullscreen mode Exit fullscreen mode

With this approach, the models are packaged inside the container and the Lambda function can start much faster, avoiding unnecessary downloads and problems related to the restricted filesystem.


8. 🔁 Orchestrated paper processing

The following function corresponds to the orchestration Lambda. Its goal is to list the papers stored in Amazon S3 and run the processing by invoking the docling-lambda function, which contains the Docker image with Docling.
In this case, the processing is done in a distributed way. Each PDF file is sent individually to the Lambda function responsible for converting the document into Markdown.
In the repository, you will find an implementation similar to the following:

import boto3
import json
from botocore.config import Config

s3 = boto3.client("s3")

lambda_client = boto3.client(
   "lambda",
   config=Config(
       read_timeout=900,
       connect_timeout=10,
       retries={"max_attempts": 0}
   )
)

BUCKET = "docling-papers-tutorial"
DOCLING_LAMBDA = "docling-lambda"

def lambda_handler(event, context):

   # List PDFs
   response = s3.list_objects_v2(
       Bucket=BUCKET
   )

   pdfs = [
       obj["Key"]
       for obj in response.get("Contents", [])
       if obj["Key"].endswith(".pdf")
   ]

   print(f"PDFs found: {len(pdfs)}")

   results = []

   for pdf_key in pdfs:

       s3_url = f"s3://{BUCKET}/{pdf_key}"

       print(f"Processing: {s3_url}")

       response = lambda_client.invoke(
           FunctionName=DOCLING_LAMBDA,
           InvocationType="RequestResponse",
           Payload=json.dumps({"s3_url": s3_url})
       )

       result = json.loads(response["Payload"].read())

       results.append({
           "input": s3_url,
           "output": result.get("output"),
           "status": result.get("status")
       })

       print(f"{pdf_key}{result.get('output')}")

   return {
       "processed": len(results),
       "results": results
   }
Enter fullscreen mode Exit fullscreen mode

Once the function is deployed, we can execute the Lambda function and analyze the results from CloudWatch.


In my tests with these 10 papers, the average was approximately 3.8 seconds per page. This can vary significantly depending on document complexity

This confirms something important: the processing time depends much more on the complexity of the content, such as tables, images, multiple columns, or figures, than on the file size or the number of pages.


9. 🟩 Amazon Bedrock Knowledge Base

Now we are going to build our knowledge base, but first it is important to understand what a Knowledge Base is inside Amazon Bedrock and why it is key in this type of RAG architecture.

•••••

What is a Knowledge Base?

In simple terms, a Knowledge Base is a layer that connects private data with artificial intelligence models, so they can use that information as context to answer questions.

•••••

What is a Knowledge Base in Amazon Bedrock?

In Amazon Bedrock, a Knowledge Base is a fully managed service that allows you to build RAG systems over your own data.
This means that models can query information stored in a knowledge base to generate more accurate and contextualized answers based on private data.
The following image summarizes the key benefits of using Amazon Bedrock Knowledge Bases in this type of architecture.

Also, it includes capabilities such as:

  • 〰️ automatic embedding management
  • 〰️ context management
  • 〰️ source attribution in the answers
  • 〰️ direct integration with private data
•••••

Supported data sources

A Knowledge Base in Amazon Bedrock can connect to different data sources:

  • 〰️ 🪣 Amazon S3
  • 〰️ 🟦 Confluence depending on availability
  • 〰️ ☁️ Salesforce depending on availability
  • 〰️ 📑 Custom data sources
  • 〰️ 🕸️ Web Crawler

Availability can depend on the AWS Region and account configuration.

In this use case, we are mainly going to work with Amazon S3, where we store the documents processed with Docling.

•••••

Chunking: how information is divided

One of the most important concepts when building a Knowledge Base is chunking, which is the process of dividing documents into smaller parts called chunks.
This is necessary because models have context limitations and cannot process very long documents all at once.

We can understand this from two perspectives:

  • 〰️ Context limit: models can only handle a limited number of tokens
  • 〰️ Efficient search: dividing the content allows the system to retrieve more precise information faster In this project, chunking is key because we are working with scientific papers, where the context between sections is very important, for example: results, methods, and adverse effects. The following image describes the main chunking features available in Amazon Bedrock Knowledge Bases.

•••••

Step by step configuration

Creating the Knowledge Base in Amazon Bedrock

In this section, we are going to configure the Knowledge Base in Amazon Bedrock using the papers processed with Docling and stored in Amazon S3 in Markdown format.
The chunking strategy selected for this use case is Hierarchical Chunking, because it allows us to keep the relationship between document sections, for example results, methods, or adverse effects. This is key when working with scientific papers.

Below, I explain why I did not choose the other strategies and what each one implies:

  • 〰️ ❌ Default: uses the default chunking configuration, which may split content without preserving the full document structure.
  • 〰️ ❌ Fixed size: similar to the default strategy, but configurable. It still has the same problem of losing context.
  • 〰️ ❌ Semantic: groups content by semantic similarity. It can be useful, but it may add extra processing time and can be less predictable depending on the documents.
  • 〰️ ❌ No chunking: useful when documents are already small or manually preprocessed into meaningful units.
  • 〰️ ✅ Hierarchical: keeps a parent child structure, allowing each chunk to preserve its context inside the document.
•••••

Prerequisites and permissions

To create a Knowledge Base, you need to consider the following permissions:

  • 〰️ IAM: create or select roles with the right permissions
  • 〰️ Bedrock: access to Knowledge Bases and embedding models
  • 〰️ S3: access to the bucket where the processed documents are stored
  • 〰️ KMS: optional, for data encryption
  • 〰️ Lambda: optional, for custom data transformations

⚠️ AWS does not support creating a Knowledge Base using root user credentials — you need an IAM user or role with the right permissions. Permission configuration is usually one of the most delicate parts of this type of architecture.

_______________

Step 1: Create the Knowledge Base

In Amazon Bedrock, go to the Knowledge Bases section and select Create knowledge base with vector store.
Complete the configuration:

  • Name: docling-glp1-papers-kb
  • Description: Knowledge base with GLP-1 papers processed with Docling and Lambda
  • IAM Role: AmazonBedrockExecutionRoleForKnowledgeBase-docling
  • Data source: Amazon S3

_______________

Step 2: Configure the data source

Configure the data source with the following values:

  • 〰️ Source name: docling-glp1-papers-ds
  • 〰️ S3 path: s3://docling-papers-tutorial/output/
  • 〰️ Parsing strategy: Amazon Bedrock default parser
  • 〰️ Chunking strategy: Hierarchical chunking

_______________

Step 3: Vector store and embeddings

Here, we are going to select the model that we will use to create the RAG system and the destination where the information will be stored.

  • 〰️ Embeddings model: Titan Text Embeddings v2
  • 〰️ Vector store: Amazon S3 Vectors

In this case, we use on demand mode, although other models are available depending on the use case.
After that, we select the Amazon S3 bucket used by S3 Vectors to store the vector index.

To better understand how this type of storage works, you can check a previous article I wrote:


_______________

Step 4: Data synchronization

Once the Knowledge Base is created, its initial status will be Available.
To load the documents, you need to run a manual synchronization:

  • 〰️ Go to the Knowledge Base
  • 〰️ Select the data source
  • 〰️ Click on Sync This synchronization processes the documents, generates the required embeddings, and makes the content available for natural language queries.


10. 🟩 BEDROCK: Test the Knowledge Base

Now we return to the point where we left off a few steps ago: testing our Knowledge Base with the processed papers.

The idea in this stage is to validate whether the system can retrieve relevant information from the 10 scientific papers that we previously loaded and processed.

To do this, we are going to ask some questions focused on clinical analysis and study comparison:

  • 〰️ What gastrointestinal adverse effects were reported in semaglutide clinical trials and what were the incidence rates?

  • 〰️ What were the cardiovascular outcomes reported in semaglutide clinical trials and which patient populations benefited most?

  • 〰️ How does semaglutide compare to liraglutide and tirzepatide in terms of weight loss efficacy and adverse effects across the clinical trials?

These queries allow us to evaluate how the system retrieves specific information across different studies, especially in scenarios where the results are distributed across multiple documents.


11. 🎯 Conclusions

This MVP shows that it is possible to build a queryable knowledge base over private scientific documents using AWS serverless services together with open source tools like Docling.

What I learned while building this system:

  • 〰️ The chunking strategy matters more than it may seem. In the case of scientific papers, Hierarchical Chunking preserves the context between sections such as Results or Adverse Effects better than fixed token based strategies.
  • 〰️ Docling can help reduce the cost and complexity of preprocessing when working with complex PDFs, especially those with tables, columns, and non linear structures. It allows us to convert these documents into structured information ready to be used in AI systems. *〰️ Embeddings are not the same as security. Even though we work with vector representations, research has shown that in some scenarios it is possible to infer or reconstruct sensitive information from embedding vectors. Because of this, treating vector stores as sensitive data and applying access controls and encryption is a good practice in real scenarios.

If we take this to a production environment, three pieces become fundamental:

  • 〰️ CI/CD pipelines are necessary to automate processing and system updates as improvements are added.
  • 〰️ Infrastructure as Code with Terraform, or similar tools, is key to replicate, scale, and maintain the environment consistently across different stages.
  • 〰️ Any solution that is deployed, especially one that uses AI models, should include observability systems to detect and solve problems in production. In terms of impact, this type of solution opens a very relevant space in industries such as healthcare and research, where controlled access to large volumes of knowledge can significantly accelerate scientific analysis and decision making.

Finally, beyond the tools used, the most interesting part of this architecture is how it combines different cloud services and generative AI capabilities to solve a very concrete problem: converting unstructured information into accessible, private, and queryable knowledge using natural language.



12. 📚 Technical references

  1. Amazon Web Services. (n.d.). AWS Lambda Developer Guide. AWS Documentation. Retrieved May 26, 2026, from https://docs.aws.amazon.com/es_es/lambda/latest/dg/welcome.html

  2. Amazon Web Services. (n.d.). Create a Lambda function using a container image. AWS Documentation. Retrieved May 26, 2026, from https://docs.aws.amazon.com/lambda/latest/dg/images-create.html

  3. Amazon Web Services. (n.d.). Amazon Bedrock Knowledge Bases. Retrieved May 26, 2026, from https://aws.amazon.com/es/bedrock/knowledge-bases/

  4. IBM. (n.d.). Docling. Retrieved May 26, 2026, from https://www.docling.ai/

  5. Docling Project. (n.d.). SmolDocling 256M preview. Hugging Face. Retrieved May 26, 2026, from https://huggingface.co/docling-project/SmolDocling-256M-preview

  6. University of Utah Health. (2026). GLP 1 FAQs answered by weight loss experts. Retrieved from https://healthcare.utah.edu/healthfeed/2026/03/preguntas-frecuentes-sobre-el-glp-1-respondidas-por-expertos-en-perdida-de-peso


13. 📄 Research papers used in the use case

  1. Han, S. H., Safeek, R., Ockerman, K., Trieu, N., Mars, P., Klenke, A., Furnas, H., & Sorice Virk, S. (2023). Public interest in the off label use of glucagon like peptide 1 agonists (Ozempic) for cosmetic weight loss: A Google Trends analysis. Aesthetic Surgery Journal. https://doi.org/10.1093/asj/sjad211

  2. Ryan, N., & Savulescu, J. (2026). The ethics of Ozempic and Wegovy. Journal of Medical Ethics, 52(3), 185–193. https://doi.org/10.1136/jme-2024-110374

  3. Mailhac, A., Pedersen, L., Pottegård, A., Søndergaard, J., Mogensen, T., Sørensen, H. T., & Thomsen, R. W. (2024). Semaglutide (Ozempic®) use in Denmark 2018 through 2023: User trends and off label prescribing for weight loss. Clinical Epidemiology. https://doi.org/10.2147/CLEP.S456170

  4. Manoharan, S. V. R. R., & Madan, R. (2024). GLP 1 agonists can affect mood: A case of worsened depression on Ozempic (Semaglutide). Case Reports in Psychiatry. https://pmc.ncbi.nlm.nih.gov/articles/PMC11208009/

  5. Humphrey, C. D., & Lawrence, A. C. (2023). Implications of Ozempic and other semaglutide medications for facial plastic surgeons. Facial Plastic Surgery. https://doi.org/10.1055/a-2148-6321

  6. Pillarisetti, L., & Agrawal, D. K. (2025). Semaglutide: Double edged sword with risks and benefits. Archives of Internal Medicine Research, 8(1), 1–13. https://doi.org/10.26502/aimr.0189

  7. Fong, S., Carollo, A., Lazuras, L., Corazza, O., & Esposito, G. (2024). Ozempic (Glucagon like peptide 1 receptor agonist) in social media posts: Unveiling user perspectives through Reddit topic modeling. Dialogues in Health. https://www.sciencedirect.com/science/article/pii/S2667118224000163

  8. Carboni, A., Woessner, S., Martini, O., Marroquin, N. A., & Waller, J. (2024). Natural weight loss or “Ozempic Face”: Demystifying a social media phenomenon. Journal of Drugs in Dermatology, 23(1). https://doi.org/10.36849/JDD.7613

  9. Grech, V. S., Lotsaris, K., Grech, I., & Kefala, V. (2024). Semaglutide (Ozempic) and obesity: A comprehensive guide for aestheticians. Review of Clinical Pharmacology and Pharmacokinetics, 38(Suppl. 1), 31–35. https://www.researchgate.net/publication/378300594_Semaglutide_Ozempic_and_obesity_A_comprehensive_guide_for_aestheticians

  10. Vambe, S. D., Zulu, W., Hough, E., & Luvhimbi, M. J. (2024). Semaglutide (Ozempic®): A comprehensive review of its pharmacology, efficacy, and safety profile in type 2 diabetes mellitus and weight management. SA Pharmaceutical Journal, 91(6), 31–34. https://www.researchgate.net/publication/388790459_Semaglutide_Ozempic_R_a_comprehensive_review_of_its_pharmacology_efficacy_and_safety_profile_in_type_2_diabetes_mellitus_and_weight_management

Top comments (0)