A complete guide to building a serverless PDF conversion service using Rust, pdf_oxide, and cargo-lambda.
Demo: https://pdf-to-text-phi.vercel.app
Repo: https://github.com/fayismahmood/pdf-to-text
Prerequisites
You'll need Rust 1.70+ installed, AWS CLI configured with credentials, and basic familiarity with AWS Lambda concepts. For cargo-lambda installation, follow the official guide.
Your AWS IAM user also needs permissions for:
lambda:* (or the required Lambda deployment permissions)iam:CreateRoleiam:AttachRolePolicyiam:PassRole
Without these permissions, cargo lambda deploy may fail during the initial deployment when creating the Lambda execution role.
To install cargo-lambda, follow the official installation guide:
Installing cargo-lambda
cargo-lambda is the official Cargo subcommand for AWS Lambda functions.
macOS / Linux
curl -L https:// cargo-lambda.info/install.sh | sh
Verify: cargo lambda --version
Project Setup
Create new project
cargo lambda new pdf-converter --http
cd pdf-converter
This creates an HTTP-compatible Lambda project with API Gateway integration.
Update Cargo.toml
[package]
name = "pdf-converter"
version = "0.1.0"
edition = "2021"
[dependencies]
lambda_http = "1.0"
pdf_oxide = "0.3"
tokio = { version = "1", features = ["macros"] }
Code Implementation
src/http_handler.rs
use lambda_http::{Body, Error, Request, RequestExt, Response};
use pdf_oxide::{PdfDocument, converters::ConversionOptions};
pub(crate) enum FileType {
Html,
Text,
Markdown,
}
impl FileType {
fn from_str(s: &str) -> Option<Self> {
match s.to_lowercase().as_str() {
"html" => Some(FileType::Html),
"text" => Some(FileType::Text),
"markdown" => Some(FileType::Markdown),
_ => None,
}
}
}
pub(crate) async fn function_handler(event: Request) -> Result<Response<Body>, Error> {
let file_type = event
.query_string_parameters_ref()
.and_then(|params| params.first("file_type"))
.unwrap_or("text");
let file_type = FileType::from_str(file_type).unwrap_or(FileType::Text);
let body = event.body().to_vec();
let pdf_data = PdfDocument::from_bytes(body)?;
let options = ConversionOptions::default();
let page_count = pdf_data.page_count()?;
let mut result = String::new();
for i in 0..page_count {
let page_content = match file_type {
FileType::Html => pdf_data.to_html(i, &options)?,
FileType::Text => pdf_data.to_plain_text(i, &options)?,
FileType::Markdown => pdf_data.to_markdown(i, &options)?,
};
result.push_str(&page_content);
}
let content_type = match file_type {
FileType::Html => "text/html",
FileType::Text => "text/plain",
FileType::Markdown => "text/markdown",
};
let resp = Response::builder()
.status(200)
.header("content-type", content_type)
.body(result.into())
.map_err(Box::new)?;
Ok(resp)
}
Local Testing
Start the local server
cargo lambda watch
Send a PDF via curl
curl -X POST 'http://localhost:9000/function/function_handler?file_type=markdown' \
-H 'Content-Type: application/pdf' \
--data-binary @document.pdf
Deployment
Build for production
cargo lambda build --release --arm64
The --arm64 flag targets AWS Graviton processors for better cost/performance.
Deploy to AWS
cargo lambda deploy
The first deployment will create an IAM role automatically. Subsequent deployments will reuse it.
Via AWS CLI
# Package the function
cargo lambda build --release
# Deploy
aws lambda deploy
Performance Benchmarks
pdf_oxide Performance
pdf_oxide is one of the fastest PDF libraries available, with benchmark results on 3,830 real-world PDFs:
Python PDF Libraries Comparison
| Library | Mean Time | Pass Rate | License |
|---|---|---|---|
| pdf_oxide | 0.8ms | 100% | MIT |
| PyMuPDF | 4.6ms | 99.3% | AGPL-3.0 |
| pypdfium2 | 4.1ms | 99.2% | Apache-2.0 |
| pdfminer | 16.8ms | 98.8% | MIT |
| pdfplumber | 23.2ms | 98.8% | MIT |
Rust PDF Libraries Comparison
| Library | Mean Time | Pass Rate |
|---|---|---|
| pdf_oxide | 0.8ms | 100% |
| unfpdf | 2.8ms | 95.1% |
| pdf_extract | 4.08ms | 91.5% |
| oxidize_pdf | 13.5ms | 99.1% |
pdf_oxide is 5× faster than pdf_extract and 17× faster than oxidize_pdf in Rust.
AWS Lambda Cold Start
Rust's minimal runtime and compiled binary size result in extremely fast cold starts:
| Runtime | Cold Start (typical) |
|---|---|
| Rust (provided.al2023) | ~50-100ms |
| Node.js | ~100-200ms |
| Python | ~100-300ms |
| Java | ~500-2000ms |
Memory Usage
With pdf_oxide's efficient design, memory usage stays low:
| PDF Size | Peak Memory |
|---|---|
| 100KB | ~5MB |
| 1MB | ~20MB |
| 10MB | ~100MB |
API Usage
Request
POST /function/function_handler?file_type={html|text|markdown}
Content-Type: application/pdf
<binary PDF data>
Response
Returns the converted content with appropriate content-type header.
Example with AWS CLI
aws lambda invoke \
--function-name pdf-converter \
--payload '{"file_type": "markdown"}' \
--cli-binary-format raw-in-base64-out \
response.json
Top comments (0)