Ravindra Pandya

Posted on Jan 19

Fine-Tuning Open Source Models with Amazon Bedrock: A Complete Guide

#aws #ai #bedrock #finetuning

I've spent the last few months experimenting with fine-tuning foundation models on Amazon Bedrock, and I wanted to share what I've learned. Fine-tuning lets you customize these models for your specific needs, which can make a huge difference for domain-specific tasks. Bedrock handles most of the heavy lifting, so you don't need to worry about managing infrastructure.

What This Guide Covers

I'll walk you through the entire process: preparing your dataset, setting up a fine-tuning job, watching it train, and actually using your customized model in production.

What You'll Need

Here's what you should have before diving in:

An AWS account with Bedrock permissions (you might need to request access if you haven't used Bedrock before)
AWS CLI installed on your machine
Some familiarity with machine learning basics
Python 3.8 or newer
Comfort navigating the AWS Console

Understanding What Bedrock Actually Does

Bedrock supports fine-tuning for several open source models, including Meta's Llama family and Cohere's models. The real win here is that AWS manages all the infrastructure complexity. You focus on your data and evaluating results instead of babysitting EC2 instances.

One thing to note: not every model supports fine-tuning, and availability varies by region. Check the current docs before you get too far into planning.

Picking Your Dataset

For this walkthrough, I'm using publicly available data. Here are some solid options I've worked with:

Hugging Face Datasets - massive collection, easy to access
SQuAD - great if you're building a Q&A system
Common Crawl - useful for general language tasks
GitHub code datasets - perfect for code generation projects

Let's say you want to build a customer support chatbot. I'll use a customer support dataset from Hugging Face as an example:

from datasets import load_dataset

# Grab a public customer support dataset
dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")

Getting Your Data Ready

This part is critical. Bedrock wants your data as JSONL (JSON Lines) files. Each line needs to be a complete JSON object with your training example.

Here's the basic format:

{"prompt": "Customer: How do I reset my password?", "completion": "To reset your password, click on 'Forgot Password' on the login page and follow the instructions sent to your email."}

Converting your dataset looks like this:

import json

def format_for_bedrock(example):
    return {
        "prompt": f"Customer: {example['instruction']}",
        "completion": example['response']
    }

# Process your data
formatted_data = []
for item in dataset['train']:
    formatted_data.append(format_for_bedrock(item))

# Write it out as JSONL
with open('training_data.jsonl', 'w') as f:
    for item in formatted_data:
        f.write(json.dumps(item) + '\n')

A few things I learned the hard way:

Keep your formatting consistent across every single example
Strip out any PII - seriously, double check this
You need at least 200-500 good examples, but more is definitely better
Watch out for imbalanced datasets that lean heavily toward certain types of responses
Test that your JSONL is valid before uploading (saves time later)

Getting Your Data Into S3

Bedrock pulls training data from S3, so you'll need to upload it there:

# Make a new bucket if needed
aws s3 mb s3://my-bedrock-finetuning-bucket

# Push your data up
aws s3 cp training_data.jsonl s3://my-bedrock-finetuning-bucket/training-data/training_data.jsonl

Quick tip: make sure your bucket is in the same region where you're running the fine-tuning job, or you'll hit weird errors.

Setting Up Permissions

You need an IAM role that gives Bedrock access to your S3 bucket. Here's what the permissions should look like:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-bedrock-finetuning-bucket/*",
        "arn:aws:s3:::my-bedrock-finetuning-bucket"
      ]
    }
  ]
}

And the trust policy so Bedrock can actually use this role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "bedrock.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Kicking Off a Fine-Tuning Job (Console Method)

Head over to the Bedrock console:

Get to the right place: Click "Custom models" in the left menu, then "Create fine-tuning job"
Pick your base model: Choose which foundation model you're starting with. I usually go with Meta Llama 2 or Cohere Command depending on the use case
Fill in the details:
- Give it a name you'll remember
- Name your fine-tuned model something descriptive
- Point it to your S3 training data
- Tell it where to save the results
- Select that IAM role you just created
Tweak the hyperparameters:
- Epochs: How many times to go through your data (I start with 3-5)
- Batch size: How many examples to process at once
- Learning rate: Usually best to stick with defaults unless you know what you're doing
Double-check everything and launch it

Starting a Job via CLI

If you prefer the command line (I usually do):

aws bedrock create-model-customization-job \
    --job-name my-customer-support-model \
    --custom-model-name customer-support-assistant-v1 \
    --role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/BedrockFineTuningRole \
    --base-model-identifier arn:aws:bedrock:us-east-1::foundation-model/meta.llama2-13b-chat-v1 \
    --training-data-config s3Uri=s3://my-bedrock-finetuning-bucket/training-data/training_data.jsonl \
    --output-data-config s3Uri=s3://my-bedrock-finetuning-bucket/output/ \
    --hyper-parameters epochCount=3,batchSize=8,learningRate=0.00001

Watching Your Job Run

Fine-tuning takes time. Sometimes a few hours, sometimes longer depending on your dataset size. Here's how to keep tabs on it:

In the Console: Just go to Bedrock > Custom models and you'll see the status. It'll say "InProgress", "Completed", or "Failed"

With CLI: Check programmatically:

aws bedrock get-model-customization-job \
    --job-identifier my-customer-support-model

CloudWatch has detailed metrics too if you want to dig into loss curves and validation accuracy.

Testing Your Model

Once it's done training, time to see how it performs:

import boto3
import json

bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')

# Give it a test prompt
prompt = "Customer: What are your business hours?"

response = bedrock_runtime.invoke_model(
    modelId='arn:aws:bedrock:us-east-1:YOUR_ACCOUNT_ID:custom-model/customer-support-assistant-v1',
    body=json.dumps({
        "prompt": prompt,
        "max_tokens": 200,
        "temperature": 0.7
    })
)

result = json.loads(response['body'].read())
print(result['completion'])

Don't test on your training data. Set aside a separate validation set to get realistic results.

Actually Using Your Model

Now you've got a working fine-tuned model. Here's what you can do with it:

Plug it into your app: Use the AWS SDK to call it from your code

Set up provisioned throughput: If you need guaranteed capacity for production, you can purchase dedicated throughput

Build a chatbot: Hook it up to your customer-facing systems with Lambda or API Gateway

Here's a simple production example:

def get_customer_support_response(customer_query):
    bedrock_runtime = boto3.client('bedrock-runtime')

    response = bedrock_runtime.invoke_model(
        modelId='arn:aws:bedrock:us-east-1:YOUR_ACCOUNT_ID:custom-model/customer-support-assistant-v1',
        body=json.dumps({
            "prompt": f"Customer: {customer_query}",
            "max_tokens": 300,
            "temperature": 0.5
        })
    )

    return json.loads(response['body'].read())['completion']

What I Wish I'd Known Earlier

Quality beats quantity every time: I've had better results with 300 really good examples than 2000 mediocre ones.

Don't mess with hyperparameters right away: Start with defaults. You can experiment later once you see baseline performance.

Keep track of versions: Name your jobs clearly and document which dataset you used. Trust me, you'll forget.

Watch your costs: Fine-tuning isn't free. Run small experiments first before going all-in on a massive dataset.

Iterate: Your first fine-tuned model probably won't be perfect. Look at where it fails, add more examples for those cases, and retrain.

Always validate properly: Keep a holdout set that the model never sees during training. This is how you know if it actually works.

When Things Go Wrong

Job dies immediately: Nine times out of ten, it's permissions. Check that your IAM role can actually read from S3.

Model performs worse than expected: Look at your data quality first. Are the prompt-completion pairs actually good examples?

Format errors: Validate your JSONL file. Every line must be valid JSON, and field names need to be consistent.

Training loss stays high: Try lowering the learning rate or adding more epochs. Also check if you have contradictory examples in your data.

Wrapping Up

Fine-tuning with Bedrock has genuinely changed how I approach building AI applications. The managed infrastructure means I can focus on what matters - creating good training data and building useful products - instead of fighting with training infrastructure.

Start simple. Get something working end-to-end with a small dataset first. Then expand from there as you understand what works for your specific use case.

The secret sauce is really in the data preparation. Spend time there and the rest tends to fall into place.

Where to Go From Here

Try different base models and compare results
Set up A/B tests to measure improvement over the base model
Build pipelines to automatically retrain as you collect more data
Look into fine-tuning for multiple tasks at once
Keep an eye on AWS docs - they're constantly adding new features

Good luck with your fine-tuning projects!

DEV Community