DEV Community: omar-steam

Building a Personalized Study Companion Using Amazon Bedrock

omar-steam — Sat, 04 Jan 2025 06:53:00 +0000

I'm in my master's degree program right now, and I've always wanted to find ways to reduce my learning hours every day with work and helping my family. Voila! Here's my solution: creating a study companion using Amazon Bedrock.

Using Amazon Bedrock, which we will incorporate into the system, we will be able to access the capabilities of foundation models such as GPT-4 or T5.

Such models will assist in generating the AI which will be able to answer user-generated questions on a number of topics in my master's course such as quantum physics, and machine learning among others. We will explain how to take the model and make it even better, use prompt engineering for smarter answers, and deploy Retrieval-Augmented Generation for precise answers to students like myself.

So, Let's get into it!

First step: Creating your AWS development Environment

In order to start this project, make sure your AWS account is created with the important permissions for Amazon S3, Lambda and Bedrock because those are the tools you'll be working with (I learned that the hard way after I found out I had to put in my debit card for verification :( ).

Using the URL navigate to the S3 Console and create a new bucket with an example name such as “study-materials”. S3 is used to upload content related to Education. In my case, I generated additional synthetic data that would be appropriate for my master’s program. You can make your own needs or even add other datasets from the Kaggle website for example.

[
    {
        "topic": "Advanced Economics",
        "question": "How does the Lucas Critique challenge traditional macroeconomic policy analysis?",
        "answer": "The Lucas Critique argues that traditional macroeconomic models' parameters are not policy-invariant because economic agents adjust their behavior based on expected policy changes, making historical relationships unreliable for policy evaluation."
    },
    {
        "topic": "Quantum Physics",
        "question": "Explain quantum entanglement and its implications for quantum computing.",
        "answer": "Quantum entanglement is a physical phenomenon where pairs of particles remain fundamentally connected regardless of distance. This property enables quantum computers to perform certain calculations exponentially faster than classical computers through quantum parallelism and superdense coding."
    },
    {
        "topic": "Advanced Statistics",
        "question": "What is the difference between frequentist and Bayesian approaches to statistical inference?",
        "answer": "Frequentist inference treats parameters as fixed and data as random, using probability to describe long-run frequency of events. Bayesian inference treats parameters as random variables with prior distributions, updated through data to form posterior distributions, allowing direct probability statements about parameters."
    },
    {
        "topic": "Machine Learning",
        "question": "How do transformers solve the long-range dependency problem in sequence modeling?",
        "answer": "Transformers use self-attention mechanisms to directly model relationships between all positions in a sequence, eliminating the need for recurrent connections. This allows parallel processing and better capture of long-range dependencies through multi-head attention and positional encodings."
    },
    {
        "topic": "Molecular Biology",
        "question": "What are the implications of epigenetic inheritance for evolutionary theory?",
        "answer": "Epigenetic inheritance challenges the traditional neo-Darwinian model by demonstrating that heritable changes in gene expression can occur without DNA sequence alterations, suggesting a Lamarckian component to evolution through environmentally-induced modifications."
    },
    {
        "topic": "Advanced Computer Architecture",
        "question": "How do non-volatile memory architectures impact traditional memory hierarchy design?",
        "answer": "Non-volatile memory architectures blur the traditional distinction between storage and memory, enabling persistent memory systems that combine storage durability with memory-like performance, requiring fundamental redesign of memory hierarchies and system software."
    }
]

Step 2: Utilise Amazon Bedrock

Launch Amazon Bedrock then:

Click on Link, Amazon Bedrock, and go to the Amazon Bedrock Console.
Start a new project to open your text generator of choice or choose from one of the available foundation models (For example, GPT-3 or T5).
Select your use case, in this comfort, it’s a study companion. Then select the Fine-tuning option and input the dataset (your educational content from S3) for fine-tuning.

Bedrock will be fine-tuned to set up the foundation model on your dataset. For example, if you are using GPT-3, Amazon Bedrock modifies it to learn the context of educational material and come up with the right answers to certain issues.

Here is a code quick snippet of the fine-tuned model below

import boto3

# Initialize Bedrock client
client = boto3.client("bedrock-runtime")

# Define S3 path for your dataset
dataset_path = 's3://study-materials/my-educational-dataset.json'

# Fine-tune the model
response = client.start_training(
    modelName="GPT-3",
    datasetLocation=dataset_path,
    trainingParameters={"batch_size": 16, "epochs": 5}
)
print(response)

Save Fine-tuned Model: In the post fine-tuning stage, the model is saved and in a position to be deployed. It can be located in your Amazon S3 bucket in a new directory that will be named as fine-tuning-model.

Step 3: Implement RAG or Retrieval-Augmented Generation.

1. Create the AWS lambda function:

Lambda will process the request to the natural language understanding model to give back the proper response. This, in turn, will involve the Lambda function using the information the user has entered to search for relevant study material on S3 and then use RAG in the creation of an accurate response.

The Lambda code for generating answers: The code Below is an example of how you might configure the lambda function in order to use the fine tuned model for generating the answers needed:

import json
import boto3
from transformers import GPT2LMHeadModel, GPT2Tokenizer

s3 = boto3.client('s3')
model_s3_path = 's3://study-materials/fine-tuned-model'

# Load model and tokenizer
def load_model():
    s3.download_file(model_s3_path, 'model.pth')
    tokenizer = GPT2Tokenizer.from_pretrained('model.pth')
    model = GPT2LMHeadModel.from_pretrained('model.pth')
    return tokenizer, model

tokenizer, model = load_model()

def lambda_handler(event, context):
    query = event['query']
    topic = event['topic']

    # Retrieve relevant documents from S3 (RAG)
    retrieved_docs = retrieve_documents_from_s3(topic)

    # Generate response
    prompt = f"Topic: {topic}\nQuestion: {query}\nAnswer:"
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(inputs['input_ids'], max_length=150)
    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return {
        'statusCode': 200,
        'body': json.dumps({'answer': answer})
    }

def retrieve_documents_from_s3(topic):
    # Fetch study materials related to the topic from S3
    # Your logic for document retrieval goes here
    pass

3. Launch/deploy the lambda function: Delpoy that lambda function on AWS. It will be invoked through API Gateway to deal with the provocation of real-time user queries.

Step 4: Expose the model via an API Gateway

Create an API gateway then:

Go to the API gateway console and generate a new REST API.
Create a POST endpoint for your Lambda function that generates the answers.

Deploy the actual API

Deploy the API itself to make it public by pointing it to a custom domain or using any default URL they provide in AWS.

Last step: Building the Streamlit app (aka a backend developer's favourite tool)

And finally, build the Streamlit app that allows the user to interact and ask questions.

import streamlit as st
import requests

st.title("Personalized Study Companion")

topic = st.text_input("Enter Study Topic:")
query = st.text_input("Enter Your Question:")

if st.button("Generate Answer"):
    response = requests.post("https://your-api-endpoint", json={"topic": topic, "query": query})
    answer = response.json().get("answer")
    st.write(answer)

You can host the Streamlit application on AWS E2 or Elastic Beanstalk

If everything works well congratulations. You just made your study companion. If I had to evaluate this project, I could add some more examples for my synthetic data (duh??) or get another educational dataset that perfectly aligns with my goals.

Thanks for reading! Let me know what do you think!

AWS Learning Experience with Streamlit (This Time We Connect To AWS)

omar-steam — Fri, 03 Jan 2025 17:20:45 +0000

In my very last post (click here to read it), I showed you how to create the application using mock AWS using Motto which simulates AWS via an SDK. Now, I'm going to show you how to do it via AWS (yay!)

It uses AWS S3 as the place to store learning resources while using AWS Lambda to process the data coming from the user. The recommender currently applies the TF-IDF vectorization technique and cosine similarity in ranking.

Let's do it, shall we?

1. Set Up The S3 Bucket

Create the S3 Bucket for storing data:

aws s3 mb s3://learning-paths-data

Upload the learning resources JSON:

echo '{
    "resources": [
        {"title": "Introduction to AWS", "tags": "AWS, Cloud Computing"},
        {"title": "Deep Learning on AWS", "tags": "AWS, AI, Deep Learning"}
    ]
}' > resources.json
aws s3 cp resources.json s3://learning-paths-data/

2. Create the lambda function

Write the lambda function in order to process the input:

import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def lambda_handler(event, context):
    user_input = event['query']
    s3 = boto3.client('s3')
    obj = s3.get_object(Bucket='learning-paths-data', Key='resources.json')
    resources = json.loads(obj['Body'].read().decode('utf-8'))['resources']
    titles, tags = zip(*[(r['title'], r['tags']) for r in resources])

    tfidf = TfidfVectorizer().fit_transform(tags + [user_input])
    scores = cosine_similarity(tfidf[-1], tfidf[:-1]).flatten()
    recommendations = [titles[i] for i in scores.argsort()[-3:]]
    return {'recommendations': recommendations}

Use this function and enable its access through API Gateway.

3. Change the Streamlit app accordingly

Change the app based on the query from the lambda function:

import requests

def main():
    user_input = st.text_input('What do you want to learn about AWS?')
    if st.button('Recommend'):
        response = requests.post('<API Gateway URL>', json={'query': user_input})
        recommendations = response.json()['recommendations']
        for rec in recommendations:
            st.write(rec)

if __name__ == '__main__':
    main()

These projects illustrate how to use AWS services to develop resilient and outsized applications. Try them via my Github repo here Thanks for reading!

Create Your Own Personalised AWS Learning Experience with Streamlit

omar-steam — Tue, 24 Dec 2024 13:06:30 +0000

In my journey towards learning about AWS and machine learning/AI, I've decided to create a simple yet powerful AWS Learning Path Recommender using Streamlit, natural language processing or NLP via a mock S3 environment. This application will be able to suggest AWS learning paths based on user input.

So, let's get into it!

Prerequisites

Before we start, make sure Python is installed and create a new project folder. Then, install the libraries below:

pip install streamlit boto3 moto scikit-learn

Step 1: Setting Up the Mock S3

First, we define a function to create a mock S3 using Moto; this will be used to simulate AWS S3 without connecting to AWS.

import boto3
from moto import mock_s3
import json

def setup_mock_s3():
    s3 = boto3.resource("s3", region_name="us-east-1")
    bucket_name = "mock-learning-paths"
    s3.create_bucket(Bucket=bucket_name)
    data = {
        "resources": [
            {"title": "Introduction to AWS", "tags": "AWS, Cloud Computing, Basics"},
            {"title": "Deep Learning on AWS", "tags": "AWS, Deep Learning, AI"},
            {"title": "NLP with SageMaker", "tags": "AWS, NLP, Machine Learning"},
            {"title": "Serverless with AWS Lambda", "tags": "AWS, Serverless, Lambda"},
        ]
    }
    s3.Bucket(bucket_name).put_object(Key="mock_resources.json", Body=json.dumps(data))
    return bucket_name

Step 2: Recommendation Function

Next, we will define a function that, given a user's input, will make suggestions for learning paths by utilizing some NLP:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def recommend_learning_path(user_input, bucket_name):
    s3 = boto3.client("s3", region_name="us-east-1")
    obj = s3.get_object(Bucket=bucket_name, Key="mock_resources.json")
    data = json.loads(obj['Body'].read().decode('utf-8'))

    resources = data["resources"]
    titles = [resource["title"] for resource in resources]
    tags = [resource["tags"] for resource in resources]

    corpus = tags + [user_input]
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(corpus)
    similarity = cosine_similarity(tfidf_matrix[-1], tfidf_matrix[:-1])

    scores = similarity.flatten()
    ranked_indices = scores.argsort()[::-1]

    recommendations = [titles[i] for i in ranked_indices[:3]]
    return recommendations

Step 3: Streamlit Interface

Let's now design the interface of our application using Streamlit:

import streamlit as st

st.title("AWS Learning Path Recommender")

user_input = st.text_input("What do you want to learn about AWS?", "I want to learn about AWS and AI")

if st.button("Get Recommendations"):
    with mock_s3():
        bucket_name = setup_mock_s3()
        recommendations = recommend_learning_path(user_input, bucket_name)

    st.subheader("Recommended Learning Path:")
    for i, rec in enumerate(recommendations, 1):
        st.write(f"{i}. {rec}")

Step 4: Putting It All Together
Combine all the code snippets into a single Python file named 'app.py':

import streamlit as st
import boto3
from moto import mock_s3
import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# [Insert setup_mock_s3 function here]

# [Insert recommend_learning_path function here]

st.title("AWS Learning Path Recommender")

user_input = st.text_input("What do you want to learn about AWS?", "I want to learn about AWS and AI")

if st.button("Get Recommendations"):
    with mock_s3():
        bucket_name = setup_mock_s3()
        recommendations = recommend_learning_path(user_input, bucket_name)

    st.subheader("Recommended Learning Path:")
    for i, rec in enumerate(recommendations, 1):
        st.write(f"{i}. {rec}")

Step 5: Run the App
To start the Streamlit app, open a terminal, change the directory to your project folder and run:

streamlit run app.py

This will boot the Streamlit server and open the app in your default web browser.

How It Works
The app creates a mock S3 bucket populated with some example AWS learning resources.

This application recommends the most relevant resources, using TF-IDF in addition to cosine similarity, by entering the interests of the user in learning and clicking "Get Recommendations." The top 3 recommendations will be shown to the users.

Conclusion.

This simple application uses Streamlit to glue together the NLP techniques with AWS services, admittedly mocked, to create an interactive learning path recommender. You could extend this example by integrating the actual AWS services, adding more resources, or using more sophisticated recommendation algorithms.

This is a very simple example and can be improved a lot for production. Remember, security, scalability, and user experience are major concerns while developing applications for the real world.

Thank you so much for reading this!