Use Amazon Titan Text Model with Lambda (Exploring 🤖 Generative AI on AWS)

#aws #cloud #ai #serverless

Have you ever tried using Amazon Bedrock service on an AWS Lambda function?

In this article we are going to create a simple application that returns a random joke using Amazon Titan Text Express model.

The main parts of this article:
1- What is generative AI?
2- How does generative AI work?
3- Generative AI vs. AI
4- Generative AI services on AWS
5- Technical Part (code)
6- Result
7- Conclusion

What is generative AI?

Generative AI refers to deep-learning models that can produce various types of content. These contents can be text, imagery, audio, and synthetic data (synthetic data is information that is artificially generated rather than produced by real-world events).

How does generative AI work?

Generative AI models can take inputs such as text, image, audio, video, and code and generate new content in any of the modalities mentioned. For example, it can turn text inputs into an image, turn an image into a song, or turn video into text.

Foundation models

A foundation model is a machine learning or Deep learning model that is trained on broad data such that it can be applied across a wide range of use cases. Foundation models have transformed artificial intelligence, powering prominent generative AI applications like ChatGPT.

Large language models

A large language model is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification.

Generative AI vs. AI

Generative AI often starts with a prompt that lets a user or data source submit a starting query or data set to guide content generation. This can be an iterative process to explore content variations. Traditional AI algorithms, on the other hand, often follow a predefined set of rules to process data and produce a result.

Both approaches have their strengths and weaknesses depending on the problem to be solved, with generative AI being well-suited for tasks involving NLP and calling for the creation of new content and traditional algorithms being more effective for tasks involving rule-based processing and predetermined outcomes.

Generative AI services on AWS

Here is a Range of generative AI technologies on AWS:

Amazon CodeWhisperer is an AI coding companion, you can get great results in developer productivity.
Amazon Bedrock Link is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
Amazon SageMaker JumpStart to discover, explore, and deploy open source FMs—or even create your own. SageMaker JumpStart provides managed infrastructure and tools to accelerate scalable, reliable, and secure model building, training, and deployment.
AWS HealthScribe is a HIPAA-eligible service empowering healthcare software vendors to build clinical applications that automatically generate clinical notes by analyzing patient-clinician conversations.
Amazon Q in QuickSight helps business analysts easily create and customize visuals using natural-language commands. The new Generative BI authoring capabilities extend the natural-language querying of QuickSight Q beyond answering well-structured questions (such as “what are the top 10 products sold in California?”) to help analysts quickly create customizable visuals from question fragments (such as “top 10 products”), clarify the intent of a query by asking follow-up questions, refine visualizations, and complete complex calculations.

Technical Part

The AWS services that will be used in this part:

1- AWS Lambda: Which holds the code and the business logic
2- AWS IAM: For all the permissions inside the AWS cloud
3- Amazon Bedrock: Fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API

First let's increase the Lambda execution time (by default it's 3 seconds), in my case I will set it to 1 min.

What's Amazon Titan Text G1 - Express? it's a large language model for text generation. It is useful for a wide range of advanced, general language tasks such as open-ended text generation and conversational chat, as well as support within Retrieval Augmented Generation (RAG).

import json
import boto3
from botocore.exceptions import ClientError

def lambda_handler(event, context):
  prompt = "Tell me a random joke, about cats"
  client = boto3.client(
    service_name="bedrock-runtime",
    region_name="eu-central-1"
  )

  body = json.dumps({
    "inputText": prompt,
    "textGenerationConfig": {
      "maxTokenCount": 512,
      "stopSequences": [],
      "temperature": 0,
      "topP": 0.9
    }
  })

  modeId='amazon.titan-text-express-v1'

  response = client.invoke_model(
    body=body,
    modelId=modeId,
    accept="application/json",
    contentType="application/json"
  )

  response_body = json.loads(response.get('body').read())
  outputText = response_body.get('results')[0].get('outputText')

  text = outputText[outputText.index('\n')+1:]
  joke = text.strip()
  print(joke)

Let's break down more about the json body object, and the parameters that's being passed.

temperature: A float value representing the temperature parameter, which controls the randomness of the generated text. Higher values result in more diverse but potentially less coherent outputs, while lower values produce safer but potentially repetitive outputs.

topP: Also known as nucleus sampling or nucleus probability, this parameter sets a threshold for cumulative probability when generating the next token. It ensures that only the most probable tokens are considered for generation, which helps avoid generating improbable or nonsensical sequences.

maxTokenCount: An integer indicating the maximum number of tokens to generate. This parameter limits the length of the generated text to prevent excessively long outputs.

stopSequences: An array of strings representing sequences that, if generated, will cause the generation process to stop. This allows for custom stopping criteria based on specific sequences that the user wants to avoid in the generated text.

Also make sure that our Lambda function has the permission to invoke a model

Result

We can see below how a joke about cats was generated using this simple code.

Conclusion

Generative AI will be very helpful in many applications. It will help us to build faster and it will enable us to be creative with new innovative ideas.

If you did like my content, and want to see more, feel free to connect with me on ➡️ Awedis LinkedIn or X, happy 😊 to guide 💁 or help with anything that needs further clarification.