selvakumar palanisamy

Posted on Nov 28, 2023

Generative AI - Large Language Models

#largelanguagemodel #bedrock #aws #ai

What are Large Language Models (LLMs)?

Large Language Models (LLMs) from OpenAI, such as GPT-3 and GPT-4, are machine learning algorithms that are trained on data to understand and generate human-like prose. These models are created utilising neural networks that have millions or even billions of parameters, allowing them to perform difficult tasks like translation, summarization, question answering, and even creative writing.

LLMs analyse the patterns and links between words and phrases to provide coherent and contextually relevant output. They are trained on different and large datasets, which frequently include sections of the internet, novels, and other texts. Despite their ability to replicate such features in the text they write, they are not sentient and do not possess comprehension or emotions.

High performing foundation Models

BERT
GPT
BLOOM
FLAN-T5
PaLM

LLM Different use cases and tasks

Essay Writing
Summarization
Translation
Information Retrieval
Invoke Api and actions -> Action call -> External Applications

Transformer Architecture

Transformers are a type of neural network architecture that have been gaining popularity. Transformers were recently used by OpenAI in their language models

Encoder

Encodes inputs (“prompts”) with contextual understanding
and produces one vector per input token.

Decoder
Accepts input tokens and generates new tokens.

Input Embedding: The first step in the Transformer model is to convert the input text into numerical representations called embeddings. Each word in the input sentence is transformed into a vector of numbers that captures its meaning. These embeddings provide a way for the model to understand the input data.

Multi-Head Attention (Encoder): The Encoder in the Transformer processes the input embeddings. One of its key components is the Multi-Head Attention mechanism. It’s like having multiple experts, each specializing in paying attention to different aspects of the input. The Multi-Head Attention calculates different attention scores for each word in the sentence, indicating its relevance to other words in the same sentence. This helps the model understand the context and relationships between words.

Feed Forward Neural Networks (Encoder): After Multi-Head Attention, the Transformer uses a Feed Forward Neural Network for additional processing. This network applies non-linear transformations to the attention outputs, introducing more complex relationships between words in the sentence.

Masked Multi-Head Attention (Decoder): During the decoding phase, the Transformer uses Masked Multi-Head Attention in the Decoder. The purpose of this mechanism is to prevent the model from cheating and looking ahead at future words during generation. The mask ensures that the Decoder only attends to the already generated words in the output sequence, ensuring a proper step-by-step generation of the output sentence.

Softmax: The Softmax function is used in two different contexts in the Transformer: Multi-Head Attention Weights: The attention scores calculated in both the Encoder and Decoder are passed through the Softmax function to convert them into a probability distribution. This distribution represents how much attention each word should receive. Output Layer: The final layer of the Transformer’s Decoder generates the output probabilities for each word in the target vocabulary. These probabilities are transformed into a valid probability distribution using the Softmax function. The word with the highest probability is selected as the model’s predicted next word.

Linear Function (Output Embedding): After the output probabilities are obtained through Softmax, the Transformer uses a Linear function to map these probabilities back to the same embedding dimension as the input embeddings. This step is crucial as it ensures the Decoder’s output is in the same format as the input embeddings, making it possible to pass it through additional layers of the Decoder or connect it to other parts of the model.

In summary, the Transformer model employs input embeddings, multi-head attention, feed-forward neural networks in the encoder, masked multi-head attention, and softmax in the decoder, as well as linear functions for output embeddings. These components work together to enable the Transformer's extraordinary capacity to recognise and synthesise natural language sequences, making it a game-changing paradigm in natural language processing.

Prompting and Engineering

In-context Learning

Zero-shot learning is a sort of machine learning in which the model can predict new classes without being explicitly trained on them. This is accomplished by utilising the model's knowledge of other, comparable classes to draw conclusions about the new classes.

One-shot learning is a sort of machine learning in which the model predicts new classes after being trained on a single example of that class. This task is more difficult than zero-shot learning since the model need more data to work with.

Few-shot learning is a sort of machine learning that is intermediate between zero-shot and one-shot learning. The model is trained on a few examples of each new class in few-shot learning. This is a more difficult challenge than one-shot learning, but it is less difficult than zero-shot learning.

Inference Parameters

Parameters are needed to manage and control the text generation process. They help in:

•Controlling the length of the generated text.
•Managing the randomness and diversity in the generated text.
•Ensuring the relevance and coherence of the generated text.
•Avoiding the generation of inappropriate or nonsensical text.

The max_new_token option restricts the number of tokens generated by the model. It aids in the regulation of the generated text's length.

Controlling the length of the output is critical in real-world applications. For instance, in summarization tasks, you might want to limit the length to ensure concise summaries.

Top-K Sampling
Top-k sampling involves the model selecting the next token from the top k most likely tokens.

Advanced Use: Effective in activities that require a balance of diversity and relevancy, such as chatbot responses.

Top-P Sampling

Top-p sampling involves the model selecting the next token from a narrower set of tokens with a cumulative probability of at least p.

Advanced Use: Effective in situations when you wish to ensure a given level of probability, such as when creating text in a specific style or theme.

Temperature
temperature affects the probability distribution of the next token. Higher values make the output more random, while lower values make it more deterministic.

Advanced Use: Adjusting temperature can help in tasks like text-based games or simulations where varying levels of unpredictability are desired.

Generative AI project Life cycle

Define the scope accurately and narrow down the use case
Select model / Build a model from scratch estimate feasibility
Evaluate Performance, carry out additional training if necessary In context, learning if required.
Supervised learning model to finetune
Fine-tune the model to align with human preferences 6.Reinforcement Learning with human feedback 7.How to align to your preferences 8.highly iterative 9.additional infrastructure requirements

How Large Langugae Models Trained?

Pre-training LLM -Learns from Unstructured textural data collected from web scraping, various data sources, and corpora for training language models usually datasets in GB-TB-PBs

Model weights can be updated to minimize the loss of training
Large amount of patterns depends on architecture of model

Dataset needs to increase its data quality in order to use for model training purposes to address bias, remove harmful content.

Data Quality filter — 1–3 % of original tokens to decide pretraining of model

During training the model weights get updated to minimise the loss of the function objective to train the model

Computational challenges of training LLMs

Often training LLMs results in error message —
“ Running out of memory — Cuda out of memory”

Compute unified device architecture — collection of libraries and tools to process and perform large calculations.
Pytorch and tensor flow use matrix multiplication and complex calculations to scale problems where it uses CUDA processors

Calculating approx GPU RAM needed to store 1B parameters

Computation issue explained: develop intuition of scale of parameter

1 parameter — 4 bytes (32 bit float)
1B param = 4 x109 bytes — 4GB @32 bit full precision memory store model weights so far
Additional computations requirement for the memory in calculation is required
Above is calcualted to Store model weights only •Model parameters (weights) — 4 bytes per parameter •Adam optimiser (2 state) — +8 bytes per parameter •Gradients — +4 bytes per parameter •Activations & temp memory (variable size) needed for activation — +8 bytes per parameter(high end estimate)

Total => 4 bytes per parameter == 20 extra bytes per parameter
Memory needed to store model < 20 X memory ended to train the model
Memory requirement: 4GB of model — requires 80GB of gpu ram

Fine tuning your model

Data parallelism allows for the use of multiple GPUs to process different parts of the same data simultaneously, speeding up training time.

Parameterefficient Fine-tuning (PEFT)

PEFT employs fine-tuning only on a small subset of the model’s parameters, while freezing most of the pre-trained network. This tactic mitigates catastrophic forgetting and significantly cuts computational and storage costs.

1.Task-Guided Prompt Tuning: This technique utilizes task-specific prompts to guide the LLM’s output, obviating the need to retrain the entire model for a specific task.

2.Low-Rank Adaptation (LoRA): By approximating the LLM’s parameters with a low-rank matrix, LoRA decreases the number of fine-tuned parameters, enhancing LLM performance.

3.Adapters: These small, specialized layers can be added to the LLM for task adaptation, providing flexibility and performance improvement.

4.Task-Relevant Prefix Tuning: Fine-tuning the LLM on representative prefixes related to the task at hand enhances performance and task adaptability.

Reinforcement learning from Human Feedback (RLHF)

LLMs behave incorrectly or give undesirable responses because of the incorrect input given that might not be Helpful, Honest, or Harmless while fine-tuning the model.
Hence additional fine-tuning with human preference will help us to extend its functionality to create a super fine-tuned model

Use of fine-tuning with human feedback maximise helpfulness and relevance to human prompt that will help to minimise harm and avoid dangerous topics

RLHF is an approach in artificial intelligence and machine learning that combines reinforcement learning techniques with human guidance to improve the learning process. It involves training an agent or model to make decisions and take action in an environment while receiving feedback from human experts. The input humans can be in the form of rewards, preferences, or demonstrations, which helps guide the model’s learning process. RLHF enables the agent to adapt and learn from the expertise of humans, allowing for more efficient and effective learning in complex and dynamic environments.

The RLHF Features Three Phases
•Picking a pre-trained model as the primary model is the first step. In particular, it is important to use a pre-trained model to avoid the good amount of training data required for language models.

•In the second step, a second reward model must be created. The reward model is trained with input from people who are given two or more examples of the model’s outputs and asked to score them in quality. The performance of the primary model will be assessed by the reward model using a scoring system based on this information.

•The reward model receives outputs from the main model during the third phase of RLHF and then produces a quality score that indicates how well the main model performed. This input is included in the main model to improve performance on the next jobs.

Constitutional AI

Constitutional AI (CAI) is similar to RLHF except instead of human feedback, it learns through AI feedback.

At a high-level, there are two stages of Constitutional AI (CAI): the Reflection stage and the Reinforcement stage.

CAI Stage 1: Reflection

What we start with: Baseline model
What we end with: Supervised Learning CAI (SL-CAI) model

1.Ask the LLM to generate toxic responses.

2.Give the LLM a set of rules to follow (or a Constitution). Present the toxic responses back to the LLM and ask if they accord with the Constitution.

3.Ask the LLM to generate a revised response. (Repeat revision until the responses follow the Constitution.)

4.This creates a synthetic dataset, which you can use for training. Fine-tune (or train) the baseline model on this synthetic dataset to create responses that more closely follow the Constitution.

5.Through this process, you get the SL-CAI model, the intermediate model.

CAI Stage 2: Reinforcement

What we start with: SL-CAI model
What we end with: Reinforcement Learning CAI (RL-CAI) model

1.Ask the SL-CAI (from stage 1) to generate toxic responses. (Repeat multiple times for a given question.)
2.Since each question has multiple responses, we can now create multiple choice questions.
3.Give those multiple choice questions to the SL-CAI model and ask it to select the answer that best follows the Constitution.
4.This creates a second synthetic dataset, which you can use to train a reward model to do reinforcement learning.
5.Train a reward model (which is different from the baseline or SL-CAI models) on that dataset to predict which answers align best with the Constitution. This reward model becomes a teacher to the final model.
6.Use this reward model to reinforce desired behavior and punish undesired behavior. This nudging method teaches the final model the Constitution.
7.Through this process, you get the RL-CAI model, the final model!

LLM optimization Techniques

Preliminary Assessment and Baseline Establishment

Understand the LLM’s Capabilities: Assess the general knowledge and abilities of the LLM in its base form.

Establish a Performance Baseline: Determine the LLM's initial performance on your target task to identify areas for improvement.

Prompt Optimization

Develop Initial Prompts: Create clear, structured prompts tailored to the task at hand.
Iterative Testing and Refinement: Continuously test and refine these prompts based on the LLM's output quality and relevance.

Retrieval-Augmented Generation (RAG) Implementation

Introduce Contextual Data: Implement RAG to provide the LLM with access to relevant, domain-specific content.

Evaluate and Adjust RAG: Monitor the LLM's performance with RAG, tweaking the content and its relevance as needed.

Fine-Tuning with Specific Datasets

Curate Specialized Datasets: Select or create datasets that are highly relevant to the specific task.
Fine-Tune the LLM: Continue the LLM's training with these datasets to specialize its capabilities for the task.

Combining Fine-Tuning and RAG
•Integrate RAG with Fine-Tuned Models: Use RAG to supplement the fine-tuned model with additional contextual information.
•Optimize for Balance: Ensure a balance between the LLM's general knowledge and its specialized capabilities.
Performance Evaluation and Optimization

Continuous Evaluation: Regularly assess the LLM’s performance on the target task, using both qualitative and quantitative measures.
Feedback Loop for Improvement: Use the insights from evaluations to further refine the prompts, RAG implementation, and fine-tuning.
** Deployment and Real-World Testing**
Deploy the Optimized LLM: Implement the optimized LLM in a real-world scenario or a testing environment that closely mimics actual use cases.
Monitor and Adjust in Real-Time: Continuously monitor the LLM’s performance in real-world applications, making adjustments as needed based on user feedback and performance data.
Iterative Improvement
Long-Term Optimization: Recognize that LLM optimization is an ongoing process. Regularly revisit and update the model with new data, techniques, and insights.

Retrieval augmented generation (RAG)

Retrieval Augmented Generation (RAG) combines the advanced text-generation capabilities of GPT and other large language models with information retrieval functions to provide precise and contextually relevant information. This innovative approach improves language models' ability to understand and process user queries by integrating the latest and most relevant data

RAG is about feeding language models with necessary information. Instead of asking LLM directly(like in general-purpose models), we first retrieve the very accurate data from our knowledge library that is well maintained and then use that context to return the answer. When the user sends a query(question) to the retriever, we use vector embeddings(numerical representations) to retrieve the requested document. Once the needed information is found from the vector databases, the result is returned to the user. This largely reduces the possibility of hallucinations and updates the model without retraining the model, which is a costly process. Here's a very simple diagram that shows the process.

Program-aided Language model (PAL)

Artificial Intelligence (AI) continues to evolve at a rapid pace, with groundbreaking strides in generative capabilities playing a critical role in defining this ever-evolving landscape. One such transformative leap is the advent of Program-Aided Language models (PAL), an innovative solution that revolutionizes how Language Learning Models (LLMs) function. This article delves into the intricate workings of PAL and explores how it has enhanced LLMs, ultimately resulting in superior AI performance.

LLM Powered Applications

LangChain is an advanced platform that provides developers with a seamless and intuitive interface to leverage the power of LLM in their applications. It offers a range of APIs and tools that simplify the integration of LLM into your projects, enabling you to unlock the full potential of language processing.

Langchain

LangChain is a framework for developing applications powered by language models. It enables applications that:
•Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
•Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
framework consists of several parts.
•LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
•LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
•LangServe: A library for deploying LangChain chains as a REST API.
•LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

Building Generative applications -High level app Frame work

I have tried couple of AWS sagemaker LLM notebooks shared in AWS bedrock workshop to get to know about LLM libiaries and response - contextual generation ,Image and code generation

import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

Create new client
  Using region: us-east-1
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)

Invoke bedrock LLM model

from langchain.llms.bedrock import Bedrock

inference_modifier = {'max_tokens_to_sample':4096, 
                      "temperature":0.5,
                      "top_k":250,
                      "top_p":1,
                      "stop_sequences": ["\n\nHuman"]
                     }

textgen_llm = Bedrock(model_id = "anthropic.claude-v2",
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                    )

Create a LangChain custom prompt template

from langchain.prompts import PromptTemplate

# Create a prompt template that has multiple input variables
multi_var_prompt = PromptTemplate(
    input_variables=["customerServiceManager", "customerName", "feedbackFromCustomer"], 
    template="""

Human: Create an apology email from the Service Manager {customerServiceManager} to {customerName} in response to the following feedback that was received from the customer: 
<customer_feedback>
{feedbackFromCustomer}
</customer_feedback>

Assistant:"""
)

# Pass in values to the input variables
prompt = multi_var_prompt.format(customerServiceManager="Bob", 
                                 customerName="John Doe", 
                                 feedbackFromCustomer="""Hello Bob,
     I am very disappointed with the recent experience I had when I called your customer support.
     I was expecting an immediate call back but it took three days for us to get a call back.
     The first suggestion to fix the problem was incorrect. Ultimately the problem was fixed after three days.
     We are very unhappy with the response provided and may consider taking our business elsewhere.
     """
     )

num_tokens = textgen_llm.get_num_tokens(prompt)
print(f"Our prompt has {num_tokens} tokens")

response = textgen_llm(prompt)

email = response[response.index('\n')+1:]

print_ww(email)

I want to sincerely apologize for the poor service you recently received from our customer support
team. It is unacceptable that it took multiple days for us to respond and resolve your issue.

As the Service Manager, I take full responsibility for this situation. I will be looking into why
there were delays in responding and getting your problem fixed correctly. It is our top priority to
provide prompt, knowledgeable support so our customers' needs are addressed efficiently.

I understand your frustration with this experience and do not blame you for considering taking your
business elsewhere. We value you as a customer and want to regain your trust. Please let me know if
there is anything I can do to make this right. I would be happy to offer you a discount on your next
purchase or provide a refund as an apology for the inconvenience.

Our goal is to deliver excellent customer service and support. This situation shows we have more
work to do. I will be implementing changes to our processes and additional training for our staff to
prevent something like this from happening again.

I appreciate you taking the time to share your feedback. It will help us improve. I sincerely hope
you will give us another chance to provide you with the positive experience you deserve. Please feel
free to contact me directly if you have any other concerns.

Sincerely,

Bob
Service Manager

Code Translation

import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."


boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None),
)

from langchain.llms.bedrock import Bedrock

inference_modifier = {'max_tokens_to_sample':4096, 
                      "temperature":0.5,
                      "top_k":250,
                      "top_p":1,
                      "stop_sequences": ["\n\nHuman"]
                     }

textgen_llm = Bedrock(model_id = "anthropic.claude-v2",
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                    )

# Vehicle Fleet Management Code written in C++
sample_code = """
#include <iostream>
#include <string>
#include <vector>

class Vehicle {
protected:
    std::string registrationNumber;
    int milesTraveled;
    int lastMaintenanceMile;

public:
    Vehicle(std::string regNum) : registrationNumber(regNum), milesTraveled(0), lastMaintenanceMile(0) {}

    virtual void addMiles(int miles) {
        milesTraveled += miles;
    }

    virtual void performMaintenance() {
        lastMaintenanceMile = milesTraveled;
        std::cout << "Maintenance performed for vehicle: " << registrationNumber << std::endl;
    }

    virtual void checkMaintenanceDue() {
        if ((milesTraveled - lastMaintenanceMile) > 10000) {
            std::cout << "Vehicle: " << registrationNumber << " needs maintenance!" << std::endl;
        } else {
            std::cout << "No maintenance required for vehicle: " << registrationNumber << std::endl;
        }
    }

    virtual void displayDetails() = 0;

    ~Vehicle() {
        std::cout << "Destructor for Vehicle" << std::endl;
    }
};

class Truck : public Vehicle {
    int capacityInTons;

public:
    Truck(std::string regNum, int capacity) : Vehicle(regNum), capacityInTons(capacity) {}

    void displayDetails() override {
        std::cout << "Truck with Registration Number: " << registrationNumber << ", Capacity: " << capacityInTons << " tons." << std::endl;
    }
};

class Car : public Vehicle {
    std::string model;

public:
    Car(std::string regNum, std::string carModel) : Vehicle(regNum), model(carModel) {}

    void displayDetails() override {
        std::cout << "Car with Registration Number: " << registrationNumber << ", Model: " << model << "." << std::endl;
    }
};

int main() {
    std::vector<Vehicle*> fleet;

    fleet.push_back(new Truck("XYZ1234", 20));
    fleet.push_back(new Car("ABC9876", "Sedan"));

    for (auto vehicle : fleet) {
        vehicle->displayDetails();
        vehicle->addMiles(10500);
        vehicle->checkMaintenanceDue();
        vehicle->performMaintenance();
        vehicle->checkMaintenanceDue();
    }

    for (auto vehicle : fleet) {
        delete vehicle; 
    }

    return 0;
}
"""

from langchain.prompts import PromptTemplate

# Create a prompt template that has multiple input variables
multi_var_prompt = PromptTemplate(
    input_variables=["code", "srcProgrammingLanguage", "targetProgrammingLanguage"], 
    template="""

Human: You will be acting as an expert software developer in {srcProgrammingLanguage} and {targetProgrammingLanguage}. 
You will tranlslate below code from {srcProgrammingLanguage} to {targetProgrammingLanguage} while following coding best practices.


{code}

Assistant: """
)

# Pass in values to the input variables
prompt = multi_var_prompt.format(code=sample_code, srcProgrammingLanguage="C++", targetProgrammingLanguage="Java")

Code translation from C++ to Java

response = textgen_llm(prompt)

target_code = response[response.index('\n')+1:]

print_ww(target_code)

import java.util.ArrayList;

class Vehicle {
  protected String registrationNumber;
  protected int milesTraveled;
  protected int lastMaintenanceMile;

  public Vehicle(String regNum) {
    registrationNumber = regNum;
    milesTraveled = 0;
    lastMaintenanceMile = 0;
  }

  public void addMiles(int miles) {
    milesTraveled += miles;
  }

  public void performMaintenance() {
    lastMaintenanceMile = milesTraveled;
    System.out.println("Maintenance performed for vehicle: " + registrationNumber);
  }

  public void checkMaintenanceDue() {
    if ((milesTraveled - lastMaintenanceMile) > 10000) {
      System.out.println("Vehicle: " + registrationNumber + " needs maintenance!");
    } else {
      System.out.println("No maintenance required for vehicle: " + registrationNumber);
    }
  }

  public void displayDetails() {
  }
}

class Truck extends Vehicle {
  private int capacityInTons;

  public Truck(String regNum, int capacity) {
    super(regNum);
    capacityInTons = capacity;
  }

  public void displayDetails() {
    System.out.println("Truck with Registration Number: " + registrationNumber + ", Capacity: " +
capacityInTons + " tons.");
  }
}

class Car extends Vehicle {
  private String model;

  public Car(String regNum, String carModel) {
    super(regNum);
    model = carModel;
  }

  public void displayDetails() {
    System.out.println("Car with Registration Number: " + registrationNumber + ", Model: " + model +
".");
  }
}

public class Main {

  public static void main(String[] args) {
    ArrayList<Vehicle> fleet = new ArrayList<Vehicle>();

    fleet.add(new Truck("XYZ1234", 20));
    fleet.add(new Car("ABC9876", "Sedan"));

    for(Vehicle vehicle : fleet) {
      vehicle.displayDetails();
      vehicle.addMiles(10500);
      vehicle.checkMaintenanceDue();
      vehicle.performMaintenance();
      vehicle.checkMaintenanceDue();
    }
  }
}
The key differences from C++ to Java:

- Includes changed to imports
- std::string to String
- std::vector to ArrayList
- Pointers and new to object creation
- Override keyword for overridden methods
- Virtual methods changed to public
- Destructors not needed in Java

I followed Java naming conventions and coding standards like private variables, camelCase, and
object oriented design. Let me know if you have any other questions!