DEV Community: Victor Isaac Oshimua

Understanding Why Large Language Models Hallucinate

Victor Isaac Oshimua — Tue, 28 Jan 2025 14:02:59 +0000

Around 10 to 30% of the time, large language models (LLMs) like GPT-4 and Gemini tend to produce factually incorrect responses.

As a user of these LLMs, you might have come across scenarios like this:

You ask the LLM a question such as:
"Who was the first person to walk on Mars?"

And the LLM responds:
"The first person to walk on Mars was Alexei Ivanov in 2024 as part of the Mars One mission."

In reality, no human has ever walked on Mars as of 2025. The LLM has fabricated a name, a date, and even a mission (Mars One) that do not correspond to actual events.

This phenomenon is called hallucination.

Although I personally find the metaphor "hallucination" somewhat misleading—since hallucination refers to sensory perceptions without real sensory input and LLMs lack senses altogether—I believe "incorrect output" would be a more fitting term. Nevertheless, we will stick with the standard terminology used by the AI community.

In this article, you will learn what LLM hallucination is, why it happens, and how to reduce it.

What Does Hallucination Mean in LLMs?

Hallucinations in LLMs occur when their generated outputs deviate from facts or contextual logic. For instance, as shown in the example above, an LLM may generate outputs that appear correct but, in reality, are not grounded in factual information.
This can result in misinformation, particularly in critical industries like education or law, where accuracy in generated outputs is essential.

Categories of LLM halucination

Let’s break down LLM hallucinations into different levels to better understand how they occur.

Output Contradiction: This occurs when an LLM generates a sentence that contradicts a previously generated output. For instance, an LLM might generate a sentence stating that Argentina won the World Cup, and in the next sentence, it could claim that America won the World Cup.

Input Contradiction: This occurs when a LLM generates responses that contradict the input prompt or instruction.
An example of this would be when an LLM is asked to generate a story about a person who loves dogs, but it creates a narrative where the character is described as being afraid of dogs.
For instance, if an LLM is prompted with:
"Write a story about a person who enjoys going on long walks with their dog."

But it generates:
"The character dislikes walking and avoids their dog at all costs."

This is an example of input contradiction, as the response does not align with the original prompt.

Factual Contradictions: As the term suggests, these are outputs generated by LLMs that are inaccurate or false. For example, an LLM stating that Michael Jordan is a boxer.

Why Do LLMs Hallucinate?

The reasons behind why LLMs hallucinate may not have a straightforward answer. Even the engineers who develop these models often struggle to understand the "black box" nature of how LLMs generate their outputs. However, there are a few key reasons we can point out:

Input Context: The prompts provided to LLMs play a crucial role in guiding the model to generate relevant outputs. However, if the prompt is vague or ambiguous, it can confuse the model and lead to less accurate or irrelevant responses.
For example, if a user asks, "Tell me about the history of space exploration," the LLM can provide a detailed and accurate response.
But if the prompt is vague, such as "Tell me about space," the model might struggle to determine whether the user is asking about astronomy, space travel, science fiction, or something entirely.
This ambiguity can result in a response that is either too broad, off-topic, or even factually incorrect

Data Quality: The data used to train LLMs is often filled with noise, errors, and biases. For instance, if an LLM is trained on data scraped from platforms like Reddit, there is a high likelihood of inaccuracies.
For example, a Reddit user might claim that aliens are currently living among us on Earth. Since the LLM cannot verify the accuracy of such claims, it may inadvertently learn and reproduce these inaccuracies in its outputs.

How To Reduce Hallucination

What can you do to reduce hallucinations? Do you just keep prompting different LLMs and hope one avoids hallucinations better than the others? While trial-and-error experimentation can teach us about LLMs' strengths and performance, there are far more reliable ways to tackle inaccuracies.
Here are some strategies to reduce hallucinations and improve the quality of your interactions with LLMs:

1. Use Clear and Specific Prompts
Vague prompts can confuse the model, leading to less accurate responses. Detailed prompts help the model understand exactly what information you’re seeking.

Example: Instead of asking, "What happens on December 25th?" Try, "Can you explain the major holiday celebration that happens every year in December 25th?"

2. Adjust Model Parameters
Many LLMs allow you to control parameters like temperature when prompting; this influences the randomness of outputs.
For instance, a lower temperature produces more conservative and focused responses, reducing the likelihood of hallucinations.
A higher temperature increases creativity but also raises the risk of inaccuracies.

3. Employ Multi-Shot Prompting
Instead of single-shot prompting (one input), provide the model with multiple examples of the desired output format or context. This helps the model recognise patterns and expectations more effectively.

4. Understand the Causes of Hallucination
Hallucinations often come from ambiguous inputs or insufficient context. By identifying these factors, you can trace the reasons for halucination and improve the reliability of the model’s outputs.

Final Thoughts

LLM hallucination remains a significant challenge in generating accurate responses from LLLMs. However, this issue is not unaviodable. By adopting strategies like clear prompting, parameter tuning, and multi-shot examples, we can mitigate hallucinations and enhance output reliability.

Furthermore, advancements in reasoning model architectures and training methodologies, such as improved fact-checking mechanisms, are reducing the rate of LLM hallucinations.

Learning how to build AI agents in 2025

Victor Isaac Oshimua — Fri, 03 Jan 2025 23:05:04 +0000

In the past few months, we've experienced significant advancements in AI technologies. The rate at which new updates to AI technologies occur can make it easy to feel overwhelmed by everything.

Currently, one of the hottest topics in AI is AI agents, with popular names like Baby AGI, GPT-4, Agent GPT, and more.

If you're curious about this concept and interested in resources to help you learn how to create your own AI agents, this blog is for you.

Prerequsite

This blog will guide you from beginner to intermediate resources on building AI agents. The only prerequisite is a basic understanding of programming and a keen interest in AI development.

A primer on AI agents

Before we dive into learning how to build AI agents, let’s take a moment to understand the concept of AI agents.

In engineering, an agent is something that can understand its environment and take actions within it. The environment is defined by the agent's use case.

For instance, if an agent is developed to perform natural language text-to-SQL queries, a database could be its environment. Alternatively, if an agent is designed for autonomous driving (like those used in self-driving cars), the real-world driving conditions would serve as its environment.

An AI agent performs tasks based on the input provided by a user. It uses a foundational model as the "brain" behind it, which processes the user’s input, plans actions to complete the task, and determines whether the task is completed successfully.

Let's go further and illustrate how an AI agent works.

Imagine you're building a personalised movie recommendation agent for a streaming service. The agent’s task is to recommend movies to users based on their past viewing behaviour and preferences.
The agent follows these steps:

Understand the Task:
- The agent first determines that to recommend personalised movies, it needs to analyse the user’s past viewing history, including genres, ratings, and favourite actors.
Retrieve User History:
- It generates a query to gather data from the user’s watch history: movies they’ve watched, genres they prefer, and their ratings.
Execute Data Query:
- The agent retrieves the user’s historical viewing data from the database.
Analyse the Data:
- It analyses the user’s preferences, such as frequent genres (e.g., action or drama) or actors, to understand what they like.
Generate Movie Recommendations:
- Based on the analysis, the agent generates a new query to find movies that match the user’s preferences (e.g., action movies or those with a favourite actor).
Execute Movie Query:
- The agent retrieves a list of available movies that meet the user’s criteria.
Evaluate and Rank Movies:
- It ranks the recommended movies based on relevance, ratings, and user preferences (e.g., prioritising movies with high ratings).
Present Recommendations:
- The agent presents the top recommendations to the user, ensuring they align with the user’s tastes.

Recommended Resources to Get Started

At the beginning of this blog, we introduced the concept of AI agents. Now, let’s dive deeper and explore valuable resources that will help you get started on your journey to building AI agents.

Getting Started with Programming for AI

Before getting into the core concepts of Generative AI, it's essential to establish a solid foundation in programming. This course is perfect for beginners who want to learn Python, the primary language used in AI development.

1. AI & Python for Beginners

This course introduces the basics of programming with Python, focusing on how to apply it in AI contexts. It's an ideal starting point for those new to programming and AI.

What You’ll Learn:
- Basics of Python programming and its application in AI.
- An introduction to machine learning concepts using Python.
Resource: AI & Python for Beginners by DeepLearning.AI

Generative AI: The Core Foundations

1. Introduction to Generative AI

Generative AI is a breakthrough in AI. This area has expanded rapidly and is now used in fields like text, image, and video generation. Understanding generative AI will be your first step toward mastering AI agents.

What You’ll Learn:
- The fundamentals of generative AI, its principles, and applications.
- Key models like GPT and GANs that power generative AI systems.
Resource: Generative AI for Everyone by Andrew Ng (DeepLearning.AI)

2. Basics of Large Language Models (LLMs)

LLMs like GPT form the backbone of many generative AI applications. Learning how they work will help you understand how AI agents can perform complex tasks like text generation, translation, and summarization.

What You’ll Learn:
- The architecture behind LLMs and their training process.
- Real-world use cases where LLMs excel.
Resource: H2O.ai Large Language Models (LLMs) - Level 1

3. Fundamentals of Prompt Engineering

Prompt engineering is a way of telling LLMs to work for you. The quality of the prompts you craft determines how well the model can generate the desired output.

What You’ll Learn:
- How to craft effective prompts to guide model behavior.
- Different types of prompts: zero-shot, few-shot, and more.
Resource: ChatGPT Prompt Engineering for Developers by DeepLearning.AI & OpenAI

4. Data Handling and Processing

Data handling and preprocessing are fundamental skills for working with LLMs and AI agents. You'll need to prepare data effectively to train your models and improve their performance.

What You’ll Learn:
- Techniques for processing unstructured data.
- How to clean and prepare text data for AI models.
Resource: Preprocessing Unstructured Data for LLM Apps by DeepLearning.AI & Unstructured.io

5. Introduction to API Wrappers

API wrappers simplify interactions with complex APIs, making it easier for you to integrate powerful AI models into your applications.

What You’ll Learn:
- The concept of API wrappers and their benefits.
- How to create and use API wrappers for generative AI models.
Resource: Getting Started with Generative AI API Specialization by Coursera & Codio

6. Essentials of RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is an important technique that enhances the capability of generative models by incorporating external data. It’s a key aspect of building AI agents that provide more contextually relevant outputs.

What You’ll Learn:
- How RAG works and why it’s so powerful.
- Practical applications of RAG in AI-driven systems.
Resource: Introduction to Retrieval-Augmented Generation (RAG) by Coursera

AI Agents: Building and Expanding Your Knowledge

1. Introduction to AI Agents

AI agents can take autonomous actions based on their environment, and they are often built using generative models like LLMs. They can assist in everything from virtual assistants to complex decision-making systems.

What You’ll Learn:
- The key components of AI agents and how they function.
- How to combine RAG and LLMs to build more intelligent agents.
Resource: Fundamentals of AI Agents Using RAG and LangChain by Coursera & IBM

2. Exploring Agent Frameworks

Building a robust AI agent requires using frameworks that streamline the process. LangChain is one such powerful tool that integrates with LLMs to make agent development simpler and more scalable.

What You’ll Learn:
- How LangChain and other frameworks can help you build intelligent agents.
- Features that make these frameworks suitable for AI applications.
Resource: LangChain for LLM Application Development by DeepLearning.AI & LangChain

3. Building a Simple AI Agent

Creating your first AI agent can be a great way to apply what you've learned. This process involves designing a system that can make decisions autonomously based on pre-defined criteria.

What You’ll Learn:
- Step-by-step guide to building a simple AI agent from scratch.
- Tools and frameworks you can use to build your agent.
Resource: Build Autonomous AI Agents From Scratch With Python by Udemy

4. Understanding Agent Workflows

To create efficient AI agents, you need to design workflows that outline how agents will process information, make decisions, and take actions. This is where AI agent design patterns come into play.

What You’ll Learn:
- How to design agent workflows for different use cases.
- Best practices for structuring agent workflows for scalability.
Resource: AI Agentic Design Patterns with AutoGen by Coursera & Microsoft

5. Learning About Agent Memory

Memory enables AI agents to recall past interactions and make more informed decisions. Implementing memory can significantly enhance the performance and adaptability of your AI agents.

What You’ll Learn:
- How agent memory works and why it’s essential.
- Techniques for implementing memory in AI agents.
Resource: LLMs as Operating Systems: Agent Memory by DeepLearning.AI & Letta

6. Evaluating Agent Performance

Once your agent is built, it’s crucial to evaluate its performance to ensure it meets your objectives. This includes testing its ability to solve problems effectively and make autonomous decisions.

What You’ll Learn:
- Metrics for evaluating agent performance: efficiency, accuracy, and user satisfaction.
- Continuous improvement techniques for AI agents.
Resource: Building Intelligent Troubleshooting Agents by Coursera & Microsoft

7. Collaborating with Multiple Agents

In some scenarios, multiple agents need to collaborate to achieve complex goals. Learn how to manage communication between agents and design systems for multi-agent collaboration.

What You’ll Learn:
- How to build systems where multiple AI agents collaborate.
- Use cases of multi-agent systems in real-world applications.
Resource: Multi AI Agent Systems with crewAI by DeepLearning.AI & CrewAI

8. Implementing RAG in AI Agents

Incorporating RAG into your AI agent’s workflow can improve its ability to access and integrate external knowledge, making it more powerful and contextually aware.

What You’ll Learn:
- How to integrate RAG into AI agents for enhanced decision-making.
- Advanced use cases of RAG-enabled AI agents in industries like customer service.
Resource: Building & Evaluating Advanced RAG Apps by DeepLearning.AI, LlamaIndex & TruEra

Final thoughts

Building AI agents is an exciting and growing field, with numerous opportunities to innovate and solve real-world problems. By following this roadmap and resources, you'll be well-equipped to start developing your own AI-driven systems, automate tasks, and create intelligent agents that can think and act autonomously.

Special thanks to Omn for sharing the roadmap that inspired the content of this blog.

If you have any additional resources that others might benefit from, feel free to share them in the comments!

Lists of open-source frameworks for building RAG applications

Victor Isaac Oshimua — Thu, 02 Jan 2025 10:21:57 +0000

Introduction

Large Language Models (LLMs), as we know them, can sometimes produce inaccurate and unreliable responses. To address these challenges and reduce factual errors, Retrieval-Augmented Generation (RAG) techniques are used in building AI applications.

RAG combines the generative power of LLMs with the ability to retrieve specific, up-to-date information from external sources. By doing so, it produces contextually relevant and accurate responses tailored to user needs.

For instance, imagine you want to build a travel assistant chatbot. Instead of relying on just an LLM (which may not include the latest flight schedules or travel restrictions), a RAG-based assistant can query live travel databases to retrieve current information about flight options, hotel availability, or visa requirements. It can then use that data to generate personalised responses, such as:

"Based on your location, there’s a direct flight to New York on January 10th at 8:00 AM. Shall I help you book it?"

Another relatable example is a health advice assistant. While the LLM might have general knowledge about medical conditions, it could retrieve specific information from trusted medical sources, such as the Mayo Clinic or CDC, to give accurate, up-to-date responses. For example:

"According to the CDC, the flu vaccine is recommended for everyone above the age of 6 months, especially during flu season. Would you like tips on finding a nearby clinic?"

These example use cases demonstrate how RAG ensures responses are accurate, relevant, and grounded in domain-specific knowledge, making AI applications far more reliable.

In this blog, you will learn about some top open-source tools to help you build and enjoy RAG applications.

Open-source RAG tools

1. R2R

R2R (Retrieve-to-Respond) is an advanced framework for developing local RAG apps.

Key Features:
- Multimodal Ingestion: Seamlessly processes various data formats, including .txt, .pdf, .json, .png, and others.
- Hybrid Search: Combines semantic and keyword-based searches for precise information retrieval.
- Knowledge Graph Generation: Automatically extracts entities and their relationships, organizing information into structured knowledge graphs.
Ideal For: Applications requiring dynamic data handling and complex relationships between entities. GitHub Repository

2. Cognita

Cognita is a modular framework designed to build scalable, production-ready RAG applications.

Key Features:
- Customizable Pipelines: Tailored components for data ingestion, processing, and retrieval.
- Production Focus: Designed for real-world, enterprise-grade applications with extensive configuration options.
- Efficient Scaling: Built to adapt to growing data and retrieval demands.
Ideal For: Enterprises seeking a robust framework for large-scale AI applications. GitHub Repository

3. LLMWare

LLMWare simplifies the creation of RAG workflows and AI agents by leveraging specialized, fine-tuned models.

Key Features:
- Model Library: Includes over 50 small, fine-tuned models optimized for specific enterprise tasks such as document retrieval, summarization, and sentiment analysis.
- Comprehensive RAG Lifecycle Support: Covers everything from data ingestion to model integration and deployment.
- Enterprise-Friendly: Built with corporate use cases in mind, such as customer support automation and knowledge management.
Ideal For: Organizations needing tailored, domain-specific solutions. GitHub Repository

4. LangChain

LangChain provides tools to build RAG applications by combining language models with custom logic and data sources.

Key Features:
- Chain Creation: Lets developers create modular pipelines to link retrieval and generation tasks seamlessly.
- Wide Integration: Works with various vector databases, APIs, and embedding models.
- Customizable Components: Offers tools for pre- and post-processing, ensuring flexibility.
Ideal For: Developers looking for an all-in-one framework to link multiple data sources with LLMs. GitHub Repository

5. Haystack by deepset

Haystack is one of the most widely used frameworks for RAG applications, providing extensive flexibility and scalability.

Key Features:
- Custom Pipelines: Supports advanced workflows for QA, summarization, and document retrieval.
- Backend Agnostic: Compatible with various vector search engines like Elasticsearch, OpenSearch, and FAISS.
- Pre-built Connectors: Simplifies integration with third-party data sources and retrievers.
Ideal For: Building question-answering systems and document-heavy retrieval applications. GitHub Repository

6. GPT Index (LlamaIndex)

LlamaIndex streamlines the integration of external knowledge bases with LLMs.

Key Features:
- Dynamic Data Ingestion: Converts unstructured data into structured indices for efficient retrieval.
- Custom Indexing: Supports creating indices tailored to specific datasets, improving retrieval accuracy.
- Model Agnostic: Works with various pre-trained language models for flexible application.
Ideal For: Developers who need seamless integration between proprietary data and LLMs. GitHub Repository

7. txtai

txtai is a scalable AI-powered search engine built for RAG workflows.

Key Features:
- Natural Language Search: Enables intuitive text-based queries for information retrieval.
- Customizable Pipelines: Supports embedding generation and advanced query processing.
- Lightweight and Efficient: Can be deployed locally or on cloud infrastructure with minimal overhead.
Ideal For: Projects requiring quick setup and robust search capabilities. GitHub Repository

Final thoughts

RAG is to AI development what a well-stocked library is to a researcher.It is important for providing accurate and context-aware information when it’s needed most.
These open-source frameworks are transforming how RAG powers real-world applications, making it easier than ever to bring context-rich AI solutions into everyday life.

Is this list missing your favourite framework? Are you building with a tool that isn’t featured here? Let us know—we’d love to hear about it! In the meantime, explore the frameworks listed above and start turning your RAG ideas into reality.

Happy coding! 🚀

LLM APIs vs. Self-Hosted Models: Finding the Best Fit for Your Business Needs

Victor Isaac Oshimua — Fri, 06 Dec 2024 16:12:55 +0000

TL;DR

LLM APIs are ideal for:

Quick Deployment: Great for businesses needing rapid integration of AI features.
Non-Sensitive Data Applications: Perfect for scenarios where data privacy isn't a primary concern.
Prototyping and Short-Term Projects: Allows fast experimentation with minimal setup.
Limited In-House Expertise: A solution for teams without ML expertise.

Self-hosting is ideal for:

Custom AI Needs: Enables fine-tuning and adaptation for specialised business use cases.
In-House Resources: Suitable for organizations with the technical expertise and infrastructure to manage models.
High Privacy and Compliance Needs: Ensures data security and adherence to regulatory requirements.

In summary, choose LLM APIs for ease, speed, and cost effective integration. Opt for self-hosting if your business demands full control, security, and customization.

Introduction

Recent advancements in large language models (LLMs) have given AI applications abilities beyond simple natural language processing tasks. LLMs can process and output information like humans.

These abilities have been particularly helpful to businesses in areas such as customer support, analysing unstructured business data, content creation, solving repetitive and tedious tasks, and even human resources management.

While LLMs can be impactful in business, there are still unclear solutions on how businesses should harness these models. Should they simply rely on third-party API calls, commonly known as GPT wrappers, or should they host their LLMs?

In this article, we will answer these questions. By the end of this read, you’ll know which setup is best for your business and when to use it. So, sit back, relax, grab a coffee, and let’s dive in!

Understanding LLM APIs

LLM APIs are interfaces that allow developers to integrate large language models into their applications without needing to build the model themselves. Think of it as ordering food from a restaurant instead of cooking at home. You get access to a ready-made meal (LLM) without having to gather ingredients, follow recipes, and cook.

Similarly, LLM APIs let you use advanced AI features without building and managing the model. These APIs provide a simplified way for businesses and developers to use AI functionalities like text generation, summarisation, sentiment analysis, and more, often through simple API calls.

What are the features of LLM APIs?

Let's look at some features of LLM APIs and how they can align with your use case.

1. Ease of Access: LLM APIs are designed for simplicity. Developers can integrate them with minimal setup with just an API key and a basic understanding of how to make RESTful calls.

2. Scalability: APIs provided by major vendors are built on powerful cloud infrastructures to ensure availability as demand increases.

3. Cost: Using an API eliminates the need for expensive hardware required to train or host large language models.

4. Updated Models: LLM providers often improve their models over time, and APIs ensure you get the latest advancements in AI technologies.

Popular LLM APIs Providers

Now that you understand what LLM APIs are, let’s look at some popular examples of API providers/vendors offering these APIs:

OpenAI GPT

OpenAI offers APIs for their GPT models, which are known for their performance in tasks like chatbots, content creation, and code generation. OpenAI also provides fine-tuning, which allows businesses to train the models to their specific needs, and embeddings for semantic search.
Google Gemini (formerly Bard)

Gemini provides an API to build with Google’s ecosystem and benefit from Google's rich AI/ML infrastructure.
Anthropic Claude

Anthropic’s Claude is designed with an emphasis on safety and reliability. It offers NLP capabilities for tasks like summarisation, content creation, coding co-pilot, and more.
Microsoft Azure OpenAI Service

Microsoft Azure OpenAI Service integrates OpenAI’s models with Azure’s enterprise-grade scalability. This service is ideal for businesses needing strong solutions for large-scale applications.

When to Choose an LLM API

If you're wondering how to integrate AI into your application or business and unsure if using an API is the right choice, here’s when opting for an LLM API makes the most sense:

Limited talents: If your team lacks expertise in deploying and maintaining AI models, LLM APIs provide an accessible way to leverage advanced AI without the steep learning curve.

Moderate Usage Needs: When your application doesn’t require heavy, continuous processing, APIs offer a cost-efficient pay-as-you-go model that aligns with your usage patterns.

Prototype Development: For businesses testing new AI-driven features or building prototypes, APIs enable quick experimentation without committing to long-term infrastructure.

Short-Term Projects: For projects with a limited timeline, LLM APIs are ideal as they allow you to implement AI features quickly without the overhead of self-hosting or fine-tuning.

Understanding Self-Hosted LLMs

When we say self-hosted LLM, what exactly does it mean? Does it mean training your LLM from scratch, or does it refer to running a pre-trained model on your infrastructure? Let’s find out.

Self-hosting an LLM is simply running a pre-trained LLM on your own infrastructure rather than relying on third-party API providers like OpenAI or Google. This means that the model is deployed and maintained on the company’s servers or cloud instances, giving full control over the model’s performance, usage, and data privacy.
You can access model cards and platforms to run them on sites like Hugging Face Models and Kaggle Models.

What are the requirements for self-hosting LLMs?

Compute: LLMs are resource-intensive and require high-performance hardware, particularly GPUs or TPUs. These resources are used to handle the large computations involved in training, fine-tuning, and inferencing models.

Engineering talent: Deploying and managing an LLM requires technical knowledge in machine learning and model optimisation.
Most importantly, you will need to optimise the deployed model for latency and high throughput. This involves applying ML engineering concepts such as quantising the model, using inference containers, sharding across GPUs, and much more.

Budget: Hosting an LLM can be expensive, and it's important to consider your budget and integration costs. For instance, if you want to host a 6-billion-parameter LLM like GPT-J on a cloud platform such as AWS, you’d likely choose a GPU instance, like the NVIDIA V100 GPU. These instances cost around $3.06 per hour. While this might seem affordable at first glance, it adds up to roughly $26,800 per year for a single instance. If you want to run the model across multiple regions for redundancy, the annual cost can quickly multiply.

When to Choose Self-Hosting

Self-hosting isn’t the right solution for every business looking to integrate AI, but it’s the best choice in the following scenarios:

Businesses with Long-Term Projects: If your organisation relies heavily on AI for core operations and plans to scale significantly, self-hosting offers more control, long-term cost efficiency, and the ability to tailor the model to your specific needs.

Privacy: For industries like healthcare, finance, or cybersecurity, where sensitive data is involved, self-hosting ensures you retain complete control over your data, reducing exposure to third-party risks and simplifying regulatory compliance.

Customisation: When LLM APIs don’t meet your business requirements, self-hosting allows you to fine-tune models with your proprietary data, create specialised features, and adapt the model to niche use cases.

Comparing LLM APIs and Self-Hosting

Let's take a side-by-side look at both setups and see where each one shines:

Feature	LLM APIs	Self-Hosting
Cost	Typically $0.01–$0.10 per token, depending on usage and provider. Ideal for small budgets or infrequent tasks.	Hardware costs typically range from $1–$5 per GPU per hour. Upfront hardware setup ($10k–$50k) for long-term use, plus ongoing maintenance.
Setup Time	Ready to use in minutes to a few hours.	May take weeks or months to set up infrastructure and deploy the model.
Technical Expertise	Low—suitable for teams without ML or infrastructure skills.	High—requires expertise in machine learning, DevOps, and hardware management.
Customization	Limited to pre-built functionality; fine-tuning may require separate APIs.	Fully customizable to meet specific business needs with proprietary data.
Data Privacy	Data is sent to third-party servers, raising potential privacy concerns.	Full control over data, meeting strict privacy or regulatory requirements.
Scalability	Easily scales up or down to meet demand. Providers manage infrastructure.	Requires investment in additional GPUs or servers for scaling ($1–$5 per GPU per hour).
Performance	Optimised for general use; potential latency depending on API provider.	Can be tailored for high performance and low latency, suitable for critical applications.
Use Cases	Best for prototyping, short-term projects, or non-sensitive tasks.	Ideal for long-term projects, industries with strict compliance needs, or specialised use cases.

Final Thoughts

So far, we've discussed key areas to consider for your AI integration setup. In addition to the areas highlighted, there are other factors to consider as well.

When planning to integrate AI into your business, keep in mind the scale you need. Do you want to lock yourself into proprietary models? Do you want to fine-tune your models? These questions will help guide your decision.

Check out this helpful guide to see how other companies are deploying LLMs: LLMOps Database. If you have any questions or suggestions, feel free to reach out.

[Boost]

Victor Isaac Oshimua — Wed, 27 Nov 2024 19:07:36 +0000

My Journey Through the 2024 Kaggle X Fellowship Programme

Victor Isaac Oshimua ・ Nov 16 '24

#ai #computerscience #showdev #machinelearning

My Journey Through the 2024 Kaggle X Fellowship Programme

Victor Isaac Oshimua — Sat, 16 Nov 2024 13:05:48 +0000

The most certain way to succeed is always to try just one more time.
—Thomas Edison

How it all started

Back in 2022, I was browsing through a tech community group on WhatsApp that I had just joined, when I saw someone drop a link to apply for the Kaggle Mentorship Programme for the 2022 cohort. At the time, I was just starting to learn about data science, but I thought, "This programme is organised by Kaggle—they surely have plenty of resources to mentor me into becoming a data scientist. So, yes, let me apply."

You can probably guess what happened. Well, I wasn’t selected.

Fast forward to 2023, I had started taking some data science courses and building projects. The opportunity to apply for the Kaggle mentorship came up again, and once more, I applied. But again, I wasn’t selected.

My thought at that point was, maybe I need to upskill more to have the right skill set to be mentored.

This was me encouraging a fellow candidate who wasn’t selected in 2023.

"You miss 100% of the shots you don’t take." This famous quote by Wayne Gretzky, a hockey player, is often a reminder for me to take action to achieve my goals.

The opportunity came up again this year, 2024. I gave it another shot, and I finally got accepted! It’s a win for me, and the first thing I did was announce my success on Twitter.

What is the Kaggle X fellowship programme all about?

At this point, you might be wondering what the program is all about and why I am so excited about being accepted into it and completing it.

The KaggleX Fellowship Program is designed to enhance representation and create career opportunities for Black, Indigenous, and People of Color (BIPOC) in the data science industry, the program pairs early to mid-career practitioners with experienced mentors.
Participants gain hands-on experience by working on impactful data science projects for their portfolios, while benefiting from a supportive, community-driven environment that encourages personal and professional growth.

So, you can see why I was persistent in getting into the program. I am aware of the lack of opportunities available to me, so I wanted to participate and learn from mentors and industry practitioners.

Highlights of My Journey in the Program

This year's cohort focused on working with LLMs. Specifically, every participant had to fine-tune the Google Gemma model for a question-and-answer task in different domain use cases.

Therefore, a participant interested in healthcare, for instance, can fine-tune the Google Gemma model to answer questions related to cancer.

My Project

I am interested in the application of AI/ML to solve problems in cybersecurity, and I thought of a good use case that would be applicable. I fine-tuned Gemma to act as a cybersecurity help desk chatbot.

This AI-driven help desk is designed to answer cybersecurity-related questions for employees, ensuring efficient support and enhancing organizational security protocols. It provides employees with 24/7 access to cybersecurity guidance.

The chatbot could answer questions such as:

What should I do if I suspect a phishing attempt?
How do I know if my computer is infected with malware?
What should I do if I think my computer has been hacked?
How can I securely back up my data?
How do I know if an email attachment is safe to open?

Final Outcome

The program lasted for 15 weeks. During that time, in addition to building a chatbot, I achieved a lot that would ordinarily have taken me much longer. Below are the accomplishments I achieved throughout the program.

Built a Chatbot

The main focus of the program was to build a portfolio, and I participated in and completed the development of a cybersecurity help desk chatbot. Through building the chatbot, I learned about LoRA fine-tuning of LLMs, and how to generate synthetic data to fine-tune LLMs, and I also wrote a blog about it.

One-on-One Mentoring

I was assigned a one-on-one advisor Samuel Waweru, and I gained a lot from his guidance. Through our weekly one-on-one calls, I was able to build and complete the chatbot. He often shared opportunities for me to apply for internships.

Networked with Industry Experts

Apart from mentoring, I also had the opportunity to meet with other advisors from various industries, many of whom are experts in AI/ML from top organizations.

Published the First Variant of the Cybersecurity Chatbot Gemma Model on Kaggle

Kaggle allows users to publish pre-trained models to the Kaggle Model Hub. Previously, there was no variant of the Gemma model fine-tuned for cybersecurity use cases. I was able to publish the first one.

Conclusion

Rejections are not failures; they are opportunities to refine your approach and grow. By embracing persistence and following your curiosity, you open doors to possibilities that were once out of reach.

My journey into the KaggleX Fellowship Program is a testament to the power of staying committed to your goals and continuously improving yourself, even in the face of rejection. Each attempt, whether successful or not, was a stepping stone towards gaining valuable skills.

Following my curiosity led me to explore the intersection of AI and cybersecurity, an area I am deeply passionate about. This led me to create something impactful—a fine-tuned chatbot that addresses real-world challenges.

Project outcome

Here is how the Gemma model responds to cybersecurity queries before fine-tuning. Initially, it had limited knowledge in the security domain.

After fine-tuning, however, the Gemma model gained expertise in this domain and now functions effectively as a cybersecurity help desk.

Additional Resources

How to Generate High-Quality Synthetic Data for Fine-Tuning Large Language Models (LLMs)

Victor Isaac Oshimua — Thu, 12 Sep 2024 15:58:28 +0000

Introduction

Large Language Models (LLMs) are indeed powerful tools for understanding and generating human-readable languages. However, there are specific use cases where LLMs may fall short. For instance, if you want an LLM to have a deeper understanding of neurology, fashion, sports, or security, how can you achieve that? The answer lies in fine-tuning the LLM with a dataset tailored to the specific use case.

But how do you fine-tune an LLM when most publicly available datasets have already been used to train it? This is a common challenge when trying to improve an LLM by either adding to its training data or fine-tuning it for better performance in various domains.

The solution lies in generating high-quality synthetic data.

In this blog post, you will learn how to generate synthetic data for your specific use case in just a few minutes. So, sit back and relax for an informative read!

The Problem

While working on a personal project, I aimed to fine-tune a Large Language Model (LLM) for a question-and-answer task focused on serving as a cybersecurity help desk. However, I encountered a significant challenge: obtaining a high-quality cybersecurity domain-specific dataset to fine-tune the LLM.

After researching, I discovered an effective solution—using generative AI to create synthetic datasets tailored to specific needs. This led me to a platform called Gratel, which simplifies the process of generating synthetic data, making fine-tuning LLMs for niche applications much more accessible.

What Exactly is Synthetic Data?

Synthetic data refers to data that is artificially created to resemble real-world data in terms of its structure, characteristics, and patterns.

Synthetic data can be generated using a variety of techniques, such as generative models like GANs (Generative Adversarial Networks), or simulations.

These methods offer a flexible, and cost-effective alternative to collecting real-world data, which can be time-consuming, or limited in availability.
Additionally, synthetic data can be used to address privacy concerns, as it doesn’t rely on sensitive personal information. This makes it a valuable resource for industries like healthcare, finance, and autonomous systems.

Benefits of Synthetic Data

Here are five benefits of synthetic data:

Cost-effective: Collecting real-world data is expensive and time-consuming. Synthetic data is a cheaper and faster alternative to generating data on demand.
Scalable: You can create as much synthetic data as needed This allows you to scale up datasets easily, which is especially helpful for training large AI models.
Privacy-friendly: Since synthetic data doesn’t involve real personal information, it reduces privacy risks and helps comply with data protection regulations.
Diverse and balanced: Synthetic data can be generated to include underrepresented or rare scenarios, helping improve the fairness and accuracy of AI models.
Accessible for testing: It allows developers to test models in different conditions or scenarios without waiting for rare events to happen in real-world data, making the development process more efficient.

Generating Synthetic Data

Now that you have a basic understanding of synthetic data and its benefits, let’s move on to the main topic of this blog post: how to generate it.
This should be a straightforward process, similar to prompting your favourite AI language model to perform tasks. Just follow these steps.

Step 1: Create an account on gratel

Head on to gratel to create an account.

Step 2: Generate data from a prompt

Once you log in to Gratel, you'll have access to the Gratel dashboard. On the dashboard, you'll find the "Prompt to Data" feature. With this, you can easily input your prompt to generate data. Remember, the better your prompt, the better the quality of your data.

For this tutorial and the project I was originally developing before encountering the challenge of generating data, I will be creating cybersecurity-related data. This data will help me fine-tune a language model to answer questions just like a cybersecurity help desk would.

Step 3: Enter prompt to create data

Selecting Gratel's "Prompt to Data" feature will take you to the navigator. There, you can generate an API and get an API key with a single click, which will allow you to start creating datasets.

In the navigator, enter a prompt to instruct Gratel’s language model to generate a dataset.

Here is the prompt I used.

Generate a dataset of cybersecurity-related questions and their corresponding answers, organized by category and difficulty level. For each question: Provide a realistic, common cybersecurity question employees or IT staff might ask. Give a detailed, accurate answer that reflects best cybersecurity practices. Assign a category from the list of topics provided. Indicate the appropriate difficulty level based on how technical the question is.

Here is the result:

By default, this will generate a dataset with 50 rows for you to review. To create a larger dataset, such as 1,000 rows, click on the "Batch Data" button.

And that's it! In just a few minutes, you've created a high-quality dataset tailored to your specific use case.

Final thoughts

Creating powerful generative AI applications requires high-quality training data, which can be challenging to obtain. In this article, we've explored how synthetic data can address this issue and how Gratel can help generate high-quality synthetic data efficiently.

Gratel simplifies the process, making it quick and easy to produce quality data. I highly recommend trying it for your next ML or LLM project.

If you have any questions or suggestions, feel free to reach out to me on LinkedIn or Twitter. Happy developing!

Aditional resources:

Link to the generated data: https://www.kaggle.com/datasets/victorkingoshimua/cybersecurity-help-desk

How to Deploy Segment Anything Model 2 (SAM 2) With Modelbit

Victor Isaac Oshimua — Thu, 29 Aug 2024 22:54:32 +0000

Building on the success of the Segment Anything Model (SAM), Meta has released an upgraded version called the Segment Anything Model 2 (SAM 2).

SAM 2 is a computer vision model designed to identify and separate objects in images or videos quickly. It operates in real-time and can be "prompted" to focus on specific objects, making it highly effective and advanced at recognizing and isolating objects from their backgrounds.

In this article, we’ll explore how to deploy the SAM 2 model to a REST API using Modelbit.

Prerequisites

To get the most out of this article, follow these steps:

1. Access the SAM 2 Model

Start by downloading the SAM 2 model from the official Meta AI repository. Open your command line interface and run the following commands:

git clone https://github.com/facebookresearch/segment-anything-2.git
cd segment-anything-2

pip install -e .

Next, download the model checkpoints by navigating to the checkpoints directory and running the script:

cd checkpoints
./download_ckpts.sh

For more detailed installation instructions, refer to the SAM 2 GitHub repository.

2. Set Up Modelbit

To deploy the SAM 2 model, you'll need a Modelbit account. Head over to the Modelbit website and sign up.
Once registered, install the Modelbit Python library by running:

pip install --upgrade modelbit

This will allow you to interact with Modelbit and deploy your SAM 2 model as a REST API endpoint.

Overview of SAM 2

The SAM 2 model, an advanced iteration of Meta AI's Segment Anything Model, significantly enhances image and video segmentation. It is engineered to deliver rapid and precise segmentation, making it six times faster than its predecessor.

SAM 2's core features include its ability to handle real-time video segmentation and its superior accuracy across complex and diverse scenarios.

Built on an extensive dataset of over 50,000 videos and millions of segmentation masks, SAM 2 can segment objects in both images and videos with exceptional detail. These capabilities make it ideal for applications in augmented reality, autonomous driving, environmental monitoring, and more.

Key Features and Enhancements of SAM 2

Memory Mechanism: Incorporates a memory encoder, memory bank, and memory attention module to store and use object information, enhancing user interaction throughout the video.

Streaming Architecture: Processes video frames sequentially, enabling real-time segmentation of long videos.

Enhanced Image Segmentation: Offers superior performance in image segmentation compared to the original SAM, with exceptional capabilities in video tasks.

Multiple Mask Prediction: Provides several potential segmentation masks when faced with uncertain image or video data.
Occlusion Prediction: Enhances the model’s ability to handle objects that are temporarily obscured or leave the frame.

Video Segmentation: Tracks objects across all video frames, effectively managing occlusion.

SAM 2 in Action

You can easily test and use SAM 2 through the Web UI provided by Meta. To get started, visit SAM 2 Web UI.

Working With SAM 2

Now that you have an understanding of SAM 2's capabilities, it's time to put it into action programmatically. Getting started with SAM 2 is straightforward. In this tutorial, we'll use SAM 2 to generate segmentation masks for an image.

In the context of image segmentation, a mask is typically a binary or multi-class image that matches the size of the input image. Each pixel in the mask is labeled or assigned a value indicating whether it belongs to a specific object or region of interest.

When you feed an image into the SAM 2 model, the mask generator will output an image where different objects—such as cars, people, or animals—are highlighted with distinct colours or binary values.
This capability is important for various real-world applications, including:

Autonomous driving: Helping vehicles recognize and differentiate between roads, pedestrians, other vehicles, and more.

Medical imaging: Allowing for the segmentation of different tissues, organs, or abnormalities within an image.

Image editing: Facilitating the isolation of specific objects from their background for easier manipulation.

Let's dive into how to get SAM 2 up and running.

Make sure you’ve downloaded SAM 2 in your development environment. If not, refer back to the prerequisites.
Next, check if a CUDA-compatible GPU is available on the system and optimize the execution of the PyTorch model accordingly by running the following code:

import numpy as np
import torch
import matplotlib.pyplot as plt
from PIL import Image

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Use autocast only if CUDA is available
if torch.cuda.is_available():
    with torch.autocast(device_type="cuda", dtype=torch.bfloat16):
        # Your GPU-specific code here
        if torch.cuda.get_device_properties(0).major >= 8:
            torch.backends.cuda.matmul.allow_tf32 = True
            torch.backends.cudnn.allow_tf32 = True
else:
    # Your CPU-specific code here
    print("CUDA is not available. Running on CPU.")

Next, create a function called display_image_with_annotations to visually represent segmentation masks on an image. This function will overlay the masks with random colours and can optionally draw borders around the segmented regions, enhancing visibility and differentiation between various segments in the image. Below is the code:

def display_image_with_annotations(image, annotations, show_borders=True):
    """
    Display an image with annotations overlaid.

    Parameters:
    image (numpy array): The image to display.
    annotations : mask.
    show_borders (bool): If True, borders around each annotation will be drawn.

    Returns:
    None
    """

    def display_annotations(annotations, show_borders=True):
        """
        Helper function to display annotations on an image.

        Parameters:
        annotations: masks
        show_borders (bool): If True, borders around each annotation will be drawn.

        Returns:
        None
        """

        # Return immediately if there are no annotations to display
        if len(annotations) == 0:
            return

        # Sort annotations by area in descending order
        sorted_annotations = sorted(annotations, key=lambda x: x['area'], reverse=True)

        # Get the current axis for plotting
        axis = plt.gca()
        axis.set_autoscale_on(False)

        # Create an empty image with an alpha channel (RGBA) to hold the annotations
        overlay_img = np.ones((sorted_annotations[0]['segmentation'].shape[0],
                            sorted_annotations[0]['segmentation'].shape[1], 4))
        overlay_img[:,:,3] = 0  # Set alpha channel to 0 (transparent)

        # Iterate through each annotation and overlay it on the image
        for annotation in sorted_annotations:
            mask = annotation['segmentation']  # Get the segmentation mask
            # Generate a random color for the mask with 50% opacity
            mask_color = np.concatenate([np.random.random(3), [0.5]])  
            overlay_img[mask] = mask_color  # Apply the mask color to the overlay image

            # If borders are enabled, draw borders around each mask
            if show_borders:
                # Find contours of the mask
                contours, _ = cv2.findContours(mask.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
                # Smooth the contours slightly
                contours = [cv2.approxPolyDP(contour, epsilon=0.01, closed=True) for contour in contours]
                # Draw the contours with a specified color and thickness
                cv2.drawContours(overlay_img, contours, -1, (0, 0, 1, 0.4), thickness=1)

        # Display the annotated image
        axis.imshow(overlay_img)

    # Set up the plot with a large figure size to ensure detailed visualization.
    plt.figure(figsize=(20, 20))

    # Display the image that you want to annotate.
    plt.imshow(image)

    # Call the helper function to display annotations on the image.
    display_annotations(annotations, show_borders)

    # Remove the axis labels and ticks for a cleaner display.
    plt.axis('off')

    # Render and display the final image with the annotations.
    plt.show()

To test this model, we need an image. For this tutorial, let's use a free image from Unsplash. You can download the image using the following link: Download the image from Unsplash

Make sure to download the image to your local environment for the demonstration.

Load the image in your notebook:

image = Image.open(Image Path)
image = np.array(image.convert("RGB"))

Next, initialize the SAM 2 model for the image segmentation task by running this code:

from sam2.build_sam import build_sam2
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator

# Specify the path to the model checkpoint.
# This checkpoint contains the pre-trained weights for the SAM 2 model.
checkpoint_path = Path to model checkpoint

# Specify the configuration file for the model.
# This YAML file contains the architecture and hyperparameters used to define the SAM 2 model.
model_config = "sam2_hiera_b+.yaml"

# Build the SAM 2 model using the configuration file and checkpoint.
# The model is loaded onto the GPU (device='cuda') for faster processing.
# Post-processing is disabled (apply_postprocessing=False) to keep raw outputs.
sam2_model = build_sam2(model_config, checkpoint_path, device='cpu', apply_postprocessing=False)

# Initialize the automatic mask generator using the SAM 2 model.
# This will generate segmentation masks automatically based on the input data.
mask_generator = SAM2AutomaticMaskGenerator(sam2_model)
masks = mask_generator.generate(image)

Finally, call the display_image_with_annotations function to show the segmentation mask on the image.

display_image_with_annotations(image,masks)

Here is the result:

You can see that the model accurately segments each region of the image, highlighting different sections with precision. You can repeat this for different images and see how powerful SAM 2 is.

Deploying SAM 2 Model With Modelbit

The true value of an AI model is only realized when it is made available to end users, typically through deployment in a production environment. One effective method for achieving this is by deploying the model as a REST API. Modelbit offers a straightforward approach for rapidly deploying your AI models. You can learn more about this solution at Modelbit.

To begin deploying the SAM 2 model, import Modelbit and activate it with the following code:

import modelbit
mb = modelbit.login()

Let's deploy using Modelbit's Python method. Remember, we already have a function, display_image_with_annotations, to mask an image with SAM 2. Here’s how to do it:

Modelbit will manage all dependencies for you, including any other Python functions and variables that the function depends on.

mb.deploy(display_image_with_annotations)

Result:

Accessing the Model

The model has been successfully deployed as a REST API endpoint using Modelbit. You can access the model easily via various methods, including curl or Python. Once accessed, you can integrate the API endpoint into your applications to make inferences effortlessly.

Here is an example using Python:

modelbit.get_inference(

  workspace="victorkingoshimua",

  deployment="display_image_with_annotations",

  data=[image, masks]

)

Final Thoughts

Deploying a model as a REST API endpoint using Modelbit simplifies the process of integrating advanced functionality into your applications. With easy access through tools like curl or Python, you can incorporate the model into your workflows, enabling efficient and scalable inferences.

In this article, you’ve learned how to effortlessly deploy one of the latest and most advanced AI models as a REST API. Whether you're working with image recognition, natural language processing, or any other AI domain, the ease of integration provided by Modelbit can help you bring sophisticated AI features to your projects with minimal effort.

Exploring Get Pieces for Developers: A Personal Review

Victor Isaac Oshimua — Sat, 03 Aug 2024 18:15:03 +0000

Introduction

Since the emergence of generative AI, completing tasks has never been easier. Whether it's coding, writing, researching, or even studying, generative AI enables people to accomplish in minutes what used to take hours or even days.

I am a machine learning engineer and a technical writer. My day-to-day involves developing and documenting machine learning projects. I primarily write code in Python, often using Jupyter notebooks for experimentation and development. For documentation, I frequently use Google Colab.

My workflow has been consistent since early 2021, a time when tools like ChatGPT were still new and not widely adopted. Initially hesitant to try AI tools, I soon discovered how they could significantly streamline my tasks, from coding to writing.

Here’s a snapshot of my typical work environment:

Operating System: I develop on both Mac and Linux.
Tools: I regularly use Chrome, Visual Studio Code, Slack, and ChatGPT.
Projects: My projects often involve developing machine learning models, creating technical content, and contributing to open-source documentation.
Experience: I have been in the field for over three years and have been actively engaged in technical writing and project development throughout my academic and professional journey.

Integrating AI into my workflow has made my tasks more manageable and efficient, allowing me to focus on more complex and creative aspects of my work.

Previous Methods of Tracking Workstream Materials.

Before coming across Get Pieces, I used to keep track of small workstream materials like code snippets in a mix of different places. I often saved code snippets and error stacks in text files or sticky notes on my computer. For things like links and screenshots, I used browser bookmarks and a folder on my desktop.

This process was quite unorganised. It was hard to find specific pieces of information when I needed them, and I often spent a lot of time searching through different files and notes. There was no central place to keep everything together, which made it inefficient and time-consuming.

Why I Chose Pieces and Its Productivity Benefits.

What initially interested me about Pieces was the need for a centralised location to organise and manage the various types of information I work with. As a machine learning engineer and technical writer, I often multitask and handle a lot of information at once, such as code snippets, links, screenshots, and error stacks. Previously, I would save these in different places or bookmark links as I went along, which was disorganised and inefficient.

When I came across Pieces, I saw an opportunity to streamline my workflow by having a single tool for all my storage needs. With Pieces, I can keep everything—whether it’s code, screenshots, or links—in one central place. Additionally, the built-in chatbot feature allows me to quickly learn more about the information I've saved, making the process even more productive and seamless.

Why I Reviewed Pieces and How It Can Help Others.

I thought it would be useful to review Pieces because it solves common problems with organising information. Many people, including my peers, struggle with keeping track of code, links, and notes. By sharing my review, I hope to show how Pieces can make their work easier and more organised. It’s a tool that can help save time and reduce hassle, which could be really helpful for anyone managing a lot of different information.

In this blog, we'll explore the strengths, weaknesses, and challenges of Get Pieces, and examine how it affects machine learning engineers like myself and impacts our productivity and ongoing projects.

Challenges and Areas for Improvement

Just like any other development tool out there, Get Pieces has areas that need improvement and may not fit everyone's needs perfectly. As a user, I have experienced some of these issues firsthand and would like to highlight these areas.

1. Feature Overload: Get Pieces comes with a multitude of features that can be very helpful. However, having too many features can sometimes overwhelm users, making the tool less user-friendly. This is especially true for new users, who might need extra time to learn how to use all the features effectively.

2. Resource Intensive: Pieces OS enables the running of Large Language Models (LLMs) locally on your machine, ensuring that all operations, processing, and data handling are performed on your own device without relying on external servers. While this enhances privacy and security, it comes with a significant downside: resource intensity.

Running LLMs locally can be extremely demanding on your computer's resources, requiring substantial CPU, GPU, and memory capacity. This high demand can lead to system freezes, where your machine becomes unresponsive, and other tasks slow down considerably. This issue is especially pronounced on lower-end or older hardware, making it less feasible for those without high-performance systems.

3. Potential security and privacy concerns: The context-aware feature of Pieces provides substantial productivity enhancements, but it's important to be mindful of privacy and security concerns to safeguard sensitive information. Although Pieces is not trained on users' data, as a writer, I might have concerns when documenting sensitive APIs. Ensuring that this data is securely managed is essential to prevent unauthorised access and potential breaches*.*

Strengths and Highlights

After using Get Pieces for a considerable amount of time, I have experienced significant positive impacts on my workflow. Here are some key areas:

1. Live context feature: As someone who regularly engages in writing, including coding, documentation, and blogging, I often encounter errors. These errors, which are typically due to incorrect inputs, can be disruptive to my workflow. I have always wanted a tool that could help me identify and correct these mistakes in real-time. Get Pieces has proven invaluable in this regard.

One notable instance was while working on a blog using an online editor. The Get Pieces browser extension seamlessly integrated into my workflow, allowing me to spot and correct errors in my Markdown text before publishing. This feature significantly enhanced my productivity and the quality of my work.

2. Multi-Modal Capability: One of the standout features of Get Pieces is its multi-modal capability, which allows users to work with different types of input data without incurring additional costs. This is a significant advantage over other chat copilots like ChatGPT, where certain features, such as image processing, require a subscription.

For instance, with Get Pieces, I can easily upload a screenshot of code and obtain detailed information about it without needing to pay for this feature. This capability has been a major boost to my productivity, allowing me to work more efficiently and effectively without worrying about additional tool costs. Overall, Get Pieces provides a cost-effective solution for handling diverse input data, making it an invaluable asset for my workflow.

3. Universal Model Access: Another feature that stands out on Pieces is the Universal Model Access. This is a game-changer for users seeking flexibility and variety in AI tools. This feature consolidates access to a diverse range of language models, including premium options like GPT-4, PaLM 2, and Anthropic, all within a single platform. It eliminates the need for multiple subscriptions or separate interfaces, allowing users to experiment with and leverage various models without additional costs. The ease of switching between different models and integrating them into various workflows makes Pieces a powerful tool for enhancing productivity and creativity. Whether for coding, content generation, or data analysis, this feature streamlines access to cutting-edge AI capabilities, providing unparalleled convenience and versatility.

Moreover, these models are typically available as cloud-based options, but that's not all. Users also have the flexibility to utilise similar powerful models directly on their local devices. This dual availability ensures that users can access high-performance AI tools both online and offline, enhancing versatility and convenience.

4. Ask Copilot: As a developer with experience using various copilots, I find that Pieces Copilot truly stands out. The Pieces Ask Copilot feature is particularly impressive, offering insightful information about files, snippets, or terminal outputs. For instance, when encountering an error, you can simply highlight the error message and, with one click, ask the Copilot for details about the error. Fascinating, right? Normally, you would copy the error message and paste it into your favourite AI chatbot, like ChatGPT. However, with Pieces Copilot, you can resolve bugs faster without leaving your IDE, making your development process more efficient and seamless.

Final Thoughts

Pieces for Developers is undoubtedly a must-have tool for anyone looking to streamline their workflow. With everything you need in one place, it simplifies and enhances the development process. As a machine learning engineer, I find myself using this tool consistently and can’t imagine working without it. Whether you're a student, intern, or experienced developer, I strongly urge you to adopt this innovative tool.

Off-Chain Data Storage Significance in Blockchain

Victor Isaac Oshimua — Fri, 07 Jun 2024 09:10:26 +0000

You can’t solve a problem on the same level that it was created. You have to rise above it to the next level.
-Albert Einstein

Blockchain data storage and security introduce a change in how information is stored, accessed, and safeguarded. Fundamentally, blockchain technology presents a decentralised and tamper-proof ledger that offers a transparent and secure approach to storing data.

However, if everyone were to store all their data directly on the blockchain, it would cause the chain to expand enormously, making it challenging for individuals to maintain complete copies, which is crucial for the chain's integrity. Additionally, this extensive data storage would hinder the speed and efficiency of the chain's operations.

To address these challenges, blockchain systems often use off-chain data storage mechanisms. Off-chain data can include less critical or large data sets that are stored outside the main blockchain but can be referenced or verified through cryptographic proofs when needed. This approach is a solution to data storage problems in Web 3.
In this article, you will learn about off-chain data storage and its significance in blockchain application development.

Overview of blockchain technology

A blockchain is a distributed ledger that is used by a network of related users. These users can come from various industries or sectors, such as aerospace companies, educational institutions, logistics firms, or even diverse groups like renewable energy providers. Each participant in the network typically maintains a full copy of the distributed ledger; this ensures transparency, integrity, and the ability to verify and validate transactions across the blockchain network.

Blockchain is considered to be secure due to its immutable nature. Once a block is added to the blockchain, previous blocks cannot be altered. Attempting to change data would require altering the hash of each subsequent block, leading to rejection as it would break the chain of similarity with the previous blocks.

Importance of data storage for blockhain application

Data storage is a key part of building blockchain technology. Here are some reasons why storing data matters in blockchain:

Data Integrity: Storing data ensures that information remains intact and unaltered over time. This maintains the accuracy and reliability of records on the blockchain.
Historical Reference: Stored data provides a historical reference point that allows users to track past transactions, changes, and events on the blockchain. This historical data is valuable for auditing, and decision-making.
Analytics and Insights: Stored data serves as a valuable resource for analytics, providing insights into user behaviour, market trends, and performance metrics. Analysing data helps improve decision-making and optimise business processes.
Innovation and Development: Data storage fuels innovation and development in blockchain technology. By collecting data, developers can create new features, improve user experiences, and drive continuous improvement in blockchain applications.

On-Chain vs Off-Chain Data Storage

With the continued improvement in blockchain technologies, choosing the right data storage methods has become an important topic of discussion. Let us see the difference between on-chain and off-chain data storage and also understand which storage method should be adopted.

On-chain data storage

In blockchain technology, on-chain is the most used data storage method. On-chain storage is saving data directly onto the blockchain; there, it becomes part of the blockchain's ledger and can be accessed by all network participants. This method offers multiple benefits, such as transparency and immutability. Because the data is stored on the blockchain, it is openly available and resistant to tampering.

One example of on-chain data storage is storing transaction details in a blockchain-based cryptocurrency network. In systems like Bitcoin or Ethereum, each transaction, including sender and recipient addresses, transaction amounts, and timestamps, is stored directly on the blockchain. This on-chain data is publicly accessible and immutable, providing a transparent and secure record of all transactions that have occurred on the network.

Off-chain data storage

Off-chain storage involves storing data externally from the blockchain using methods like centralised databases or other decentralised storage options. This method is often chosen when data is too large or intricate to be stored directly on the blockchain.

Here's a comparison table between on-chain and off-chain data storage:

Aspect	On-Chain Data Storage	Off-Chain Data Storage
Data Location	Stored directly on the blockchain's ledger.	Stored outside the blockchain, in centralised or decentralised systems like databases.
Accessibility	Publicly accessible to all participants in the network.	Accessible based on permissions and protocols set by the storage system.
Immutability	Immutable once recorded on the blockchain.	May have varying degrees of immutability depending on the storage system's design.
Scalability	Limited scalability due to blockchain constraints.	More scalable for large or complex data sets.
Transaction Costs	Typically incur transaction fees for data storage.	Costs may vary depending on the off-chain storage solution used.
Speed	Slower due to blockchain consensus mechanisms.	Faster access and retrieval compared to on-chain storage.
Security	High level of security and trust due to blockchain's cryptographic features.	Security depends on the chosen off-chain storage solution and its implementation.
Use Cases	Best suited for critical and immutable data, such as financial transactions.	Suitable for storing large files, sensitive information, or data that doesn't require blockchain-level security.

Data storage challenges in blockchain

Blockchain networks face several challenges related to data storage, particularly in terms of scalability and other issues. Here's how off-chain data storage addresses these concerns:

1. Scalability Challenges

Issue: Limited transaction throughput, block size, and processing speed lead to congestion and higher fees.
Off-Chain Solution: Off-chain data storage reduces the burden on the main blockchain, improving transaction throughput, reducing congestion, and enhancing network performance.

2. Performance Issues

Issue: Slow transaction confirmations and high latency impacting user experience.
Off-Chain Solution: Off-chain storage improve transaction processing speed, reduce confirmation times, and enhance network performance for time-sensitive applications.

3. Data Storage Limitations

Issue: On-chain storage limitations for large files, complex data structures, or frequent data updates.
Off-Chain Solution: Off-chain storage solutions such as decentralised storage networks offer scalable and cost-effective solutions for managing diverse data types and frequent updates.

4. Privacy and Security Concerns

Issue: Public blockchains compromise privacy as transaction details are visible to all participants.
Off-Chain Solution: Off-chain storage enhances privacy and security through encryption, zero-knowledge proofs, and private databases, ensuring data confidentiality while maintaining integrity.

Off-chain data storage real-world use case

We've discussed what off-chain data storage is and also understood the advantages it brings to blockchain applications. Now, let's explore a real-world example of an off-chain data storage and see how it brings these advantages to life.

Bubble protocol off-chain data storage

Introducing Bubble Protocol; Bubble Protocol is an off-chain data storage solution integrated with on-chain permissions. This means that users can manage who has access to view or modify the data stored off-chain by leveraging the blockchain's inherent security and permission mechanisms.

Data storage with bubble protocol is powered by bubbles; a
bubble serves as a protected storage unit housing files and folders, and its access rights are overseen by a smart contract. When a request for content is made from the bubble, the protocol verifies the access rights through the smart contract, allowing the request only if the requester possesses the correct permissions.
Follow this link to learn more about how bubble protocol works.

What bubble protocol off-chain data storage solution offers

Bubble Protocol offers several significant advantages for builders, which include:

Privacy Focus: You can build applications with a strong emphasis on user privacy, ensuring that sensitive data remains secure and accessible only to authorised parties.

Control Over Data Access: The protocol provides you with granular control over data access through a combination of on-chain and off-chain mechanisms.

Hybrid Web2/Web3 Solution: By offering a hybrid approach, you can leverage the benefits of both Web2 and Web3 technologies, combining the scalability and familiarity of Web2 with the decentralisation and security of Web3.

Customisable Access Controls: The protocol's POSIX-like access controls, managed through smart contracts, allow you to tailor access permissions according to specific application requirements, ensuring data integrity and security.

Blockchain Agnosticism: You can integrate Bubble Protocol with various blockchain platforms, providing flexibility and compatibility with different ecosystems.

Multi-User Encryption: The protocol supports multi-user encryption, enabling developers to implement secure data sharing and collaboration features in their applications.

Real-Time Notifications: You can implement real-time notifications, enhancing user experience and ensuring timely updates on data access and changes.

Key Delegation: The protocol facilitates key delegation, allowing you to manage cryptographic keys effectively and delegate access rights as needed.

NFT-Controlled Storage: For applications involving Non-Fungible Tokens (NFTs), Bubble Protocol offers an ideal solution for secure and controlled storage, ensuring the integrity and ownership of digital assets.

Final thoughts

Off-chain data storage is the ideal data storage in blockchain applications, providing significant advantages over on-chain storage despite its limitations. However, Bubble Protocol tackles these challenges by combining off-chain storage with on-chain capabilities. When considering data storage for your Web3 application, off-chain storage should be a primary consideration. If you're looking for a reliable off-chain storage solution, Bubble Protocol has you covered. Join Bubble Protocol community on Discord to learn more.

Fine-Tuning Google Gemma for Python Question and Answer Task

Victor Isaac Oshimua — Mon, 18 Mar 2024 15:44:50 +0000

Large Language Models (LLMs) are incredibly powerful tools. However, their performance can be significantly enhanced by training them on custom datasets tailored to specific tasks or contexts. In this article, you will discover how fine-tuning LLMs with personalised data can greatly improve their capabilities and effectiveness for various applications.

What is Fine-Tuning in LLMs?

In Natural Language Processing (NLP), fine-tuning plays a crucial role in optimising pre-trained language models for specific tasks such as question and answer (Q&A). Let's delve into what fine-tuning LLMs entails, explore examples, understand its benefits, and discover various use cases.

Fine-tuning refers to the process of customizing a pre-trained language model for a specific downstream task. Imagine having a talented friend who excels at drawing but wants to improve in drawing images of cars. Wouldn't you advise or train that friend to focus on drawing car images? That's essentially what fine-tuning does for language models—it enhances their performance in specific tasks. For instance, with Google Gemma for Python Q&A tasks, this process involves adjusting Gemma's parameters and training it on a Python question and answer dataset.

Examples of Fine-Tuning

Question Answering Systems: Fine-tuning LLMs like Google Gemma for question answering involves training the model on a dataset of questions and their corresponding answers. This allows the model to understand and generate accurate responses to queries, enhancing the functionality of Q&A systems.

Chatbots and Conversational Agents: Fine-tuning LLMs for chatbots involves training the model on conversational data, including dialogues and interactions between users and the system. This enables the chatbot to engage in natural and context-aware conversations, improving user experience.

Text Generation: Fine-tuning LLMs for text generation tasks involves training the model on specific types of text, such as news articles, stories, or code snippets. This helps the model generate coherent and relevant content in the desired style or format.

Language Translation: Fine-tuning LLMs for translation tasks involves training the model on parallel text data in multiple languages. The model learns to accurately translate text from one language to another, facilitating cross-lingual communication and content localization.

Sentiment Analysis: Fine-tuning LLMs for sentiment analysis involves training the model on a dataset of text samples labelled with sentiment (positive, negative, neutral). This enables the model to classify the sentiment of new text inputs, aiding in sentiment analysis tasks.

Text Summarization: Fine-tuning LLMs for text summarization tasks involves training the model on a dataset of longer texts paired with concise summaries. The model learns to generate accurate and concise summaries of new texts, aiding in document summarization and content extraction.

Benefits of Fine-Tuning

Improved Performance: Fine-tuning allows Google Gemma to achieve better performance on specific tasks by leveraging domain-specific knowledge.

Efficient Resource Usage: Rather than training a language model from scratch, fine-tuning optimises the model's existing capabilities, saving computational resources.

Faster Deployment: Fine-tuned models can be quickly deployed in production environments, accelerating the development of Q&A applications.

Use Cases of Fine-Tuning

Customer Support Chatbots: Fine-tuning LLMs enables chatbots to provide accurate and context-aware responses to customer queries.

Educational Platforms: Fine-tuned models can assist students by answering questions related to course materials, enhancing learning experiences.

Medical Diagnosis Systems: Fine-tuning LLMs with medical data can improve diagnostic accuracy in healthcare applications.

How to Fine-Tune

Welcome to the practical coding section of this article! Here, you will learn by doing, following a step-by-step guide on fine-tuning Gemma using a Python question and answer dataset.
To ensure you get the most out of this tutorial, I have prepared a comprehensive Jupyter Notebook that will walk you through the process.

Click on the link below to access the Jupyter Notebook and start coding:

Link to Jupyter Notebook

If you have any questions or need further clarification, please don't hesitate to leave a comment or reach out to me directly. I'm here to help. Happy coding!

How To Detect Cyber Injection Attack With Artificial intelligence

Victor Isaac Oshimua — Thu, 07 Mar 2024 19:35:16 +0000

In the world of cybersecurity, a big challenge is figuring out what is and what is not malicious. This is important for all sorts of security tools, like ones that look for hackers, viruses, or flaws in software. Typically, this is done by comparing the incoming attack with a bunch of patterns or rules we already know. But this method isn't always accurate. The rules don't get updated regularly, and sometimes they get mixed up with so many other rules that they can't work properly.

A more accurate solution to this is using artificial intelligence (AI). In today's world of technology, AI has affected almost every industry, if not all, and cybersecurity is not an exception. What stands out in AI is its power to analyse vast amounts of data and detect anomalies as well.
In this tutorial, we will explore the power of AI in analysing vast amounts of data that contain vectors of cyber attacks and teach this AI to distinguish between an attack and normal behaviour.

Prerequsites

To get the most out of this tutorial, you should:

Have a basic knowledge of Python programming.
Understand machine learning; i.e., you have built a basic classification or regression model.
Cultivate curiosity about the application of AI in security.

What is cyber injection?

Cyber injection attacks, also known as code injection attacks, are a prevalent type of cyber threat where malicious code is injected into a system or application to compromise its integrity or steal sensitive information. These attacks can take various forms, including SQL injection, XSS (Cross-Site Scripting), and command injection, among others.

SQL injection involves inserting malicious SQL code into input fields of a web application to manipulate the database or gain unauthorised access to data. XSS attacks inject malicious scripts into web pages, allowing attackers to steal session cookies or redirect users to malicious sites. Command injection attacks exploit vulnerabilities in applications that execute shell commands, enabling attackers to execute arbitrary commands on the server.

Building AI-based cyber injection detector

Detecting cyber injection attacks can be challenging due to their diverse nature and the evolving tactics used by attackers. However, with the advancements in AI and machine learning (ML), detection methods have become more sophisticated. AI algorithms can analyse patterns in network traffic, application behaviour, and system logs to identify anomalies indicative of injection attacks.
ML models trained on historical data can learn to recognise patterns associated with known injection attacks and detect deviations from normal behaviour.

Now that you have an introductory understanding of cyber injection and how AI can assist in detecting it, let's follow these steps to proceed with building this AI.

Step 1: Data Collection.

For this project, we will be using real-world data from past examples of API request vectors. The vector in the dataset denotes data that encapsulates attributes crucial for identifying the presence of an SQL injection attack. Typically, this vector encompasses details regarding the API request, including parameters, headers, or payloads. These components can then be scrutinized by a machine learning model to discern indicators of a SQL injection attack.
You can download this data from kaggle.

The training data from Kaggle is divided into two parts:

info.csv: This file contains the type label of each attack vector (i.e., whether a particular vector is an attack or not).
train.msgpack: This is the main data for this project and will be used to train the machine learning model. It is stored in the messagepack format and contains various types of attack vectors.

Load the train.msgpack:

# Load the data that contains the attack vectors
with open("/content/train.msgpack", "rb") as data_file:  # Open train.msgpack file in binary read mode
    train = msgpack.unpack(data_file)  # Unpack the data using msgpack

# Transforming train data to a pandas DataFrame
train = pd.DataFrame(train)  # Convert unpacked data into a DataFrame
train.columns = ['id', 'vector']  # Rename columns of the DataFrame
train.head()  # Display the first few rows of the DataFrame

Load the info.csv:

# Load data that contains the type label of each attack vector 
# (i.e., whether a particular vector is an attack or not)
url_1 = "https://raw.githubusercontent.com/cyberholics/Cyber-Injection-Attack-Detection-With-Machine-Learning/main/Data/info.csv"  # URL of the CSV file containing the attack vector labels
info = pd.read_csv(url_1)  # Reading the CSV file into a pandas DataFrame

Inspect an example of a vector in the dataset:

# View the data sample to understand the data
train.iloc[2].values  # Retrieve values of the third row in the DataFrame to examine the data sample

The data looks messy, with lots of symbols, strange word formats, and web links. Right now, we're not sure if cleaning it up will help; it might even make things worse. So, let's start with something simple: tokenizing the vectors at the character level.

# View the data sample to understand the data
train.iloc[2].values  # Retrieve values of the third row in the DataFrame to examine the data sample

Step 2: Data preparation

After obtaining the data to build our machine learning model, the next step is to prepare the data. In this data preparation step, we will explore the data as well as convert it to a format that the machine learning algorithm can understand.
The following code will prepare the data.

# Convert the 'vector' column in the train dataset to string type to ensure consistency
train['vector'] = train['vector'].astype(str)

#Initialize the TfidfVectorizer with specified parameters
# ngram_range=(1, 4) specifies that we want to consider uni-grams, bi-grams, tri-grams, and four-grams
# analyzer='char' indicates that we want to tokenize the input text into characters
vectorizer = TfidfVectorizer(ngram_range=(1, 4), analyzer='char')


# Fit the vectorizer on the combined train data to learn the vocabulary and IDF values
vectorizer.fit(list(train['vector'].values))

# Transform the 'vector' column of the train dataset into a sparse matrix representation using the fitted vectorizer
train_vectorized = vectorizer.transform(train['vector'])

# Creating a label input for the model
y = np.array([1 if i == True else 0 for i in info.injection.values])
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(train_vectorized, y, test_size=0.2, random_state=42)

Step 3: Building the model

This is the modeling phase of the project, and it's the most eagerly anticipated part for many machine learning practitioners. During this stage, we will train a classification model to learn from the prepared data. For this project, we will utilise the XGBoost model. Now, let's delve into what XGBoost is all about.

XGBoost, short for Extreme Gradient Boosting, is a powerful and popular machine learning algorithm that is highly effective for both regression and classification tasks. It is an implementation of gradient boosting decision trees designed for speed and performance. XGBoost has gained widespread adoption in various machine learning competitions and real-world applications due to its ability to deliver high accuracy and efficiency.

Train the model:

# Define parameters for XGBoost model
params = {
    'max_depth': 6,  # Maximum tree depth
    'eta': 0.1,  # Learning rate
    'objective': 'binary:logistic',  # Binary classification
    'eval_metric': 'auc'  # Evaluation metric: AUC
}

# Convert data into DMatrix
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Train the XGBoost model
num_rounds = 100
xgb_model = xgb.train(params, dtrain, num_rounds)

Evaluate the trained model:

After training the model, we have to evaluate it to see how well it performs. This evaluation helps us determine if the model is generating false positives. The evaluation metric we will use is the Area Under the Curve (AUC) score.

The AUC score is a key metric in binary classification, measuring a model's ability to distinguish between positive and negative instances. It ranges from 0.5 to 1, with higher scores indicating better performance. AUC is especially useful for imbalanced datasets and provides a reliable measure of predictive power.

# Predict probabilities on the test set
preds = xgb_model.predict(dtest)

# Calculate AUC
auc = roc_auc_score(y_test, preds)
print("AUC:", auc)

We have a 0.99 AUC score. An AUC of 0.99 indicates a very high level of model performance.

Conclusion

Thank you for following this tutorial to the end. Throughout this tutorial, you have learned how to build an AI model capable of detecting cyber attacks. However, building the model is just the beginning. To fully utilise its capabilities, you should take the model to production, where it can be deployed in real-world scenarios.

Deployment methods vary depending on the specific requirements and infrastructure of your organisation. One common approach is deploying the model as a web service or API, allowing it to integrate seamlessly with existing systems. Alternatively, you can deploy the model within containerised environments using platforms like Docker and Kubernetes for scalability and flexibility. Whichever method you choose, ensuring proper monitoring, security, and maintenance of the deployed model is crucial for its effectiveness in real-world applications.
If you have any questions or suggestions, feel free to reach out to me. Your feedback is valuable.