DEV Community: Alfred Nutile

PHP and LLMs book - Local LLMs: Streamlining Your Development Workflow

Alfred Nutile — Mon, 26 Aug 2024 15:08:38 +0000

This is a chapter from the PHP and LLMs book you can buy at

Or get a sample at

Enjoy!

Big Picture

This chapter covers the basics of working with local Large Language Models (LLMs) in a PHP development environment. We'll go through setting up Ollama on your machine and creating a simple PHP client to interact with its API. The chapter demonstrates how to use both completion and chat modes, and includes a practical example of using LLMs for news summarization. Along the way, we'll cover important development practices like mocking API responses and writing effective tests. By the end of this chapter, you should have a foundational understanding of integrating local LLMs into your PHP applications, setting the stage for more advanced topics in later chapters.

What You Will Learn

How to set up a local LLM environment using Ollama
The advantages and considerations of working with local LLMs
How to create a basic PHP client to communicate with the Ollama API
How to implement and test both completion and chat functionalities in your client
The practical application of LLMs for news summarization and article filtering

Setting up your local Computer

We are going to use different LLMs and models, but to start this chapter I will show how to setup Ollama on your machine.
Keep in mind that an M1 or M2 Mac will not be great speed wise and you can overwhelm these systems with too many requests, but we'll mock a lot of the requests we do so this will not be an issue overall. Later in the book, I will cover how to use Laravel queues to help manage that, and how our Mock Driver can mock results so you can work in the UI without being impacted by this.

Ollama

This is the easy part! They have a nice download page: http://ollama.com/. Once installed, you'll have an icon in your tool bar (or taskbar on Windows) and you can basically "Restart" when needed since they do a ton of updates.

After you've installed Ollama, we need to pull down two models to start this process.

First let's use a small model called phi3 for results so we do not tax your machine too much.

Some great models that are larger are llama3.1 and mistral but we'll stick to the smaller model phi3
since it will take up less memory. Explore https://ollama.com/library for more models.

ollama pull phi3

Here are their docs https://github.com/ollama/ollama/blob/main/docs/README.md for more details.

OK, so now you have a chat model. For simple chat at the commandline to start you can just open a terminal and run:

ollama run phi3:latest

Pretty cool! But let's do more. Because you are running Ollama on your machine it is accessable via a port you can make http requests using the API https://github.com/ollama/ollama/blob/main/docs/api.md

curl http://localhost:11434/api/generate -d '{
  "model": "phi3:latest",
  "prompt": "What is PHP?"
}'

Or without streaming.

curl http://localhost:11434/api/generate -d '{
  "model": "phi3:latest",
  "prompt": "What is PHP?",
  "stream": false //added
}'

In the pursuit of accurate and reliable results, our approach often employs a 'multi-shot' strategy. Think of it as refining your initial request through multiple interactions with the LLM, much like crafting a masterpiece through successive drafts.

I am going to default to non-streaming. I will show how to stream, but will mostly stick to non-streaming. This is because much of what I am building is not for chat, and, when it is for chat, I want the quality of the results to be multi-shot. Here is an explanation from

Google Gemini:

In the pursuit of accurate and reliable results, our approach often employs a "multi-shot" strategy. Think of it as refining your initial request through multiple interactions with the LLM, much like crafting a masterpiece through successive drafts.

Each interaction serves as a "shot," where the LLM processes your prompt and delivers a response. We then evaluate this response and, if necessary, provide additional guidance or context in subsequent prompts, shaping the output toward the desired outcome. This iterative process allows us to fine-tune the results, ensuring they align with your intent and meet the highest standards of accuracy.

While this multi-shot approach may take slightly longer than a single prompt, the emphasis is on delivering the most precise and useful information, even if it means sacrificing a bit of speed.

OK, back to the fun part! Let's make our first "Client" to talk to this API and return results. From there, we will build up to "abstract" it as a driver we can use on any LLM service.

HTTP Client with Tests

NOTE: there are libraries for these but I want you to see how easy it is, using HTTP, to talk to these APIs. This pays off later as we do things like Pool requests, or talk to APIs that have no library yet.

We will start with some simple code on this:

//app/Services/Ollama/Client.php
<?php

namespace App\Services\Ollama;

use Illuminate\Support\Facades\Http;

class Client
{
    public function completion(string $prompt) {
        return Http::post('http://localhost:11434/api/generate', [
            'model' => 'phi3:latest',
            'prompt' => $prompt,
            "options" => [
                "temperature" => 0.1,
            ],
            'stream' => false,
        ]);
    }
}

A note on temperature and the elephant in the room hallucinations. We can prevent hallucinations almost 100%. And I am 100% confident of that from past experience. Preventing hallucinations or what I call drifting depends on a few things like, temperature, context building, prompting skills, etc that I will go over in this book. So let us put that concern aside for a bit and I promise you by the end of this book you will see it more of a problem with the prompt or the context than with the LLM.

So let us put that concern (hallucinations) aside for a bit and I promise you by the end of this book you will see it more of a problem with the prompt or the context than with the LLM.

So here (and by default) I set the temperature to 0.1 or 0 in many cases, lowering the amount of "creativity" the LLM will reply with.

Let's make a test. Below you will see the first run is there for us to get a fixture from the actual repsonse, then we will update the test to then use the fixture with our HTTP mock.

<?php

test('should return response', function () {

    $client = new \App\Services\Ollama\Client;
    $response = $client->completion('What is PHP?');

    put_fixture('simple_ollama_client_results.json', $response->json());

    expect($response)->not->toBeNull();
});

Ok I hate when people show a code example and introduce a custom something that then I am like "why?". But I just use
this all the time and add it to my app/helpers.php:

// app/helpers.php
<?php

use Illuminate\Support\Facades\File;

if (! function_exists('put_fixture')) {
    function put_fixture($file_name, $content = [], $json = true)
    {
        if (! File::exists(base_path('tests/fixtures'))) {
            File::makeDirectory(base_path('tests/fixtures'));
        }

        if ($json) {
            $content = json_encode($content, 128);
        }
        File::put(
            base_path(sprintf('tests/fixtures/%s', $file_name)),
            $content
        );

        return true;
    }
}

And then register it in composer.json here:

    "autoload": {
        "psr-4": {
            "App\\": "app/",
            "Database\\Factories\\": "database/factories/",
            "Database\\Seeders\\": "database/seeders/"
        },
        "files": [
            "app/helpers.php"
        ]
    },

I know there is the VCR library as well but I just wanted something more simple in my day to day workflow.

With our initial run, we got a file put into this folder tests/fixtures/simple_ollama_client_results.json.

And, in that file, we have the payload that it returned.

{
    "model": "phi3:latest",
    "created_at": "2024-08-22T15:19:44.208546Z",
    "response": "PHP, which stands for \"PHP: Hypertext Preprocessor,\" is a widely-used open source server-side scripting language. It was originally designed in 1994 by Rasmus Lerdorf to manage the creation and maintenance of websites...",
    "done": true,
    "done_reason": "stop",
    "context": [
        //lots of numbers
    ],
    "total_duration": 6179024209,
    "load_duration": 7809625,
    "prompt_eval_count": 14,
    "prompt_eval_duration": 760911000,
    "eval_count": 215,
    "eval_duration": 5409058000
}

There are some details in the above response we can talk about later, but for now we have the context, and the done_reason. All these LLMs will have different stop reasons. As we work on abstracting them out we will define a consistent version of and all the results will conform so for example when we get to Tools, there will be a different stop reason for that.

Now, let's update the test, moving the fixture we just generated in place.

<?php

test('should return response', function () {

    $data = get_fixture('simple_ollama_client_results.json'); //added

    \Illuminate\Support\Facades\Http::fake([
        'localhost:11434/*' => $data
    ]); //added

    \Illuminate\Support\Facades\Http::preventStrayRequests();//added

    $client = new \App\Services\Ollama\Client;
    $response = $client->completion('What is PHP?');

    //put_fixture('simple_ollama_client_results.json', $response->json()); removed
    expect($response)->not->toBeNull();
});

The get_fixture method is another helper I use to get the fixture from the tests/fixtures folder:

// app/helpers.php
<?php

// ...

if (! function_exists('get_fixture')) {
    function get_fixture($file_name, $decode = true)
    {
        $results = File::get(base_path(sprintf(
            'tests/fixtures/%s',
            $file_name
        )));

        if (! $decode) {
            return $results;
        }

        return json_decode($results, true);
    }
}

So now using the generated fixture we can fake the interaction. This is a good start as we flesh this out more in the "Agnostic Driver Chapter".

Using the Chat Method

You can read the Ollama docs here about how chat works. Those docs will help us in this section since we want to have a threaded discussion, not just a prompt completion as seen in the previous section.

We have our first PHP class that can ask the LLM a question. Let's make a new method for "chat". This allows us to add context to the question(s). Keep in mind for many one off integrations the completion we did above is all you need. But you will see as we get into tools it becomes more of a thread.

A thread in this case would look like this, simplified:

[
    {
        "role": "user",
        "content": "Find me all the events on this page and create a list of events and save them to the system"
    },
    {
        "role": "assistant",
        "content": "<thinking>I will use the event_tool and pass in the data I found on the page.</thinking>"
    },
    {
        "role": "tool",
        "content": "",
        "tool": {
            "name": "event_create_tool",
            "parameters": //all the events formmated
        }
    },
    {
        "role": "assistant",
        "content": "I created the following events in your system..."
    }
]

The above is a good example of what comes out of a request to the LLM, in this case the LLM was asked a question by you, the user role, then it started to build a chain of responses based on it needing to use tools to answer the question. We will show real results like this in later chapters.

In the next section, we update the client for using chat over completion, then do our first use case.

Update Client for Chat

NOTE: the tests are simple, but you'll see how we build on the code with confidence and the tests over time.

//app/Services/Ollama/Client.php
<?php
public function chat(array $messages) {
    return Http::post('http://localhost:11434/api/chat', [
        'model' => 'phi3:latest',
        'messages' => $messages,
        'stream' => false,
        "options" => [
            "temperature" => 0.1,
        ]
    ]);
}

And the test:

<?php
test('should return chat response', function () {

    $client = new \App\Services\Ollama\Client;
    $response = $client->chat([
        [
            'role' => 'user',
            'content' => 'What is PHP?',
        ]
    ]);

    put_fixture('simple_ollama_client_chat_results.json', $response->json());

    expect($response)->not->toBeNull();
});

Then we take our fixture and use it, here is what it looks like:

{
    "model": "phi3:latest",
    "created_at": "2024-08-22T17:11:40.160747Z",
    "message": {
        "role": "assistant",
        "content": "PHP, which stands for Hypertext Preprocessor, is a widely-used open source server-side scripting language. It was originally created by Rasmus Lerdorf in 1994 and...",
    },
    "done_reason": "stop",
    "done": true,
    "total_duration": 16670826417,
    "load_duration": 5122545583,
    "prompt_eval_count": 14,
    "prompt_eval_duration": 84989000,
    "eval_count": 412,
    "eval_duration": 11462114000
}

Here we use it for the test:

<?php
test('should return chat response', function () {

    $data = get_fixture('simple_ollama_client_chat_results.json'); // added

    \Illuminate\Support\Facades\Http::fake([ // added
            'localhost:11434/*' => $data] // added
    ); // added

    \Illuminate\Support\Facades\Http::preventStrayRequests(); //added

    $client = new \App\Services\Ollama\Client;
    $response = $client->chat([
        [
            'role' => 'user',
            'content' => 'What is PHP?',
        ]
    ]);

    //put_fixture('simple_ollama_client_chat_results.json', $response->json()); removed

    expect($response)->not->toBeNull();
});

So just like before we send in the results, prove it works. Again this is going to change quite a bit in the Abstraction chapter.

Real Use Case- News Summary

There is a feature in LaraLlama.io that allows you to search the web. I use that to search daily for news, and then send me a summary of the news for that day or week.

There are three things I can use the LLM for.

Parse the HTML to get the content about the article
Reject the article if it is not about the subject I want, e.g., a bad search result.
Then summarize all the articles that came in that day.

We will talk about this more later on in the Prompt chapter but this is all about Prompt and Context. We are giving the LLM Context or Content to parse with the prompt.

<role>
You are extracting the content from the provided HTML or text that is related to technology news. 

<task>
First see if the article provided talks about technology news and if not just return bool false. Else pull out the content of the article as a Title, Summary, URL, and Content formatted as below.

<format>
Title:
Url:
Summary:
Content:

Step 1 and 2 Filter non-related data and then extract data

In the Prompting chapter, I'll show you how to test your prompt but for now we will go with the prompt above.

Then we use our client in our new NewsFeedParser class:

//app/Domains/NewsFeedParser.php
<?php

namespace App\Domains;

use Facades\App\Services\Ollama\Client;

class NewsFeedParser
{

    public function handle(string $context) : bool|string
    {
        $prompt = <<<PROMPT
<role>
You are extracting the content from the provided HTML or text that is related to technology news. 

<task>
First see if the article provided talks about technology news and if not just return bool false. Else pull out the content of the article as a Title, Summary, URL, and Content formatted as below.

<format>
Title:
Url:
Summary:
Content:

<content>
$context
PROMPT;

        /**
         * @NOTE
         * This is going to change quite a bit
         * for now keeping it simple
         */
        $results = Client::completion($prompt);

        $results = data_get($results, 'response', false);

        if ($results == 'false') {
            return false;
        }

        return $results;
    }
}

NOTE: I use Laravel's Real Time Facades - watch how this pays off in the next test.

The first test to see it work will be for false responses where the LLM weeds out search results that are not about the technology news we want, so below you'll see put_fixture('news_feed_false.json', $results); where I get the results and save them for a mocked test after this.

it('can return false', function () {

    $context = 'This is about how to make a hamburger';

    $results = (new \App\Domains\NewsFeedParser())->handle($context);

    put_fixture('news_feed_false.json', $results);

    expect($results)->toBeFalse();
});

Then our mocked test to see it work:

it('can return false', function () {

    $context = 'This is about how to make a hamburger';

    $data = get_fixture('news_feed_false.json'); //removed

    Facades\App\Services\Ollama\Client::shouldReceive('completion') //removed
        ->once() //removed
        ->andReturn($data); //removed

    $results = (new \App\Domains\NewsFeedParser())->handle($context);

    //put_fixture('news_feed_false.json', $results); removed

    expect($results)->toBeFalse();
});

NOTE: See how we mock now at the Client level not at Http. I typically test one class at a time, even in my Feature tests and mock the surrounding classes. You can easily remove this though and test them together. By the way, we'll pass Data objects around using the Spatie Data object which will be tested as well.

OK, so we can handle false. Now, let's do a good response example.

First we make the test to generate the mock data. We will create a Laravel news article and save it to our tests/fixtures folder:

<!--tests/fixtures/news_feed_good.html-->
<html lang="en">
<head>
    <title>Laravel News</title>
</head>
<body>
<article>
    <h1>Laravel 11 Released</h1>
    <p>Laravel 11 has been released, and it's packed with new features and improvements. Here's what's new in this version:</p>
    <ul>
        <li>New Blade Directives</li>
        <li>Improved Form Validation</li>
        <li>Improved File Uploads</li>
    </ul>
</article>
</body>
</html>

Then, we create a test to generate the mock data:

<?php
it('can return an article summary', function () {

    $context = get_fixture('news_feed_good.html', false);

    $results = (new \App\Domains\NewsFeedParser())->handle($context);

    expect($results)->not->toBeFalse();
});

Another way we can create a fixture is to temporarily update the tested method to generate a fixture:

// app/Domains/NewsFeedParser.php
<?php

public function handle(string $context) : bool|string {
// ...

$results = Client::completion($prompt);

put_fixture('news_feed_good_response.json', $results); //added

$results = data_get($results, 'response', false);

if ($results == 'false') {
    return false;
}

return $results;

In this case my prompt gave me mixed results. This is a good example of how we could keep testing to get good results from our prompt, but we will save that for a later chapter 🙂

Now that we have the fixture, we move it to our test:

it('can return an article summary', function () {

    $data = get_fixture('news_feed_good_response.json'); //added

    Facades\App\Services\Ollama\Client::shouldReceive('completion') //added
        ->once() //added
        ->andReturn($data); //added

    $context = get_fixture('news_feed_good.html', false);

    $results = (new \App\Domains\NewsFeedParser())->handle($context);

    //put_fixture('news_feed_good_response.json', $results); removed

    expect($results)->not->toBeFalse();
});

That is it for our NewsFeedParser - we now have the class that uses the LLM to verify the content is what we want, and then extract the content we want out of the HTML.

Step 3 Summarize all the news

Now that we have the news we want (in this example we can assume it was saved in a database for later use), let's do two examples of creating a "News Summary" experience, one with completion and one with chat.

We will cover both, but I asked Ollama what is the advantage of one over the other, and I liked some of the information it offered:

When using LLMs like Ollama, the choice between sending input as chat or completion depends on how you want to provide context and what type of response you expect.

Chat:

In chat mode, you're essentially having a conversation with the model. You send a message, and the model responds. Then, you can send another message building upon the previous one, and so on. This is useful when:

You want to provide context incrementally, like in a conversation.
You need the model to maintain state or recall previous messages.
In your example, sending 5 news articles as separate chat messages would allow the model to process each article individually, taking into account any relationships between them. This might be useful if you want the model to:

Identify common themes or trends across multiple articles
Recognize patterns or connections between different articles
Completion:

In completion mode, you're providing a prompt and expecting a response that completes or generates text based on that input. You can send a single prompt with multiple components (like your 5 news articles) as long as they're concatenated into a single string.

When sending multiple inputs as a single completion prompt, the model will process them as a whole, without maintaining state between individual messages. This is useful when:

You want to provide a fixed amount of context upfront.
The relationships between different components are not crucial for the response.
In your example, sending all 5 news articles and the summary prompt in a single completion request would allow the model to process them as a whole, generating a summary that takes into account all the articles.

Context window:

When using chat, you get more control over the context window, as each message builds upon previous ones. The model maintains state and can recall previous messages, allowing for more nuanced responses.

In completion mode, the context window is fixed to the input prompt itself. While the model can still recognize patterns or relationships within that prompt, it won't maintain state across individual components like it would in a chat conversation.

So, if you want to provide more context and control over how the model processes your inputs, use chat. If you're comfortable providing all relevant information upfront and don't need incremental processing, completion might be the better choice.

Completion Example

In this example we get all our articles for this summary, it could be 1, it could be 10. The concern here is that there is too much content and we break the "context window". This limit will increase over time, like with Google Gemini it is a 2 million tokens window!

See the Terminology chapter. But the TLDR is this: tokens represent the space your question takes up, and we need to leave room for the reply of the LLM. Combined, this forms your context window.

In this case, we build up our messages thread from a query to the database to generate "context" for the threaded prompt:

//app/Domains/News/NewsDigestCompletion.php
<?php

namespace App\Domains\News;

use App\Models\News;
use Carbon\Carbon;
use Facades\App\Services\Ollama\Client;

class NewsDigestCompletion
{
    public function handle(Carbon $start, Carbon $end) : string
    {
        $messages = News::all()->map(function ($news) {
            return sprintf('News: Title: %s Content: %s', $news->title, $news->body);
        })->join("\n");

        $prompt = <<<PROMPT
<role>
You are my news digest assistant
<task>
Take the news articles from the <content> section below and create a TLDR followed by a title and summary of each one
If not news is passed in the just say "No News in this

<content>
{ $messages }
PROMPT;

        $results = Client::completion($messages);

        return data_get($results, 'content', 'No Results');
    }
}

So, we get all the news, mash it into a big text blob (the LLM can do some interesting things to understand the text - it does not need to be perfectly formatted), then we add it to our prompt.

From here, we get our results and give them back to the Class making the request.

Chat Example

//app/Domains/News/NewsDigest.php
<?php

namespace App\Domains\News;

use App\Models\News;
use Carbon\Carbon;
use Facades\App\Services\Ollama\Client;

class NewsDigest
{
    public function handle(Carbon $start, Carbon $end) : string
    {
        $messages = News::whereBetween('created_at', [$start, $end])->get()->map(
            function ($news) {
                return [
                    'user' => sprintf('News:  Title: %s Content: %s', $news->title, $news->body),
                    'role' => 'user'
                ];
            }
        );

        $prompt = <<<PROMPT
<role>
You are my news digest assistant
<task>
Take the news articles from this thread and create a TLDR followed by a title and summary of each one
If not news is passed in the just say "No News in this thread"
PROMPT;

        $messages = $messages->push([
            'role' => 'user',
            'content' => $prompt
        ])->toArray();

        $results = Client::chat($messages);

        return data_get($results, 'content', 'No Results');
    }
}

As you can see above we are sending the array of news articles with our prompt as part of the message thread.

NOTE: We are taking this in small steps. Later, we will get into Tools and how to make a prompt like this below just work:

<role>
You are my assistant helping with news and events
<task>
Get all the news articles from this week and make a digest of them an email them to info@dailyai.studio

So you can see three "Tools" in that prompt, news_fetch, news_digest, and send_email, that will "just work" by the time we are at the end of this book. The LLM will look at the prompt, the tools you tell it the system has, and know which ones to use and in what order. As for the news feature, this can all be wrapped up into a scheduler to search the web daily and then send off a digest. And once the Tools are part of the natural language of the prompt, a user can easily create these complex workflows.

What You Learned

The core concepts and benefits of utilizing local LLMs
How to install and configure Ollama on your local machine
The basics of interacting with the Ollama API using cURL
The importance of considering context window limitations
How to build a simple PHP client to communicate with Ollama
How to mock API responses for efficient testing
A practical example of using LLMs for news summarization
The strategic use of both completion and chat modes for different LLM interactions
A preview of the exciting possibilities that await as we delve deeper into LLMs

Mapping LLM Integration Levels to Levels of Autonomous Driving, it does not have to be all or nothing!

Alfred Nutile — Tue, 02 Jul 2024 02:15:34 +0000

I thought it would be interesting to compare LLM integration into your day to day business workflows to Levels of Automobile Automation.

BTW Follow me on YouTube https://www.youtube.com/@alfrednutile

Here are the levels of Automobile Automation:

Level 0 - No Automation: The driver performs all driving tasks.
Level 1 - Driver Assistance: The vehicle can assist with either steering or acceleration/deceleration, but not both simultaneously.
Level 2 - Partial Automation: The vehicle can control both steering and acceleration/deceleration in some situations, but the driver must remain engaged.
Level 3 - Conditional Automation: The vehicle can handle all aspects of driving in certain conditions, but the driver must be ready to take control when prompted.
Level 4 - High Automation: The vehicle can handle all driving tasks in specific circumstances without human intervention.
Level 5 - Full Automation: The vehicle can perform all driving tasks under all conditions, with no human intervention needed.

Right now there seems to be a lot of “all or nothing” mindset when it comes to working with LLMs and I think each tasks you face day to day might benefit differently or not at all from where this tech is right now.

The challenge—and opportunity—lies in identifying which of your daily tasks could benefit from LLM assistance and at what level

Level 0

Sometimes this is it! You might try and use it to write cover letters for job postings only to find out that Grammarly was enough. But you tried. You pasted a few bits of your writing style into a chat window, pasted the job posting and said “Write me a cover letter in my voice for this job” and what came out was embarrassing.

Great stop right there for that task but keep an open mind to other tasks where LLMs might be more helpful.

Level 1

One example of this level we can think about it and me writing this article then giving I write to the LLM to review. I can give it some examples of my voice and then ask it to just update any places it could use more “meat on the bones” etc.

I might not like what it outputs or maybe I do. Maybe I use it as is or maybe I compare the two and just move some bits over. But I might end up with something a little better than what Grammarly could have done. Using examples of my tone it does a good job of not sounding like an alien from another planet trying to sound human and professional.

Level 2

Let’s take the above and take it up a notch. Let’s say I have a dozen articles in my www.larallama.io collection that I wrote, and then I start a chat with a bullet list of points I want to cover in a blog post and I tell it have at it!

Maybe just maybe it can pull it off. Maybe it can do some great tweets or LinkedIn “tweets”?

Anyways maybe it keeps the tone, or ten prompts later it pulls it off. That is great - it took the bones of my story and really fleshed it out for me, potentially saving time and enhancing the quality of the content.

Level 3

Ok like above we are building confidence and we even setup a system like www.larallama.io to collect data from the week. News about Laravel, news about RAG systems, news about LaraLlama.io itself. And then I tell it “Every Friday Write a News Letter and Email me a copy”

Great, but then I check it and if it looks good I post it to the mailing list using [INSERT YOUR MAILING LIST SERVICE HERE]

And that is great, I have a moment to check it, even tell it to try again etc. I am not 100% trusting it yet but getting closer.

This level of automation can significantly reduce your workload while maintaining quality control.

Level 4

Then comes Level 4! Level 3 proves itself week after week. Readers are writing back how impressed they are! They are even sending emails to LaraLlama that become suggestions for next articles and building reports on customer feedback.
It just does not end!

So we turn on FULL AUTOMATION!!! The system now sends the emails every Friday without my proofing, based on the consistent quality it has demonstrated.

Level 5

Ok my example ended above this one just goes over the top. Let’s list out what just happened above in Four to see how crazy this could get then wrap this up.

One, the system is sending emails every Friday about news of the week.
Two, the system is reading customer feedback to adjust each week to that feedback and preventing the infamous trolling of the system that would lead it to writing about some obscure topic.
Three, the system is also taking suggesting based on the stats of the articles read!

This level of automation represents a significant leap in AI-assisted content creation and management, though it's important to note that human oversight may still be necessary in some capacity.

Wrapping it Up

And that's how far we can potentially go with automation for our different day to day work tasks. However, it's important to remember that not all business processes are ready for any level of AI integration. For instance, tasks like helping a customer get a quote on a new web feature still require a human touch (though this might change in a few years). Similarly, creating engaging videos about innovative ideas for LaraLlama.io remains a primarily human-driven process. (For now 🤔)

The key takeaway is that LLM integration isn't an all-or-nothing proposition. There's a wide spectrum of possibilities between no automation and full automation. The challenge—and opportunity—lies in identifying which of your daily tasks could benefit from LLM assistance and at what level. Some processes might be ready for Level 4 or 5 automation, while others might be best served by Level 1 or 2 support.

As we move forward, it's crucial to approach LLM integration strategically. Start small, experiment with different levels of automation for various tasks, and gradually increase the AI's role as you build confidence in its capabilities.

By understanding these different levels of LLM integration, you can make informed decisions about how to leverage this powerful technology in your day-to-day business workflows, potentially boosting productivity and innovation in ways we're only beginning to explore. (That is such an LLM worded sentence 🤦‍♂️)

Laravel RAG System in 4 Steps!

Alfred Nutile — Tue, 25 Jun 2024 11:35:38 +0000

Laravel RAG System in 4 Steps!

This post will show how easy it is to get going with Laravel, Vectorized data and LLM chat. It can be the foundation to a RAG system. There are links to the code and more in this article Original Article.

Links

📺 YouTube Channel - https://youtube.com/@alfrednutile?si=M6jhYvFWK1YI1hK9

📖 The Docs - https://docs.larallama.io/

🚀 The Site - https://www.larallama.io

🫶🏻 https://patreon.com/larallama

🐦 https://x.com/alnutile

🧑🏻‍💻 The Code - https://github.com/LlmLaraHub/laralamma
n
📰 The NewsLetter - https://sundance-solutions.mailcoach.app/larallama-app

🖊️ Medium - https://medium.com/@alnutile

🤝🏻 LinkedIn - https://www.linkedin.com/in/alfrednutile/

📺 YouTube Playlist - https://www.youtube.com/watch?v=KM7AyRHx0jQ&list=PLL8JVuiFkO9I1pGpOfrl-A8-09xut-fDq

💬 Discussions - https://github.com/orgs/LlmLaraHub/discussions

Laravel RAG System in 4 Steps!

Alfred Nutile — Mon, 24 Jun 2024 21:38:00 +0000

Make sure to follow me on YouTube

This post will show how easy it is to get going with Laravel, Vectorized data and LLM chat. It can be the foundation to a RAG system. There are links to the code and more.

Retrieval augmented generation system — an architectural approach that can improve the efficacy of large language model (LLM) applications

The Repo is here. The main branch will get you going on a fresh install of Laravel. If you copy the .env.example to .env you can get started just follow along.

Follow Along or Watch the Video COMING SOON!

Step 1 Setting up Vector

NOTE: Each pull do composer install. If that is not enough run composer dump

You can see the Branch here https://github.com/alnutile/laravelrag/tree/vector_setup

Once you have Laravel Setup using HERD, DBEngine, or the PostGres app. Then using TablePlus or Command Line or whatever make the database in this example “laravelrag”

Now we want to are going to install this library which will setup the Vector Extension for us.



composer require pgvector/pgvector
php artisan vendor:publish --tag="pgvector-migrations"
php artisan migrate

Step 2 Now for the Model

The Branch is https://github.com/alnutile/laravelrag/tree/model_setup

We will keep it simple here and have a model named Chunk

This will be where we store “chunks” of a document. In our example we will chunk up a long Text document so we can keep it simple for now. But in the end all things become text! PDF, PPT, Docx etc.

You will see in the code it is not about pages as much as chunks that are x size with x overlap of content.

In this code we will default to 600 characters chunks with an bit of an overlap you can see the code here.

Step 3 Add the LLM Drivers

Repo Link is https://github.com/alnutile/laravelrag/tree/llm_driver

NOTE: Each pull do composer install. If that is not enough run composer dump

We bring in my LLMDriver folder which is not an official package (sorry just too lazy) and then some other libraries



composer require spatie/laravel-data laravel/pennant

I am going to use my LLM Driver for this and then plug in Ollama and later Claude.

But first get Ollama going on your machine read about it here.

We are going to pull llama3 and mxbai-embed-large (for embedding data)

Or just use your API creds for OpenAi it should make sense when you see the code and the config file “config/llmdriver.php”

Just set the Key value in your .env or checkout config/llmdriver.php for more options.



LLM_DRIVER=ollama

Now let’s open TinkerWell (I want to avoid coding a UI so we can focus on the concept more) https://tinkerwell.app/

Load up the Provider bootstrap/providers.php



<?php

return [
    App\Providers\AppServiceProvider::class,
    \App\Services\LlmServices\LlmServiceProvider::class,
];

Ok so we see it is working now lets chunk a document.

NOTE: This ideally all would run in Horizon or Queue jobs to deal with a ton of details like timeouts and more. We will see what happens if we just go at it this way for this demo.

Also keep an eye on the tests folders I have some “good” examples on how to test your LLM centric applications like tests/Feature/ChunkTextTest.php

Ok now we run the command to Embed the data

![Embed Files](

And now we have a ton of chunked data!

The columns are for the different size embeddings depending on the embed models you are using. I got some feedback here and went the route you see above.

Now lets chat with the data!

Step 4 Chatting with your Data

Ok we want the user to ask the LLM a question, but the LLM needs “context” and a Prompt that reduces drift, that is when an LLM makes up answers. I have seen it reduced to 100% in these systems.

First let’s vectorize the input so we can search for related data.

Since we embedded the data the question then gets embedded or vectorized to then use it to do the search.

So we take the text question and pass it to the embed api (Ollama and OpenAi offer this)

Here is the code so you can see how simple it really is with HTTP.

You will see I use Laravel Data from Spatie so not matter the LLM service it is always the same type of date in and out!

Now we use the Distance query to do a few things lets break it down

We take the results of the embedData gives us and pass it into the query using Vector to format it Pgvector\Laravel\Vector library:



use Pgvector\Laravel\Vector;
new Vector($value);

Then we use that in the distance query

I used Cosine since I feel the results had been a bit better. Why? I did a bit of ChatGPT work to decide which one and why. Here are some results:

The order of effectiveness for similarity metrics can vary depending on the nature of the data and the specific use case. However, here’s a general guideline based on common scenarios:

Cosine Similarity: Cosine similarity is often considered one of the most effective metrics for measuring similarity between documents, especially when dealing with high-dimensional data like text documents. It’s robust to differences in document length and is effective at capturing semantic similarity.

Inner Product: Inner product similarity is another metric that can be effective, particularly for certain types of data. It measures the alignment between vectors, which can be useful in contexts where the direction of the vectors is important.

L2 (Euclidean) Distance: L2 distance is a straightforward metric that measures the straight-line distance between vectors. While it’s commonly used and easy to understand, it may not always be the most effective for capturing complex relationships between documents, especially in high-dimensional spaces.

In summary, the order of effectiveness is typically Cosine Similarity > Inner Product > L2 Distance. However, it’s important to consider the specific characteristics of your data and experiment with different metrics to determine which one works best for your particular application.

Ok back to the example. So now we have our question vectorized and we have search results. The code also takes a moment to knit back the chunks with siblings so instead of getting just the chunk we get the chunk before and after. https://github.com/alnutile/laravelrag/blob/chat_with_data/app/Services/LlmServices/DistanceQuery.php#L33

Now that we have the results we are going to build a prompt. This is tricky since it takes time to get it right so you might want to pull it into ChatGPT or Ollama and mess around a bit. The key here is setting the temperature to 0 to keep the system from drifting. That is not easy yet in Ollama https://github.com/ollama/ollama/issues/2505

![results(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oulye3gg5rtfbh13jzap.png)

Ok let’s break this down.

Item 1 the Prompt

Here we define a Role, Task and Format (JSON, Markdown, Table etc) checkout https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/ for some tips!
Item 2 the Input

Here we pass the original text question to help the LLM understand the users request

Garbage in Garbage Out

Item 3 the Context

Context, this is key. “Garbage in Garbage Out” is still the rule here as well. Put good data into your RAG system. In this example I imported some Laravel docs. But this is the data from the Distance Query!

Item 4 LLM

We are just doing a completion api here. This is not a “Chat” which is an array of questions and answers but that would work as well. This is just the Prompt we built passed in. I am using Claude driver to show how easily we can switch systems. Also I feel like Ollama, unless you set the temperature, is a bit trickier right now to keep on track. And Claude is FAST!
Item 5 The Answer

As seen above you can get a sense of the answer but just want to share that sometimes the LLM will point out (see below) that it does not have enough data to answer your question.

Here is another example I like to share with people

Just more “evidence” of what a good RAG system can do.

Wrapping it up

That is really how “easy” it is to get a RAG system going. LaraLlama.io has a ton more details that you can see but this is a very simple code base I share in this article.

The next post will be tools/functions, extending this code further. But there are so many ways to use this in applications, I list a bunch of use cases here https://docs.larallama.io/use-cases.html

The code is all here https://github.com/alnutile/laravelrag you can work through the branches with the last one being https://github.com/alnutile/laravelrag/tree/chat_with_data

Make sure to follow me on YouTube https://youtube.com/@alfrednutile?si=M6jhYvFWK1YI1hK9

And the list below or more ways to stay in touch!

📺 YouTube Channel — https://youtube.com/@alfrednutile?si=M6jhYvFWK1YI1hK9
📖 The Docs — https://docs.larallama.io/
🚀 The Site — https://www.larallama.io
🫶🏻 https://patreon.com/larallama
🧑🏻‍💻 The Code — https://github.com/LlmLaraHub/laralamma
n
📰 The NewsLetter — https://sundance-solutions.mailcoach.app/larallama-app
🖊️ Medium — https://medium.com/@alnutile
🤝🏻 LinkedIn — https://www.linkedin.com/in/alfrednutile/
📺 YouTube Playlist — https://www.youtube.com/watch?v=KM7AyRHx0jQ&list=PLL8JVuiFkO9I1pGpOfrl-A8-09xut-fDq
💬 Discussions — https://github.com/orgs/LlmLaraHub/discussions

Past Year and a Half of Tech Used and Loved and Stuff Built

Alfred Nutile — Mon, 18 Sep 2023 14:00:00 +0000

tldr: this is about my past year a half becoming a full time developer again and the different technologies I used and experience with them. I will share the tech I like the most still and the projects I used them on.

Overview

Was updating my resume and my list of experience and just wanted to share here. I will write more on the topic later about the "Developer to Director back to Developer" path since I think the first part of this "growth" happens often and not always for the right reasons as seen in the "Peter Principle"

The Peter Principle states that, if you perform well in your job, you will likely be promoted to the next level of your organization's hierarchy. You will continue to rise up the ladder until you reach the point where you can no longer perform well.

Anyways more on that later for now a look at my time since April 2022 and becoming a full time Developer again. And now as I look back I also want to share the tech that has resonated the most with my workflow and thinking. Btw some of that time was spent at a full time job building a product for them so I did "lose" some time to that but I did learn a ton at that one both about tech and myself.

All these stacks use Github Actions for CI/CD https://medium.com/@alnutile/ci-github-actions-and-laravel-part-1-of-3-547d5515b116 honestly it is just one file and done. And https://envoyer.io to deploy with "zero" downtime. And then there is https://jetstream.laravel.com/introduction.html providing not only the foundation but some great starter components and patterns for Vue and Intertia.

2022-2023 Tech Stack Winners for me 🏆

Inertia https://inertiajs.com/ the form api makes life sooooo easy
Vite https://vitejs.dev/ what a game changer for me and the dev workflow
Forge + DigitalOcean https://forge.laravel.com/ gets the Ops out of the way so I can focus on the Dev
Meilisearch https://www.meilisearch.com/ wow saved my 🥓 on a large database
Tailwind Yup Shane https://www.linkedin.com/in/shane-rosenthal/ was right about me going with this over Bootstrap never looked back
JetStream https://jetstream.laravel.com/introduction.html just what I need to get started on any project
OpenAi PHP https://github.com/openai-php my TODO list for 2023-2024 is to make a "Storage" like driver so we can use any LLM but for now this is a great option!
RapidAPI https://rapidapi.com/hub getting going quickly with some APIs like LinkedIn or Twitter and more like text to audio! Even a great webscrapper.
PHPVector https://github.com/pgvector/pgvector wow so nice I could just enter this new LLM phase
Miro https://miro.com remote whiteboard
Balsamiq https://balsamiq.com/ the downloaded version to keep MVP goals clear and simple.
Envoy.io https://envoyer.io just works!
Horizon all things https://laravel.com/docs/10.x/horizon

2022-2023 Tech I appreciate but not using ❤️

Livewire, https://laravel-livewire.com/ nice to have this option great work
Vapor https://vapor.laravel.com/ so glad this exists especially when/if I get a client that needs this level of scaling.
Nova https://nova.laravel.com/ glad it is here
Spark https://spark.laravel.com/ glad it is here as well just a bit too "simplistic" for most of my customers so far but still looking for a place for it.
Statamic https://statamic.com/ very glad they are doing the work just did not fit my style. Great editor.
PEST https://pestphp.com/ glad this is here to help push our thinking about testing more.

2023-2024 Tech I want to try 🚀

Filament https://filamentphp.com/ soon. I have so many Inertia Components to use at this point but this looks sooooo nice
Alternative to Pusher.com though it has been the most affordable reliable service for the past 10 years it would be nice to use the newer free options the Laravel Docs s talk about.

Built internal system using People Data Labs

System using People Data Labs to more effectively use that API to build reports as needed. (top secret stuff)
URL Is private

Tech:
Livewire 2, Tailwind, Inertia, Meilisearch

Thoughts:
Wow my first use of Livewire and Tailwind! Love Tailwind now but at first I did tooooo much adding of classes to get it "right" and not enough removing. As far as Livewire it just did not work with my brain. I like how 3 works since it has a Vue like Component feel. Later on I moved new features to Inertia and it is going well. As far as Meilisearch wow! Using Forge I set it up with one click and the 3 million row database is just sooooooo fast.

Gallery Anywhere

https://galleryanywhere.com/

Personal idea to have local art in AirBnb's did not go anywhere 🤷🏻‍♂️

Tech:
Laravel, Nova, Tailwind, Inertia

Thoughts:
Fell in love with Inertia and Vue just was more inline with what I was use to. Nova was fine, but for me it took some of the fun out of Laravel.

RecallsNow (use to be TotalRecalls)

https://recallsnow.com/

Fun project and "scratch my own itch". Lots of background api stuff and web scraping.

Tech:
Laravel, OpenAi API for Tagging, Tailwind and Inertia, OpenAi API Finetuning

Thoughts:
Just wanted one idea of mine to go from start to end (subscriptions - just me at this point🙂 )
The stack was great, still learning Tailwind as you can see in the CSS.
Some of the APIs I used lacked good tags or data so I used OpenAi Finetuning and some tags I had to train it on what a recall might be tagged if it lacked a tag. https://platform.openai.com/docs/guides/fine-tuning

My Blog

https://alfrednutile.info/

We all have to do this! Really wanted to "show off" my Github stats.

Tech:
Statamic

Thoughts:
Like Livewire I appreciate the community and effort. As a long time Laravel dev I feel it was too different. I suck at design🙂

Well Registration Site and Clean water search site (Two of them)

https://cleanwater.kaweahwater.org/start
SECOND URL is in QA mode

Working with a partner in California to build clean water / well registration sites. The internal part of the site is actually quite complex.

Tech:
Interia, Tailwind, Forge, DigitalOcean, Postgres and GIS plugin.

Thoughts:
Found my stack and it has been working really well. This site just helped me to grow in this skill and then appreciate GIS, Postgres, and MapBox, Twilio for SMS

BountifulTravel

URL still closed other than a few standalone payment and management areas.

This client site is moving a HUGE Wordpress site to Laravel that manages Group Trips.

Tech:
Interia, Tailwind, Vapor!, Integration to Monday.com , Plaid (great tech stack for payments imo) and AFS for other payments.

Thoughts:
Overall it reminded me of why I need to work with the customer and not a person between me and the customer. No matter how good we all are at our jobs and intentions communication is key and building the right thing really means the developer in constant contact (jargon) with the customer. The tech stack is good but this was my last Vapor and I would move it if I could back to Forge and DO. I like Vapor and glad I know it is there when needed.

Book Creation and Ai/LLM

https://nextchapters.com/ (this is the sister site that leads into the site I made)

Lot's of work on a sister site still in QA to help authors write books using OpenAi as an assistant. Really fun stuff.
Kind of like GitHub but for a community of writers.

Tech:
Interia, Tailwind, Forge, DigitalOcean

Thoughts:
The stack just works and delivers. I am getting to the point where I have a nice folder of components to share with all my projects. Using Echo and Pusher the user gets a very dynamic experience.

LaraChain Open Ai LLM Integration

https://larachain.io/ and https://www.youtube.com/playlist?list=PLL8JVuiFkO9I1EgzjPKKZNz-ZVbcBJWF9

This an open-source project with a focus on use LLMs and "blocks" to quickly make a backend for any person or company take advantage of Vector databases and OpenAi API (or other APIs).

Tech:
Interia, Tailwind, Forge, DigitalOcean, OpenAi PHP https://github.com/openai-php

Thoughts:
Amazing the tech stack and libraries we have as PHP developers. I was really worried with the rise of https://www.langchain.com/ if PHP and Laravel could keep up. After my deep dive into this it is clear we can 🥂

TOP SECRET PROJECT

One other OpenAi API LLM Project that was MVP and did not make it past that stage but was a great try. I learned a lot and worked with someone who showed me a lot about building the right product, and staying focused on the MVP and just how to think about solutions.

Tech:
Interia, Tailwind, Forge, DigitalOcean, OpenAi PHP https://github.com/openai-php
RapidAPI https://rapidapi.com/hub

Thoughts:
Again Horizon, Batches etc all lead us to make some amazing tools. And RapidAPI allowed me to get some data quickly.

LLMAssistant

https://github.com/alnutile/llm-assistant

This is my big effort to scratch my own itch of storing articles I like, helping me with my day to day work planning (Bullet Journal like) and since it knows a lot about me it can reply to Job Posts in a sensible way!

Tech:
Interia, Tailwind, Forge, DigitalOcean, OpenAi PHP https://github.com/openai-php
RapidAPI https://rapidapi.com/hub

Thoughts:
The RapidApi webscrapper is solid https://rapidapi.com/proxycrawl/api/proxycrawl-scraper I tried PHPRoach but ended up with this and was done in under and hour. Otherwise it is a Forum like system (eg me and the LLM) to go back and forth about a topic like a reply to a job posting, or an article I want to tag. You can see more about that here https://medium.com/@alnutile/openai-function-calls-and-laravel-a2f8db9b0f97

Colosseum Crew Gaming Tournament site

https://colosseumcrew.com/

This is one of my favorites. It was me trying to be a VC type partner but putting up my dev time for later payment. 🤞

Tech:
Nova (but moved away from this), Tailwind, Inertia, API integration to https://challonge.com/ and Spark at first but moved away from this as well to Paypal

Thoughts:
Nova was a good start but the data is quite complex so I moved away from it and just built a ui.
Spark too just was not flexible enough so I moved to Paypal (which fit some of their needs better)
Very very hard site for me, the domain was all new and the data structure was harder than I imagined with matches, teams, players, stats etc. Really a good bunch of people to work with too looking forward to it being used an being an enjoyable system for the users to use.

SimpleRecipes (personal project)

https://simplerecipes.io/

With the LLM Craze I had to try some ideas.

Tech:
Interia, Tailwind, Forge, DigitalOcean, OpenAi PHP https://github.com/openai-php, Nova

Thoughts:
Nova worked here cause I stayed within the lines of what it can do. And did not push it to a place where I had to learn a ton with Nova. The art is all Dalle-2 with a prompt that focuses on food in watercolor style. Still never certain what you will get! Fun project.

There are others items, like one customer used Larachain as a backend and we are still integrating some chat and doc search into their frontend etc. But this is the list of tech used, stuff built and the happy path back to a developer.