DEV Community: TimeSurge Labs

Using LLMs in 3 lines of Python

Chandler — Mon, 30 Jun 2025 19:26:33 +0000

When working with LLMs, the first thing people generally install is the openai or anthropic packages, if you’re a little more adventurous with your LLM choice it may be litellm or ollama. The issue is that all of these require a bit of code to get your started. For example, assuming you have an API key in your environment like I do, you’ll need at least this code to make an LLM call with OpenAI (also assuming you’re using the older Chat Completions endpoint).

import os
from openai import OpenAI

# retrieve API key from environment
api_key = os.getenv("OPENAI_API_KEY")

# initialize client
client = OpenAI(api_key=api_key)

# send a chat request
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Say something concise."}
    ]
)

# print assistant's answer
print(response.choices[0].message.content.strip())

And if you want to wrap your API call with a function so you can call it repeatedly, that’s even more lines!

import os
from openai import OpenAI

def chat_with_openai(prompt: str) -> str:
    api_key = os.getenv("OPENAI_API_KEY")

    client = OpenAI(api_key=api_key)
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content.strip()

if __name__ == "__main__":
    print(chat_with_openai("Say something concise."))

And that is simply unacceptable!

Do you really care?

No, I’m being facetious. For most LLM projects, consistency of output trumps anything else, however sometimes its nice to have a super simple way to add LLMs to my one-off python scripts and tools without all the boilerplate.

Magentic

Magentic is a Python package that lets you create functions that call LLMs in 3 lines of code. No, really! Here’s an example ripped straight from their docs.

from magentic import prompt

@prompt('Add more "dude"ness to: {phrase}')
def dudeify(phrase: str) -> str: ...  # No function body as this is never executed

Thanks to some black box dark magic that I don’t feel like learning about, this is a completely valid Python function that’s callable anywhere in the script, assuming you have an OpenAI API Key in your environment variables.

print(dudeify("Hello, how are you?"))
# "Hey, dude! What's up? How's it going, my man?"

A Note On Package Management

I’m going to be using the PEP 723 standard at the top of all my scripts for the rest of this post. This allows you to use uv, the best package manager for Python, to run the scripts without you having to make a virtual environment, then install packages, then run the script. This automates all three of those tasks into a single command. Here’s an example.

Here’s the above script with the added metadata and some slight modifications. This assumes you have uv installed and the OPENAI_API_KEY env var set.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic"
# ]
# ///

import fire
from magentic import prompt

@prompt('Add more "dude"ness to: {phrase}')
def dudeify(phrase: str) -> str: ...  # No function body as this is never executed

if __name__=="__main__":
    fire.Fire(dudeify)

This script can now be downloaded and ran like an executable. I’ve uploaded to a gist for easy download.

wget -O dudeify https://gist.githubusercontent.com/chand1012/218372f3e1101dfa7f915dc35c0e66d8/raw/363f720d21fa8ebe2e6a484f6b389496c3452064/dudeify.py
chmod +x dudeify
./dudeify "Hello how are you"
# Installed 23 packages in 45ms
# Yo dude, how's it hangin'?

The first time you run the script it’ll handle making a cached virtual environment for the next time you run it! For more information on how this works, you can check out the uv docs, and the blog post that inspired my constant use of this feature.

Structured Outputs

If you want to have structured outputs, like for example for an API response or just to make it easier to parse and use the data with your scripts, you can use a Pydantic Dataclass.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic",
#     "pydantic",
# ]
# ///

from fire import Fire
from magentic import prompt
from pydantic import BaseModel

class Animal(BaseModel):
    species: str
    legs: int
    latin_species: str
    predators: list[str]
    prey: list[str]

@prompt("Give me information on the animal {animal_name}.")
def animal_info(animal_name: str) -> Animal: ...

if __name__=="__main__":
    Fire(animal_info)

Here’s an example of that method being ran.

Prompting and Function Calls

There’s two ways you can prompt the LLM with Magentic. You can either use the @prompt decorator, as I’ve been using, which is the simplest and fastest way to create LLM methods. There’s also @chatprompt, which allows you to pass a list of chat messages to the LLM. This is especially useful for few-shot prompting, where you give the LLM some examples of what output you want. After all, LLMs are just fancy pattern matching black boxes.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic",
#     "pydantic",
# ]
# ///
from fire import Fire
from magentic import chatprompt, AssistantMessage, SystemMessage, UserMessage
from pydantic import BaseModel

# this is a modified version of magentic's example chatprompt code
# https://magentic.dev/#chatprompt
class Quote(BaseModel):
    quote: str
    character: str

@chatprompt(
    SystemMessage("You are a movie buff."),
    UserMessage("What is your favorite quote from Harry Potter?"),
    AssistantMessage(
        Quote(
            quote="It does not do to dwell on dreams and forget to live.",
            character="Albus Dumbledore",
        )
    ),
    UserMessage("What is your favorite quote from {movie}?"),
)
def get_movie_quote(movie: str) -> Quote: ...

if __name__=="__main__":
    Fire(get_movie_quote)

You can also pass function calls to LLMs to allow them to return a python callable that you can call later. Another use of this is the decorator @prompt_chain which allows you to have an LLM call a function and use the returned results to generate its response.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic",
#     "duckduckgo_search",
# ]
# ///
from fire import Fire
from magentic import prompt_chain
from duckduckgo_search import DDGS

def web_search(query: str) -> dict:
    """Searches the web for a given query"""
    with DDGS() as ddgs:
        results = ddgs.text(query, max_results=5)
        print(results)
        return results

@prompt_chain(
    "You are a helpful assistant that can search the web for information. Use your tools to answer the user's question: {query}",
    functions=[web_search],
)
def search(query: str) -> str: ...

if __name__ == "__main__":
    Fire(search)

Using Other LLMs

If you’re a data conscious person, or just want your options to be open, Magentic can be configured to work with nearly all other LLMs as long as they are supported by LiteLLM or offer an OpenAI compatible API. Here’s an example of a script that runs entirely locally using Ollama and Google’s Gemma 3.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic"
# ]
# ///

import fire
from magentic import prompt, OpenaiChatModel

model = OpenaiChatModel("gemma3:27b-it-qat", base_url="http://localhost:11434/v1/")

@prompt('Add more "dude"ness to: {phrase}', model=model)
def dudeify(phrase: str) -> str: ...  # No function body as this is never executed

if __name__=="__main__":
    fire.Fire(dudeify)

If your chosen LLM is one of the many supported by LiteLLM, you can use the LiteLLM extra feature of Magentic.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic[litellm]"
# ]
# ///

import fire
from magentic import prompt
from magentic.chat_model.litellm_chat_model import LitellmChatModel

# this specific example requires GEMINI_API_KEY env var to be set
model = LitellmChatModel("gemini/gemini-2.0-flash")

@prompt('Add more "dude"ness to: {phrase}', model=model)
def dudeify(phrase: str) -> str: ...  # No function body as this is never executed

if __name__=="__main__":
    fire.Fire(dudeify)

You can use the LiteLLM method to use Anthropic’s Claude series of models, or you can use Magentic’s official Anthropic extension.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic[anthropic]"
# ]
# ///

import fire
from magentic import prompt
from magentic.chat_model.anthropic_chat_model import AnthropicChatModel

# this specific example requires GEMINI_API_KEY env var to be set
model = AnthropicChatModel("claude-4-sonnet-latest")

@prompt('Add more "dude"ness to: {phrase}', model=model)
def dudeify(phrase: str) -> str: ...  # No function body as this is never executed

if __name__=="__main__":
    fire.Fire(dudeify)

No LLM left behind!

Advanced Usage

Need an async function? Just prefix with async def instead of def !

# incomplete snippet
from magentic import prompt

@prompt("Tell me more about {topic}")
async def tell_me_more_about(topic: str) -> str: ...

You can use Python’s AsyncIterable to make multiple simultaneous calls to the LLM.

# incomplete snippet
import asyncio
from typing import AsyncIterable

from magentic import prompt

@prompt("List ten presidents of the United States")
async def iter_presidents() -> AsyncIterable[str]: ...

tasks = []
async for president in await iter_presidents():
    # Use asyncio.create_task to schedule the coroutine for execution before awaiting it
    # This way descriptions will start being generated while the list of presidents is still being generated
    task = asyncio.create_task(tell_me_more_about(president))
    tasks.append(task)
descriptions = await asyncio.gather(*tasks)

Need to stream the response back to the user? Use Magentic’s StreamedStr to loop through the response chunks.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic"
# ]
# ///

import fire
from magentic import prompt, StreamedStr

@prompt("Tell me about {country}")
def describe_country(country: str) -> StreamedStr: ...

def describe(country: str):
    for chunk in describe_country(country):
        print(chunk, end="")
    print()

if __name__=="__main__":
    fire.Fire(describe)

This also works for multiple objects, simply wrap your objects in the Iterable class.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "fire",
#     "magentic",
#     "pydantic",
# ]
# ///
from collections.abc import Iterable

from fire import Fire
from magentic import prompt
from pydantic import BaseModel

class Animal(BaseModel):
    species: str
    legs: int
    latin_species: str
    predators: list[str]
    prey: list[str]

@prompt("Give me information on the animals in the family {family}.")
def animal_family_info(family: str) -> Iterable[Animal]: ...

def info(family: str):
    for animal in animal_family_info(family):
        print(animal)

if __name__=="__main__":
    Fire(info)

Conclusion

Working with LLMs is now easier than ever, and Magnetic makes it even easier than the standard methods to quick add LLMs to any Python script, regardless of the scale of complexity. Using this in tandem with something like uv and the new scripting metadata allows you to quickly make command line tools that can utilize AI quickly and effectively. I won’t always use Magentic for every project I need an LLM for, but I’ll definitely use it all the time with my small one-offs and utilities.

Agentic Coding (Vibe Coding) Best Practices

Chandler — Fri, 28 Mar 2025 21:49:53 +0000

TLDR

If you've been living under a rock, you may not be aware of the "vibe coding" phenomenon.
If you want a good explanation of what "vibe coding", or to use a more technical term, Agentic Coding, is, check out Fireship's video on the subject. He does a great job of explaining the concept in a way that's both easy to understand and objective about the pros and cons.

Tooling

If you're going to ignore all the cons and go ahead with agentic coding, there are some best practices I use to make sure my code doesn't turn into a complete mess of AI generated garbage. I personally use Cursor, so this guide is going to use their Rules feature to organize and apply rules to the LLM for code generation.

Rules

Cursor has a concept of a rule file. This file is a markdown file found in the .cursor/rules directory with some extra front matter that specifies when and how the rule is applied to the LLM, and ending in .mdc rather than .md. These can be broken down into 4 categories.

Language Rules
- Applies to specific languages.
Framework Rules
- Applies to specific frameworks.
- Can also apply to libraries that have special rules, like shadcn/ui.
Practice Rules
- For coding practice guidelines.
Project Rules
- Should be used to describe project specific guidelines like file structure, dependencies used, etc.

Rules can be applied with 4 different methods.

Always Apply
Auto Apply
- Uses a glob pattern to apply the rule to all files that match the pattern.
- Especially useful for language and framework rules where specific file extensions are used.
Agent Requested
- Uses a description of the rule to allow the agent to decide when to apply the rule.
Manual Apply
- Only applied when you directly ask the agent to apply the rule.

Rules can also have other files from within your project linked to them and will also be loaded into context. This is especially useful for the project rules where you can link the README as well as any other documentation that the LLM should know about, like an Architecture or Contributing Guide.

Writing Rules

The actual contents of the rules are written in markdown, and should be concise and clear guidelines for the LLM rules. They should be written to be both human and LLM readable, and should include very minimal code and command examples. Here's some examples of rules from my personal collection.

Example Rules

Here's an example I made for best practices when using Go.

---
description: "Go coding standards and best practices for modern development"
globs: **/*.go
---

# Go Best Practices

## Package and Import Statements
- Use meaningful package names that reflect their purpose (e.g., `auth`, `config`).
- Group imports in this order: standard library, third-party, then local packages, separated by blank lines.
- Avoid import cycles to maintain clean dependency graphs.

## Type System
- Use `struct` types to define complex data structures.
- Define `interface` types to specify behavior and enable polymorphism.
- Use type aliases sparingly for clarity (e.g., `type ID string`).
- Leverage Go’s built-in types (e.g., `map`, `slice`) and composite types effectively.
- Avoid unnecessary type conversions to maintain type safety.
- Use struct embedding for composition instead of inheritance.

## Naming Conventions
- Use `camelCase` for variable and function names (e.g., `getUser`).
- Use `PascalCase` for type names and exported identifiers (e.g., `UserService`).
- Use `ALL_CAPS` for constants (e.g., `MAX_RETRIES`).
- Be descriptive yet concise in naming (e.g., `userCount` over `cnt`).

## Code Organization
- Follow the standard Go project layout (e.g., `cmd/`, `pkg/`, `internal/`).
- Keep related code within the same package for cohesion.
- Use subdirectories for larger packages to organize functionality (e.g., `api/handlers`).

## Functions and Methods
- Keep functions short and focused on a single responsibility.
- Use named return values for clarity in complex functions (e.g., `func getData() (data string, err error)`).
- Avoid side effects in functions to improve predictability.

## Best Practices
- Follow the Go proverb: "A little copying is better than a little dependency."
- Use interfaces to define behavior and decouple components.
- Prefer composition over inheritance using embedding.
- Avoid unnecessary abstractions; prioritize simplicity.
- Use the `init` function sparingly for package initialization.
- Avoid global variables; if unavoidable, ensure they are thread-safe (e.g., with `sync.Mutex`).
- Be mindful of memory allocations; use profiling tools (e.g., `pprof`) to optimize performance.
- Use `gofmt` for consistent formatting and `go vet` for static analysis.

## Error Handling
- Always check errors explicitly (e.g., `if err != nil`).
- Use descriptive error messages for debugging (e.g., `errors.New("failed to open file")`).
- Consider error wrapping with `fmt.Errorf` and `%w` for context (e.g., `fmt.Errorf("query failed: %w", err)`).
- Use `defer` with `recover` to handle panics in critical sections (e.g., HTTP handlers).

## Concurrency
- Use goroutines for concurrent tasks (e.g., `go processData()`).
- Use channels for safe communication between goroutines (e.g., `ch := make(chan int)`).
- Avoid shared state when possible; prefer message passing via channels.

## Testing
- Write tests for all public functions using the `testing` package.
- Use table-driven tests for multiple test cases (e.g., `tests := []struct{...}`).
- Aim for high test coverage with `go test -cover`.

## Documentation
- Write doc comments for all exported identifiers (e.g., `// UserService handles user operations`).
- Follow the standard Go doc format, starting with the identifier name (e.g., `// Package auth provides...`).
- Include examples in doc comments when possible (e.g., `// Example: ...`).

## Patterns
- Use interfaces for dependency injection to improve testability.
- Implement the Repository pattern for data access (e.g., `UserRepository` interface).
- Use the Factory pattern for object creation (e.g., `NewUserService()`).

This allows the LLM to properly structure Go code, and its output is great! Here is some code generated using the rule.

Here's another example of a project-level rule that I use for a Supabase backend project.

---
description: 
globs: 
alwaysApply: true
---
# Lancer DB

This is our monorepo for our Supabase Database migrations as well as our Supabase Edge Functions written in Deno.

## Directory Structure

- docs/: Markdown documentation relevant to the repo and development.
- supabase/: Contains all the Supabase related configurations, migrations, and edge functions.
  - supabase/migrations/: Contains the migrations. All migrations names should be formatted like so: `20240821194157_subnets.sql`. That is a raw date with no spaces or formatting + `_` + followed by a snake case description of the migration.
  - supabase/functions/: Contains Deno edge functions.
    - supabase/functions/**/index.ts: Each of the main entrypoints for each edge function. Edge functions have folders which are their name, and any related files that the edge function uses that are not shared between functions should be included in the same directory as `index.ts`.
    - supabase/functions/_shared/: Directory of all shared code that gets reused between multiple functions.
  - supabase/seed.sql: Seed data for local development and testing only. If dummy data is needed for local testing, it should be added here.
  - supabase/config.toml: Configuration data for the local Supabase instance for local dev and testing.
- scripts/: Deno scripts for development and testing.
- Justfile: Command runner script. Holds commands and bash scripts we use frequently as we work on the project. Automatically loads a `.env` if present. Commands can be run with `just <command name>`

## Code Style

### General Guidelines

- Follow DRY (DO NOT REPEAT YOURSELF)
- Code should be well-named while following the case practices defined below for the language.
- Code should be readable by human devs as well as LLMs alike.
- Use meaningful variable and function names.

This basically just tells the LLM to follow the project's overall coding standards and file structure. In other repos, I've also linked other documentation that the LLM should know about via an @ symbol.

Creating Rules

Create a new file in the .cursor/rules directory followed by .mdc as the extension.

mkdir -p .cursor/rules
touch .cursor/rules/go.mdc

Cursor will by default open the rule file using a special editor that allows you to set the rule type and globs without having to manually edit the front matter.

For manual and always apply rules, you can simply write the rule contents and save the file.

For auto apply rules, you can use a glob pattern to apply the rule to all files that match the pattern.

For agent requested rules, you should write a good description of the rule and the conditions that should trigger the rule.

Once that's done your rules are finished and will be loaded into context when you open Cursor.

Documentation

Sometimes you'll need to link both internal and external documentation to the LLM. For internal documentation, such as a project's README, you can use the @ which will bring up a menu of files you can select from. You can start typing the name of the file you want to link to and it will filter down the list.

For external documentation, you should link it via a markdown link.

Conclusion

I've been using these rules for a while now and they've helped me write better code. I've also found that the LLM is able to follow the rules more often than not, and when it doesn't, it's usually because I need to update the rule to be more specific.

Happy coding!

Command Line Power-Ups: Boost Your Workflow with Mods and Freeze

Chandler — Thu, 27 Jun 2024 23:05:53 +0000

For developers and tech enthusiasts, the command line is a powerful tool. But did you know there are ways to make it even more efficient and visually appealing? Two of my favorite tools I’ve been using lately are Mods and Freeze. These tools, brought to you by the innovative team at Charm Bracelet (charm.sh), will revolutionize how you interact with your terminal, automating tasks and creating beautiful code snippets.

Mods

Imagine having the capabilities of ChatGPT directly in your command line. That's the power of Mods. This AI-driven tool excels at scripting and automation, allowing you to generate code snippets and streamline repetitive tasks with ease.

Let's say you need a basic Python script to print "Hello World" with user input. With Mods, it's as simple as typing:

mods -f "Generate a python hello world app with user input. Only output the code and no other text" -r > test.py

This command instructs Mods to generate the code, output it in raw format (without Markdown), and save it to a file named test.py.

Mods is still under development, so you might need to make minor adjustments to the output format. However, its ability to understand natural language commands and generate code is truly impressive. Plus, Mods remembers your conversation history, allowing you to reference previous commands and build upon your work seamlessly.

Freeze

Sharing code snippets for documentation, presentations, or blog posts can be cumbersome. Freeze comes to the rescue, enabling you to generate visually stunning code screenshots with just a single command.

To create a beautiful image of your Python script (test.py), simply type:

freeze test.py

This will generate a PNG image file showcasing your code with elegant syntax highlighting. Freeze also offers a range of customization options:

--window: Adds macOS-style window controls for a realistic look.
--theme: Allows you to apply various themes like the popular GitHub dark mode.
--execute: Captures the output of terminal commands within the screenshot.

With Freeze, you can effortlessly create professional-looking code visuals, enhancing your projects and communication.

Mods and Freeze offer developers and tech enthusiasts powerful tools to enhance their productivity and creativity. Whether you're automating tasks, generating scripts, or creating eye-catching code visuals, these tools will streamline your workflow and elevate your projects. Explore these and other innovative command line tools to unlock the full potential of your terminal!

Links:

Mods: https://github.com/charmbracelet/mods
Freeze: https://github.com/charmbracelet/freeze

What are your favorite command line tools? Share them in the comments below!

Video Tutorial - How To Run Llama 3 locally with Ollama and OpenWebUI!

Chandler — Thu, 16 May 2024 16:52:07 +0000

Learn how to run LLaMA 3 locally on your computer using Ollama and Open WebUI! In this tutorial, we'll take you through a step-by-step guide on how to install and set up Ollama, and demonstrate the power of LLaMA 3 in action. Whether you're a developer, AI enthusiast, or just curious about the possibilities of local AI, this video is for you. So sit back, relax, and let's dive into the world of LLaMA 3, Ollama, and OpenWeb UI!

Follow TimeSurge Labs!

AI Disclosure

Thumbnail Background by OpenAI Dalle 3.
Title and Description partially AI generated by Llama 3 on GroqCloud.
All music generated by Suno.
Video filmed and edited by Chandler (NOT AI).

Music Links

I Said Goodbye to ChatGPT and Hello to Llama 3 on Open WebUI - You Should Too

Chandler — Wed, 24 Apr 2024 18:29:03 +0000

I’m a huge fan of open source models, especially the newly release Llama 3. Because of the performance of both the large 70B Llama 3 model as well as the smaller and self-host-able 8B Llama 3, I’ve actually cancelled my ChatGPT subscription in favor of Open WebUI, a self-hostable ChatGPT-like UI that allows you to use Ollama and other AI providers while keeping your chat history, prompts, and other data locally on any computer you control.

My previous article went over how to get Open WebUI set up with Ollama and Llama 3, however this isn’t the only way I take advantage of Open WebUI. The other way I use it is with external API providers, of which I use three. I’ll go over each of them with you and given you the pros and cons of each, then I’ll show you how I set up all 3 of them in my Open WebUI instance!

External AIs

OpenAI

OpenAI can either be considered the classic or the monopoly. Their AI tech is the most mature, and trades blows with the likes of Anthropic and Google. Even though Llama 3 70B (and even the smaller 8B model) is good enough for 99% of people and tasks, sometimes you just need the best, so I like having the option either to just quickly answer my question or even use it along side other LLMs to quickly get options for an answer.

OpenAI is the example that is most often used throughout the Open WebUI docs, however they can support any number of OpenAI-compatible APIs. Here’s another favorite of mine that I now use even more than OpenAI!

Groq Cloud

Groq is an AI hardware and infrastructure company that’s developing their own hardware LLM chip (which they call an LPU). They offer an API to use their new LPUs with a number of open source LLMs (including Llama 3 8B and 70B) on their GroqCloud platform.

Their claim to fame is their insanely fast inference times - sequential token generation in the hundreds per second for 70B models and thousands for smaller models. Here’s Llama 3 70B running in real time on Open WebUI.

// Detect dark theme var iframe = document.getElementById('tweet-1782783466406322202-998'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1782783466406322202&theme=dark" }

Here’s the best part - GroqCloud is free for most users. With no credit card input, they’ll grant you some pretty high rate limits, significantly higher than most AI API companies allow. Here’s the limits for my newly created account.

14k requests per day is a lot, and 12k tokens per minute is significantly higher than the average person can use on an interface like Open WebUI.

Using GroqCloud with Open WebUI is possible thanks to an OpenAI-compatible API that Groq provides. All you have to do is generate an API Key via the dashboard, change the URL in the dashboard to https://api.groq.com/openai/v1, and it’ll work just like OpenAI’s API!

This is how I was able to use and evaluate Llama 3 as my replacement for ChatGPT!

Cloudflare Workers AI

This is the part where I toot my own horn a little. Using Open WebUI via Cloudflare Workers is not natively possible, however I developed my own OpenAI-compatible API for Cloudflare Workers a few months ago. I recently added the /models endpoint to it to make it compable with Open WebUI, and its been working great ever since. The main advantage of using Cloudflare Workers over something like GroqCloud is their massive variety of models. This allows you to test out many models quickly and effectively for many use cases, such as DeepSeek Math (model card) for math-heavy tasks and Llama Guard (model card) for moderation tasks. They even support Llama 3 8B!

The main con of Workers AI is token limits and model size. Currently Llama 3 8B is the largest model supported, and they have token generation limits much smaller than some of the models available. I still think they’re worth having in this list due to the sheer variety of models they have available with no setup on your end other than of the API. If you want to set up OpenAI for Workers AI yourself, check out the guide in the README.

Adding External AIs to Open WebUI

Now, how do you add all these to your Open WebUI instance? Assuming you’ve installed Open WebUI (Installation Guide), the best way is via environment variables.

When running Open WebUI using Docker, you can set the OPENAI_API_BASE_URLS and OPENAI_API_KEYS environment variables to configure the API endpoints.

For example, to integrate OpenAI, GroqCloud, and Cloudflare Workers AI, you would set the environment variables as follows:

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  -e OPENAI_API_BASE_URLS="https://api.openai.com/v1;https://api.groq.com/openai/v1;https://openai-cf.yourusername.workers.dev/v1" \
  -e OPENAI_API_KEYS="sk-proj-ABCDEFGHIJK1234567890abcdef;gsk_1234567890abcdefabcdefghij;0123456789abcdef0123456789abcdef" \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Replace sk-proj-ABCDEFGHIJK1234567890abcdef, gsk_1234567890abcdefabcdefghij, and 0123456789abcdef0123456789abcdef with your actual API keys. Make sure to put the keys for each API in the same order as their respective API. If you don’t, you’ll get errors saying that the APIs could not authenticate.

When using Docker Compose, you can define the environment variables in your docker-compose.yaml file:

services:
  open-webui:
    environment:
      - 'OPENAI_API_BASE_URLS=${OPENAI_API_BASE_URLS}'
      - 'OPENAI_API_KEYS=${OPENAI_API_KEYS}'

Alternatively, you can define the values of these variables in an .env file, placed in the same directory as the docker-compose.yaml file:

OPENAI_API_BASE_URLS="https://api.openai.com/v1;https://api.groq.com/openai/v1;https://openai-cf.yourusername.workers.dev/v1" \
OPENAI_API_KEYS="sk-proj-ABCDEFGHIJK1234567890abcdef;gsk_1234567890abcdefabcdefghij;0123456789abcdef0123456789abcdef" \

By following these steps, you can easily integrate multiple OpenAI-compatible APIs with your Open WebUI instance, unlocking the full potential of these powerful AI models.

Conclusion

Open WebUI has opened up a whole new world of possibilities for me, allowing me to take control of my AI experiences and explore the vast array of OpenAI-compatible APIs out there. With the ability to seamlessly integrate multiple APIs, including OpenAI, Groq Cloud, and Cloudflare Workers AI, I've been able to unlock the full potential of these powerful AI models. By leveraging the flexibility of Open WebUI, I've been able to break free from the shackles of proprietary chat platforms and take my AI experiences to the next level. If you're tired of being limited by traditional chat platforms, I highly recommend giving Open WebUI a try and discovering the vast possibilities that await you.

How to Run Llama 3 Locally with Ollama and Open WebUI

Chandler — Sun, 21 Apr 2024 14:26:46 +0000

I’m a big fan of Llama. Meta releasing their LLM open source is a net benefit for the tech community at large, and their permissive license allows most medium and small businesses to use their LLMs with little to no restrictions (within the bounds of the law, of course). Their latest release is Llama 3, which has been highly anticipated.

Llama 3 comes in two sizes: 8 billion and 70 billion parameters. This kind of model is trained on a massive amount of text data and can be used for a variety of tasks, including generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way. Meta touts Llama 3 as one of the best open models available, but it is still under development. Here’s the 8B model benchmarks when compared to Mistral and Gemma (according to Meta).

This begs the question: how can I, the regular individual, run these models locally on my computer?

Getting Started with Ollama

That’s where Ollama comes in! Ollama is a free and open-source application that allows you to run various large language models, including Llama 3, on your own computer, even with limited resources. Ollama takes advantage of the performance gains of llama.cpp, an open source library designed to allow you to run LLMs locally with relatively low hardware requirements. It also includes a sort of package manager, allowing you to download and use LLMs quickly and effectively with just a single command.

The first step is installing Ollama. It supports all 3 of the major OSes, with Windows being a “preview” (nicer word for beta).

Once this is installed, open up your terminal. On all platforms, the command is the same.

ollama run llama3

Wait a few minutes while it downloads and loads the model, and then start chatting! It should bring you to a chat prompt similar to this one.

ollama run llama3
>>> Who was the second president of the united states?
The second President of the United States was John Adams. He served from 1797 to 1801, succeeding
George Washington and being succeeded by Thomas Jefferson.

>>> Who was the 30th?
The 30th President of the United States was Calvin Coolidge! He served from August 2, 1923, to March 4,
1929.

>>> /bye

You can chat all day within this terminal chat, but what if you want something more ChatGPT-like?

Open WebUI

Open WebUI is an extensible, self-hosted UI that runs entirely inside of Docker. It can be used either with Ollama or other OpenAI compatible LLMs, like LiteLLM or my own OpenAI API for Cloudflare Workers.

Assuming you already have Docker and Ollama running on your computer, installation is super simple.

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

The simply go to http://localhost:3000, make an account, and start chatting away!

If you didn’t run Llama 3 earlier, you’ll have to pull some models down before you can start chatting. Easiest way to do this is to click the settings icon after clicking your name in the bottom left.

Then clicking on “models” on the left side of the modal, then pasting in a name of a model from the Ollama registry. Here are some models that I’ve used that I recommend for general purposes.

llama3
mistral
llama2

Ollama API

If you want to integrate Ollama into your own projects, Ollama offers both its own API as well as an OpenAI Compatible API. The APIs automatically load a locally held LLM into memory, run the inference, then unload after a certain timeout. You do have to pull whatever models you want to use before you can run the model via the API, which can easily be done via the command line.

ollama pull mistral

Ollama API

Ollama has their own API available, which also has a couple of SDKs for Javascript and Python.

Here is how you can do a simple text generation inference with the API.

curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt":"Why is the sky blue?"
}'

And here’s how you can do a Chat generation inference with the API.

curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Replace the model parameter with whatever model you want to use. See the official API docs for more information.

OpenAI Compatible API

You can also use Ollama as a drop in replacement (depending on use case) with the OpenAI libraries. Here’s an example from their documentation.

# Python
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',

    # required but ignored
    api_key='ollama',
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='mistral',
)

This also works for Javascript.

// Javascript
import OpenAI from 'openai'

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1/',

  // required but ignored
  apiKey: 'ollama',
})

const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: 'user', content: 'Say this is a test' }],
  model: 'llama2',
})

Conclusion

The release of Meta's Llama 3 and the open-sourcing of its Large Language Model (LLM) technology mark a major milestone for the tech community. With these advanced models now accessible through local tools like Ollama and Open WebUI, ordinary individuals can tap into their immense potential to generate text, translate languages, craft creative writing, and more. Furthermore, the availability of APIs enables developers to seamlessly integrate LLMs into new projects or enhance existing ones. Ultimately, the democratization of LLM technology through open-source initiatives like Llama 3 unlocks a vast realm of innovative possibilities and fuels creativity in the tech industry.

Building a Fast, Efficient Web App: The Technology Stack of PromptSmithy Explained

Chandler — Tue, 26 Mar 2024 17:08:07 +0000

I’ve written a lot of one-off project, internal scripts for my own use, B2C apps, B2B apps, and everything in between. Every time I start a new project I like to use a new stack to try and diversify my own skillset, and so that if I’m ever tasked with doing another similar project in the future, my knowledge can accelerate my workflow. PromptSmithy was slightly different though, as the stack hadn’t changed from other recent projects of mine, however the frontend tooling did greatly. In this article I’m going to break down the stack we used and talk about the new development flow we used for rapid development.

The Stack

React + Vite + React Router

We all know what React is at this point, but why use it with Vite and React Router DOM over something like NextJS?

The reasons are twofold: We didn’t need any of the backend functionality of NextJS, and I wanted something that wouldn’t get in the way of our development with SSR or any other special cases that are only found on NextJS.

On top of that, Vite’s compiler is super fast, supports Typescript (which we of course used), and built just fine on our host, which was Cloudflare Pages. Cloudflare Pages is a super fast static website hosting service by Cloudflare, which allows your site to take advantage of their global CDN to make sure your site is as close to your users as possible. It supports nearly any JS framework you could want to use for your site, and can even host plan old HTML if you’re of that persuasion.

React Router is also super minimal and doesn’t get in the way, and provides all the same functionality of NextJS’s static output router without making your compile sizes massive. Our entire built site (before we added all the fancy physics animations) was just a bit over 135KB.

Tailwind + shadcn/ui + v0.dev

For development of the UI components, we tried something new. Vercel has this new AI tool called v0.dev that allows developers to take advantage of shadcn/ui and Tailwind using nothing but words, which can then be easily downloaded to your local project using nothing but a simple npx command.

Here is the original example code for a 404 page that I ended up using in the final app! npx v0 add RB8eJs2Kd6R

While I have experience with Tailwind and frontend development, I don’t really have the patience to use it. I usually end up using something like Mantine, which is a complete component library UI kit, or Daisy UI, which is a component library built on top of Tailwind. Shadcn/ui is quite similar to Daisy in this sense, but being able to customize the individual components, since they get installed to your components folder, made development more streamlined and more customizable. On top of that being able to change my components style with natural language thanks to v0 made development super easy and fast. Shadcn may be too minimalist of a style for some, but thanks to all the components being local, you can customize them quickly and easily!

This is the structure of the project’s components directory.

Supabase

Here the thing that accelerated my development the most: Supabase. Thanks to its Database, Authentication, and Edge Functions, we were able to rapidly develop the app. Their JS library made development super seamless, and their local development stack made testing a breeze.

The development process is simple: install their CLI, run supabase init, run supabase start. That’s it. (Assuming you have Docker installed that is.)

The database service is pretty self explanatory. Rather than having to write SQL queries in a remote API, hosting that API as well as the database to go with it, you simply create tables with migrations (created using supabase migrations new name_here), then you can query the migrations using the frontend API. From there you can configure row level security, which restricts access to specific rows on a per user basis, using either the migrations themselves or using the local UI. I opted for the former so that I could easily apply the migrations. Here is one I wrote for the project.

create table prompts (
  id bigint primary key generated always as identity,
  user_id uuid not null,
  metaprompt_id bigint references metaprompts(id),
  prompt text,
  variables text,
  title text,
  task text,
  created_at timestamp with time zone default now(),
  updated_at timestamp with time zone default now(),
  public boolean default false
);

alter table prompts
  enable row level security;

create policy "Users can insert their own prompts" on prompts
  for insert with check (auth.uid() = user_id);

create policy "Users can read their own prompts that are private" on prompts
  for select using (auth.uid() = user_id and public = false);

create policy "Users can read all prompts that are public" on prompts
  for select using (public = true);

This then got applied locally by running supabase db reset, and deployed remotely with supabase db push. We could then query on the frontend using the following code:

const limit = 100;
const { data, error } = await supabase
      .from("prompts")
      .select("*")
      .eq("public", true)
      .order("created_at", {
        ascending: false,
      })
      .limit(limit);

You can see Supabase’s excellent guides on how to do this for more information.

We also took advantage of Supabase’s Authentication service that allows us to quickly and effectively log in a user so we can handle authenticated requests (which once Row Level Security is set up is automatic) quickly. Since this was a weekend project sort of app, we went with Supabase Magic Links, which allows our users to log in by simply entering their email and clicking a link that gets sent. Here’s all the code to do that:

const email = 'me@example.com';
const { error } = await supabase.auth.signInWithOtp({ email });

That’s it! Then all we had to do is have the user click the link (assuming your website URL is configured properly in the Supabase settings) and they were logged in!

Finally we used Supabase Edge Functions to handle payments with Stripe, as well as hold the business logic of PromptSmithy, which primarily just calling Anthropic AI. Edge Functions are written in Deno, which is a NodeJS alternative that I like very much. You create a new edge function by running supabase functions new name_here and then deploying with supabase functions deploy . You can also run these functions locally for testing (which is what we did along with the Stripe CLI’s webhook feature) with supabase functions serve .

Calling your functions on the frontend is super simple too. Whenever we wanted to call the AI, this is the code we run on the frontend.

const input = "Write me an email response to my boss asking for a raise";
const resp = await supabase.functions.invoke<string>(
        "create-task-prompt",
        {
          body: {
            task: input,
            variables: "",
            public: true,
            metapromptID: 1,
          },
        }
      );

The value of resp would be whatever we responded with, which is always JSON for our application.

Functions can also be invoked by remote applications, for example Stripe webhooks. If you want this, you’ll need to make sure that JWT verification is disabled for that function, which can be done simply in the config.toml in the supabase directory of your project. Here’s an example.

[functions.stripe-webhook]
verify_jwt = false

Now whenever this function is deployed you can check your Edge Functions page for a URL to give to Stripe!

Conclusion

In conclusion, our choice of stack for PromptSmithy’s project was primarily based on the speed of development and the performance of the end product. Using tools like Vite, React, Supabase, and the innovative v0.dev, we were able to develop rapidly and effectively, resulting in a highly functional and efficient application.

Want to give PromptSmithy a try? All new users get $5 in free credits!

🦉 AthenaDB: Distributed Vector Database Powered by Cloudflare 🌩️

Chandler — Mon, 19 Feb 2024 18:47:15 +0000

What is AthenaDB?

AthenaDB is a serverless vector database designed to be highly distributed and easily accessible as an API. It leverages Cloudflare’s Workers AI platform to create the vectors, Cloudflare Vectorize for handling vector querying, and Cloudflare D1 as its database for storing text. This combination allows AthenaDB to offer a simple yet powerful set of API endpoints for inserting, querying, retrieving, and deleting vector text data.

Key Features of AthenaDB

Simple API Endpoints: AthenaDB provides straightforward endpoints for various database operations, making it accessible for developers of all skill levels.
Distributed Nature: With data replication across multiple data centers, AthenaDB ensures high availability and resilience.
Built-In Data Replication: Due to Cloudflare Workers’ underlying architecture, data is replicated across data centers automatically.
Scalability: AthenaDB is designed to handle large amounts of vector text data, making it suitable for projects with high data volumes.
Serverless Architecture: With AthenaDB being serverless, you don't have to worry about managing infrastructure, allowing for more focus on development.

What Are Vector Databases?

Vector databases are a special kind of computer storage that helps artificial intelligence (AI) programs quickly understand and use information. They work by turning data into numbers (called vectors) that the AI can easily compare to find similarities. This is really useful for things like online searches, suggesting products you might like, or creating smart chatbots.

For example, if a vector database has the following three items: “Python is cool”, “Java is cool”, and “C is statically typed”, and the user uses a search query “coffee”, it would return “Java is cool”. Why? Because while the user may not have been talking about the programming language Java, the words “java” and “coffee” have similar root meaning, which the neural network that created the vectors relates using complex math.

You can learn more about them in this article.

Why Cloudflare?

Cloudflare has a serverless compute platform called Workers. Workers are automatically replicated across all Cloudflare data centers, meaning that the developer can make an API or other application that automatically scales with zero infrastructure! Workers also automatically routes user requests to their nearest data center, meaning that latency is reduced significantly!

By using this, it means AthenaDB gets many of the features of Cloudflare’s Platform - Data replication, distribution across data centers, and an infinitely scalable serverless architecture - with no complicated code base or management.

Getting Started with AthenaDB

Deploying and using AthenaDB involves a few steps, starting from setting up your environment to deploying your instance of AthenaDB.

Prerequisites

Before you begin, make sure you have the following:

Cloudflare account
Node.js and npm installed
Wrangler CLI installed (npm install -g @cloudflare/wrangler)

Deployment Steps

Clone the Repository: Start by cloning the AthenaDB repository to your local machine, installing dependencies, and logging in to Wrangler.
```
git clone https://github.com/TimeSurgeLabs/athenadb.git
cd athenadb
npm i
npx wrangler login
```
Create a Vector and Database: Use the provided npm scripts to create a vector and database for your AthenaDB instance.
```
npm run create-vector
npm run create-db
```
After running these commands, copy the output Database ID and update the wrangler.toml file under database_id.
Initialize the Database: Run the initialization script to set up the database schema.
```
npm run init-db
```
Deploy AthenaDB: Finally, deploy your instance of AthenaDB using Wrangler.
```
npm run deploy
```
Upon successful deployment, you will receive an output with your API URL, which indicates that AthenaDB is now ready for use.

Using AthenaDB

With AthenaDB deployed, you can start interacting with the database through its API endpoints. Here are some examples of how you can use AthenaDB:

Inserting Text Data: Use the /insert endpoint to add text data into the database.

fetch('https://athenadb.yourusername.workers.dev/your-namespace/insert', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ input: 'Your text here' })
})

Querying the Database: To find similar text embeddings, use the /query endpoint.

fetch('https://athenadb.yourusername.workers.dev/your-namespace/query', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ input: 'Query text' })
})

Retrieving an Entry: Retrieve specific entries using their UUID with the GET endpoint.

fetch('https://athenadb.yourusername.workers.dev/your-namespace/your-uuid', {
  method: 'GET'
})

Deleting Data: Use the /delete endpoint to remove data from the database.

fetch('https://athenadb.yourusername.workers.dev/your-namespace/your-uuid', {
  method: 'DELETE'
})

Conclusion

AthenaDB stands out as a powerful tool for developers needing a scalable, serverless database solution for managing vector text data. By following the steps outlined in this blog post, you can deploy your own instance of AthenaDB and begin leveraging its capabilities for your projects. Whether you're building search engines, recommendation systems, or any application that requires efficient handling of vector data, AthenaDB provides a robust, easy-to-use solution.

If you’re looking to integrate AI into your existing workflow or products, TimeSurge Labs is here to help. Specializing in AI consulting, development, internal tooling, and LLM hosting, our team of passionate AI experts is dedicated to building the future of AI and helping your business thrive in this rapidly changing industry. Contact us today!

How I Use Google's Gemini Pro with LangChain

Chandler — Thu, 04 Jan 2024 17:00:26 +0000

Google's Gemini Pro is one of the newest LLMs publicly available, and to the surprise of some its relatively price competitive.

Its effectively free while you're developing, and after you development is complete its relatively cheap, costing around $0.00025 per 1K characters (characters, not tokens like OpenAI), which is slightly more expensive than GPT-3.5-Turbo, and $0.0025 per image, which is effectively the same as OpenAI's GPT-4).

Okay, I get it, how do I use it?

Let's start fresh with a new project. Assuming you're using Python >= 3.10, let's initialize a new virtual environment.

python -m venv env
source env/bin/activate

And let's install LangChain and our dotenv file loader first.

pip install langchain python-dotenv

Once that's done, we can install Gemini Pro's libraries and its LangChain adapter.

pip install google-generativeai langchain-google-genai

Next you need to acquire an API key, which can be done on the Google MakerSuite. In the top left of the page you should see a "Get API Key" button.

Click that button, then click "Create API Key in new project".

Copy the new API Key and save it to a .env file in your project directory.

GOOGLE_API_KEY=new_api_key_here

Now we can create a script that calls Gemini Pro via LangChain.

import os

from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

load_dotenv()

GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

llm = ChatGoogleGenerativeAI(model="gemini-pro", google_api_key=GOOGLE_API_KEY)

tweet_prompt = PromptTemplate.from_template("You are a content creator. Write me a tweet about {topic}.")

tweet_chain = LLMChain(llm=llm, prompt=tweet_prompt, verbose=True)

if __name__=="__main__":
    topic = "how ai is really cool"
    resp = tweet_chain.run(topic=topic)
    print(resp)

And that's it! You've now integrated Gemini Pro with LangChain! If you're interested in learning more about LangChain and AI, follow us here on Dev.to as well as on X! I also post a lot of AI and developer stuff on my personal X account! We also have more articles on LangChain an AI on our Dev Page!

Happy coding!

Llamafile: AI Integration & Deployment Made Easy

Chandler — Wed, 13 Dec 2023 18:45:47 +0000

Its no secret that I think Llama 2 and its derivatives are the future of AI and ML. Rather than getting bigger and smarter, I think GPT-4 is enough for 99.5% of applications, AI should instead strive to get smaller, cheaper, and faster. If you want to run Llama 2 via llama.cpp, you can check out my guide on how to do that.

However, the problem with llama.cpp is that to get it working you have to have all the dependencies, either download a binary or clone and build the repo, make sure your drivers are working, and then you can finally run it. What if you just want to download the model and run it? Well, that's where Llamafiles comes in.

Llamafile is a project by a team over at Mozilla. It allows users to distribute and run LLMs using a single, platform-independent file. It accomplishes this by building all required code into a binary called llamafile , then by using a zipping tool, you can combine the binary with the model and any other files you need. This allows you to run Llama 2, Mistral 7B, on anything, without having to have most dependencies.

Llamafiles come in two flavors: Main and Server. Main replicates the command-line interface of llama.cpp, while Server is a server that can be used to run Llama 2 and other models over HTTP with a basic but functional web interface. I'll be focusing on the Servers, as that's what I prefer to use, but the process is the same for both.

How to Use Llamafiles

There are some dependencies you'll need depending on your platform. You can see here for the most up-to-date information on that, but here is the TLDR as of 13/12/2023:

On macOS, you need Xcode installed.
For GPU support on Linux and Windows, you need CUDA.
For GPU support on Linux, you need to have the cc compiler installed.
On Linux and Windows you have to pass --n-gpu-layers 35 for the GPU to work properly.

One other thing of note, due to a limitation imposed by Microsoft, you cannot run Llamafiles larger than 4GB on any version of Windows. This can be bypassed by enabling Nvidia CUDA on WSL and running it inside of Linux. Otherwise you'll just be limited to models smaller than 4GB.

Once that is all complete, you can simply find a llamafile, and try to run it! Platform doesn't matter, except if you are on Windows you have to add .exe to the end of the file in order to execute it. On *nix systems, here's all you have to do.

# replace with the url and name of the llamafile you want to run!
wget https://huggingface.co/jartine/mistral-7b.llamafile/resolve/main/mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
chmod +x mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile

And that's it! You can now run Llama 2 on anything, without having to worry about dependencies. I have compiled a list of available Llamafiles on HuggingFace, you can find the collection here.

Making your own Llamafile

If your favorite Llama-based model isn't available yet, or you want to run a newer version of the Llama code that an older Llamafile may not be running, you may want to make your own Llamafile. Luckily, this is pretty easy to do!

First step is to obtain either binaries or source code for Llamafile. I opted to compile from source, but they do have precompiled binaries on their releases page. Here's how you can compile from source.

git clone https://github.com/Mozilla-Ocho/llamafile.git
cd llamafile
# optional. I compiled the latest tagged version. You can also just build from main.
git checkout 0.4 # there may be a newer version by the time you read this
make -j$(nproc)
sudo make install PREFIX=/usr/local

Now that you have Llamafile, you need to download your model. The models have to be in the GGUF format for ggml, as Llamafile is based on llama.cpp. If you don't know how to convert models to GGUF, or just don't want to put in the effort, TheBloke has uploaded thousands of GGUF conversions to HuggingFace. I'd check his collection before you try to convert anything yourself. If the model happens to be unavailable as GGUF, here's a guide by Substratus.ai.

For this example, we're going to package Intel's Neural Chat v3.3 into a Llamafile. First, we need to download the model. Luckily for us, TheBloke has already converted it to GGUF, so we can just download it from HuggingFace. Once that's done, we should test it out before we package it. Luckily, when you install llamafile it includes the ability to execute models in a quick and easy manner from the command line, even before you package it into a llamafile!

# The Q4_K_M model is a good balance between compression and performance.
wget https://huggingface.co/TheBloke/neural-chat-7B-v3-3-GGUF/resolve/main/neural-chat-7b-v3-3.Q4_K_M.gguf
llamafile-server -m neural-chat-7b-v3-3.Q4_K_M.gguf --host 0.0.0.0

If there is a browser available on the machine you are running this on, it will automatically open a browser window to the server. If not, you can just go to http://localhost:8080 (or the IP of the server) to access the web interface. Once you are there, you can test out the model. If you are happy with the results, you can now package it into a Llamafile.

First, we have to create an arguments file. This file is named .args. and gets appended to the end of the file to act as the arguments for the executable. Here is the contents we are going to be putting on our file, use whichever text editor you prefer. The file is newline delimited, so make sure to add a newline where you would normally add a space.

-m
neural-chat-7b-v3-3.Q4_K_M.gguf
--host
0.0.0.0
...

The ... at the end means to accept any other commands that are passed to the executable. Now that we have our arguments file, we can package it into a Llamafile. First, copy the llamafile from /usr/local/bin/llamafile-server (or wherever you installed it) to the same directory as the model. Then, rename the executable to what you want to call your Llamafile plus .llamafile . In this case, we'll call it neural-chat-7b-v3-3.Q4_K_M.llamafile .

cp /usr/local/bin/llamafile-server neural-chat-7b-v3-3.Q4_K_M.llamafile

Now we can package it all together into a Llamafile. Llamafile include a command called zipalign that will package everything together for you. Here's how you can use it.

zipalign -j0 neural-chat-7b-v3-3.Q4_K_M.llamafile neural-chat-7b-v3-3.Q4_K_M.gguf .args

Once this is complete, you can test your llamafile out with ./neural-chat-7b-v3-3.Q4_K_M.llamafile . If it works, you can now distribute it to your friends, or upload it to HuggingFace for others to use!

Conclusion

Llamafiles are a great way to distribute Llama 2 and other LLMs. They allow you to run Llama 2 on anything, without having to worry about dependencies. They also allow you to distribute LLMs without having to worry about the user having the right version of Llama 2 installed. I hope this guide was helpful, and I hope to see more Llamafiles in the future!

How I Made an AI Agent in 10 Minutes with LangChain

Chandler — Tue, 15 Aug 2023 16:49:12 +0000

LangChain is a powerful library for Python and Javascript/Typescript that allows you to quickly prototype large language model applications. It allows you to chain together LLM tasks (hence the name) and even allows you to run autonomous agents quickly and easily. In this blog post, we'll explore how to create agents and define custom tools that those agents can use.

Prerequisites

Python 3.9
- 3.10 and up have some issues with some of LangChain’s modules.
An OpenAI API Key

Getting Started

We’re going to create a Python virtual environment and install the dependencies that way.

mkdir myproject
cd myproject
# or python3, python3.9, etc depending on your setup
python -m venv env
source env/bin/activate # this will need run every time before using your agent

Once that is done we can install dependencies. The only ones we need for this tutorial are LangChain and OpenAI. Finally, python-dotenv will be used to load the OpenAI API keys into the environment.

pip install langchain openai python-dotenv requests duckduckgo-search

LangChain is a very large library so that may take a few minutes. While this is downloading, create a new file called .env and paste your API key in. Here is an example:

OPENAI_API_KEY=Your-api-key-here

Once that is complete we can make our first chain!

Quick Concepts

Agents are a way to run an LLM in a loop in order to complete a task. Agents are defined with the following:
- Agent Type - This defines how the Agent acts and reacts to certain events and inputs. For this tutorial we will focus on the ReAct Agent Type.
- LLM - The AI that actually runs your prompts.
- Tools - These are Python (or JS/TS) functions that your Agent can call to interact with the world outside of itself. These can be as simple or as complex as you want them to be!
  - Many tools make a Toolkit. There are many toolkits already available built-in to LangChain, but for this example we’ll make our own.

Agents

Agents use a combination of an LLM (or an LLM Chain) as well as a Toolkit in order to perform a predefined series of steps to accomplish a goal. For this example, we’ll create a couple of custom tools as well as LangChain’s provided DuckDuckGo search tool to create a research agent.

1. Importing Necessary Libraries

import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from langchain.tools import Tool, DuckDuckGoSearchResults
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.agents import initialize_agent, AgentType

Here's a breakdown of the imports:

requests: A popular Python library for making HTTP requests.
BeautifulSoup: A library for web scraping purposes to pull the data out of HTML and XML files.
load_dotenv: A method to load environment variables from a .env file.
LangChain specific imports: These are specific to the LangChain framework and are used to define tools, prompts, chat models, chains, and agents.

2. Loading Environment Variables

load_dotenv()

This line loads environment variables from a .env file. This is useful if you have API keys or other sensitive information that you don't want to hard-code into your script.

3. Setting Up the DuckDuckGo Search Tool

ddg_search = DuckDuckGoSearchResults()

This initializes the DuckDuckGo search tool provided by LangChain. It allows you to search the web using DuckDuckGo and retrieve the results.

4. Defining Headers for Web Requests

HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:90.0) Gecko/20100101 Firefox/90.0'
}

This sets a user-agent header for our web requests. Some websites might block requests that don't have a user-agent set, thinking they're from bots.

5. Parsing HTML Content

def parse_html(content) -> str:
    soup = BeautifulSoup(content, 'html.parser')
    text_content_with_links = soup.get_text()
    return text_content_with_links

This function takes in HTML content, uses BeautifulSoup to parse it, and then extracts all the text from it.

6. Fetching Web Page Content

def fetch_web_page(url: str) -> str:
    response = requests.get(url, headers=HEADERS)
    return parse_html(response.content)

This function fetches the content of a web page using the requests library and then parses the HTML to extract the text.

7. Creating the Web Fetcher Tool

web_fetch_tool = Tool.from_function(
    func=fetch_web_page,
    name="WebFetcher",
    description="Fetches the content of a web page"
)

Here, we're creating a new tool using the Tool.from_function method. This tool will use our fetch_web_page function to fetch and parse web pages.

8. Setting Up the Summarizer

prompt_template = "Summarize the following content: {content}"
llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate.from_template(prompt_template)
)

summarize_tool = Tool.from_function(
    func=llm_chain.run,
    name="Summarizer",
    description="Summarizes a web page"
)

This section sets up a summarizer using the ChatOpenAI model from LangChain. We define a prompt template for summarization, create a chain using the model and the prompt, and then define a tool for summarization. We use ChatGPT 3, 5 16k context as most web pages will exceed the 4k context of ChatGPT 3.5.

9. Initializing the Agent

tools = [ddg_search, web_fetch_tool, summarize_tool]

agent = initialize_agent(
    tools=tools,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    llm=llm,
    verbose=True
)

Here, we're initializing an agent with the tools we've defined. This agent will be able to search the web, fetch web pages, and summarize them. Notice how we can re-use the LLM from the summarize tool.

10. Running the Agent

prompt = "Research how to use the requests library in Python. Use your tools to search and summarize content into a guide on how to use the requests library."

print(agent.run(prompt))

Finally, we define a prompt for our agent and run it. The agent will search the web for information about the Python’s Requests Library, fetch the content fetch some of the content, and then summarize it.

Experimentation

In this section, we'll explore how to modify the code from the blog post draft to use the experimental Plan and Execute agent. The Plan and Execute agent accomplishes an objective by first planning what to do, then executing the sub-tasks. The planning is almost always done by an LLM, while the execution is usually done by a separate agent equipped with tools.

First, install the LangChain experimental package.

pip install langchain_experimental

Then you can import the necessary modules from the langchain_experimental.plan_and_execute package:

from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner

Load the planner and executor:

planner = load_chat_planner(llm)
executor = load_agent_executor(llm, tools, verbose=True)

Initialize the Plan and Execute agent:

agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)

Run the agent with a prompt:

result = agent.run("Research how to use the requests library in Python. Use your tools to search and summarize content into a guide on how to use the requests library.")
print(result)

In this example, the agent will first plan the steps needed to accomplish the objective, then execute the sub-tasks using the tools provided. The agent will search the web for information about the requests library in Python, fetch the content of the relevant results, and then summarize them into a guide.

Note that the Plan and Execute agent is experimental and may not work as expected in all cases. However, it can be a powerful tool for automating complex tasks that require multiple steps and interactions with external tools.

Conclusion

LangChain is a game-changer for anyone looking to quickly prototype large language model applications. In just a few minutes, we’ve walked through the process of creating agents, defining custom tools, and even experimenting with the experimental Plan and Execute agent to automate complex tasks.

The power of LangChain lies in its simplicity and flexibility. Whether you’re a seasoned developer or just starting out, LangChain’s intuitive design allows you to harness the capabilities of large language models like never before. From generating creative content to running autonomous agents, the possibilities are endless.

So why wait? Dive into LangChain today and unleash the potential of AI in your projects. If you’re looking to integrate AI into your existing workflow or products, TimeSurge Labs is here to help. Specializing in AI consulting, development, internal tooling, and LLM hosting, our team of passionate AI experts is dedicated to building the future of AI and helping your business thrive in this rapidly changing industry. Contact us today!

How To Use LangChain in 10 Minutes

Chandler — Wed, 09 Aug 2023 16:19:04 +0000

LangChain is a powerful library for Python and Javascript/Typescript that allows you to quickly prototype large language model applications. It allows you to chain together LLM tasks (hence the name) and even allows you to run autonomous agents quickly and easily. Today we will be going over the basics of chains, so you can hit the ground running with your newest LLM projects!

Prerequisites

Python 3.9
- 3.10 and up have some issues with some of LangChain’s modules.
An OpenAI API Key

Getting Started

We’re going to create a Python virtual environment and install the dependencies that way.

mkdir myproject
cd myproject
# or python3, python3.9, etc depending on your setup
python -m venv env
source env/bin/activate

pip install langchain openai python-dotenv

LangChain is a very large library so that may take a few minutes. While this is downloading, create a new file called .env and paste your API key in. Here is an example:

OPENAI_API_KEY=Your-api-key-here

Once that is complete we can make our first chain!

Quick Concepts

There are a few basic concepts you’ll need to understand in order to get started.

Chains can be thought of as a list of actions to take with an LLM or multiple LLM calls in the list. A chain is made up of 3 simple parts.
- Prompt Template - so you can quickly change inputs without changing the prompt.
- LLM - The AI that actually runs your prompts.
- Output Parsers - Converts the output into something useful, usually just another string.

Writing the Chain

For this example, we’re going to write a chain that generates a TikTok script (I am a Zoomer after all) for an educational channel. First, we need to generate a description for the TikTok. We will use prompt templating so we can reuse the prompt later.

# prompts.py
from langchain.prompts import PromptTemplate

description_prompt = PromptTemplate.from_template(
    "Write me a description for a TikTok about {topic}")

This can then be used in a chain. Before we can define a chain, we need to define an LLM for the chain to use. LangChain recommends that most users should use The ChatOpenAI class to get the cost benefits and simplicity of the ChatGPT API.

# chain.py
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from dotenv import load_dotenv
from prompts import description_prompt

# loads the .env file
load_dotenv()

llm = ChatOpenAI(model_name="gpt-3.5-turbo")

Once that is done we can create the chain.

description_chain = LLMChain(llm=llm, prompt=description_prompt, verbose=True)

Now we can call the new chain with .predict

output = description_chain.predict(topic="Cats are cool")
print(output)

And here is the output:

😻 Unleash your inner feline aficionado! From their enchanting eyes to their purrfectly mysterious ways, cats are the epitome of coolness. 🐾 Watch as they effortlessly own their spaces, teaching us the art of relaxation and play. Whether they're mastering acrobatics or curling up for a catnap, their cool vibes are undeniable. 😎 Join the cat craze and embrace the awesomeness of these four-legged trendsetters! 🐱💫 #CatsRule #CoolCats #FelineVibes

Now that we have a description, we need to have it write a script. This is where chaining comes in - we can sequentially call the LLM again using a slightly different prompt. First, let’s define a new prompt for the next chain.

# add to prompts.py

script_prompt = PromptTemplate.from_template(
    "Write me a script for a TikTok given the following description: {description}")

Here is what your chain.py should look like now.

# chain.py
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from dotenv import load_dotenv
# this line changed!!!!!
from prompts import description_prompt, script_prompt

# loads the .env file
load_dotenv()

llm = ChatOpenAI(model_name="gpt-3.5-turbo")

description_chain = LLMChain(llm=llm, prompt=description_prompt, verbose=True)

output = description_chain.predict(topic="Cats are cool")
print(output)

# new code below this line
script_chain = LLMChain(llm=llm, prompt=script_prompt)
script = description_chain.predict(description=output, verbose=True)
print(script)

Here is the new output:

[Opening shot: A close-up of a cat's mesmerizing eyes, slowly blinking.]

Narrator (Voiceover): "😻 Unleash your inner feline aficionado!"

[Cut to a sleek cat walking confidently through a room, tail swaying gracefully.]

Narrator (Voiceover): "From their enchanting eyes to their purrfectly mysterious ways..."

[Transition to a montage of cats lounging in different relaxed poses.]

Narrator (Voiceover): "Cats are the epitome of coolness. 🐾"

[Show a cat effortlessly jumping onto a high shelf, landing with precision.]

Narrator (Voiceover): "Watch as they effortlessly own their spaces..."

[Cut to a person lying on the couch while a cat playfully bats at a string toy.]

Narrator (Voiceover): "Teaching us the art of relaxation and play."

[Show a cat doing a graceful mid-air flip while chasing a feather toy.]

Narrator (Voiceover): "Whether they're mastering acrobatics..."

[Transition to a cozy scene of a cat curled up in a sunlit spot, eyes half-closed.]

Narrator (Voiceover): "Or curling up for a catnap..."

[Cut to a group of cats with various personalities and fur patterns.]

Narrator (Voiceover): "Their cool vibes are undeniable."

[Show a person petting a content cat, both sharing a moment of connection.]

Narrator (Voiceover): "😎 Join the cat craze and embrace the awesomeness..."

[Cut to a playful cat chasing its tail, accompanied by a cheerful laugh.]

Narrator (Voiceover): "...of these four-legged trendsetters!"

[End with a shot of a cat sitting regally, gazing confidently into the camera.]

Narrator (Voiceover): "🐱💫 #CatsRule #CoolCats #FelineVibes"

[Fade out with a final glimpse of a cat's eyes.]

Narrator (Voiceover): "Because when it comes to cool, cats wrote the book."

[End screen: "Follow for more feline fun!"]

[Background music fades out as the TikTok video concludes.]

Using them like this is fine, but what if we want to chain them together? That’s where Sequential Chains comes in. These allow you to tie multiple chains into a single function call, with them executing in order they are defined. There are two types of sequential chains, we’re just going to focus on the simple sequential chain. Edit the Chain import line to the following:

from langchain.chains import LLMChain, SimpleSequentialChain

Move the LLM chains to the top of the file and remove the print statements and the .predict calls.

# chain.py
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain, SimpleSequentialChain
from dotenv import load_dotenv
# this line changed!!!!!
from prompts import description_prompt, script_prompt

# loads the .env file
load_dotenv()

llm = ChatOpenAI(model_name="gpt-3.5-turbo")

description_chain = LLMChain(llm=llm, prompt=description_prompt)
script_chain = LLMChain(llm=llm, prompt=script_prompt)

tiktok_chain = SimpleSequentialChain(chains=[description_chain, script_chain], verbose=True)

script = tiktok_chain.run("cats are cool")

print(script)

And here is the output:

Title: #CoolCatsRule

INT. LIVING ROOM - DAY

A trendy, upbeat song begins playing as the camera pans across a stylishly decorated living room. Various cat-themed decorations can be seen, setting the perfect atmosphere for showcasing the undeniable coolness of cats.

CUT TO:

INT. BEDROOM - DAY

A YOUNG WOMAN, in her early twenties, stands in front of a full-length mirror. She wears trendy clothes and holds a CAT, who seems equally as cool, in her arms.

YOUNG WOMAN
(looking into the mirror)
Ready to show the world why cats rule!

The young woman gently places the cat on the ground as the camera zooms in on the feline.

CUT TO:

INT. KITCHEN - DAY

A CAT sits on the kitchen counter, effortlessly balancing on one paw while wearing sunglasses. The camera pans around it, capturing its cool demeanor.

CUT TO:

INT. BACKYARD - DAY

A CAT lounges in a hammock, wearing a tiny hat and reading a book. The camera captures its relaxed and sophisticated vibe.

CUT TO:

INT. LIVING ROOM - DAY

The young woman sits on the couch, surrounded by a group of COOL CATS. Each cat showcases their unique coolness, like one wearing a leather jacket and another playing a tiny electric guitar.

YOUNG WOMAN
(points to the cats)
See? Cats rule!

The camera zooms in on the cats, showing their undeniable feline awesomeness.

CUT TO:

INT. LIVING ROOM - DAY

The young woman and her cool cats gather around a table, where they enjoy a mini-cat party. There are cat-themed snacks, funky drinks, and even a DJ cat scratching vinyl records.

CUT TO:

INT. LIVING ROOM - DAY

The young woman holds up a sign saying "#CoolCatsRule" as the cats pose beside her. The camera pans out to reveal a fun, energetic dance routine as they all groove to the beat.

CUT TO:

INT. LIVING ROOM - DAY

The young woman and her cool cats strike a final pose, with the camera capturing their undeniable coolness.

YOUNG WOMAN
(looking at the camera)
Remember, folks, cats rule!

The screen fades out with the hashtag #CoolCatsRule displayed prominently.

FADE OUT.

Conclusion

LangChain is a game-changer for anyone looking to quickly prototype large language model applications. In just a few minutes, we've walked through the process of creating chains, defining prompts, and even chaining together multiple LLM calls to create a dynamic TikTok script.

The power of LangChain lies in its simplicity and flexibility. Whether you're a seasoned developer or just starting out, LangChain's intuitive design allows you to harness the capabilities of large language models like never before. From generating creative content to running autonomous agents, the possibilities are endless.

So why wait? Dive into LangChain today and unleash the potential of AI in your projects. If you're looking to integrate AI into your existing workflow or products, TimeSurge Labs is here to help. Specializing in AI consulting, development, internal tooling, and LLM hosting, our team of passionate AI experts is dedicated to building the future of AI and helping your business thrive in this rapidly changing industry. Contact us today!

Cover image generated by Stable Diffusion.