DEV Community

CircArgs
CircArgs

Posted on • Originally published at ouellet.dev

Keymaker: Powerful, Flexible, and Extensible Language Model Control

TLDR;

Intro

Artificial intelligence and natural language processing have made significant strides in recent years. Large language models (LLMs) like ChatGPT and GPT-4 have demonstrated remarkable capabilities in generating human-like text. However, controlling the output of these models can be challenging, especially when ensuring the generated text meets specific requirements or follows a desired format.

Keymaker, a Python library, provides a powerful, flexible, and extensible way to control the output of large language models. Keymaker makes it easier than ever to ensure your model's output is exactly what you need. It offers a simple and straightforward way to create and apply constraints on generated tokens.

Example TLDR;

The example below demonstrates several powerful features of Keymaker:

  • Dynamic Prompts and Responses: Keymaker allows for dynamic generation of prompts and responses using a simple formatting and completion system.

  • Model Flexibility: You can use different models for different parts of the prompt. chatgpt and LlamaCpp are used at will.

  • Constraints: Constraints like OptionsConstraint, StopsConstraint, and RegexConstraint (and others not shown) give you the power to control the output of your model precisely.

  • Mapping Function: The mapping function allows for transformation of the generated output before it is returned.

  • Multiple Completions: You can generate multiple fully controlled completions for a single prompt, as demonstrated in the countdown example.

  • Plain Python: Everything is plain python. Prompts are str and control-flow is plain and testable python. Control flow and other logic is not tucked away in a string or DSL.

Now, let's get into it.

Keymaker in Practice

First, import the necessary modules and set up the model instances:

from keymaker.models import chatgpt, LlamaCpp
from keymaker import Prompt, CompletionConfig
from keymaker.constraints import RegexConstraint, OptionsConstraint, StopsConstraint

chat_model = chatgpt()
llama_model = LlamaCpp(model_path="path/to/llama/model/file")
Enter fullscreen mode Exit fullscreen mode

Create the prompt with format parameters. The placeholders in the prompt are for various completions that will be generated using different models and constraints.

async def print_stream(completion):
    print(completion)

prompt = Prompt(
    """Time: {time}
User: {user_msg}
Assistant: Hello, {}{punctuation}
User: Can you write me a poem about a superhero named pandaman being a friend to {}?
Assistant:{poem}
User: What is 10+5?
Assistant: The answer is 10+5={math}

The final answer is {fin}!

User: Countdown from 5 to 0.
Assistant: 5, 4, {countdown}

""",
    chat_model,
    stream = print_stream,
)
Enter fullscreen mode Exit fullscreen mode

Generate completions from different models by using the format method on the Prompt object. Different models and constraints are used for each completion:

filled_in = await prompt.format(
    CompletionConfig(constraint=OptionsConstraint({"Sam", "Nick"})),
    lambda p: p.completions[0],
    punctuation="!",
    user_msg="Hi, my name is Nick.",
    time="2023-07-23 19:33:01",
    poem=CompletionConfig(
        llama_model,
        max_tokens=250,
        constraint=StopsConstraint("User|Assistant", include=False),
    ),
    math=CompletionConfig(
        llama_model,
        constraint=RegexConstraint("[0-9]+", terminate_on_match=False),
        map_fn=int,
    ),
    fin=lambda p: CompletionConfig(
        llama_model,
        constraint=RegexConstraint(rf"{p.completions.math}|16"),
    ),
    countdown=lambda p: (
        CompletionConfig(
            llama_model,
            constraint=RegexConstraint("[0-9]"),
            map_fn=lambda s: f"{s}, ",
        )
        for _ in range(5)
    ),
)
Enter fullscreen mode Exit fullscreen mode

Print the resulting completed prompt to see the generated completions:

print(filled_in)
Enter fullscreen mode Exit fullscreen mode

This example demonstrates how to generate completions from different models like chatgpt and LlamaCpp using the format method in Keymaker. The prompt contains placeholders for various completions, and they are generated using different models and constraints.

With Keymaker installed, controlling the output of large language models becomes easier than ever. Check out the documentation โ†— for more information and examples on how to use Keymaker effectively.

In Depth Explanation

Let's break down the example to understand each portion of the code.

0. The Prompt

Take another look at the Prompt

async def print_stream(completion):
    print(completion)

prompt = Prompt(
    """Time: {time}
User: {user_msg}
Assistant: Hello, {}{punctuation}
User: Can you write me a poem about a superhero named pandaman being a friend to {}?
Assistant:{poem}
User: What is 10+5?
Assistant: The answer is 10+5={math}

The final answer is {fin}!

User: Countdown from 5 to 0.
Assistant: 5, 4, {countdown}

""",
    chat_model,
    stream = print_stream,
)
Enter fullscreen mode Exit fullscreen mode

Prompts are just strings and Keymaker intends that to be the case with completions as well. The format is another example of that. any {...} is a spot to insert text just as with an ordinary format string.

At the end of the prompt, we have an opportunity to introduce default parameters for all completions. Here, we introduce a default model chat_model and a simple stream function that simply prints completions as they occur.

1. CompletionConfig with OptionsConstraint

We start by generating a completion using an OptionsConstraint. This constraint allows us to restrict the generated text to one of the provided options. In this case, the options are "Sam" and "Nick".

CompletionConfig(constraint=OptionsConstraint({"Sam", "Nick"}))
Enter fullscreen mode Exit fullscreen mode

2. Completion Using a Lambda Function

Next, we use a lambda function to reference the previous completion. This allows us to dynamically include the value generated by the previous completion in our prompt.

lambda p: p.completions[0]
Enter fullscreen mode Exit fullscreen mode

3. Static and Dynamic Constraints

We use both static and dynamic constraints to control the generated text. format completion is flexible in that we can provide

  • Stringable: Something castable to a str
  • Callable returning Stringable or CompletionConfig: Dynamic single component prompt
  • Callable returning iterable of Stringable or CompletionConfig: Dynamic multi-component prompt

The Callable options allow the prompt to be customized dynamically based on the context. The CompletionConfig return allows configuring the completions directly in the prompt formatter.

The StopsConstraint means generation ends once "User" or "Assistant" is generated (or of course a stop token is reached). The RegexConstraint ensures the generated text matches a regular expression pattern.

poem=CompletionConfig(
    llama_model,
    max_tokens=250,
    constraint=StopsConstraint("User|Assistant", include=False),
),
math=CompletionConfig(
    llama_model,
    constraint=RegexConstraint("[0-9]+", terminate_on_match=False),
    map_fn=int",
),
fin=lambda p: CompletionConfig(
    llama_model,
    constraint=RegexConstraint(rf"{p.completions.math}|16"),
),
Enter fullscreen mode Exit fullscreen mode

4. Mapping Function

We use a mapping function to modify the generated completion that is simply a string cast. When we later access the completion with filled_in.completions.math.value we will find it is an int.

map_fn=int,
Enter fullscreen mode Exit fullscreen mode

5. Generating Multiple Completions

Finally, we generate multiple completions for the "countdown" prompt by using a Generator and creating a separate CompletionConfig object for each completion.

# p is the prompt passed to your function by Keymaker
countdown=lambda p: (
    CompletionConfig(
        llama_model,
        constraint=RegexConstraint("[0-9]"),
        map_fn=lambda s: f"{s}, ",
    )
    for _ in range(5)
),
Enter fullscreen mode Exit fullscreen mode

At any point we could return a Stringable or stop completion based on the value of p: Prompt.

By using Keymaker's powerful and flexible features, we can control the output of our large language models with ease and precision.

Top comments (0)