DEV Community

Cover image for Structured: Extract Data from Unstructured Input with LLM
Mateusz Charytoniuk
Mateusz Charytoniuk

Posted on • Originally published at github.com

2 1

Structured: Extract Data from Unstructured Input with LLM

The Structured project started as a Go conversion of Instructor,
but it is a more general-purpose library. It is designed to be extremely easy to use and set up.

It also features a language-agnostic HTTP server that you can set up in front of llama.cpp.

Same features, Go-like API. Model agnostic - maps data from arbitrary JSON schema to arbitrary Go struct (or just plain JSON).

It is focused on llama.cpp. Support for other vendor APIs (like OpenAI or Anthropic) might be added in the future.

HTTP API

Start a server and point it to your local llama.cpp instance:

./structured \
    --llamacpp-host 127.0.0.1 \
    --llamacpp-port 8081 \
    --port 8080
Enter fullscreen mode Exit fullscreen mode

Structured server connects to llama.cpp to extract the data.

Now, you can issue requests. Include schema and data in your POST body.
The server will respond with JSON matching your schema:

POST http://127.0.0.1:8080/extract/entity
{
  "schema": {
    "type": "object",
    "properties": {
      "hello": {
        "type": "string"
      }
    },
    "required": ["hello"]
  },
  "data": "Say 'world'"
}

Response:
{
  "hello": "world"
}
Enter fullscreen mode Exit fullscreen mode

Programmatic Usage (Optional)

API can change with time until all features are implemented.

Initializing the Mapper

Point it to your local llama.cpp instance:

import (
    "fmt"
    "net/http"
    "testing"

    "github.com/distantmagic/structured/structured"
    "github.com/distantmagic/paddler/llamacpp"
    "github.com/distantmagic/paddler/netcfg"
)

var entityExtractor *EntityExtractor = &structured.EntityExtractor{
    LlamaCppClient: &llamacpp.LlamaCppClient{
        HttpClient: http.DefaultClient,
        LlamaCppConfiguration: &llamacpp.LlamaCppConfiguration{
            HttpAddress: &netcfg.HttpAddressConfiguration{
                Host:   "127.0.0.1",
                Port:   8081,
                Scheme: "http",
            },
        },
    },
    MaxRetries: 3,
}
Enter fullscreen mode Exit fullscreen mode

Extracting Structured Data from String

import "github.com/distantmagic/structured/structured"

responseChannel := make(chan structured.EntityExtractorResult)

go entityExtractor.ExtractFromString(
    responseChannel,
    map[string]any{
        "type": "object",
        "properties": map[string]any{
            "name": map[string]string{
                "type": "string",
            },
            "surname": map[string]string{
                "type": "string",
            },
            "age": map[string]string{
                "description": "Age in years.",
                "type":        "integer",
            },
        },
    },
    "I am John Doe - living for 40 years and I still like to play chess.",
)

for result := range responseChannel {
    if result.Error != nil {
        panic(result.Error)
    }

    // map[name:John, surname:Doe, age:40]
    fmt.Print(result.Result)
}
Enter fullscreen mode Exit fullscreen mode

Mapping Extracted Result onto an Arbitrary Struct

Once you obtain the result:

import "github.com/distantmagic/structured/structured"

type myTestPerson struct {
    Name    string `json:"name"`
    Surname string `json:"surname"`
    Age     int    `json:"age"`
}

func DoUnmarshalsToStruct(result structured.EntityExtractorResult) {
    var person myTestPerson

    err := structured.UnmarshalToStruct(result, &person)

    if nil != err {
        panic(err)
    }

    person.Name // John
    person.Surname // Doe
}
Enter fullscreen mode Exit fullscreen mode

Summary

That's it! :) You can use it as a language-agnostic server. Visit the repository and leave a star to show your support and get notified about new developments.

Sentry blog image

How to reduce TTFB

In the past few years in the web dev world, we’ve seen a significant push towards rendering our websites on the server. Doing so is better for SEO and performs better on low-powered devices, but one thing we had to sacrifice is TTFB.

In this article, we’ll see how we can identify what makes our TTFB high so we can fix it.

Read more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay