Text Analysis in Go Without a Machine Learning Library

#showdev #tutorial #go #webdev

Go's standard library handles strings and Unicode well. strings.Fields, unicode.IsLetter, bufio.Scanner — you can build word count and basic stats without any third-party packages. Where the ecosystem gets thin is content quality metrics: readability grades, sentiment scoring, keyword extraction.

If you've worked with Python's textstat, textblob, or spacy, you've seen how much ground is already covered there. Go is a different story.

The Go NLP Landscape

Go does have some text processing packages worth knowing:

github.com/jdkato/prose is the most complete option. It handles tokenization, part-of-speech tagging, and named entity recognition. Solid for linguistic analysis, but it doesn't cover readability grades (Flesch-Kincaid, Gunning Fog, Coleman-Liau) or AFINN sentiment scoring.

Built-in strings and unicode packages get you word counts, sentence boundaries (if you're careful about punctuation), and character-level stats. You can compute a rough syllable count heuristic from there. But "rough" is doing a lot of work in that sentence — the standard readability formulas need accurate syllable counts, and Go has no widely-used syllabification package.

The honest summary: Go NLP is early-stage compared to Python for content quality metrics specifically. If you need Flesch-Kincaid grade, SMOG index, sentiment polarity, and TF-IDF keywords from a single call, there's no Go package that covers all of that. You'd be writing it from scratch or stitching together multiple immature libraries.

A REST API Sidestep

For content quality metrics, an HTTP endpoint sidesteps the library problem. The Go HTTP client is first-class — this pattern is idiomatic and unsurprising to anyone reading the code:

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "os"
)

type AnalysisResult struct {
    Readability struct {
        ConsensusGrade string  `json:"consensus_grade"`
        FleschKincaid  float64 `json:"flesch_kincaid_grade"`
    } `json:"readability"`
    Sentiment struct {
        Label string  `json:"label"`
        Score float64 `json:"score"`
    } `json:"sentiment"`
    Keywords struct {
        Top5 []string `json:"top_5"`
    } `json:"keywords"`
}

func analyzeText(apiKey, text string) (*AnalysisResult, error) {
    body, err := json.Marshal(map[string]string{"text": text})
    if err != nil {
        return nil, err
    }
    req, err := http.NewRequest("POST", "https://api.ckmtools.dev/v1/analyze", bytes.NewBuffer(body))
    if err != nil {
        return nil, err
    }
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("X-API-Key", apiKey)

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    var result AnalysisResult
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        return nil, err
    }
    return &result, nil
}

func main() {
    result, err := analyzeText(os.Getenv("TEXTLENS_KEY"), "Your content here...")
    if err != nil {
        fmt.Fprintf(os.Stderr, "error: %v\n", err)
        os.Exit(1)
    }
    fmt.Printf("Grade: %s, Sentiment: %s\n",
        result.Readability.ConsensusGrade,
        result.Sentiment.Label,
    )
}

The struct tags match the JSON response directly. Add fields as you need them. If you want FleschKincaid, GunningFog, SMOG, and ColemanLiau, expand the Readability struct — they're all in the response.

When This Pattern Makes Sense

This is worth considering if you're:

Building a blog platform, CMS, or content review tool in Go and you need readability grades before publishing
Running automated content quality checks in a CI pipeline
Building a tool that auto-tags content with extracted keywords
Writing a Go service that wraps text analysis for downstream consumers

The key constraint is that you don't want to maintain a Python sidecar or pull in a large dependency for a feature that isn't your core product.

Honest Tradeoff

HTTP adds 20–100ms per request. For most editorial workflows — "analyze this article before it goes live" — that's fine. For interactive writing tools with keypress-level feedback, it's noticeable. For batch processing thousands of documents per minute, a local library would be faster if one existed.

That last part is the constraint. For high-throughput stream processing in Go, a local library would be the right call. Right now, the Go ecosystem doesn't have one that covers these metrics. So you're choosing between HTTP overhead and writing the implementation yourself.

Where to Find This

The TextLens API is in development — free tier at 1,000 requests/month. Waitlist is open at ckmtools.dev/api/ if this fits a project you're working on. Feedback on the Go client structure is welcome — I'm particularly curious whether the struct tag approach is the interface people actually want or whether a map-based response is more practical for dynamic field access.

DEV Community