Como criar uma "busca inteligente" para seu Wiki utilizando Go e ElasticSearch

#go #elasticsearch

Hoje em dia, existem diversos motores de busca para diferentes finalidades. Como exemplo: o Google. O Google se destaca com técnicas avançadas de rastreamento, indexação e outras tecnologias que tornam suas buscas extremamente eficientes.

Neste post, vamos aprender a criar uma busca inteligente para o seu repositório Wiki utilizando arquivos Markdown. Não será tão complexo quanto o Google, mas será rápido, útil e bastante eficiente, utilizando o ElasticSearch.

Vamos lá?

Para começarmos, é necessário que os seguintes softwares estejam instalados e em execução no seu sistema:

Go
Elasticsearch

Observação: neste artigo, não abordaremos como instalar ou configurar esses softwares. Assumimos que eles já estão prontos para uso.

Estarei utilizando como repositório o ObsidianNotes - Cycro, um repositório recheado de conhecimento coletado e armazenado em arquivos .md.

Antes de começarmos a implementar, vou explicar como estruturaremos o motor de busca.

Motor de busca

1 - Leitura dos arquivos Markdown: Percorreremos todos os arquivos .md do repositório e extrairemos o conteúdo relevante(título, corpo, tags, etc).
2 - Indexação no ElasticSearch - Cada arquivo será transformado em um documento JSON e enviado ao ElasticSearch, que fará a indexação para permitir buscas rápidas e eficientes.
3 - Busca e retorno dos resultados - Implementaremos uma funcão simples que consulta o ElasticSearch com palavras-chave e retorna os arquivos mais relevantes.

Preparando o ambiente Go

Certifique-se de ter o Golang instalado e execute os seguintes comandos em um diretório de sua escolha:

mkdir go-md-search
cd go-md-search
go mod init go-md-search

Em seguida, vamos instalar o cliente oficial do ElasticSearch para Go:

go get github.com/elastic/go-elasticsearch/v9

Lendo arquivos Markdown

Podemos criar um script simples que percorre uma pasta e lê todos os arquivos .md:

package main

import (
    "fmt"
    "log"
    "os"
    "path/filepath"
)

func main() {
    files, err := filepath.Glob("~/ObsidianNotes/**/*.md") // sua pasta onde os arquivos estão
    if err != nil {
        log.Fatal(err)
    }
    for _, file := range files {
        content, err := os.ReadFile(file)
        if err != nil {
            log.Println("Erro ao ler arquivo:", err)
            continue
        }
        fmt.Println("Arquivo:", file)
        fmt.Println(string(content[:100]), "...")
    }
}

Note que alguns arquivos podem estar vazios de conteúdo. Para ignora-los, podemos tratar facilmente no Go antes de enviar ao ElasticSearch:

package main

import (
    "fmt"
    "log"
    "os"
    "path/filepath"
    "strings"
    "unicode/utf8"
)

func main() {
    files, err := filepath.Glob("~/ObsidianNotes/**/*.md") // sua pasta onde os arquivos estão
    if err != nil {
        log.Fatal(err)
    }

    for _, file := range files {
        content, err := os.ReadFile(file)
        if err != nil {
            log.Printf("Erro ao ler arquivo %s: %v\n", file, err)
            continue
        }

        text := strings.TrimSpace(string(content))

        if len(text) == 0 || !utf8.Valid(content) {
            log.Printf("Ignorando arquivo vazio ou inválido: %s\n", file)
            continue
        }

        clean := strings.ReplaceAll(text, "\x00", "")

        fmt.Printf("Arquivo: %s\n", file)
        fmt.Println(clean[:min(200, len(clean))], "...")
    }
}

func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

Indexando os arquivos no ElasticSearch

Agora que já conseguimos ler e limpar nossos arquivos, o próximo passo é enviar cada documento para o ElasticSearch. Cada arquivo Markdown será representado como um documento JSON, contendo:

Caminho do arquivo
Nome do arquivo
Conteúdo do arquivo

Segue implementação:

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "log"
    "os"
    "path/filepath"
    "strings"
    "unicode/utf8"

    elasticsearch8 "github.com/elastic/go-elasticsearch/v9"
)

type MarkdownDoc struct {
    Path    string `json:"path"`
    Name    string `json:"name"`
    Content string `json:"content"`
}

func main() {
    cfg := elasticsearch8.Config{
        Addresses: []string{"http://localhost:9200"},
        Username:  "username",
        Password:  "password",
    }

    es, err := elasticsearch8.NewClient(cfg)

    if err != nil {
        log.Fatal(err)
    }

    res, err := es.Indices.Create("wiki")
    if err != nil {
        log.Fatalf("Erro ao criar índice: %v", err)
    }
    defer res.Body.Close()

    if res.IsError() {
        log.Printf("Aviso: índice 'wiki' pode já existir: %s\n", res.String())
    } else {
        fmt.Println("Índice 'wiki' criado com sucesso.")
    }

    files, err := filepath.Glob("/Users/josepaulomarinho/Documents/estudos/ObsidianNotes/**/**.md")
    if err != nil {
        log.Fatal(err)
    }

    for _, file := range files {
        content, err := os.ReadFile(file)
        if err != nil {
            continue
        }

        text := strings.TrimSpace(string(content))
        if len(text) == 0 || !utf8.Valid(content) {
            continue
        }

        clean := strings.ReplaceAll(text, "\x00", "")

        doc := MarkdownDoc{
            Path:    file,
            Name:    filepath.Base(file),
            Content: clean,
        }

        data, _ := json.Marshal(doc)

        res, err := es.Index("wiki", bytes.NewReader(data))
        if err != nil {
            log.Printf("Erro ao indexar %s: %v\n", file, err)
            continue
        }
        defer res.Body.Close()

        fmt.Println("Indexado:", file)
    }
}

Note que estabelecemos uma conexão utilizando autenticação básica via usuário e senha. Logo após criamos o índice, caso não haja criado, filtramos os arquivos, ignorando os inválidos e indexamos no índice wiki.

Para conferir se o conteúdo foi indexado, execute o seguinte comando com CURL no terminal do seu sistema operacional:

http://localhost:9200/wiki/_search?pretty

Realizando buscas no elasticsearch

Agora que nossos arquivos Markdown foram indexados no índice wiki, podemos testar buscas diretamente pelo conteúdo. O elasticsearch permite consultas poderosas, não apenas por palavras exatas, mas também por relevância e similaridade.

1. Consulta simples por palavra-chave

Podemos criar uma query simples que busca um termo em todos os documentos:

query := map[string]interface{}{
    "query": map[string]interface{}{
        "multi_match": map[string]interface{}{
            "query":  "locks", // termo de busca
            "fields": []string{"content"}, // campos a pesquisar
        },
    },
}

No caso estarei buscando somente pelo conteúdo a palavra locks.

multi_match - permite buscar em vários campos.
query - é o termo que o usuário quer encontrar
fields - especifica quais campos do documento serão pesquisados(title, content, etc).

2. Executando a busca em Go

package main

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "log"

    elasticsearch8 "github.com/elastic/go-elasticsearch/v9"
)

func main() {
    cfg := elasticsearch8.Config{
        Addresses: []string{"http://localhost:9200"},
        Username:  "username",
        Password:  "password",
    }

    es, err := elasticsearch8.NewClient(cfg)

    query := map[string]interface{}{
        "query": map[string]interface{}{
            "multi_match": map[string]interface{}{
                "query":  "locks",
                "fields": []string{"content"},
            },
        },
    }

    var buf bytes.Buffer
    if err := json.NewEncoder(&buf).Encode(query); err != nil {
        log.Fatalf("Erro ao codificar query: %s", err)
    }

    res, err := es.Search(
        es.Search.WithContext(context.Background()),
        es.Search.WithIndex("wiki"),
        es.Search.WithBody(&buf),
        es.Search.WithTrackTotalHits(true),
    )
    if err != nil {
        log.Fatalf("Erro ao buscar: %s", err)
    }
    defer res.Body.Close()

    var r map[string]interface{}
    if err := json.NewDecoder(res.Body).Decode(&r); err != nil {
        log.Fatalf("Erro ao decodificar resposta: %s", err)
    }

    fmt.Printf("Resultados encontrados: %d\n", int(r["hits"].(map[string]interface{})["total"].(map[string]interface{})["value"].(float64)))
    for _, hit := range r["hits"].(map[string]interface{})["hits"].([]interface{}) {
        source := hit.(map[string]interface{})["_source"].(map[string]interface{})
        fmt.Printf("- %s\n", source["name"])
    }
}

A saída será a seguinte:

Resultados encontrados: 1
- DistributedLocks.md

O código exibe quantos documentos foram encontrados e os nomes dos arquivos correspondentes.

Para tornar a busca mais inteligente, poderíamos utilizar alguns recursos do Elasticsearch para refinar os resultados, como: Boost por campo(peso aos títulos), Fuzzy search(permite erros de digitação), Highlight(mostra trechos do conteúdo que bateram com a busca).

Um exemplo com fuzzy search:

"multi_match": {
  "query": "locks",
  "fields": ["title^2", "content"],
  "fuzziness": "AUTO"
}

title^2 - dá peso maior para o campo title
AUTO - corrige pequenas variações de ou erros de digitação

Servidor WEB

Abaixo encontra-se um programa expondo um servidor web mínimo para executar o que foi aprendido acima:

package main

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "log"
    "net/http"

    elasticsearch8 "github.com/elastic/go-elasticsearch/v9"
)

const MediaTypeApplicationJson = "application/json"

type MarkdownDoc struct {
    Path    string `json:"path"`
    Name    string `json:"name"`
    Content string `json:"content"`
}

type MultiMatch struct {
    Query  string   `json:"query"`
    Fields []string `json:"fields"`
}

type Query struct {
    MultiMatch MultiMatch `json:"multi_match"`
}

type SearchQuery struct {
    Query Query `json:"query"`
}

var es *elasticsearch8.Client

func searchHandle(w http.ResponseWriter, r *http.Request) {
    term := r.URL.Query().Get("q")
    if term == "" {
        http.Error(w, "Informe o parâmetro 'q' para busca", http.StatusBadRequest)
        return
    }

    sq := SearchQuery{
        Query: Query{
            MultiMatch: MultiMatch{
                Query:  term,
                Fields: []string{"name", "content"},
            },
        },
    }

    var buf bytes.Buffer
    if err := json.NewEncoder(&buf).Encode(sq); err != nil {
        http.Error(w, "Erro ao codificar query", http.StatusInternalServerError)
        return
    }

    res, err := es.Search(
        es.Search.WithContext(context.Background()),
        es.Search.WithIndex("wiki"),
        es.Search.WithBody(&buf),
        es.Search.WithTrackTotalHits(true),
    )
    if err != nil {
        http.Error(w, fmt.Sprintf("Erro na busca: %s", err), http.StatusInternalServerError)
        return
    }
    defer res.Body.Close()

    var rJSON map[string]interface{}
    if err := json.NewDecoder(res.Body).Decode(&rJSON); err != nil {
        http.Error(w, "Erro ao decodificar resposta do ElasticSearch", http.StatusInternalServerError)
        return
    }

    results := []MarkdownDoc{}
    for _, hit := range rJSON["hits"].(map[string]interface{})["hits"].([]interface{}) {
        source := hit.(map[string]interface{})["_source"].(map[string]interface{})
        doc := MarkdownDoc{
            Name:    source["name"].(string),
            Content: source["content"].(string),
            Path:    source["path"].(string),
        }
        results = append(results, doc)
    }

    w.Header().Set("Content-Type", MediaTypeApplicationJson)
    json.NewEncoder(w).Encode(results)
}

func main() {
    cfg := elasticsearch8.Config{
        Addresses: []string{"http://localhost:9200"},
        Username:  "username",
        Password:  "password",
    }

    var err error
    es, err = elasticsearch8.NewClient(cfg)
    if err != nil {
        log.Fatalf("Erro ao criar cliente: %s", err)
    }

    http.HandleFunc("/search", searchHandle)
    fmt.Println("API de busca rodando em http://localhost:8080/search?q=termo")
    log.Fatal(http.ListenAndServe("localhost:8080", nil))
}

Virá uma resposta como esta:

[
  {
    "path": "~/ObsidianNotes/Concurrency/DistributedLocks.md",
    "name": "DistributedLocks.md",
    "content": "When we talk about Distributed Locks, we are mainly talking about distributed systems, before commenting on distributed Locks, we will talk about distributed systems.\n\n A distributed system is a collection of **independent  computers** (nodes) that work together and appear to the user as a single coherent system. Key characteristics:\n - Network Communication: nodes exchanges messages (they don't share memory directly)\n- Concurrency: multiple machines can execute tasks simultaneously\n- Partial failures: one machine can fail while others keep running (different from centralized systems).\n- Consistency and coordination: since data may be spread across nodes, the system must handle synchronization, consensus, and fault tolerante(e.g., Paxos or Raft algorithms)\n- Examples: distributes databases(Cassandra, MongoDb Cluster), distributed file systems (HDFS), coordination services (Zookeeper, Etcd), and cloud plataforms (AWS, Google Cloud)\n\nIn short:\n\u003e That’s a distributed system: **many people (computers) working together as if they were one.**\n\nNow, back to Distributed Locks. Distributed locks are sophisticated techniques used to synchronize access to shared resources in distributed environments, ensuring that only one process or node can perform a critical operation at a time, preventing conflicts, inconsistencies, and possible data corruption.\n\nIn distributed systems, multiple processes/nodes may attempt to access the same critical resource at the same time (e.g., updating a bank account balance, processing the same queue message, writing a file, start a schedule, job). Local locks (mutexes, semaphores) are insufficient because they only control concurrency within a single process or machine. For this we need a locking mechanism that works across multiple machines connected over the network.\n\n**A distributed lock must provide:**\n1. Mutual Exclusion\n\tOnly one client at a time can hold the lock.\n2. Deadlock prevention\n\tIf the client holding the lock crashes, the lock should eventually expire(lease or TTL)\n3. Availability\n\tEven under failures, the system should continue granting locks.\n4. Consistency\n\tNo two clients should ever believe they both hold the same lock.\n5. Fairness(**optional**)\n\tLocks can be granted in the order of requests (queue-based fairness).\n\n**Commons implementations**\n\n###### *Redis* (e.g., Redlock Algorithm)\n- Each client sets a key with a TTL.\n- If the client can successfully set the key in a majority of Redis nodes, it assumes the lock.\n- TTL ensures auto-release if the client crashes.\n\n**ASCII flow**:\n\n```

sh\nClient A tries to lock \"resource-1\"\n ├──\u003e Redis Node 1: OK\n ├──\u003e Redis Node 2: OK\n └──\u003e Redis Node 3: FAIL (timeout)\n\nMajority acquired (2/3) → Client A holds the lock\n

```\n\n###### *Zookeeper*\n- Uses ephemeral znodes: each client creates a temporary node.\n- The client with the smallest sequence number holds the lock.\n- If the client disconnects, its znode disappears automatically.\n\n**ASCII flow**:\n\n```

sh\n/locks/resource-1/\n  ├── lock-0001 (Client A)  \u003c-- LOCK OWNER\n  ├── lock-0002 (Client B)\n  └── lock-0003 (Client C)\n

```\n\n\u003e If the **Client A** crashes, Zookeeper deletes lock-0001, and **Client B** becomes the new owner\n\n###### *Etcd*\n- Uses leases (time-limited key ownership).\n- Clients acquire a key with PUT key --lease=10s.\n- When the leases expires (or client crases), the key is removed\n- Watchers notify other clients that the lock is free.\n\n\n##### *General Architecture*\n\n![[architecture-distributed-locks-system.png]]\n\n\u003e- All clients attempt to acquire a lock on resource X. \n\u003e- Only one  succeeds at a time.\n\u003e- The lock expires unless it is renewed (lease).\n\n\n**Summary**\n\nDistributed locks are a coordination mechanism in distributed systems, typically implemented with a middleware like Redis, Zookeeper, Etcd, or Consul. They guarantee **mutual exclusion**, **fault tolerance, and consistency** when multiple processes compete for shared resources."
  }
]