DEV Community

Cover image for How LLMs Automate Manual Work and Enable Low-Code Solutions
kamal
kamal

Posted on

How LLMs Automate Manual Work and Enable Low-Code Solutions

Manual work slows down teams. Writing code for every small task wastes time. That’s where LLMs (Large Language Models) help

In an era where efficiency and scalability are paramount, organizations are actively seeking ways to minimize manual processes and accelerate digital transformation. Enter Large Language Models (LLMs) — advanced AI systems capable of understanding and generating human-like text. These models are reshaping how developers and non-technical professionals interact with data, systems, and even code.

This shift is especially significant in low-code and no-code development environments, where the goal is to simplify complex workflows. Instead of hand-coding every step, teams can now rely on intelligent prompts and pre-trained models to automate repetitive tasks, analyze documents, generate structured data formats like JSON, and more.

LLMs read and write like people do. You can give them a simple request, and they return useful results. They can read documents, find what matters, and write code or JSON for you.

In this blog post, we explore a practical Python example demonstrating how LLMs, combined with vector databases and schema-aware embeddings, can drastically reduce manual effort. We’ll walk through each part of the code—what it does, why it’s important, and how it contributes to a fully automated low-code pipeline. From document preprocessing and chunking to embedding with HuggingFace and querying using ChromaDB, we’ll show how you can integrate LLMs to extract structured information effortlessly.

Whether you’re a data engineer, low-code developer, or curious technologist, this guide will clarify how modern language models are turning everyday coding tasks into streamlined workflows.

Computers need data in a specific format. People use everyday language. This gap often creates manual work. AI can bridge this gap. It turns simple words into structured data. This saves time and effort.

Enough rambling and Let's see how it works with a Python script.

import os
import pdfplumber
import requests
import json
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain_core.documents import Document
from typing import List
from langchain_chroma import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter, SpacyTextSplitter

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "YOUR_API_TOKEN"
token = os.environ["HUGGINGFACEHUB_API_TOKEN"]
Enter fullscreen mode Exit fullscreen mode

Giving the AI Knowledge

The AI must know about our database. To do this, we will teach it the database column names.

def load_document():
    valid_pages: List[Document] = [] # Explicitly type hint for clarity

    column_names = [
        "pk_subscription",
        "fck_customer_account",
        "fck_subscription_plan",
        "ek_subscription_state",
        "ak_subscription_id",
        "ck_start_date",
        "end_date",
        "billing_interval",
        "subscription_price",
        "subscription_feature_price",
        "subscription_tracking_id",
        "cancellation_date",
        "cancellation_to_date",
        "created_by",
        "created_on",
        "updated_by",
        "updated_on",
    ]

    # Iterate through each column name and create a separate Document for it
    for i, col_name in enumerate(column_names):
        valid_pages.append(
            Document(
                page_content=col_name,
                metadata={"column_index": i, "column_name": col_name, "source": "schema_definition"}
            )
        )
    return valid_pages

# Example of how you would then use it
documents = load_document()
print("Documents loaded:", documents)
Enter fullscreen mode Exit fullscreen mode
  • We listed our database column names. Each name is now a small document. This builds a knowledge base for the AI to search.

Turning Words into Numbers

Computers do not understand words. They work with numbers. We need to translate our column names into a number format. This process is called creating embeddings.

embeddings = HuggingFaceEndpointEmbeddings(
    model="sentence-transformers/all-mpnet-base-v2",
    task="feature-extraction",
    huggingfacehub_api_token=token,
)
print("embeddings: ", embeddings)
documents = load_document()
Enter fullscreen mode Exit fullscreen mode
  • We use an AI model to convert each column name into a list of numbers. This helps the AI grasp the meaning of words. It can now see that a phrase like "cancel date" is very similar to the column cancellation_date.
  • An embedding is a numerical representation (a vector of numbers) of a discrete object (like a word or a category) that captures its meaning and relationships with other objects in a way that computers can understand and process
  • For more info hugginface api docs

Breaking Down Knowledge: Document Chunking

Before we turn words into numbers, we need to handle how we feed information to the AI, especially for larger documents. While our current example uses small column names, in real-world scenarios, you'd be processing much larger texts like PDFs or articles. LLMs and vector databases work best with smaller, manageable pieces of information.

if not documents:
    print("No valid documents found to process. Exiting.")
else:
    print(f"Loaded {len(documents)} documents.")

    # Split documents into smaller chunks (important for RAG and efficient retrieval)
    # This helps in handling large documents and ensures that each chunk is semantically coherent for better retrieval.

    text_splitter = SpacyTextSplitter(
        chunk_size=10,        # The maximum size of each chunk
        chunk_overlap=0,      # The overlap between chunks to maintain context
        # length_function=len,
        # is_separator_regex=False,
    )
    # The split_documents method takes a list of Document objects and returns
    # a new list of smaller Document objects.
    chunked_documents = text_splitter.split_documents(documents)
    print(f"Split documents into {len(chunked_documents)} chunks.")
Enter fullscreen mode Exit fullscreen mode

Building a Searchable AI Memory

Now we store these numbers in a special database. It is called a vector database. It is built for fast searching.

Create an in-memory vector database from our documents.

vectordb = Chroma.from_documents(
    documents=documents,
    embedding=embeddings,
)
Enter fullscreen mode Exit fullscreen mode
  • This code creates a fast, temporary database. It stores the number versions of our column names. This gives the AI a searchable memory to find information instantly.
  • It's the highly optimized storage and search engine for the meaningful numerical representations (embeddings) of your data, enabling AI to find and understand similar information rapidly.

From Request to Result

Now the idea is a person makes a request in plain English. The system finds the right data and tells the AI what to do.

# The user's request in normal language.
query = "the cancel date is jun 15 and start date is jan 1 all are year 2025 make a json using this information"
print(f"\nPerforming similarity search for the query: '{query}'")
# Find the most relevant columns from our AI memory.
docs = vectordb.similarity_search(query, k=4)
print("Similarity search complete.")

if docs:
    print(f"\nFound {len(docs)} relevant document chunks:")
    for i, doc in enumerate(docs):
        print(f"\n--- Relevant Document Chunk {i+1} ---")
        print(f"Content: {doc.page_content}")

    # --- NEW CODE TO INTERACT WITH LOCAL LLM ---
    print("\n--- SENDING DATA TO LOCAL LLM ---")

    # 1. Gather the context from the retrieved documents
    context = "\n".join([doc.page_content for doc in docs])

    # 2. Define the system prompt and the user query
    system_prompt = "Your output MUST be a JSON object. Do not include any other text or explanation in your response. Based on the following database columns, identify the most relevant ones for the user's query."
    user_query = f"User Query: '{query}'\n\nRelevant Columns:\n{context}"

    # 3. Construct the full prompt for the model
    full_prompt = f"{system_prompt}\n\n{user_query}"

    # 4. Format the JSON payload for the API request
    api_url = "http://localhost:11434/api/generate"
    payload = {"model": "qwen3:0.6b", "prompt": full_prompt, "stream": False}

    print("\nSending the following payload to the local model:")
    print(json.dumps(payload, indent=2))

    try:
        # 5. Send the POST request to the local model
        response = requests.post(api_url, json=payload)
        response.raise_for_status()  # This will raise an exception for bad status codes (4xx or 5xx)

        # 6. Process the response
        response_json = response.json()

        # The actual generated content is often in a 'response' or 'content' key.
        model_output_str = response_json.get("response", "")

        print("\n--- RESPONSE FROM LOCAL LLM ---")
        print(f"Raw model output string: {model_output_str}")

        # --- NEW & IMPROVED JSON EXTRACTION LOGIC ---
        if not model_output_str:
            print("Model returned an empty response.")
        else:
            try:
                # Find the first occurrence of '{' and the last occurrence of '}'
                start_index = model_output_str.find("{")
                end_index = model_output_str.rfind("}")

                if start_index != -1 and end_index != -1 and end_index > start_index:
                    # Extract the potential JSON substring
                    json_substring = model_output_str[start_index : end_index + 1]

                    # Now, try to parse this cleaned substring
                    structured_output = json.loads(json_substring)

                    print("\nSuccessfully extracted and parsed JSON output:")
                    print(json.dumps(structured_output, indent=2))
                else:
                    print(
                        "\nError: Could not find a valid JSON object within the model's output."
                    )

            except json.JSONDecodeError:
                print(
                    "\nError: The model's output contained a string that looked like JSON, but was invalid."
                )
                print(f"Attempted to parse: {json_substring}")
    except Exception as e:
        print(f"An unexpected error occurred during JSON parsing: {e}")


else:
    print("No relevant documents found to pass to the local model.")

print("\n--- LOCAL LLM INTERACTION COMPLETE ---")
Enter fullscreen mode Exit fullscreen mode
  • The script takes the user's query. It searches the database for the most similar column names. It then builds a full prompt for the AI. The prompt includes the query and the relevant columns as context.
  • This process automates the work. The system interprets a simple sentence. It then generates structured data without any manual coding.

The Final Output

After the script runs, the AI sends back a clean JSON object. It understands the dates and the correct fields from the context we provided.

{
  "cancellation_date": "2025-06-15",
  "ck_start_date": "2025-01-01"
}
Enter fullscreen mode Exit fullscreen mode
  • This is the power of using AI to reduce manual effort. A simple request becomes structured data that a computer can use immediately. This makes work faster and easier for everyone.

Disadvantages

While LLMs offer exciting benefits for reducing manual work and enabling low-code solutions, directly linking them to databases or internal data sources comes with significant challenges and risks.

  • Data leaks
  • Prompt attacks
  • Poor accuracy
  • Debugging issues
  • Performance lag

Mitigation Strategies

To address these cons, technical writers and developers often employ strategies like:

Robust Access Controls: Implementing strict role-based access control (RBAC) and limiting what data the LLM can access.
Data Masking/Anonymization: Hiding or replacing sensitive data with fake values, especially in non-production environments.
Input Validation and Output Sanitization: Thoroughly checking what goes into and comes out of the LLM to prevent malicious inputs or problematic outputs.
Contextual Guardrails: Providing the LLM with very specific instructions and constraints to guide its behavior and limit its "agency."
Human-in-the-Loop (HITL): Keeping a human in the loop for critical decisions or data modifications suggested by the LLM.
Vector Databases (as in example): Using vector databases as an intermediary. Instead of giving the LLM direct access to the entire database, you feed it only relevant "chunks" (embeddings) from the vector store. This reduces the exposure of the raw database.
Fine-tuning and Retrieval Augmented Generation (RAG): Instead of direct database access, LLMs are often fine-tuned on specific datasets or use RAG to retrieve relevant information from controlled sources.

Conclusion

As organizations strive for greater efficiency and agility, integrating Large Language Models into low‑code pipelines offers a transformative shortcut around tedious, error‑prone manual coding.

By converting natural‐language requests into structured embeddings and leveraging vector databases for rapid retrieval, teams can automate the mapping between human intent and machine‑readable formats—freeing developers to focus on higher‑value work.

While challenges such as data security, model accuracy, and performance must be carefully managed through access controls, validation, and human oversight, the payoff is clear: streamlined workflows, faster time to insight, and a democratized path for non‑technical stakeholders to interact directly with complex systems.

Embracing this paradigm—where prompts replace boilerplate and intelligent search replaces hand‑crafted queries—sets the stage for truly automated, scalable solutions and ushers in a new era of low‑code innovation.

Extras

Full code

Top comments (0)