DEV Community

James Miller
James Miller

Posted on

9 Python Libraries That Reduced a One-Month AI Development Cycle to Just 3 Days

It's no secret that Python reigns supreme in the AI space. But to make an application capable of calling both OpenAI and Claude simultaneously, you usually have to write thick wrapper layers. I even used to write Regex manually to parse messy PDFs sent by clients. The result? A bloated codebase and skyrocketing maintenance costs.

Today, I’m sharing 9 Python libraries that significantly reduce boilerplate code, covering the entire pipeline from data ingestion to model evaluation.

Setting Up the Development Environment

Before diving into these libraries, a stable and easy-to-manage foundation is crucial. For novice programmers, bouncing between different versions, virtual environments, and dependency conflicts can easily waste half a day.

You can use ServBay to deploy your Python environment with a single click. Whether you need to switch versions or manage databases, it only takes a few mouse clicks. This philosophy of liberating developers from trivial configurations aligns perfectly with the logic of the tools I’m about to share.

Once your environment is ready, pick the tools below based on your specific needs.

1. LiteLLM: Unified Multi-Platform Model Calling

Different vendors have different API standards. To compare the outputs of GPT, Claude, or Llama, I used to write three sets of request logic and three sets of error handling. LiteLLM eliminates this issue by standardizing these interfaces, allowing for seamless switching.


from litellm import completion

# Whether calling GPT-4 or Claude, the logic is exactly the same
def ask_ai(model_name, prompt):
    res = completion(
        model=model_name,
        messages=[{"role": "user", "content": prompt}]
    )
    return res.choices[0].message.content

# Switching models just requires changing a string
print(ask_ai("gpt-4o", "What is RAG?"))
print(ask_ai("claude-3-5-sonnet", "What is RAG?"))
Enter fullscreen mode Exit fullscreen mode

This drastically lowers code coupling. Just keep in mind for production: when a vendor updates specific parameters, you might have to wait a bit for LiteLLM to adapt.

2. MarkItDown: Multi-Format Document to Markdown

Parsing documents used to give me a massive headache. To handle Word, Excel, and PDF files, I had to install four different libraries and handle four different types of crash errors. Thanks to Microsoft's MarkItDown, all documents are unified into the LLM's favorite format: Markdown.

from markitdown import MarkItDown

md_converter = MarkItDown()

# Parse PDF or Excel
doc_result = md_converter.convert("annual_report.pdf")
table_result = md_converter.convert("budget.xlsx")

print(doc_result.text_content)
Enter fullscreen mode Exit fullscreen mode

It does a great job preserving headers and table structures, cutting down data cleaning workloads. However, it mainly processes the text layer, so results might fluctuate for scanned documents or complex image-based tables.

3. LlamaIndex: The Data-to-LLM Framework

Formerly known as GPT Index, LlamaIndex focuses on solving the problem of connecting private data to LLMs. It provides a complete workflow from document reading and index building to query interfaces.


from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Automatically read all documents in a directory and build an index
data_docs = SimpleDirectoryReader("./docs").load_data()
data_index = VectorStoreIndex.from_documents(data_docs)

# Quickly create a query engine
engine = data_index.as_query_engine()
print(engine.query("Summarize the core arguments of the document"))
Enter fullscreen mode Exit fullscreen mode

It performs robustly when handling complex document structures and building RAG systems, making it one of the mainstream data frameworks today.

4. PydanticAI: Type-Safe Agent Development

In the past, I was constantly begging the AI to return JSON, only for it to start its response with "Sure, here is your JSON," crashing my parsing script. PydanticAI solves this by strictly defining data boundaries.

from pydantic import BaseModel
from pydantic_ai import Agent

class AnalysisResult(BaseModel):
    summary: str
    score: float

analysis_agent = Agent(
    "openai:gpt-4o",
    result_type=AnalysisResult,
    system_prompt="Analyze user feedback and provide a score"
)

output = analysis_agent.run_sync("This feature is great and improved my workflow")
print(output.data.summary)
Enter fullscreen mode Exit fullscreen mode

It essentially turns an unpredictable AI call into a type-safe function call.

5. Marvin: Encapsulate AI Capabilities as Functions

If I just need a simple classification or extraction feature, I don't want to write complex Prompts. Marvin allows me to write AI logic just like a standard Python function, making it perfect for classification, extraction, or generation tasks.

import marvin

@marvin.fn
def generate_tags(description: str) -> list[str]:
    """
    Generate 3 tags based on the product description
    """

tags = generate_tags("High-performance all-aluminum laptop, supports fast charging")
print(tags) # Outputs something like ['Tech', 'Office', 'Portable']
Enter fullscreen mode Exit fullscreen mode

This approach allows AI capabilities to be integrated into existing systems with minimal intrusion.

6. Haystack: End-to-End Retrieval Pipelines

Haystack is built for large-scale search systems. It supports various vector databases (like Qdrant and Elasticsearch) and lets you snap together retrieval, ranking, and filtering components like Lego bricks.


from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

# Assemble pipeline nodes
query_pipeline = Pipeline()
query_pipeline.add_component("prompt_builder", PromptBuilder(template="Answer: {{query}}"))
query_pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))
query_pipeline.connect("prompt_builder", "llm")

res = query_pipeline.run({"prompt_builder": {"query": "How do I learn Python?"}})
Enter fullscreen mode Exit fullscreen mode

For applications that need to process massive amounts of documents and implement semantic search, Haystack offers excellent scalability.

7. tiktoken: Accurate Token Consumption Tracking

I once spent $1.50 on a single API call because a recursive logic error generated a massive Prompt. I learned my lesson: always calculate the cost before sending the request. tiktoken is an incredibly fast tokenizer commonly used to estimate costs for OpenAI models.

import tiktoken

tokenizer = tiktoken.encoding_for_model("gpt-4")
content = "Number of tokens in this test text"
token_list = tokenizer.encode(content)

print(f"Token count: {len(token_list)}")
Enter fullscreen mode Exit fullscreen mode

This gives me real-time control over costs, ensuring I don't get a heart attack from the bill at the end of the month.

8. FAISS: Highly Efficient Vector Similarity Search

When handling search queries across hundreds of thousands of records, standard linear searches will freeze your app. FAISS, an open-source vector library by Meta, can find the most relevant snippets from hundreds of millions of vectors in mere milliseconds.

import faiss
import numpy as np

# Initialize index
vector_dim = 64
search_index = faiss.IndexFlatL2(vector_dim)

# Simulate adding vector data
mock_data = np.random.random((1000, vector_dim)).astype('float32')
search_index.add(mock_data)

# Execute search
distances, results = search_index.search(mock_data[:1], 3)
Enter fullscreen mode Exit fullscreen mode

It is the absolute benchmark tool in the vector retrieval space, performing exceptionally well, especially in localized deployments.

9. Pydantic Evals: Prompt Regression Testing

I used to tweak prompts based on gut feeling, run a couple of examples, and push to production if it looked okay. Inevitably, fixing one bug would spawn three new ones. Pydantic Evals lets me run automated regression tests to verify model performance against preset cases.


from pydantic_evals import Case, Dataset

# Define test dataset
eval_dataset = Dataset(
    cases=[
        Case(inputs="Extract company name: Microsoft released a new OS", expected_output="Microsoft"),
    ]
)

# Run evaluation and view report
results = eval_dataset.evaluate(your_extract_function)
results.print()
Enter fullscreen mode Exit fullscreen mode

Having this kind of determinism is a prerequisite for developing any production-grade application.

Conclusion

LiteLLM unifies your interfaces, MarkItDown simplifies document processing, and PydanticAI guarantees your output quality. Integrating these libraries made my development efficiency skyrocket and completely cured my post-holiday burnout.

Top comments (0)