DEV Community: TrustGraph

Part 3: How TrustGraph's Knowledge Cores End the Memento Nightmare

Daniel Davis — Mon, 28 Apr 2025 16:22:54 +0000

In Parts 1 and 2, we exposed the dangerous flaw in most current AI "memory": like Leonard Shelby in Memento, our systems often operate on disconnected fragments, unable to form the interconnected knowledge needed for reliable reasoning. We saw how this reliance on context-stripped, relationship-blind, and provenance-oblivious data dooms AI to a cycle of confident errors and hallucinations, just as Leonard's fragmented note system led him down dangerous paths.

So, how do we break the loop? How do we give AI the ability to truly know, not just recall fragments? The answer isn't a slightly better system of Polaroids and notes. The answer is to build the integrated, structured understanding Leonard tragically lacked: a Knowledge Core.

This is precisely what TrustGraph, the AI Provisioning Platform, delivers through its advanced TrustRAG engine. It moves beyond the limitations of fragmented recall by architecting genuine knowledge:

Mapping the Connections (Solving Relationship Blindness): Unlike Leonard staring at isolated clues, TrustGraph automatically builds a Knowledge Graph (KG). It doesn't just store facts; it explicitly maps the relationships between them (e.g., "Person X works for Company Y," "Event A caused Event B"). This Knowledge Graph is the coherent narrative structure Leonard couldn't form – the understanding of how things connect.
Delivering Contextualized Scenes (Solving Context Stripping): Leonard reviewed one Polaroid at a time, losing the big picture. TrustRAG uses a hybrid retrieval process. Vector search identifies relevant starting points within the Knowledge Graph, but then TrustRAG traverses the graph connections, constructing a subgraph of related entities and relationships. Instead of isolated fragments, the LLM receives a connected scene – a relevant slice of the knowledge core with inherent local context.
Verifying the Clues (Addressing Provenance Oblivion): Leonard couldn't be sure when or why he wrote his notes. TrustGraph's Knowledge Graph architecture is designed to incorporate provenance metadata directly with the facts and relationships it stores (source, timestamp, reliability). TrustRAG can then leverage this, allowing the AI to weigh information based on its origins, escaping the trap of treating all retrieved fragments as equally trustworthy.

Escaping the Memento Loop: The Power of a Knowledge Core

By building and utilizing this structured Knowledge Core, TrustGraph fundamentally changes AI capabilities:

Enables Reliable Reasoning: Provides the interconnected facts and explicit relationships needed for complex reasoning, synthesis, and understanding causality – tasks impossible for Leonard (and fragment-based AI).
Dramatically Reduces Hallucinations: Grounding responses in a verifiable graph of knowledge, potentially weighted by provenance, significantly reduces the chance of fabricating connections or asserting baseless claims.
Offers Explainable Insight: The retrieved subgraph itself acts as an explanation, showing how the AI arrived at its context based on the knowledge core's structure – unlike Leonard's often opaque leaps of faith.

Provisioning Reliable Knowledge, Not Just Infrastructure

TrustGraph isn't just a concept. It's an AI Provisioning Platform that containerizes the entire intelligent system – the LLMs, the necessary tools, and the essential TrustRAG Knowledge Cores – allowing you to reliably provision this complete, knowledgeable AI stack anywhere (Cloud, On-Prem, Edge). We're providing the robust, managed infrastructure for knowledge that Leonard's fragile system lacked.

Stop building AI condemned to relive Leonard Shelby's nightmare. Stop provisioning systems based on fragmented recall and start delivering applications grounded in genuine understanding.

Give your AI the gift of coherent memory. Build with a Knowledge Core.

Explore TrustGraph on GitHub and see how we structure knowledge
Read the TrustRAG documentation for technical details
Join our community and discuss the future of AI knowledge

Provision AI that knows. Provision it with TrustGraph.

Part 2: Why Your AI is Stuck in a Memento Loop

Daniel Davis — Sat, 26 Apr 2025 22:19:42 +0000

In Part 1, we likened today's typical AI "memory" to the plight of Leonard Shelby in Memento: brilliant at accessing isolated fragments (Polaroids, notes, tattoos) but unable to weave them into the coherent tapestry of true knowledge. He remembers that he has a note, but not necessarily the reliable why or the how it connects to everything else. Now, let's diagnose why popular RAG approaches, inherently create this dangerous, fragmented reality for our AI.

Imagine Leonard's investigation. His "database" consists of disconnected snapshots and cryptic assertions. When he tries to solve a problem ("Who is John G?"), he shuffles through these fragments, looking for clues that feel related. This is strikingly similar to how typical RAG approaches use “memory” :

The Polaroid Snapshot (Context Stripping): Just as Leonard's Polaroids capture only a single moment divorced from what came before or after, document chunking for vectorization strips vital context. A retrieved sentence saying "Project Titan deadline is critical" loses the surrounding discussion about why it's critical, who set it, or what happens if it's missed. The AI gets the snapshot, not the scene.
Cryptic Notes & Missing Links (Relationship Blindness): Leonard's notes might say "Meet Natalie" and "Don't believe Dodd's lies." Vector search can find documents mentioning "Natalie" and documents mentioning "Dodd," but like Leonard, it lacks the explicit map connecting them. Does Natalie know Dodd? Is she part of the lies? The relationships aren't inherently encoded in the vector similarity. Finding similar topics doesn't mean understanding their causal or structural connection, leaving the AI to guess these critical links.
Trusting Faded Ink (Provenance Oblivion): Leonard must trust his fragmented notes, even if they were written under duress, based on misinformation, or are simply outdated. Standard RAG often does the same, treating all retrieved text fragments as equally valid assertions. It frequently lacks a robust mechanism to track provenance – the source, timestamp, or reliability score of the information. An old, debunked "fact" retrieved via vector similarity looks just as convincing to the LLM as a fresh, verified one.

The Leonard Shelby Effect in AI:

When AI operates with only these disconnected, context-stripped, relationship-blind, and provenance-oblivious fragments, its reasoning becomes dangerously flawed:

Hallucinating Connections: Like Leonard assuming connections between unrelated clues, the LLM invents relationships between text fragments simply because they were retrieved together.

Contradictory Actions: Acting on conflicting "facts" because it can't verify which source or connection is trustworthy or current.

Inability to Synthesize: Unable to build a larger picture or draw reliable conclusions because the foundational links between data points are missing or inferred incorrectly.

We are building AI systems trapped in a Memento loop: forever re-reading fragmented clues, capable of impressive recall but incapable of forming the durable, interconnected knowledge needed for reliable reasoning and true understanding. They are architecturally destined to make potentially disastrous mistakes based on an incomplete and untrustworthy view of their "world."

If we want AI to escape this loop, we need to fundamentally change how we provide information. We need to move beyond retrieving isolated Polaroids and start building systems that can understand the whole, interconnected story.

How do we provide that interconnected narrative? How do we build AI memory that understands relationships and provenance? Stay tuned for Part 3 where we reveal the architecture for true AI knowledge.

Have you seen an AI confidently stitch together unrelated facts like Leonard building a flawed theory? Let us know:

🌟 TrustGraph on GitHub 🚢
Join the Discord 👋

Part 1: The Memento Problem with AI Memory

Daniel Davis — Fri, 25 Apr 2025 21:58:31 +0000

We're drowning in takes about AI "memory." RAG is hailed as the silver bullet, promising intelligent systems that learn and retain information. But let's be brutally honest: most implementations are building agents that are drowning in data and suffocating from a lack of knowledge.

These systems excel at retrieving fragments – isolated data points plucked from documents and observations stripped of their origins. Ask it a question, and it surfaces a text snippet that looks relevant. This feels like memory - like recall.

But it isn't knowledge.

Real knowledge isn't just storing data points - it's understanding their context, their provenance (where did this information come from? is it reliable?), and their relationships with other data points. Human memory builds interconnected information networks while current AI "memory" approaches just hoard disconnected digital Post-it notes. We are mistaking the retrieval of isolated assertions for the synthesis of contextualized understanding.

Think of Leonard Shelby in Christopher Nolan's film Memento. Suffering from anterograde amnesia, Leonard can't form new memories. To function, he relies on a system of Polaroids, handwritten notes, and even tattoos – externalized fragments representing supposed facts about his world and his mission to find his wife's killer.

Today's RAG systems often operate eerily like Leonard. They receive a query and consult their "Polaroids" – the vector embeddings of text chunks. They retrieve the chunk that seems most relevant based on similarity, a fragment like "Don't believe his lies" or "Find John G." Unfortunately, like Leonard, these RAG systems lack the overarching context and the relationships between these fragments. It doesn't inherently know how the note about John G. relates to the warning about lies, or the sequence of events that led to these assertions being recorded.

And this fragmentation is where disaster strikes. Leonard, working only with disconnected clues, makes fatal misinterpretations. He trusts the wrong people, acts on incomplete information, and is manipulated because he cannot form a cohesive, interconnected understanding of his reality. His "memory," composed of isolated data points, leads him not to truth, but deeper into confusion, madness, and catastrophe.

An AI that can quote a source but doesn't inherently grasp how that source connects to related concepts or whether that source is trustworthy isn't remembering – it's echoing fragments, just like Leonard reading his own fragmented notes.

This fundamental flaw leads to confident hallucinations, an inability to reason deeply about causality, and systems that can misled. We're building articulate regurgitators, not truly knowledgeable thinkers.

We need to stop celebrating glorified search indices as "memory" and start demanding systems capable of building actual knowledge. Until then, we're just building better mimics, doomed to repeat the mistakes born from disconnected understanding.

Next time in Part 2: We dissect why this fragment-recall approach fundamentally breaks down when AI needs to reason, synthesize, or understand causality.

Does your AI feel like it knows things, or just recalls text like Leonard Shelby reading his notes? Reach out to us below:

🌟 TrustGraph on GitHub 🧠
Join the Discord 👋

The Symphony of the AI System

Daniel Davis — Mon, 14 Apr 2025 20:33:53 +0000

Beyond the Monolith: Why the Future of AI is a Symphony, Not a Soloist

For years, the science fiction dream and much of the AI hype cycle has revolved around a singular goal: building the one, giant Artificial General Intelligence (AGI). A single, monolithic model capable of learning, reasoning, and solving any problem like a human. It's a captivating vision, but is it the only path forward? Or even the right one for practical, powerful, and responsible AI?

It's not. The pursuit of a single, all-encompassing model overlooks the messy, beautiful complexity of intelligence itself and ignores the profound limitations inherent in monolithic approaches. The true future of advanced machine intelligence lies not in a singular soloist, but in a symphony of tightly interconnected, specialized software components working in harmony.

The Cracks in the Monolith

Trying to build a single AI model to "solve" human-level intelligence faces immense hurdles:

Garbage In, Garbage Out: Training such models requires unfathomable amounts of data and human intervention to evaluate the quality of the inputs and outputs which is subject to individual biases.
Brittleness & Lack of Nuance: A single model, can struggle with specialized tasks outside its core training distribution. It's the ultimate "jack of all trades, master of none," potentially failing when encountering edge cases.
Operational Nightmares: Deploying, managing, updating, and securing a single, gigantic model across diverse environments (cloud, on-prem, edge) is incredibly complex and inefficient. How do you provide fine-grained updates or tailored capabilities?
Explainability & Auditability Black Holes: Understanding why a monolithic model made a specific decision can be nearly impossible, hindering trust, debugging, and crucial safety checks.
Concentration of Power & Risk: Placing all intelligent capabilities into a single entity creates immense concentrations of power and systemic risk.

The Rise of the System: Intelligence as an Interconnected Network

Nature offers a better blueprint. The human brain isn't a homogenous blob; it's a highly specialized, interconnected system of regions communicating dynamically. Complex tasks emerge from the coordinated activity of these specialized parts. Similarly, the future of advanced AI lies in building systems that mirror this principle.

Imagine an AI architecture that functions less like a single giant brain and more like a biological nervous system – what we might call a Synaptic Automation System. This system possesses key characteristics:

Modular Expertise: Instead of one model knowing everything, the system leverages specialized "Intelligent Cores" – components encapsulating deep expertise, algorithms, or processing for specific domains. These cores are the seeds of adaptable skill.
Dynamic Synthesis & Deployment: The system doesn't just run pre-built applications. Based on the available Cores and the task at hand, it dynamically generates and deploys the necessary processing modules on the fly. Think of it assembling a specialized task force exactly when needed.
Emergent Learning & Adaptation: Faced with unique situations, the system doesn't just rely on past training. It can generate custom learning modules to analyze new data, identify patterns, and evolve its understanding over time through integrated feedback loops, constantly refining its capabilities.
Inherent Connectivity & Communication: Like synapses firing, components constantly communicate, sharing context and triggering actions across the system. This allows for holistic reasoning and complex workflow execution far beyond simple pipelines.
Transparency & Trust: Crucially, because the system generates plans and modules dynamically, it can also be designed to make these processes transparent. The 'reasoning' behind an automated workflow can be audited, allowing for verification, compliance, and crucial safety checks.
Safety First: Built-in mechanisms constantly monitor the system's actions, detecting potential harms or deviations from desired boundaries, enabling adaptive responses to ensure responsible operation.
Universal Presence: This entire intelligent system isn't locked to specific hardware. It's designed as a fabric that can be deployed consistently across any cloud, bare-metal servers, or edge devices, bringing intelligence wherever it's needed.

TrustGraph: Embodying the Synaptic Vision

This isn't just theory. Platforms like TrustGraph are pioneering this Synaptic Automation System approach. By focusing on dynamically connecting modular Intelligent Cores, synthesizing processes on demand, enabling continuous learning through feedback, ensuring auditability and safety, and running universally across infrastructures, TrustGraph demonstrates the power of this interconnected model over the monolithic dream.

The Symphony Takes the Stage

The future of impactful AI won't be a single, monolithic oracle attempting to know everything. It will be a dynamic, adaptable, and interconnected system – a symphony of specialized components working together seamlessly. This approach offers a path towards more scalable, resilient, trustworthy, and ultimately more powerful machine intelligence capable of tackling the world's complex challenges. It’s time to move beyond the monolith and embrace the power of the network.

🌟 TrustGraph on GitHub 🧠

Join the Discord 👋

Watch tutorials on YouTube 📺️

Stop Thinking AI Agents, Start Engineering Autonomous Knowledge Operations

Daniel Davis — Wed, 09 Apr 2025 19:24:21 +0000

Beyond the Buzz: Why Autonomous Knowledge Operations Matters More Than Just AI Agents

The tech world has been ablaze with talk of AI agents. We see demos of agents booking flights, writing code snippets, or summarizing articles. It's exciting, capturing the imagination with glimpses of AI performing tasks previously requiring human operations. But as we move from demos to deployment, simply thinking in terms of "agents" falls short.

The real paradigm shift isn't just about creating smarter tools (agents); it's about building systems capable of continuous, reliable, and goal-directed operations that are powered by deep contextual understanding. This is the philosophy of TrustGraph’s Autonomous Knowledge Operations.

What's the Difference? Isn't an Agent Autonomous?

An AI Agent, in its common definition today, is often:

Task-Oriented: Designed to perform a specific, often short-lived task (e.g., answer a question, draft an email).
Reactive: Primarily responds to direct input or triggers.
Component-Level: Can be thought of as a sophisticated function call or a smart script.
Potentially Isolated & Knowledge-Poor: Might operate with limited context or struggle to access and reason over the complex web of information within an enterprise.

While powerful, these agents often lack the deep knowledge integration, robustness, persistence, and manageability needed for mission-critical business functions. Running a complex business process isn't like asking an agent to write a poem; it requires continuous awareness, adaptation, reliability, and critically, intelligent use of relevant knowledge.

Autonomous Knowledge Operations, is a broader, more systemic approach where autonomy is directly fueled by intelligent information:

Goal-Oriented & Continuous: Focused on achieving and maintaining a desired state or objective over time. Action is driven by understanding the goal within its knowledge context.
Proactive, Persistent & Knowledge-Driven: Actively monitors, plans, and acts by constantly interpreting its environment through a rich knowledge base. It runs continuously, learning and adapting.
System-Level: Encompasses not just agents but the entire infrastructure, knowledge pipelines (RAG, KG, VectorDBs), integration points, and feedback loops required for sustained, intelligent operation.
Fueled by Deep Knowledge & Context: Leverages rich, relevant, and timely information drawn from enterprise sources. This requires sophisticated RAG pipelines with both vector databases and knowledge graphs.
Observable & Manageable: Designed with built-in monitoring, logging, tracing, and controls to ensure reliability, understand the knowledge-driven behavior, and allow for intervention or adjustments.
Reliable & Scalable: Built on enterprise-grade infrastructure capable of handling failures, scaling resources, and meeting performance demands for both computation and knowledge processing.

Why This Shift in Thinking Matters

Focusing solely on "agents" leads to several potential pitfalls in enterprise adoption:

The "Demo-to-Production" Gap: Cool agent demos often bypass the hard parts: robust knowledge integration, error handling, scalability, security, and monitoring needed for real-world value.
Context Starvation: Agents without deep, structured context – the kind derived from integrated Knowledge Graphs combined with Vector DBs – struggle with complex reasoning and nuanced tasks common in business. This is a knowledge access problem.
Infrastructure Nightmare: Managing dozens of agents and their disparate, potentially inconsistent knowledge sources, ensuring reliability, and providing consistent data access is an operational burden.
Lack of Trust: How do you monitor, debug, or guarantee the performance of agents acting on potentially incomplete or misunderstood information? Observability into the knowledge retrieval and reasoning process is non-negotiable.

Building for Autonomous Knowledge Operations: The TrustGraph Philosophy

This is precisely the philosophy behind TrustGraph. We realized that the conversation needed to evolve beyond just the agent itself to encompass the entire knowledge-driven system. TrustGraph is an Autonomous Knowledge Operations Platform designed to provide the foundational elements missing from simple agent frameworks:

Enterprise-Grade Infrastructure: It provides the scalable, reliable backend needed to run operations continuously, managing both computation and knowledge flows.
Integrated RAG (KG + VectorDB): It automates the deployment of sophisticated RAG pipelines, acknowledging that deep context and reliable autonomy stem from leveraging both semantic similarity (vectors) and structured relationships (knowledge graphs).
Unified LLM Access: It abstracts the complexity of dealing with multiple LLM providers, allowing the system to focus on applying the best reasoning to the available knowledge.
Full Observability Stack: It builds in logging, metrics, and tracing from the ground up, including insights into the RAG process, because trusting autonomous systems requires understanding how they arrive at decisions based on knowledge.

By focusing on the knowledge-driven operation rather than just the agent, we can build systems that don't just perform tasks but achieve persistent business outcomes reliably, efficiently, and intelligently.

The Future is Systemic and Knowledge-Rich

AI agents are a vital component of the future. But the true transformation lies in weaving these components into robust, knowledge-aware, observable, and continuous Autonomous Knowledge Operations. This requires a shift in mindset and tooling – moving from building smart tools to engineering intelligent, self-managing systems powered by deep understanding. That's the future we're building towards with TrustGraph.

🌟 TrustGraph on GitHub 🚀

Join the Discord 👋

Watch tutorials on YouTube 📺️

How-to Use AI to See Your Data in 3D

Daniel Davis — Mon, 30 Dec 2024 22:28:01 +0000

We all know the struggle. You're drowning in massive amounts of unstructured data, trying to make sense of the complex web of relationships hidden within.

That's where TrustGraph's Data Workbench 3D visualizer comes in. Forget clunky interfaces and limited perspectives. We're bringing the power of 3D visualization to your data analysis, giving you an intuitive, immersive, and interactive way to uncover hidden patterns and relationships.

Why 3D Matters for AI Developers

Everyone knows the power of GraphRAG. But everyone also knows the limitations of data visualizations in 2D. With 3D, you can:

Uncover Hidden Relationships: In a 2D graph, nodes can clutter the screen, obscuring vital connections. 3D lets you see the true depth of your data, revealing relationships you might otherwise miss.
Intuitive Navigation: Our brains are wired to understand spatial relationships. Navigating a 3D visualization is naturally more intuitive than panning and zooming across a flat surface.
Enhanced Understanding: The spatial layout of nodes in 3D can reveal clusters, hierarchies, and anomalies that are hard to spot in only 2D.
Interactive Data: Let’s be honest, a 3D visualization looks cool! Impress stakeholders with compelling and interactive visualizations that show the hidden gems of wisdom in your data.

Getting Started: Your 3D Data Journey

Before we can use the 3D visualizer, we must first deploy TrustGraph.

Step 1: Setting Up (If you haven't already)

Navigate to Configuration Portal to select all the components for your build.

Follow all the instructions on the Finish Deployment tab get TrustGraph running. The Data Workbench will be accessible at port 8888.

Step 2: Data Ingestion

Navigate to the Data Workbench now running on port 8888. Use the Data Loader 📂 to upload your documents. TrustGraph supports .pdf, .txt, and .md files. The data extraction agents will automatically process the files, extracting key information to build the knowledge graph and vector embeddings.

Step 3: Query the Data through AI Chat

No need for Cypher or SPARQL queries! Once your data is processed, you can perform GraphRAG queries with natural language in the System Chat.

The System Chat will query both the knowledge graph and vector embeddings to generate a response. In addition, the Data Workbench will display a set of nodes from the knowledge graph on the left side of the screen. Clicking any one of these nodes will allow you to explore semantic relationships.

Step 4: Exploring Your Data in 3D

Now comes the fun part! Once you’ve selected a node, you can generate a 3D visualization by either clicking GRAPH VIEW in the Data Explorer window or by simply clicking Data Visualizer from the Data Workbench set of tools.

Once in the 3D visualizer, you can interact with the data:

Zoom, Pan, and Rotate: Use your mouse to explore the space. Zoom in to examine clusters in detail or rotate to gain a new perspective on relationships.
Click and Drag: Click and drag on an individual node to reshape the graph.
Node Exploration: Click on individual nodes to see additional related nodes.
Relationship Explorer: Click on a relationship connecting two nodes to see a pulse travel, depicting the directionality of the semantic relationship.

Ready to Explore Your Data in 3D?

Ditch analysis in 2D, embrace the third dimension, and unlock the hidden potential of your unstructured data with TrustGraph’s 3D Visualizer.

Give it a try today! Include the Data Workbench through the Configuration Portal in your build and experience the future of data analysis.

We're excited to see how you leverage this powerful tool to push the boundaries of AI! Join the TrustGraph community to push data analysis to the future!

The Future of Agentic Systems Podcast

Daniel Davis — Fri, 15 Nov 2024 15:19:04 +0000

The founders of TrustGraph, discuss their journeys with big data, knowledge graphs, and data engineering. Knowledge graphs are hard to learn - no matter what Mark says, and he gives everyone a crash course on them, why querying graphs is tricky, and what makes for reliable data services. The conversation ends with a discussion of what makes for "explainable AI" and the future of AI security.

The 2024 State of RAG Podcast

Daniel Davis — Thu, 07 Nov 2024 16:25:11 +0000

Daniel Davis of TrustGraph and Kirk Marple from Graphlit discuss the 2024 state of RAG. Whether it's RAG, GraphRAG, or HybridRAG, a lot has changed since the term has become ubiquitous in AI. Where are we, where are we going, and where should be going are all answered in this discussion.

Are LLMs Still Lost in the Middle?

Daniel Davis — Fri, 01 Nov 2024 02:49:20 +0000

A few days ago, I talked about some of the inconsistency I've seen varying LLM temperature for knowledge extraction tasks.

What does LLM Temperature Actually Mean?

Daniel Davis for TrustGraph ・ Oct 28 '24

#ai #aiops #rag #opensource

I decided to revisit this topic and talk through the behavior I'm seeing. Not only did Gemini-1.5-Flash-002 not disappoint in producing yet more unexpected results, but I saw some strong evidence that long context windows still ignore data in the middle. Below is the Notebook I used during the video:

Notebook

What does LLM Temperature Actually Mean?

Daniel Davis — Mon, 28 Oct 2024 16:50:26 +0000

At this point, I thought I knew what temperature means for a LLM. A lower temperature increases determinism, reducing the likelihood of hallucinations or inaccurate responses. Google’s definition echoes this perception:

“The temperature controls the degree of randomness in token selection. The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Lower temperatures are good for prompts that require a more deterministic or less open-ended response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic, meaning that the highest probability response is always selected.”

That makes sense - but if temperature is so straightforward, why are my tests with Gemini-1.5-Flash-002 so nonsensical???

We’ve been looking into adding what we’re calling “document-level metadata” in the TrustGraph extraction process. While we did add this feature in release 0.13.2, I had been evaluating using a LLM to extract important entities and topics for the entirety of a text corpus. I normally set the temperature to 0.0 since this should produce the most accurate extraction. I ran an extraction with Gemini-1.5-Flash-002. Looked good - except for one problem, I had accidentally set the temperature to 1.0. I reran it at 0.0, and the results worked worse. What’s going on?

I’ve never run comparison tests with TrustGraph where I did nothing but vary the temperature, but I decided, why not? For a single document, I did 3 runs, varying only the temperature from 0.0, 0.5, 1.0, 1.5, to 2.0. Yes, the temperature of Gemini goes to 2.0. No, I don’t know why. For other parameters, I set top_p=1.0, top_k=40, and output tokens maxed out at 8192 for all runs. I also used a JSON schema object for the response type.

Given my understanding of temperature, I expected Gemini to extract more information, returning more objects as the temperature increased. I would think a more deterministic response would be more conservative in how much information would be extracted. Except that didn’t happen. Except, my hypothesis wasn’t really proven wrong either. In fact, I’m not sure what these results mean.

The first document I tested was the Roger’s Commission Report from the NASA Challenger disaster. That PDF extracts to 176k tokens, 17.6% of Gemini-1.5-Flash’s advertised context window. For each run, here’s the number of output tokens:

The second document was another NASA report on the decision making of the Columbia disaster. That PDF extracts to 24.4k tokens, 2.4% of the advertised context window.

The inconsistency of the first set of test runs is inexplicable. Most times, Gemini tried to extract more than the maximum 8192 tokens, returning an incomplete and invalid JSON object. Yet, what about run 2 when even at a temperature of 0.0 Gemini returned only 1511 tokens? Why did increasing the temperature to 2.0 decrease the output so dramatically? The data is so inconsistent, I don’t know where to begin to draw any conclusions.

The second document data is more consistent. For instance, at a temperature of 0.0, it returned the same amount of tokens all 3 times. When increasing the temperature to 0.5, the responses did increase as I predicted. And then there’s temperature 1.0 where the response amounts go down. Beyond 1.0, the responses mostly go down with one outlier at 2.0 where the responses were 2x.

With this data, can I draw any meaningful conclusions? Yes, I think I can.

Long context windows still aren’t reliable. Even at only 17.6% Gemini’s advertised context window, the behavior is shockingly inconsistent.
At a much smaller context, the temperature behavior seems to be more consistent, but still a bit mysterious.
For knowledge extraction tasks, temperature doesn’t work the way we think it should.

Sure, the consistency of those 3 runs where it returned the same output all 3 times seems great, but what if we want more? For knowledge extraction and graph building in TrustGraph, we’re trying to extract every important detail from the input document. We don’t want just facts, but any meaningful statements or opinions described in the text. It appears allowing the LLM to introduce some randomness in the response tokens produces more objects for information extraction. Bizarrely, I also noticed that increasing the temperature seemed to return more people than at lower temperatures. Based on cursory glances, none of the responses seemed to be producing hallucinations, but that experiment will require more testing.

Download TrustGraph on GitHub

🚀 Get Started

Join the community