From Vector Search to a Cross-Domain Ontology Graph: How 2asy.ai Reads Tariff News

#ai #machinelearning #rag #knowledgegraph

A new tariff briefing went up on 2asy.ai this week, and for the first time you can see the graph it was built from, right there on the page. That graph is the visible end of a quiet rebuild. The retrieval behind 2asy.ai went from plain vector search, to a simple per-article graph, to a cross-domain ontology Graph RAG. This is what each step bought, and what is still missing.

Where 2asy.ai started: plain vector RAG

The first version of 2asy.ai was ordinary vector RAG. I chunked each trade and tariff article, embedded the chunks, and retrieved by similarity. For a question like "what is happening with steel duties," that works. The system finds passages that look like the question and hands them to a model to summarize.

What vector RAG cannot do is answer "why." Similarity retrieval finds text that resembles your query. It does not know that a sunset review on one product is connected to an antidumping order on another, or that an action against one country pulls in suppliers in a second. Each chunk sits alone. The causal structure that makes trade news worth reading is exactly the thing embeddings throw away.

The next step: a simple per-article graph

So I moved to Graph RAG. For each article I extracted entities, events, and the relations between them, and stored them as a small graph instead of a bag of chunks. This was a real improvement. Inside a single article you could now follow a chain: this investigation led to this duty, which affected this set of producers.

The limit showed up between articles. Each document produced its own little graph, and those graphs did not talk to each other. The "South Korea" mentioned in a steel story and the "South Korea" in a tire story were two unrelated nodes. There was no shared vocabulary of entity types and relation types, so the system could not connect a cause reported in one article to its effect reported in another. The graph was per-document, not per-world.

What cross-domain ontology Graph RAG changes

The current version of 2asy.ai runs on a shared ontology. Every entity, event, and relation is extracted against the same fixed set of types and the same canonical relation vocabulary, and entities are resolved across documents so that the same real-world thing becomes one node no matter how many articles mention it. That is the cross-domain part. A "Sunset Review" is the same Sunset Review whether it shows up in a methionine story, a tire story, or a steel story, and the edges between events can now cross from one domain to another.

You can see the result on the latest briefing. The causal map for the June 2 story, "US Trade Remedies Expand Amid Global Investigations," puts a Sunset Review node at the center as the root, with directed relations fanning out to the entities it touches. Some edges carry qualifiers like "via South Korea" or "via Taiwan," which is the system recording the path a cause took, not just that two things are related. The full extraction for that one story is around 41 nodes and 60 edges, and the view focuses on the root and the bridge nodes so the chain stays readable.

The graph is thin right now, and that is expected

If you open it today, the graph will look sparse. For this story it is a few dozen nodes, and some of them are still coarse: a node typed as "unknown" rather than placed cleanly in the ontology, or an entity that is more general than it should be. I want to be honest about that rather than hide it.

This is a data-accumulation problem, not a design ceiling. A cross-domain graph gets better the more documents flow through it, because resolution and typing both improve with volume. The first time an entity appears it has little context, so it lands as a weak or untyped node. The tenth time it appears, across different stories, it has enough surrounding structure to resolve confidently and connect to the rest of the world. The shape is already correct. What it needs is time and throughput, and both are accruing as the pipeline keeps running.

Why I built it this way

The whole point of 2asy.ai is causal chains: not "here is some news that matches your query," but "here is how this trade action connects to that one, and to the producers and countries in between." Vector RAG cannot represent that. A per-article graph can represent it inside one story but not across the corpus. A cross-domain ontology graph is the first version where the connections can actually span the whole body of news, which is where the interesting causality lives.

All of this runs on local hardware, an RTX 4090 and an AMD W6800, with open models doing the extraction and resolution. There is no cloud inference bill behind it. If you want to see where it stands, the latest briefing and its graph are live at https://www.2asy.ai/ . It is early, and it will get denser as the corpus grows.