Kazuya

Posted on Dec 11, 2025

AWS re:Invent 2025 - How Mary Technology is building the legal Fact Layer for agentic AI on AWS

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - How Mary Technology is building the legal Fact Layer for agentic AI on AWS

In this video, Dan, CEO of Mary Technology, explains why large language models fail at legal document review for dispute resolution. He identifies four key problems: lack of training data due to sensitive information, LLMs being compression machines that lose critical legal nuance, facts not being readily extractable from uploaded data (like disambiguating "A. Smith" or "PT" for patient), and lawyers needing confidence verification rather than just answers. Mary solves this through a fact manufacturing pipeline that treats facts as first-class citizens, extracting entities, dates, and events with full explainability and provenance tracking. The platform achieved 75-85% time reduction in document review and a 96 NPS score, working with major firms like Arnold Bloch Leibler.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Why Large Language Models Fall Short for Legal Document Review

Hello everyone, my name is Dan and I am the CEO and co-founder of Mary Technology. We're a legal tech firm based in Sydney, but now with a global presence, and we help law firms automate document review. That's a major challenge for large language models, and I want to talk to you today about how Mary is trying to solve that. Just before we start, can I ask how many people here are heads of legal operations inside of large enterprises or have your own law firm? Yeah, okay, great, cool.

So here's the problem. Large language models, even with Retrieval-Augmented Generation or agentic frameworks, are not fit for purpose for legal dispute resolution workloads. There are a number of problems. I'm going to talk about four today, the first one being the training problem. So what do I mean by the availability of training data? The sorts of data that we work on every day for law firms and legal teams that deal with disputes are very sensitive, and so this sort of information isn't available publicly, and you certainly can't collect and train on that data when it contains sensitive information from law firms' customers or your internal employees.

The second challenge is that there isn't a single right answer to tell a large language model to sort of trend towards, because there's always at least two sides of a matter. And so you can't just say, hey, here's the right answer and try and move and probe towards that. You have to include and understand all of the potential available narratives and correct answers depending on which side you're representing.

The second problem, and maybe the biggest one, is that large language models are compression machines. That's what they do really well, and I'm just going to talk to you a little bit about some of these stages of compression. So the first thing that a large language model does when it receives a document is it ultimately turns that page into an image. And then if that image has words on it, or an image, even if it's actually just a picture, it will still ultimately convert that into text. But particularly in legal documents where you actually have lots of words there, it's going to take away and remove some of that nuance, legal nuance and important meaning that may be present there, so things like handwriting or a small note on the side.

Once it's turned that document into text, it then ultimately turns into tokens and then into embeddings, then a contextual compression, and ultimately something that's capable of chunking and summarization. Each layer of that compression removes meaning, and it's that meaning and nuance that's so important to law firms and anybody trying to understand a dispute or facts, which is sort of the core of a dispute.

Here's what they're really good at though. We're not saying large language models are bad, we're actually saying they're really, really good, but they're particularly good at being generally capable. So they handle a massive range of tasks, they scale across massive corpuses of documents, and they generate fluent, plausible text without deep preprocessing. So they're generalists and they're really good at that, and so you can, as you can probably tell, this slide was written by an LLM. It's done its lovely emojis and done a very good job of telling me what the slide should say.

The Challenge of Facts: Context, Provenance, and the Limits of Compression

The third problem, facts are not in the uploaded data. This is a bit of a strange one and something that might take a small amount of explanation, but the facts are what is at the core of a legal matter or a dispute. So I'm just going to give you an example. Here's a fact that might be present inside of a document. It gives the date, which is wonderful, and then it says A. Smith reported an error.

So here are a few of those challenges about why you can't just extract this piece of data and assume that it's ready in its current state to be used for downstream processing using AI. What if there are multiple people within that corpus of documents called A. Smith? One could be Alice, one could be Andrew, but in order to make this a useful or meaningful fact, you actually have to understand which A. Smith that is. And by just using a large language model, you can't do that.

We're in the US today, so where we're from, Australia, these two dates are completely separate, so this could be the 5th of January, February, March, or yeah, there we go, 3rd of May. So you've got to try and understand what is the context of this matter, so that you can actually say, great, it's probably this date. In this matter, is a reported error on this date even important?

That's obviously more to do with context, so maybe this isn't a relevant fact for you to understand and dig into more. It's also fragmented. How many times is this particular fact mentioned throughout all of these documents, and do they potentially conflict with this fact or support it? And finally, provenance. What kind of document did it come from? If this is, for example, a primary document, maybe it's come from somebody detailing what a CCTV camera saw, or if it's from hearsay from somebody's statement that was given to a police officer, for example, all of those have different meanings as to how relevant and meaningful they are when related in court or in a litigation process.

Here's another example. This is actually from my co-founder Rowan's medical documents, and there's a couple of challenges in here that I'll actually hopefully show you how Mary, our platform, deals with it just a little later. But the one thing I'm going to point out here, and this is incredibly common, is PT. What it actually means here is patient. Now, what would happen if you put this fact into a large language model? Well, it wouldn't understand that it's talking about Rowan, who is the person it's actually referring to, or rather the patient itself. So you need to actually converge and correct things like this piece of information so that you can actually leverage that information later when you're trying to use it within a fact or rather document review process.

This is a more drawn out example of something that a large language model would do incredibly poorly in comparison to a system that is designed and built to support litigation workflows and the document review that's as low fault tolerance as law. So imagine I've written a letter. Within it, I don't write my name, and I don't say who I'm writing it to. And I detail out a crime I've committed, but I don't necessarily say it in the simple fact that, you know, I don't put that I stole that car. I say it in some colloquial way.

The challenge for a large language model, if that document is placed in between 4,000 other documents, if you were to ask a large language model, did Daniel steal a car, well, it wouldn't ever be able to say yes because Daniel's not mentioned, and I didn't say that I stole a car, and I also don't say who I've written it to. What Mary, or rather what any tool that's going to do this type of work in a document review process in law needs to be able to do is do things like look at the handwriting of that letter. Is that handwriting present in any of the other documents? Can we understand who actually wrote that? Also, maybe I wrote on it a date when I went to the park. We need to be able to understand that in this other document over here, Daniel said that he went to the park on that date and then try and understand that, hey, actually, maybe we can draw a conclusion here and say, brilliant, maybe you should review whether or not this is Daniel because we've got some supporting evidence. So that's an example of where large language models just fall short in this type of work.

Building Confidence Through Verification: Treating Facts as First-Class Citizens

And the last problem is that even if I did all of that fact extraction perfectly, that's not really what lawyers and legal teams need when undertaking an investigation. They actually need to feel really confident about those facts and the narrative that they're going to present on their client's behalf or for their company. So here's an example. Is anybody a lawyer? OK, well, we've got one, brilliant.

So this is just an example, hopefully that gets you thinking as to why this is so important, but it's an exercise, the perfect letter of demand. So imagine a large language model spits out to you and says, here's a perfect legal document, whatever it is, and I've done the work for you. I've gone through this entirely massive corpus of documents. I've extracted all of the facts. I've reviewed what's relevant in the context of the case, and I'm now going to give you the perfect document, in this case, a letter of demand. I can assure you it's the ideal letter to file. It's supported with the perfect evidence. It's in your template that you normally use, all of that good stuff. Please now go and file it with the other side or with the court. Would you go and file that?

That's the correct answer, good. You can't because you need to, ultimately you have an obligation to whoever it is that you're representing, but more importantly, you have a responsibility to make sure that you're confident about doing that, with the action that you're taking. And so, unlike a large language model that is perfectly built to receive a question and then deliver a correct answer, what's required in this type of work, in this document review and litigation workflow, is something that doesn't know what the question's going to be, yet can understand all of the facts and give you all of the potential narratives for you yourself to review and verify and become confident in.

So how do you fix all of these problems? Well, in a way that large language models simply don't like, because it's incredibly process heavy and it's not a generalized task, it's very specific.

And the first thing you've got to do is treat facts as first-class citizens. So in the same way that a large language model says the most important thing to us is having an incredibly efficient embeddings model, a fact review platform needs to have the best fact model and say, right, great, we're gonna take those facts and we're gonna process them and make them, put them through this manufacturing pipeline that's incredibly heavy and deliver to you something that you can rely on, and then ultimately verify, which is why you need a world-class review and verification experience. This is where the lawyer or the team representing or trying to undertake this investigation, this is where they go to review the facts that have come out, build their narratives, and more. And finally, and this is maybe the one piece that I think is missing from what I've spoken about before, is you need to then take this layer of facts that you can feel confident about and pipe it through to these downstream AI applications, so things like OpenAI or any other unified interface, you can just pipe them in there and have them working.

Mary Technology's Solution: A Fact Manufacturing Pipeline for Legal Workflows

Okay, so I've got a short video to show you how we've solved some of these problems. As a lawyer, when you receive a case, your first goal is to get the facts straight. But this is never straightforward. It means digging through endless emails, PDFs and records, splitting documents, cross-checking dates, piecing together a clear timeline. It's slow, it's manual, and it can take anywhere from hours to days before you've even started the legal work. We call this fact chaos.

But what if the moment a case landed in your inbox, everything was set in motion? We could take the attached documents in the email or find uploaded documents in the tools you already use, then scan and process them. The messy bundled files could be split into clear, structured documents. They could be categorized, renamed, and seamlessly organized back into your workflow, exactly where you need them.

But what if organizing documents was just the start? What if we could unblock you completely so you can get started on the real legal work? We can pull key entities from every document, names, businesses, and their role in the case, giving you instant insight into exactly who matters. Find and capture significant dates when events occurred. Get a concise case summary, distilling the entire matter into a few clear paragraphs. Identify gaps that need assessing, detect possible data leaks, build a timeline of events, and extract any other key details relevant to your case.

Then bring all these insights together in a single dashboard, so anyone can get a firm grip on a case in minutes, even if they've never seen it before. Delve deeper with generated chronologies, surfacing only what's most relevant to your case. Invite experts to work alongside you and your colleagues in real time and draft directly into the tools you already use. Because Mary connects with your existing systems, it adapts as new evidence, events, and documents emerge, keeping your case aligned every step of the way. When the facts are clear, decisions are faster. Fact chaos, solved.

Okay, so just to conclude what I'm trying to get at there and hopefully that you could see in the video, it requires a novel approach. And interestingly, we couldn't use what most people can, which is Retrieval-Augmented Generation or agentic workflows to just go into the documents and extract the facts that are meaningful and present them to a user. So that's what I'm sort of saying up here, we can't just use good enough, it has to be brilliant.

And so we built a fact manufacturing processing pipeline, where we extract every event, entity, actor, issue, loads of other stuff. Ultimately imagine a fact as like an object, where it has lots of metadata underneath it that allows you to build relationships and construct a case, almost as a digital case as an object. So it will then do things like tell you whether or not a fact contradicts with another. And then the important part here is that

every piece of that metadata underneath that object has to be explainable. So we'll surface and expose any rationale if we make a single decision. If we make a decision, we tell you a date, we're going to tell you how we got to that date. If we're going to tell you something that's relevant, we're going to tell you how it's relevant.

Only after producing that high quality fact layer do we then use these more traditional, or not traditional, very new technologies, but the more standard technologies such as RAG and agentic frameworks. And the result is a persistent auditable fact layer that you can rely on both in the platform itself when you're doing that investigation or downstream when you want to pipe that information down and use it when you're drafting or other associated legal tasks.

So I'm just going to show you very briefly what the platform looks like for a single fact, just to highlight that challenge before when I spoke to you about patient. So you can see here there's a fact at the top. So you can see that there's a date with a time, and it's talking about a chap called Rowan McNamee is reassessed, swelling to right bicep remains. You'll notice that that's an incredibly summarized and concise fact. That's because that's what lawyers need. They need to be able to have a look at all of these facts because the majority of them won't be relevant.

So if we just zoom in on, I've, imagine I've pushed my mouse over to the right-hand side and I've hovered over that relevance, and it's going to tell me, the entry focuses on a separate medical issue. Now bear in mind, I know a lot of my examples have been in personal injury, but this is for employment or any type of law you can do this with. But in this particular case, personal injury. So it's going to tell me why this isn't relevant. But then I'm able to dive deeper if I think it might have some relevance.

You can see I can pull up the actual document on the exact page and space where that fact has come from, and I can also rely on Mary to give me more rationale as to how it came up with this fact and where there's more details that I can replace the fact with if I want more information. But you'll notice that this handwriting's terrible. I mean, well, it's pretty good handwriting, but yeah. It works on unstructured data primarily rather than things like contracts, where all of the information is very easy to get out. We have to focus on the documents that are really difficult.

But the reason I bring this up is because if we have a look in that document, that's where it's from, PT or patient. Well, we don't just rely on that, and this is just one of those elements where we correct the fact as we go through that pipeline. We say Rowan McNamee, so that ultimately when we pipe this fact down into another downstream AI capability, it knows it's looking for Rowan McNamee. So when you say, hey, did Rowan ever go into a hospital with this, it can say yes and be confident, and you can go directly back to where that was found.

So just quickly on on where we're at in our journey, we work with many of the largest firms now in Australia, including Arnold Bloch Leibler, who's one of the largest law firms in the world, both here and over in the UK and everywhere else. But we're bringing on more firms every single week. Across all of our customers, we have achieved a 75 to 85% reduction in time spent on this, probably the biggest bottleneck in litigation, which is document review. It's where so much of the time is spent, it's where so much of the cost is accrued, and we're reducing it significantly.

And overall, we've achieved a 96 out of 100 NPS score. People really love using Mary because this is one of the most difficult, annoying, frustrating jobs that you can do as part of this process, and so people love it. Here's just, I'm just going to leave this up on the screen briefly. It is a little bit Aussie, but this is, and I've had to redact the name, which is why there's a little dot here, but that's what one of our customers has said about how they use Mary.

So I'm open it up to questions if anybody's got any, but that's Mary Technology and how we're building this fact layer. Any questions? No? Cool. Thank you

; This article is entirely auto-generated using Amazon Bedrock.