LLM operators, an approach that opens up new possibilities

Abdallah Meddah — Fri, 21 Jun 2024 16:12:04 +0000

I am sure that like many of us, we can't wait till the day AI finally replaces us so we can finish all our side projects. However, and that's my opinion, I don't think we are there yet. My intuition tells me that the LLMs have been able to model a function that takes in many words (tokens) and outputs one word, I am not sure they can do abstractions and build upon ideas like humans do. Anyhow, I also think that LLMs bring something new to the table, something that we have not been exploiting correctly so far, again, in my opinion.

When processing data using LLMs the community has been very inventive over the last two years, coming up with a bunch of prompting techniques and RAG techniques to provide the necessary context to extract something valuable from these LLMs. However, from what I have seen from my workplace and after discussing with a few people, it turns out the results are never entirely satisfying, and all that LLMs are useful for today is acting like an enhanced search engine.

I believe that we can use LLMs more than just as search engine, I believe we can write operators that leverage these LLMs to operate on data we don't know the exact structure of. For example, say you're scraping a web page for a certain date, the actual HTML might change a lot changing the selectors used, but the page always shows the date you're looking for in a way comprehensible by humans, if you write an algorithm that relies on the HTML structure or a specific wording, it won't necessarily always work. But what if you feed the content of the page to an LLM, it will be able to effectively get you the correct value each time. And that's why I wrote a library in JavaScript that contains a few operators that leverage LLMs to process data. This is an example of a text you might want to get from an HTML:

console.log(
  await Unsure("<html><body><div class=\"scrapable\">Target Content<div></body></html>")
  .flatMapTo("contain of the div with the class scrapable")
); // this will yield "target content"

With this, we can process data that is not well-polished and doesn't necessarily have a consistent structure.

We can also evaluate data that we don't know entirely, say you want to get better feedback on your app than just a 5-star rating, and you want your user to leave a comment, and according to comments you prioritize the bug backlog ( I know you have one, just like the rest of us ). You can do something like that

console.log(
  await Unsure(userComment)
  .categorize([
                "bug ticket 139",
                "bug ticket 639",
                "bug ticket 420",
                "none"
             ])
); // this will yield "bug ticket 639" for example if the user comments about a bug that's described by ticket 639

I'll give one last example and then I'll give you a link to the library where there are other examples.

Suppose you have to filter a list of documents by keeping only the ones that talk about a legal subject that's between two certain entities in a certain year, you can simply do something like this:

if(await Unsure(document.text)
  .is("A legal document between entity1 and 2 that happened in 2024")
) {
   wantedDocuments.push(document.id);
}

And far more cases where we can leverage LLMs to process data that we couldn't have processed correctly before. Here is the link to the library in javascript Unsure and here's the python package. It's in javascript and python so far, I am currently re-writing it in Rust and Go.

Thank you for reading!

DEV Community: Abdallah Meddah

LLM operators, an approach that opens up new possibilities