LLM's are there for a long time now, since ~2020. Even though these Multi-model (image, audio, now video) is introduced, but there is little it can do. Yes, it can write code, book your flight, do the research for you but it is not truly performing action.
Define Action
An action is something that changes the environment around it and the third person can visually see it.
Large Action Model (LAM)
Here are the basic checkpoints one needs to cross to be considered as LAM:
- I should be able to understand the screen, no matter what app is open
- Should have ability to control cursor, and add in keyboard strokes of its own
- Understand the user input and convert them into doable action to be executed
- If a completely new user-interface appear it should be able to navigate it easily.
Why do we need one?
Here is a scenario, You wanted to send a mail to friend and here are the tasks you need to perform in order to achieve it
- Click on start menu
- Open a browser
- Search for your email page
- Compose an email and add recipient
- Hit send
Now imagine you have a LAM and now here it how it goes
- Prompt: "Send a email to RECIPIENT saying MESSAGE..." ...
That's it, the model does it for you.
Wait, Wait, Wait ... Why the model is using Screen Capture and controlling the my personal computer to send an email, Why can't I hook up an email service API and call it off. The thing is in this case you can't do anything more than sending emails, Understood.
The Unlimited Use Cases of LAM
The LLM has changed the world and recently it has for changed the web searching. Most people are gonna use ChatGPT instead of Google search. Also services like Perplexity is making a strong move. Adding LAM to this stream of river is going to chaange how we intract with computer.
Above I just gave a email example, here are the list of things it can do if built right
- emailing, messaging, posting
- searching and finding answer (also finding that lost file in your file manager)
- Can do basic designs on Blender
- Edit videos for you
- If it is powerful enough it may be able to run a YouTube channel for you
....and the list goes on
One thing to remember we are going to get there.
How LAM is going to do GOOD?
How this tech going to benefit me. I going to give it full access to my personal computer, how do you make sure it doesn't do anything shady.
That why I am planning to build a LAM which should be able to run locally on device, and every action should be approved by the user.
Top comments (0)