As with any overhyped technology, I think it's wise to be a bit skeptical about claims made by companies pushing large language models (LLMs).
The eagerness to add AI to everything over the last couple of years reminds me of the hype around "big data" when that was a thing. The technology was useful, but teams were uncritically adopting them even if more established technologies would have been fine in their business context. There is definitely pressure to do the same with LLMs and AI in general, but as technologists we should consider what these things are actually good at and what they're not good at.
With that in mind, here are some observations from the sidelines, based on my personal usage of ChatGPT as a senior developer, how I've seen LLMs being misused, and my attempt to follow along with the general discourse.
Reasonable use cases
Speeding up work tasks
If you don't mind some manual review, LLMs are good at things like:
- coming up with names for things
- classifying items in a long list (example)
- formatting data (e.g. converting to CSV/JSON, or different date formats)
- data extraction from unstructured text (e.g. email addresses or URLs)
- rephrasing or adjusting the tone of your writing
Code assistants
LLMs can be good for quick prototyping, providing you can understand the code they are generating. You will likely have better results if you split the task into very small steps and commit often. Perhaps consider using the Mikado method with the LLM.
If AI is generating the code, I believe you should write the unit tests yourself, so that you are forced to check its correctness.
Be careful not to use code assistant tools or paste non-public code into LLMs without your employer's (or the copyright holder's) permission.
I think code assistants are a bad idea for learning new frameworks and libraries, for several reasons:
- it discourages you from reading the docs and forming a good mental model of how the thing works
- it's not always smart enough to fix bugs for you or explain why your code isn't working
- you can't recognise when the coding style is outdated or there is simpler way of doing things
I think refactoring is best done by hand unless you want to apply a single refactoring many times across a large codebase. AI generated refactorings are not safe and need checking for correctness.
Retrieval-augmented generation (RAG)
A RAG is a multi-step process that first uses word vectors to fetch content from a knowledge base, and then feeds it to an LLM to answer a user's question. For question answering, I'd expect this to outperform traditional search in cases where information is scattered amongst a lot of similar looking documents, for example slack messages or helpdesk tickets. It only really makes sense if your knowledge base is large enough that it would be costly for a technical writer to trawl and summarise.
I'm a bit skeptical of building such systems in-house though - as opposed to something like RunLLM - it feels like this effort would be better invested in improving your own product/service or its documentation.
Questionable use cases
LLMs are not oracles
LLMs are not good at:
- doing research for you
- communicating factual information
- weighing up evidence
It is nonsense to ask an LLM for opinions on ideas, because LLMs can support any position depending on their prompt and context.
Building user-facing services on top of LLMs is risky
- LLMs are costly to train and run due to the amount of compute required. This has a high energy cost, to the point that big tech companies have walked back their commitments to carbon neutrality in order to expand data centres. I wouldn't be surprised if companies hike up prices as the technology matures.
- LLM outputs cannot be trusted to be free of copyrighted or sensitive data without more transparency over how they were trained
- Allowing LLMs to act as "agents" is open to abuse from prompt injection attacks, and they can be misled by untrustworthy information
- LLMs will happily lie to customers
- It might be possible for AI to perform more complex reasoning by chaining many LLM operations, but I think this is unproven and expensive at this point(?)
Top comments (0)