Hello,
I wanted to share the result of a four month exercise in taming Anthropic Claude Sonnet 3.5 (after migrating from the "AI on steroids" Gpt4o.
The project is to summarize and pull e-learning content out of a series of thousands of vocational training videos in the wheelchair and mobility device custom seating vertical using multimodal inference.
Although there is room for improvement on the foundation model side, I think the output I'm getting is finally salable. I'm seeing brief glimmers of excellence and mostly acceptable output. Like working with a petulant teenager, sometimes the AI gets sullen and refuses to work, sometimes it does something completely different than what I asked it to do, mostly it gets it right enough, and sometimes (rarely but consistently) it comes up with fantastic and brilliant insights.
I have open-sourced my inference management logic, because using the raw API wasn't cutting it when I had to create long summaries of structured summarized video content (I use inceptive techniques where I prepare a prompt to run inference that writes a script that creates a prompt to run inference to produce part of an overall result, then I write multilevel merge logic to make sure I catch everything useful coming back from the AI and assemble it into a larger more cohesive output. The AI is really bad at knowing when it is actually done, so I've had to go through some backflips here.
https://gist.github.com/mdear/4cd8ee0bfb807840c64b34a6569a8949
Top comments (0)