Inceptive techniques to pull summaries and e-learning out of vocational training video series

#anthropic #ai

Hello,

I wanted to share the result of a four month exercise in taming Anthropic Claude Sonnet 3.5 (after migrating from the "AI on steroids" Gpt4o.

The project is to summarize and pull e-learning content out of a series of thousands of vocational training videos in the wheelchair and mobility device custom seating vertical using multimodal inference.

Although there is room for improvement on the foundation model side, I think the output I'm getting is finally salable. I'm seeing brief glimmers of excellence and mostly acceptable output. Like working with a petulant teenager, sometimes the AI gets sullen and refuses to work, sometimes it does something completely different than what I asked it to do, mostly it gets it right enough, and sometimes (rarely but consistently) it comes up with fantastic and brilliant insights.

I have open-sourced my inference management logic, because using the raw API wasn't cutting it when I had to create long summaries of structured summarized video content (I use inceptive techniques where I prepare a prompt to run inference that writes a script that creates a prompt to run inference to produce part of an overall result, then I write multilevel merge logic to make sure I catch everything useful coming back from the AI and assemble it into a larger more cohesive output. The AI is really bad at knowing when it is actually done, so I've had to go through some backflips here.

https://gist.github.com/mdear/4cd8ee0bfb807840c64b34a6569a8949

https://ai.stackexchange.com/questions/40753/how-to-generate-original-training-videos-based-on-existing-videoset

https://readmultiplex.com