Dev Week 17

Posted on 2024-12-13 by violet — No Comments ↓

This post is actually being written 2025-01-01, but I'm backdating some posts to make up for the weeks I missed. This post is about OpenAI's recent announcements of its o1 models, which makes it all the more puzzling that I'm backdating something to a time before it became known/relevant to the world.

The typical end-user AI that most people have access to are still pretty dumb and I wouldn't count on it getting a whole lot smarter. I'm not saying it won't; I'm saying I'm keeping my expectations low. However, I always thought it was a bad idea to feed input directly from the user, to an LLM, and provide the output directly back to the user. Even with all the attempts at "guardrails", this makes a lot of assumptions of trust in both the user and the AI. But researchers have shown that AI can be made to spill their secrets or act/speak against their safety directives.

I always felt that there needs to be some smaller, simpler AI watching the output as it comes out of the model to check for basics such as: is the output response actually addressing the input request? Is its reasoning sound? Is it hallucinating? This seems pretty obvious to me, and is apparently one of the biggest breakthroughs recently for OpenAI to make their model produce many (hundreds? thousands?) of candidate outputs, and have another model which was trained directly on assessing the output of other models to pick the best (or at least most sound?) between all the responses. My idea would have involved a more programmatic/software approach with more communication overhead, but I'm certain those smart people over there could figure out how to integrate it all into a single model architecture and reduce communication between the components if that's what they're going for.

Apparently their new stuff is blowing everything previous out of the water. Every time a new benchmark gets created with the intention of stumping the models for a while longer, it gets beaten within 1-2 months.

The foreshadowing to OpenAI's release of these models was Sam Altman's earlier unsupported claim that we could be hitting AGI soon. This originally sounded like a way to appease shareholders, but now we know what he was so excited about.

This raises a million possible next conversation points, from politics to capitalism to marginalization to World War 3, but I'll leave it at that and get to my next back-dated post.

Dev Week 17

Leave a Reply Cancel reply

Recent Posts

Archives

Categories

Pages