What you absolutely cannot vibe code right now

The fellow hoping to “generate an operating system” faces many challenges. LLMs are trained on a mountain of CRUD (create, read, update, delete) code and web apps. If that is what you are writing, then use an LLM to generate virtually all of it — there is no reason not to. If you get down into the dirty weeds of an algorithm, you can generate it in part, but you’ll have to know what you’re doing and constantly re-align it. It will not be simple.

Good at easy

This isn’t just me saying this, this is what studies show as well. LLMs fail at hard and medium difficulty problems where they can’t stitch together well-known templates. They also have a half-life and fail when problems get longer. Despite o3’s (erroneous in this case) supposition that my planning system caused the problem, it succeeds most of the time by breaking up the problem into smaller parts and forcing the LLM to align to a design without having to understand the whole context. In short I give it small tasks it can succeed at. However, one reason the failed is that despite all the tools created there are only about 50 patch systems out there in public code. With few examples to learn from, they inferred that unified diffs might be a good way (they aren’t generally). For web apps, there are many, many examples. They know that field very well.

What to take from this? Ignore the hype. LLMs are helpful, but truly autonomous agents are not developing production-level code at least not yet. LLMs do best at repetitive, well-understood areas of software development (which are also the most boring). LLMs fail at novel ideas or real algorithmic design. They probably won’t (by themselves) succeed anywhere there aren’t a lot of examples in GitHub.

Donner Music, make your music with gear
Multi-Function Air Blower: Blowing, suction, extraction, and even inflation

Leave a reply

Please enter your comment!
Please enter your name here