Ask HN: Imagine coding LLM's 1M times faster; what uses might there be?
Unlimited speed would probably help things like alphaevolve. Given a codebase / paper / idea and some definable "tests" or "goals", go nuts. Evolve, adversarial, n x m tries w/ documentation, n x m tries w/ papers in that field, etc.
For the moment, though, I'd take a "smarter" but slower model over x times faster. The current models are plenty fast already. They pump out 300-500 LoC files in seconds/minutes. That's plenty speed.
I think the answer is obvious so maybe you're looking for something different.
More perf means more attempts in parallel with some sort of arbiter model deciding what to pick. This can happen at the token, prompt, or agent level or all of them.
$5000 hardware can run GPT 120B at 40-60tps. Which is plenty to code actively through the day.
You could instead use Cerebras to get 3,000 tps; but people really dont do that according to stats. 100x the speed but its not like everyone is rushing to their service.
The speed of LLMs after the advent of MOE has hit a good spot. What we now need is smarter.
Well with the current way things are, where the output is only as good as the input, you’d probably just get noise.
However for search query dynamic on the fly and other stuff it would be a game changer.
I’d love to have millions of one off ai gen code that auto updates stuff quickly for personalization at scale.