Things I've learned in my 7 years implementing AI
The one issue is the accuracy of these AI models, which is that you can't -really- trust them to do a task fully, so that makes it hard to fully automate things with them. But the other is cost. Anyone using these models to do something at scale is paying maybe 100X over would it would cost in compute to run deterministic code to do the same thing. So in cases where you can write deterministic code to do something, or build a UI for a user to do it themselves, that still seems to be the best way. Once AI gets to the point where you can fully trust some model, then we've probably already hit AGI and at that point we're probably all in pods with a cable in our brainstems, so who cares...
I like the engineering part at the top but projecting AI perspectives blindsided through the lens of LLMs is effectively "looking backwards".
So this is nice
> productionizing their proof-of-concept code and turning it into something people could actually use.
because it's so easy to glamorize research, while ignoring what actually makes ideas products.
This is also the problem. It's a looking back perspective and it's so easy to be miss the forest from the trees when you're down in the weeds. I'm talking from experience and it's a feeling I get when reading the post.
In the grand scheme of things our current "AI" will probably look like a weird detour.
Note that a lot of these perspectives are presented (and thought) without a timeline in mind. We're actually witnessing timelines getting compressed. It's easy to see the effects of one track while missing the general trend.
This take is looking at (arguably "over") LLM timeline, while missing everything else that is happening.
> Creating AI models is hard, but working with them is simple
I'm not disagreeing with the overall post, but from closely observing end users of LLM-backed products for a while now, I think this needs nuance.
The average joe, be it a developer, random business type, a school teacher or your mum, is very bad at telling an llm what it should do.
- In general people are bad at expressing their thoughts and desires clearly. Frontier LLMs are still mostly sycophantic, so in absence of clear instructions they will make up things. People are prone to treating the LLM as a mind reader, without critically assessing if their prompts are self-contained and sufficiently detailed.
- People are pretty bad at estimating what kind of data an LLM understands well. In general data literacy, and basic data manipulation skills, are beneficial when the use case requires operating on data besides natural language prompts. This is not a given across user bases.
- Very few people have a sensible working model of what goes on in an autoregressive black box, so they have no intuition on managing context
User education still has a long way to go, and IMO is a big determining factor in people getting any use at all from the shiny new AI stuff that gets slathered onto every single software product these days
> What I do see is a boom in internal tools.
It’s easy now to get something good enough for use by you, friends, colleagues etc.
As it’s always been, developing an actual product is at least one order of magnitude more work. Maybe two.
But both internal tools and full products are made one order of magnitude easier by AI. Whole products can be made by tiny teams. And that’s amazing for the world.
This article aligns very well to my frustration to the current view of AI in media and discussion.
> AI tools like KNNs are very limited but still valuable today.
I've seen discussions calling even feed-forward CNNs, monte-carlo chains, or GANs "antiquated" because transformers and diffusion have surpassed their performance on many domains. There is a hyper-fixation on large transformers and a sentiment that it somehow replaces everything that came before in every domain.
It's a tool that unlocks things we could not do before. But it doesn't do everything better. It does plenty of things worse (at-least taking power and compute into account). Even if it can do algebraic now (as is so proudly proclaimed in the benchmarks), wolfram alpha remains and will continue to remain far more suited to the task. Even if it can write code; it does NOT replace programming languages as I've seen people claim in very recent posts on here on HN.
Contrary to the OP, there is a useless chatbot on the Amazon homepage ("Rufus" sparkle button).
> AI as a product isn’t viable: It’s either a tool or a feature
This correlates with the natural world. Intelligence isn’t a direct means of survival for anything. It isn’t a requirement for physical health.
It is an indirect means, I.e. a tool.
You can't rely on them directly for business value, but you can rely on them indirectly is the summary of this post
"There’s a reason we’re not seeing a “Startup Boom” AI skeptics ask, “If AI is so good, why don’t we see a lot of new startups?” Ask any founder. Coding isn’t even close to the most challenging part of creating a startup."
-- uhhh... am I the only one seeing a startup boom??? There are a bajillion kids working on AI start ups these days.
>> What I do see is a boom in internal tools.
This has been my main use case for AI. I have lots of ideas for little tools to take some of the drudgery out of regular work tasks. I'm a developer and could build them but I don't have the time. However, they're simple enough that I can throw them together in a basic script form really quickly with Cursor. Recently I built a tool to analyse some files, pull out data, and give me it in the format I needed. A relatively simple python script. Then I used Cursor to put it together with a simple file input UI in an electron app so I could easily share it with colleagues. Like I say, I've been developer for a long time but never written python or packaged an electron app and this made it so easy. The whole thing took less than 20mins and it was quick enough that I could do it as part of the task I was doing anyway rather than additional work I needed to find time to do.
I stopped reading at "AI skeptics ask, “If AI is so good, why don’t we see a lot of new startups?”" because what???
Many good points in the article, but I'd caveat that if you judge performance only by ELO score, you are not applying the best criteria.
SWEBench performance moved from 25% to 70% solved since beginning of 2025, and even with the narrowest possible lens from 65% to 70% since May. ARC-AGI2 keeps rapidly climing. We have experimental models able to (maybe) hold their ground at IMO gold. As well as performing at IPhO gold level.
And that leaves out the point that LMArena is a popularity contest. Not "was this correct", but "which answer did you like better". The thing that brought us glazing. A leveling ELO (or a very slowly climbing one) is kind of expected, and is really saying nothing about progress in the field.
Still doesn't mean "THE MACHINE GODS ARE COMING", but I would expect to see continued if slowing improvement. (I mean, how much better can you get at math if you already can be a useful assistant to Terence Tao and win IMO gold? And wouldn't we expect that progress to slow?)
But more than "how hard of a problem can you solve", I expect we'll see a shift to instead looking at missing capabilities. E.g. memory - currently, you can tell Claude 500 times to use uv, not pip, and it will cheerfully not know that again the 501st time. That's much more important than "oh, it now solves slightly harder problems most of us don't have". And if you look at arxiv papers, a lot are peeking in that direction.
I'd also expect work on efficiency. "can we not make it cost about the amount of India's budget to move the industry forward, every year" is kind of a nice idea. No, I'm not making the number up, or taking OAIs fantasy numbers - 2025 AI industry capex is expected to be $375B. We'll likely need that efficiency if we want to get significantly better at difficulty level or task length, too.
nit: space bar doesn't scroll
this list isnt learnings.