Ask HN: Is the rate of progress in AI exponential?

I've frequently seen the claim that progress in machine learning and artificial intelligence, particularly with respect to large language models, has been exponential over the recent past. This arises often in discussions around the existential risk posed by artificial general or super intelligence, but even those more positively disposed to such technology seem to believe in rapid advancements.

My problem is that I just don't see it? I see progress, yes, but not exponentially accelerating progress. Focusing on large language models, it took several years to go from the invention of the transformer architecture to a broadly useful autoregressive text completer. Between March of 2022 and March of 2023 OpenAI went from GPT-3.5 to GPT-4; a clear improvement, but not a transformationally massive one. GPT-4 remains the most capable model widely available and that shows no signs of changing -- it's not clear at this point if GPT-5 is even a glimmer in Ilya Sutskever's eye.

Further progress towards more capable large language models requires either ever-increasing amounts of data and compute for marginal gains, or (IMO) the development of new approaches that better leverage available data. But the research and development I'm seeing is all about:

1. Better ways to prompt LLMs (chain of thought, tree of thoughts)

2. New LLMs that use less compute than GPT-4 (but are almost uniformly worse, e.g. Bard)

3. Better ways of running smaller (i.e. worse) models on commodity hardware

4. New software tools that employ LLMs (document search, web search, etc)

That stuff might be important, but does it count as exponential progress? I think it only seems exponential because ChatGPT seemed to come out of nowhere and now LLMs have sucked all the energy out of the room, but high interest and activity aren't perfect indicators of rapid progress; precisely how many versions of "use embeddings to search through documents" do we need? If progress were truly exponential, I would expect something noticeably better than GPT-4 on the horizon, if not already released. But I'm not even sure such a thing is at all imminent. Heck, OpenAI can't even keep up with the GPT-4 demand right now, let alone a more compute-intensive successor.

Maybe something like GPT-5 comes out tomorrow and makes me look like a fool, I just don't see any indication that it will.

I'm not necessarily arguing that this means we shouldn't worry about the risks posed by AI. Disinformation is still an issue, and it's possible that jobs will be automated by GPT-4 (though I think such job losses will be marginal for the time being). I just don't see the rapid progress that I'm apparently supposed to be seeing, and I don't know if I'm just missing something obvious?

  • I went to a talk by Geoff Hinton around 2005 before he became a celebrity and he was very much thinking about the problems of “blackboard architectures” and similar systems where the layers are treated separately and proposing that we train all the levels together. I’d say we got on the S-curve for “deep learning” about 20 years ago by that measure and we might be closer to the end than we are to the beginning.

    Overall my take is that we are approaching an asymptote and if anything the sign of the exponential might be negative and in a certain sense we can put more and more resources in and get a diminishing effect. ChatGPT-4 is already too expensive as many people see it and I think there is going to be no stomach for a model with 100x the parameters, 100x the running cost and (I suspect) only a limited improvement in performance.

  • What might be seen as exponential progress could actually be the accumulation of a few tremendous leaps, I believe. For instance, within the architecture realm, we've witnessed leaps such as:

    Stacked neural networks (deep learning) / CNN / Transformer

    From the application perspective, there are also areas that have seen significant progress:

    AlphaGo (An application of reinforcement learning) / AlphaFold (An application for biological sequences) / ChatGPT (An application of Large Language Models / Reinforcement Learning from Human Feedback)

    You may pick different ideas. However, what I want to say is that these great advancements happened approximately once every two or three years. It may take some time before we witness another major leap.

    In truth, the majority of research and development does not contribute to exponential progress. It's sad. However, this does not render such effort meaningless. Although it may seem as though there's too much competition within a narrow field, this may not necessarily be detrimental to humanity.

  • Well really, what people in the field felt and saw was that on metrics for measuring tasks, what had happened in computer vision with giant jumps in performance for tasks previously considered hard to ever get to human level (like identifying cats in photos), then came to pass for NLP: Tasks and metrics we never thought we'd see even close to human level without some new theories or approaches to thinking and language, jumped up overnight. The metrics and applications of NLP before 2017 heavily relied on operating on words, tokens and adding in rules-based(GOFAI/expert system) modifications to do things like old-style chatbots. So many tasks now done by 355m tiny models locally had dismal performance, like repunctuation, translating faithfully, really - what turned out to be basically "easy" stuff for a network to learn.

    And then add on RLHF and instruct tuning which is bizzarely more capable than anybody reading the RL/T5 papers that lead to these largely expected at the time.

    It's the gap in researcher experiences vs assumptions as of 2017 that lead to this description imo. If you had projected the performance from 2012 to 2017 in NLP, you'd have about 3 major advancements, around word embeddings, and there wasn't much room for innovation there (as it turned out, although obviously semantic search and stuff followed to an extent, it kind of existed as "IR"). But from 2017 to 2023, we've had 30 of similar calibre, and they've stacked (GPT-2, then GPT-3, then instruct/RLHF basically).

    The real point, as I see it, is that if we have the same jump - we outperform expectations by an order of magnitude - we'd be in trouble. Now maybe the expectations have adjusted, but I can tell you even as of 1 year ago, many NLP labs still disliked and ignored GPT models in favour of clearly relatively obsolete/now academic-curiosity projects.

  • This is impossible to know because our advancements in this field are basically at the PHD of the PHD level. Every advancement is an unknown timeline with unknown impact.

  • I'm not expert enough to chime in but the people that I respect on this subject agree with you. I would also be surprised if GPT-5 was miles ahead of 4. The airplane of today is extraordinarily similar to one from 50 years ago.

    I think the biggest improvement we'll see in LLMs is (we normal people) finally getting access to what exists but which we currently don't have access to, and the slow leak of these technologies into our day-to-day.

  • Depends what you define as progress. On a purely philosophical view point the answer is most certainly no. From a speed/operational/deployment perspective probably yes. How and what you consider and measure progress is the problem.

    Consider what happens when you use output from LLM as training data for LMMs. If there is any "falsehoods" or "incorrections" then the very processes of LMM training will provide "bias" in it's output. That will then become exponential. That is the disinformation issue you mention. I would submit that this is/will/should be the paramount area for research and hence progress. In that regard there doesn't seem to be much progress at all, except to deny or cover it up.

    Just like the self perpetuating myths on Wikipedia, where entries are corrected and then those corrections are repeatedly replaced because people read it in articles quoting previous (bad) information from Wikipedia.

    I guess we can have an army of moderators (probably need several hundred thousand) correcting data or "preselecting" only truthful real data for training. That means for the most part its not Artificial Intelligence, its human.

    oh wait we have that :D

  • The progress is exponential, but the noticeable impact scales linearly with exponential improvement. Case in point, LLM's only get linearly better on benchmarks with exponential increase in parameter count.