Ask HN: How does an LLM perform addition?
It doesn’t it comes up with probabilistic options almost in a Bayesian sense, and the highest probability one is chosen. This is why its not great at it since its restricted to discrete outputs.
Or subtraction?