Ask HN: What is the future of Mass Metadata Collection for training LLMs?

The current crop of LLMs have been trained on three decades of the internet and not on private comms (AFAIK). This makes them weak in real-life applications that don't involve computer-assisted tasks.

What measures do you foresee "AI" corps taking to increase the amount of data they have available for training?

This post does not have any comments yet