How is the Herculean task of human supervision in LLM pre-training conducted?

Human supervision in LLM pre-training is a Herculean task.

E.g., the LLM spits out an essay based on a prompt. A human has to read it and decide how close it is to the output expected from the prompt.

For an LLM with 200 billion parameters, this above process had to (presumably) be done Quadrillions of times.

How are they achieving such scale?

This post does not have any comments yet