Ask HN: Algorithm for searching text with similar meaning

For example, I have a long text that says, "I have to change the screw number 1234," and a short one that says, "screw number 1234 changed."

Both were inserted by different people, referring to the same thing.

I thought of using an LLM (GPT-4), however, my dataset is too large (millions of entries) and it would be expensive.

Is there any other better or good enough way?

Thank you.

  • Try https://www.sbert.net/

    These models are self-hosted and cheap to run. Much smaller than GPT 3 or 4 but trained especially for this purpose.