Ask HN: Tool to find text reuse, similar paragraphs, fuzzy/near dupes in folder?

Do you know of any too that I can use to compare my own notes and documents vault in search for copied paragraphs or almost similar phrases? Normal diffing/hashing wouldn't work as we're talking about the contents of slightly modified documents, and the comparison of each file against all others.

I found the following tools that seem related yet not quite there, maybe I'm missing a particular term of art?

https://github.com/YaleDHLab/intertext

Python app. Requires to load and tag a corpus of text, it is used to compare different works in a visual way.

https://github.com/e-orlov/neardup

CLI Java tool, looks like a reupload to Github as it is an old project. Haven't tried it.

This post does not have any comments yet