How to Build a Search Engine
This is a textbook on information retrieval co-authored by the head of Yahoo Research and the authors of 'Foundations of Statistical NLP' (another great textbook).
Kind of like a more up to data edition of 'Managing Gigabytes', and it's just as good if not better.
Thanks! Some (http://nlp.stanford.edu/IR-book/html/htmledition/near-duplic...) looks useful for a project I'm on. And, no, we're not trying to index the Internet or do a Google Desktop search.
Jurafsky and Martin, Speech and language processing