Ask HN: Good service provider for running CPU intensive algorithms

We have an algorithm which we need to run for couple of months. This is a very CPU intensive task. You might ask what kind of algorithm it is. Let us assume that we want to index the whole internet. It's something like that.

Suggest me some service provider who can provide us the infrastructure to run this. IS cloud based infrastructure a good option ?

  • Dewitt Clinton mentioned some techniques for creating something on the order of an effective web crawler (internet indexing), but I'm not sure what it's limiting factors were (cpu, memory, number of threads/bots, asynchronous bots).

    here's the link: http://www.google.com/buzz/dclinton/KuXDg9P8Q8z/Jesse-Stay-A...

    check Dewitt's first comment:

    DeWitt Clinton - I'm not even sure how to respond, mostly because I can't even guess how you came up with the number 3, or how you can say anyone "owns" the web.

    There are currently hundreds, no, thousands, of companies, universities, and individuals have indexed large portions of the web. Getting the basics isn't even that hard. No exaggeration, you could build a perfectly suitable crawl for a sgapi of your own in a few weeks.

    The hardest bit is probably the node mapping, but fortunately, that's open source:

    http://code.google.com/p/google-sgnodemapper/

    Even high quality crawlers are available under open source licences. Nutch, for example:

    http://lucene.apache.org/nutch/

    And I highly recommend this book on the fundamentals of web crawl and indexing:

    http://www.amazon.com/gp/product/0136072240/

    Storing the node graph can be done on any number of open source or commercial backends. If I wanted to do it myself, I'd probably host it all on EC2 at Amazon. The cost wouldn't even be prohibitive, depending on how deep and broad you want the crawl to go, and for sgapi stuff, you don't need to go that big to be useful.

    No one ever has, or ever will, indexed the whole web

  • BlueLock vCloud Express offers 4- and 8-CPU VMs in a self-service "cloud," much like EC2. The big VMs are expensive, but cheaper than buying 8x "high CPU" amazon instances. Also, memory is not tied to CPU -- you can have an 8-way VM w/ 256M of RAM, if you really want it.

    http://www.bluelock.com/bluelock-cloud-hosting/bluelock-vclo...