Ask HN: Is it feasible to have real-time price comparison?

Recently I've been thinking of building a project that would aggregate and sort prices of products from a specific industry, these products are into the hundreds of thousands at least. As a feature I wish to offer the user the ability to see if ordering multile products from one site will be cheaper in shipping than ordering these items separately, or suggest a route where this will end up being the cheapest.

However at it's core, every time a user submits a query for a product, I must route this request(read scrape) to a set of websites that contain these products and sort the data however the user wants it. If the data is scraped from 10 websites, then 1 user = 10 requests from my end, 2 users = 20. Just mentally thinking about it, I wonder at what point my traffic will be interpreted as an attack and then blocked, not to mention at least one of these sites is CloudFlare protected. Because of the nature of the products, a single user may need to find several products, so again several queries I must route.

The data cannot be prescraped because I have to issue hundreds of thousands of search queries to PER site, and then the price may change because of reasons such as a discount was in effect, a price hike or price reduction, usually it will be discounts, the overall price may not change as frequently as say GPU prices these days.

  • There's two problems I think that would limit the value of this as a service.

    The first is that any consumer product offered somewhere with a "lowest price guarantee" or "price match guarantee" is likely to be a unique product for the retailer. Stuff like low cost appliances and electronics are notorious for this. A corollary is that many product brands have exclusive distribution deals with retailers, which is stuff like lightbulbs, yard equipment and power tools, or really anything you can buy at a hardware store (compare stuff at Home Depot, Menards, and Ace for example).

    The second is that the other trick retailers pull is by selling "exclusive" products that are really just the same product with a different name. Furniture is a great example of this, there is even a startup out there (the name escapes me) that catalogs the same furniture available from multiple retailers online by different names. Reverse image searches of the product can get you pretty far here.

    Just a random example: I needed kitchen appliances. I have a big incentive to lower my cost here, but also specific features I want out of each item. Refrigerators are the worst about this, if I wanted a particular combination of product series, finish, ice maker, door config (French door, freezer in left or right, etc), and form factor it would essentially determine which store I would buy it from.

  • It's not feasible. Google or Bing could do this because they're explicitly allowed by most sites, but little people like us can't.

    > I wonder at what point my traffic will be interpreted as an attack and then blocked

    For most major e-commerce sites, that point would be "from the first request" (assuming it triggers their anti-scraping protections). Anti-scraping can be simple browser fingerprinting, sophisticated fingerprinting, rate limits, Cloudflare, and IP ban limits. For well-funded sites, it's usually going to include all of those.

    This is the reason Honey is a browser extension. Their users have to essentially work as a consensual botnet.

  • There is a proven way to make money from price differentials.

    It is not building a price comparison web service.

    It is buying low and selling high.

    The logical problem with a price comparison business is that it matches price sensitive buyers with lowest cost suppliers.

    Logically, paying for a price comparison service is at odds with lowest cost and lowest price because it raises the cost and/or price.

    Good luck.

  • I think something like gasbuddy would work, but it would have to be for the right products.

    It would have to be for products people buy often that have little or no substitutes.