Ask HN: How to run queries on 20B clicks log

I have a click log with 20 Billion clicks. I want to perform a group by, count, sum, average queries on this data set and make it available to all the customers. Each record includes Browser name, Country, Carrier, Campaign ID. Queries include a) Get the number of clicks from X to Y date, for each specific browser. What options do I have?

  • BigQuery would probably be the easiest, low maintenance approach. Normally I would leverage the Hadoop ecosystem for this type of ad-tech/analytics problem, but I am not 100% certain of your use case, budget, etc. Providing clients read-only SQL access would be much better than a REST API.

  • You mention about making it available to customers - are you looking for APIs ?

    https://github.com/o1lab/xmysql - one command to generate rest api for any mysql database in seconds. (shameless plug - something i've been workng on)

    This supports grouping, count and where as query params of api. I'll be happy to help - pls raise an issue if you get stuck. Thank you.

  • I'd go for Druid + Pivot (or BigQuery if you happen to be in GCloud already).

    http://druid.io/

    https://docs.imply.io/pivot/