A programmer’s guide to big data: tools to know

  • Here's another list, sans product hype:

    Unix command-line tools like awk and grep. Set operation commands are essential too (http://www.catonmat.net/blog/set-operations-in-unix-shell/)

    Ruby/Python/Perl for more complex massaging and wrangling

    Excel (yes, really) for quick stats and graphs, great 1st step in understanding what you have

    D3.js for visualization

    I've used R in the past, but I found I was trying to squeeze data into R models. D3 is pretty hard to grasp initially (I'm very much still learning) but I'm finding that it's helping me think through how to visualize the data I have, rather than just forcing it into one of a few standard charts.

  • I still don't understand big data.

    Is it machine learning + analytics?

  • I don't get it. Why not just learn Hive (it's SQL), or use python with Hadoop.