Digg Saying Yes to NoSQL; Going Steady with Cassandra
For someone who knows nothing about NoSQL and decent with MySQL, can someone give a brief Idiot's overview of how NoSQL works? If I add a record in say a table named "news", where is the data stored? If I need to do a search by news id or description, what's the front-end api like and what happens at the backend when the api is called?
I'm looking pretty seriously at MongoDB and I've heard that Cassandra is worth considering. I do a lot of data warehousing / statistical analytics which generally means some sort of star schema-based reporting with lots of crosstabs, dimensions, etc.
If anyone can relate their experience with either of these two platforms, would either be a good choice for live querying for these types of applications? I know you can use MapReduce to eventually get the data you need, but I need to support queries that respond in (well) less than a second, even for very large data sets.
I'd love to see an example of how people are redefining their schema with NoSQL databases (especially with document based databases like CouchDB). A common example you hear is "Your blog document can contain an array of comment nodes". Which is all great in theory but obviously won't scale.
If Digg are using Cassandra as a big key-value store then how do they look up comments for posts? If they're storing one entry per comment with an index on post ID then it's not really key based any more.
We hear a lot of case studies about performance increases, but I never seem to see any practical details on how the databases should be used correctly.
For those who may have missed this a couple weeks ago, Twitter also considering Cassandra - http://nosql.mypopescu.com/post/407159447/cassandra-twitter-...
Nice to see Digg pouring effort into making Cassandra itself better for everyone.