MongoDB Days
Shrug. Oracle suffers from problems that you don't get on your z/OS running IMS on top of VSAM files (which by the way is hierarchical, not relational - which is what your big ole bank will be using).
Talking about the relational model and "sound mathematical underpinnings" is fine, but I've seen dozens.. hundreds?.. of production relational databases and they've all been monsters. Most have been partially or significantly denormalized for good/bad reasons. All have their own warts, usually significant.
If people end up normalizing their MongoDB as the author suggests - Well I'd expect that. It's pretty rare to get your DB model right first time. Any thought that you can is probably tinged with madness. If you can have a good stab at it and iterate you're ahead.
Plus if you've ever worked with a TPS (or TM/2PC) you'll know they're a nightmare. Any volume systems I've worked on dispensed with them and use a reconcile/compensate mechanism.
MongoDB has gone down a replica/sharded model that focuses on Map/Reduce. How successful this is, well that's a different argument - but comparing it to an old-school big-iron mentality is a waste of time.
I'm most interested in the features that commercial DBs like Oracle have that free/open-source DBs like Postgres, MySQL, & NoSQL DBs don't. Are things like "a materialized view (1996!), a continuous query, the result cache" available in any free DBs nowadays?
There's more of this sort of criticism in the following old thread, "SQL Databases Don't Scale":
https://news.ycombinator.com/item?id=690656
where a few commenters say (somewhat unpleasant) things like:
- I find that this type of FUD comes about from people that aren't good at designing and implementing large databases, or can't afford the technology that can pull it off, so they slam the technology rather than accept that they, themselves, are the ones lacking. Most of them tend to come from the typical LAMP/SlashDot crowd that only have experience with the minor technologies.
- For me, thousands of transactions per second and 10s of terabytes of data on a single database is normal. It's unremarkable, it's everyday, it's what we do, we have done it for years. And I know of installations handling 10x that. It's only people who's only "experience" is websites that whinge about how RDBMS can't handle their tiny datasets.
- Mr. Wiggins article would be better titled something like "ACID databases have scalability problems, especially cheap ones startups use"
How true are these criticisms nowadays? Is open-source still far behind, or is it (as I think) more than good-enough for 98% of use-cases?
edit: Thanks for the responses, sounds like I'll be trying out Postgres for my upcoming personal project.
Last year I wrote a tool for a bank to suck in MongoDB data from 5 big nodes on physical hardware, into an Oracle database running on a virtual machine. The idea was to make it easier for others to write their reporting queries against an SQL database that they understand. It turns out with the right tweaks the Oracle database also performed a lot better, on a lot less hardware. It was one of the things that really improved my impression of the Oracle database product.
The article also reminds me of how a father and son went to a Microsoft presentation in 2000, where Microsoft showed their solution to the tricky problem of integrating multile backend servers. Their solution was to have front end tiers close to the client, and the client getting thinner. The son was very impressed. The father said 'that's what IBM did before the 70s!'
One of the comments in the original article said that developers use MongoDB because they're too lazy to use RDBMS. I don't see anything wrong with that - laziness is a virtue among developers!
Seriously though, we are using MongoDB with great success at StartHQ (https://starthq.com) having done a lot of work with relational databases before. It's a great fit for startups where the schema is constantly evolving & the amount of data stored can be quite small.
Also, by talking to the DB directly, without an ORM, we can keep things really simple. I dread to think of what the same code would look like if we were to use a relational database, either with or without an ORM.
Data locality is not about seek times on disk, it's about network transfer times between different nodes! Linear horizontal scalability often needs sharding of data, and with a document store like MongoDB, you can easily shard a complex denormalized document.
Now, if I have this complex document normalized into several tables, how am I going to easily shard several tables, all such that I can execute successful joins that only need to execute on one leaf node? What if I start reusing a small piece of data in one of these normalized tables? I might be forced to go between network nodes to get this data.
Normalization is like premature optimization. I can take any program and modify it such that every piece executes optimally fast, but I am likely to compromise on clarity or to add complexity while doing such a refactoring. In the end, it probably got me no real-world performance boost that mattered. 80/20% rule and all.
Same thing with normalization: automatically making all my data fully normalized from the start is a like a bad premature optimization habit that we are forced into with relational databases out of A) sheer habit and school teachings, B) lack of easy support for nested structured data.
"We don’t work the way we do because tables are a limitation of the technology, we use the relational model because it has sound mathematical underpinnings, and the technology reflects that†. Where’s the rigour in MongoDB’s model?"
It seems all anti-NoSQL rants are the same whining and refusal to understand. "Sound mathematical base"? Really? So, if I can't describe something mathematically (I can, by the way) it's not worth it? A computer program is a mathematical description, there you have it.
The relational model breaks for very common use cases nowadays. Yes, maybe you think it's fun to do a query across how many tables to get the information you want, but if your website has a non-trivial traffic then the solution is usually to add more cache.
That's (one of the reasons) why PostgreSQL has hstore. Beyond the fanboy insistence that you can do everything with relational DBs, Postgres have accepted the reality that you need a more flexible data structure.
Edit: yes, please continue showing your contempt while I have to code around the limitation of relational databases.
A better title might be:
"MongoDB solves problems MySQL didn't solve at the time."
Relational databases are a kernel of mathematical relational beauty buried under a million tons of ugly hacks you have to do to keep them performant.
He has a DBA mindsight in a FullStack world. There are many projects now that only require one or two FullStack engineers. Not one dba, one QA, two developers, one deployment/operations, one requirements analyst, etc. In this new FullStack world MongoDB is a great fit: changes are easy (no alter table add new_thing varchar(200); It's interface is JavaScript, the same language we use in other places; It's super easy to get started with (definitely not the case with Oracle); Oh and if we need to scale later, it might scale better too.
I'm not so sure about the criticism of applications being exposed as web services.
If you have a system setup where different parts run on different services you don't want to have to co-ordinate and resync all the applications when their underlying data structures change.
Web services solve this by having a agreed contract of communicating between systems, service A knows of a better way of accessing its own data that service B knows of accessing system A's data.
Perhaps better-put: MongoDB solves problems OP doesn't understand because he sees them in terms of purely relational SQL databases.
Use the best tool for the job - it's that simple.
RDBMS is NOT a silver bullet and nor is mongodb