Comparing Fauna and DynamoDB: Architecture and Pricing
I'm sure Fauna is a great database and probably cheaper in many cases. I just have some issues with the "Complex Example". I just don't feel it is realistic that anyone familiar with DynamoDB would create such a schema. It comes across like a good schema for Fauna is being forced onto DynamoDB, without an evaluation of what would be the recommended "DynamoDB way" of solving the customer's needs.
> We have an accounts table with 20 secondary indexes defined for all the possible sort fields (DynamoDB’s maximum—Fauna has no limit).
The usecase of having 20 secondary indexes in DDB is an extremely rare case. Arguably should be considered an anti-pattern, only used for an application transitioning query patterns in some way. If this is the norm for an application, I'd argue the product managers/developers do not understand their customer's needs well enough. I'd assume that at this stage in the product's life, a basic Postgress installation is likely a better choice.
Additionally, if the query pattern really needs to be "super flexible" for the long term, you'll find that eventually you'll need more and more of ElasticSearch's tech (or similar tech). A very common pattern is to use DDB Streams to ElasticSearch connector (obviously sacrificing query-after-write consistency).
> Viewing just the default account screen queries 7 indexes and 25 documents. A typical activity update transactionally updates 3 documents at a time with 10 dependency checks and modifies all 35 indexes.
This is such a red flag. If your application requires this from DDB, you should change your schema (probably more de-normalization). However, the example doesn't have enough information for me to suggest a better schema to meet the customer's needs.
Disclaimer: I work at Amazon, but not in AWS. My opinions are my own.
IMO, it's hard to put a price on strong isolation and consistency. Being able to write an app that that uses atomic transactions, that are isolated from concurrently running transactions, and that see the correct data is something that translates to reduced programmer time and effort, and improves user experience. Many programmers discount those important features when they start out, but they'd be better served including them in the price comparisons of different products that are out there.
In their simple example of a website hit-counter, can someone explain how you would aggregate batches of 50 requests to amortize compute costs? I thought the whole point of the DB is to store information between disparate requests?
Dear HN
I've been commenting on this thread and I'd like to add a disclaimer. While I'm not a Fauna employee, I've been paid by Fauna to write articles that have been published in their blog. My opinions are my own though.
That said, I've been using and studying Fauna for almost a year now so if you have any questions let me know!
> "Read operations assume a data size of 4K or less; each additional 4K costs an additional operation. Write operations assume a data size of 1K or less. Notably, index writes count as entirely separate write operations; they are not included in the document’s 1K."
So many customers don't account for this and it up costing $$$ if your data model isn't a good fit. Cosmos even takes it further w. 1kb units (I have spent hours on Cosmos pricing and am still baffled on how to price a workload.) Although... it does incentivize decent data modeling practices which often lead to more performant apps.
1) What is the latency within an AWS Region for a key lookup?
2) What is the latency of a global sync between all AWS regions for say a 100kb update?
Amazingly Fauna pricing is even more confusing than DynamoDB's
The data consistency seems most attractive point. I'm wondering where precisely fauna clusters are located, so I could run my lambda functions in same location. What sort of latency do we see when connecting from various azure/aws datacenters? Are they in most aws data centers?
> Finally, let’s imagine we have something more like a typical SaaS application, for example, a CRM. We have an accounts table with 20 secondary indexes defined for all the possible sort fields
What makes you think you've imagined a good CRM schema here?
One problem with this article is that it doesn't have any code. You'd think it would, right? You're selling this thing to developers and architects. Why aren't we linking to a supplementary repo with the examples used for this application for both DynamoDB and Fauna?
One possibility is that the example DynamoDB design is a very bad one (I mean, you're actually using all 20 GSIs; what?), and that anyone familiar with DynamoDB would say "Actually, you can cover all the query patterns with 3 GSIs if you do it this way."
Why do this? One possibility is that the ways that Fauna actually is better than DynamoDB are too subtle to get anyone's attention. They're real and useful, but not ridiculous. The people who actually use DynamoDB at massive scale might understand them, but also probably won't want to change up.
So you go after people who aren't using DynamoDB at massive scale. Say, early-stage startup founders who want to be on DynamoDB from day 1 because someday their product will be Web Scale. But don't have a lot of time to carefully evaluate claims like this. They just say "10x cost reduction? Wow, Fauna is the new best DB!" Most of these guys fail, but a few of them are a runaway success (and would have been equally so if they'd used DynamoDB), are now stuck with Fauna whether they like it or not (but let's assume they like it as least as well as DynamoDB, maybe even slightly more), and are now listed as large scale users of Fauna on their website. You too could be a unicorn startup! Start using Fauna today!
Basically, I think the makers of Fauna are trying to con you with this article. It's not that their product is bad, it's that they're trying to get you to buy it for reasons other than that it's good.
One of the really cool DynamoDB features I love (at least in theory) is CDC / Streams. Also the fact it automagically hooks up to Kinesis is neat. Unfortunately, for personal projects, this can lead to runaway spending.
Does Fauna have strongly ordered CDC stream?
For simple use-cases [1] isn't replicated Redis much better in terms of cost?
With in-mem DBs, there is no dollar cost for reads + writes and the IOPS will be way better than dynamoDB.
AWS has redis offering as elasticache.
[1] No indexes, strongly consistent get & put, < 10 GB