Hacker News Clone

Updating a 50 terabyte PostgreSQL database (2018)

by whitepoplar on 3/22/2021, 12:13 AM with 77 comments

by fastest963 on 3/22/2021, 2:26 AM
They're halting database traffic for 15-30 minutes. I'm confused how this won't mean they have downtime? The application can queue transactions but how is the application handling reads after those queued trsnactions and what if 2 queued transactions conflict?
by pupdogg on 3/22/2021, 10:25 AM
Based on their numbers, they should be archiving their historical data in parquet format partitioned by YYYYMMDD onto something like Clickhouse. This way, they can run a lean Postgres instance(s) at all times yet still get benefits of real-time reporting. Based on their use case, they can retain up-to 30 days of data in Postgres and offload the rest onto Clickhouse.
by z3t4 on 3/22/2021, 9:58 AM
I would probably implement an "archive" system that stores static data that is mostly accessed by "sum" functions - In order to keep the main database relatively small. If the data is immutable/static replication and caching becomes much easier.
by gdgdgdgbgfff on 3/22/2021, 9:39 AM
Vertical scaling like this will have quite a few physical limits, eg max cores (is it 64?). I would think that they may be fast approaching these limits?. For a an scalable enterprise solution would it not be better, and also more cost-efficient, to horizontally scale. Shard the tables over many nodes.
by janandonly on 3/22/2021, 10:15 AM
Am I the only one who is shocked to discover that payments (even very small sums) are stored for years on end, maybe indefinite??
Simple deleting a row that is 366 days old is not an option to keep the PostgreSQL DB relatively small?
by asguy on 3/22/2021, 4:25 AM
(2018)
by Cieplak on 3/22/2021, 2:15 AM
Terms like ”master / slaves” carry racial overtones so many prefer terms like ”primary / replicas”. Similarly, ”allowlist / blocklist” rather than ”whitelist / blacklist”.
by hardwaresofton on 3/22/2021, 8:25 AM
> As PostgreSQL options were not suitable for the next upgrade, in parallel we considered other possibilities. Our storage devices were able to make instant snapshots and also make them available on remote storage devices over the network, within a much smaller timeframe
ZFS?