Cluster? F#*k One machine is all you need

  • Agree 100%.

    A 96 vCPU box with > 200 GB of RAM is ~$1.55/hour with an EC2 Spot instance. Back that with a big SSD and you have a data processing monster. Use you some nice Golang with well-formed goroutines to leverage all those cores and a damn good many data processing tasks could be crushed on a single box for sure.

    Metaphorically: Every gear you add to a machine (distributed this and that) is a gear that needs to be cared for (configured, managed) and could break the overall machine.

    Simpler is better.

  • Scalability! But at what COST? (2015)

    from the wonderfully opinionated Frank McSherry et al,

    where 'COST' stands for 'Configuration that Outperforms a Single Thread'.

    https://www.usenix.org/system/files/conference/hotos15/hotos...

    P.S. By pure coincidence, McSherry's new company is also on the front page today:

    https://news.ycombinator.com/item?id=22359769

  • Indeed.

    Don't use multiple processes if one will do.

    Don't use threads if multiple processes will do.

    Don't use some sort of multi-host framework if something dumb driven by ssh (etc) will do.