Cluster? F#*k One machine is all you need
Agree 100%.
A 96 vCPU box with > 200 GB of RAM is ~$1.55/hour with an EC2 Spot instance. Back that with a big SSD and you have a data processing monster. Use you some nice Golang with well-formed goroutines to leverage all those cores and a damn good many data processing tasks could be crushed on a single box for sure.
Metaphorically: Every gear you add to a machine (distributed this and that) is a gear that needs to be cared for (configured, managed) and could break the overall machine.
Simpler is better.
Scalability! But at what COST? (2015)
from the wonderfully opinionated Frank McSherry et al,
where 'COST' stands for 'Configuration that Outperforms a Single Thread'.
https://www.usenix.org/system/files/conference/hotos15/hotos...
P.S. By pure coincidence, McSherry's new company is also on the front page today:
Indeed.
Don't use multiple processes if one will do.
Don't use threads if multiple processes will do.
Don't use some sort of multi-host framework if something dumb driven by ssh (etc) will do.