Ask HN: Which configuration management software would/should you use in 2020?
What is your team using at work? What should be used at scale (FAANG, or similar)? What are you planning to switch to?
Not FAANG but for small to medium "cloud native" businesses I like to use this approach with minimal dependencies:
Managed Kubernetes cluster such as GKE for each environment, setup in cloud provider UI since this is not done often. If you automate it with terraform chances are next time you run it, the cloud provider has subtly changed some options and your automation is out-of-date.
Cluster services repository with Helm charts for ingress controller, centralized logging and monitoring, etc. Use a values-${env}.yaml for environment differences. Deploy with CI service such as Jenkins.
Configuration repository for each application with Helm Chart. If it's an app with one service or all services in a single repo this can go in the same repo. If it's an app with services across multiple repos, create a new repo. Use a values-${env}.yaml for environment differences. Deploy with CI service such as Jenkins.
Store secrets in cloud secrets manager and interpolate to Kubernetes secrets at deploy time.
Cloud provider keeps the cluster and VMs up-to-date, CI pipelines do the builds and deployments. No terraform/ansible/other required. Again, this only works for "cloud native" models.
I still prefer the Open Source edition of https://puppet.com/ to manage larger, diverse environments - which may include not just servers, but workstations, network appliances and so on. It's well established with lots of quite portable modules. But it can also be a bit on the slower side and comes with a steeper learning curve then some of the others.
https://www.ansible.com/ is surely a good solution for Bootstraping Linux cloud machines and can be quite flexible. I personally feel like its usage of YAML manifests instead of a domain-specific language can make complex playbooks harder to read and to maintain.
If all you do is to deploy containers on a managed Kubernetes or a similar platform, you might get away with some solution to YAML templating (jsonnet et al) and some shell glue.
I am keeping an eye on https://github.com/purpleidea/mgmt which is a newer contender which many interesting features but lacks more complex examples.
Others like saltstack and chef still see some usage as far as I know, but I've got no personal experience with them.
I favor Ansible for 2 main reasons:
- If you have SSH access, you can use it. No matter what environment or company you work for, there’s no agent to install and no need to get approval to use the tool. It’s easy to build up a reproducible library of your shell habits that works locally or remotely, where each step can avoid being repeated in case there’s a need to rerun things.
- If you get into an environment where performance across many machines is more important you can switch to pull based execution. Because of that, I see very little advantage to any of the other tools that outweighs the advantages of Ansible.
I'm curious why people use configuration management software in 2020. All of that seems like the old way of approaching problems to me.
What I prefer to do is use Terraform to create immutable infrastructure from code. CoreOS and most Linux variants can be configured at boot time (cloud-config, Ignition, etc) to start and run a certain workload. Ideally, all of your workloads would be containerised, so there's no need for configuration drift, or for any management software to be running on the box. If you need to update something, create the next version of your immutable machine and replace the existing ones.
Surprised more people here are not using Salt. Having used both Salt and Ansible, I much prefer Salt, especially when working with larger teams.
When working solo I use Guix, both Guix and Nix are _seriously_ amazing.
I use Ansible, mostly because it works pretty well for deployments (on traditional, non-dockerized applications), and then I can just gradually put more configuration under management.
So it's a very good tool to gradually get a legacy system under configuration management and thus source control.
My default tends to be Ansible because it is really versatile and lightweight on the systems being managed. That versatility can bite you though because it's easy to use it as a good solution and miss a great one. Also, heaven help you if you need to make a change on 1000s of hosts quickly.
I also use (In order of frequency): Terraform, Invoke (Sometimes there is no substitute for a full programming language like python), Saltstack (1000's of machines in a heterogenous environment)
If I were going to deploy a new app on k8s today, I would probably use something like https://github.com/fluxcd/flux.
I haven't really had a pleasant time with the tooling around serverless ecosystem yet once you get beyond hello worlds and canned code examples.
I might be fanboy of the type safety and having a quick feedback loop, but I cannot imagine a better configuration management system than just straight configuration as code e.g. in Go: https://github.com/bwplotka/mimic
I really don't see why so many weird, unreadable languages like jsonnet or CUE were created, if there is already a type safe, script-like (Go compiles in miliseconds and there is even go run command), with full pledged IDE autocompletion support, abstractions and templating capabilities, mature dependency management and many many more.. Please tell me why we are inventing thousands weird things if we have ready tools that helps with configuration as well! (:
Hashicorp tools are quite solid, and give you a lot for free. Ansible can automate host-level changes in places where hashicorp cannot reach. There shouldn't be many such places.
Alternatively, if you have the option of choosing the whole stack, Nix/NixOS and their deployment tools.
I would recommend staying away from large systems like k8s.
Here's what we're using which I'm pretty happy with:
0. Self-hosted Gitlab and Gitlab CI.
1. Chef. I'd hardly mention it because it's use is so minimal but we have it setup for our base images for the nitpicky stuff like connecting to LDAP/AD.
2. Terraform for setting up base resources (network, storage, allocating infrastructure VMs for Grafana).
3. Kubernetes. We use a bare minimum of manually maintained configuration files; basically only for the long-lived services hosted in cluster plus the resources they need (ie: databases + persistent volumes), ACL configuration.
4. Spinnaker for managing deployments into Kubernetes. It really simplifies a lot of the day-to-day headaches; we have it poll our Gitlab container repository and deploy automatically when new containers are available. Works tremendously well and is super responsive.
Nix (nixos, nixops) is worth looking into if you want a full solution and can dedicate the time and energy.
We use Ansible with Packer to create immutable OS images for VMs.
Or Dockerfile/compose for container images.
Cloud resources are managed by Terraform/Terragrunt.
Dhall: https://dhall-lang.org/
You can never go wrong with bash, you should not put secrets in 169.254.169.254 metadata and you should not have IAM profiles that have overreaching privileges. Any IAM profile that you use or whatever you use on azure or gcp you should always consider what somebody can do with it if they get access to it.
Salt because it's declarative and runs on linux, windows and osx.
I have been using Ansible for over four years now, my current use case has around 1k VMs and a handful of baremetal in a couple of different datacenters running 100s of services.
No orchestration as well FWIW, we usually have ansible configuring Docker to run and pulling the images...
As for the future I have been meaning to explore Terraform and some Orchestration platforms (Nomad).
I would go with Ansible for side projects/smaller tasks, and use Puppet at large.
Shameless plug for a thing I maintain, which is in the config management space but a little bit different from the usual tools: https://github.com/sipb/config-package-dev#config-package-de...
config-package-dev is a tool for building site-specific Debian packages that override the config files in other Debian packages. It's useful when you have machines that are easy to reimage / you have some image-based infrastructure, but you do want to do local development too, since it integrates with the dpkg database properly and prevents upgraded distro packages from clobbering your config.
My current team uses it - and started using it before I joined the company (I didn't know we were using it when I joined, and they didn't know I was applying, I discovered this after starting on another team and eventually moved to this team). I take that as a sign that it's objectively useful and I'm not biased :) We also use some amount of CFEngine, and we're generally shifting towards config-package-dev for sitewide configuration / things that apply to a group of machines (e.g. "all developer VMs") and CFEngine or Ansible for machine-specific configuration. Our infrastructure is large but not quite FAANG-scale, and includes a mix of bare metal, private cloud and self-run Kubernetes, and public cloud.
I've previously used it for
- configuring Kerberos, AFS, email, LDAP, etc. for a university, both for university-run computer labs where we owned the machines and could reimage them easily and for personal machines that we didn't want to sysadmin and only wanted to install some defaults
- building an Ubuntu-based appliance where we shipped all updates to customers as image-based updates (a la CrOS or Bottlerocket) but we'd tinker with in-place changes and upgrades on our test machines to keep the edit/deploy/test cycle fast
Ansible for dev boxes or smaller deployments. For large-scale deployments CFEngine3. When deployed within a cloud environment one doesn't even need a master node for CFE3 but the agents can just pull the latest config state from some object storage.
If you want massive parallel remote script execution, none beat gnu parallel or xargs + "ssh user@host bash < yourscript.sh".
All of cofiguration management tools( ansible, puppet, chef, salt etc ..) are bloated.
We already have FINE SHELL. Why do we need crappy ugly DSL or weird yaml ??
These days, Newbies write ansible playbooks without even basic unix shell & commands knowledge. What the hell?
I like ssh + pure posix shell approach like
Show HN: Posixcube, a shell script automation framework alternative to Ansible https://news.ycombinator.com/item?id=13378852
I typically use terraform and ansible. tf creates/manages the infrastructure and then ansible completes any configuration.
Funnily, I wrote my take on this not too long back:
http://madhadron.com/posts/choosing_your_base_stack.html
Don't be distracted by FAANG scale. It's not relevant to most software and is usually dictated by what they started using and then deployed lots of engineering time to make work.
My suggestion is to figure out how you will manage your database server and monitoring for it. If you can do that, almost everything else can fall into line as needed.
I've prototyped ansible for rolling out ssl certs to a handful of unfortunately rather heterogeneous Linux boxes - and it worked pretty well for that.
I still think there's too much setup to get started - but am somewhat convinced ansible does a better job than a bunch of bespoke shell would (partly because ansible comes with some "primitives"/concepts such as "make sure this version of this file is in this location on that server - which is quick to get wrong across heterogeneous distributions).
We're moving towards managed kubernetes (for applications currently largely deployed with Docker and docker-compose on individual vms).
I do think the "make an appliance;run an appliance;replace the appliance" life cycle makes a lot of sense - I'm not sure if k8s does yet.
I think we could be quite happy on a docker swarm style setup - but apparently everything but k8s is being killed or at least left for dead by various upstream.
And k8s might be expensive to run in the cloud (a vm pr pod?) - but it comes with abstractions we (everyone) needs.
Trying to offload to SaaS that which makes sense as SaaS - primarily managed db (we're trying out elephant sql) - and some file storage (100s of MB large Pdf files).
For bespoke servers we lean a bit on etckeeper in order to at least keep track of changes. If we were to invest in something beyond k8s (it's such a big hammer, that one become a bit reluctant to put it down once picked up..) I'd probably look at gnu guix.
Fabric https://www.fabfile.org/ (just one step above shell scripts using python), using 1.x as the 2.x stuff is still missing things. The key is having is structure to almost be like Ansible where you kind of have "playbooks" and "roles" (had this structure going before Ansible) ... probably have to move out of this soon though
I use OPS https://ops.city which uses the nanos unikernel https://github.com/nanovms/nanos and since I work on it would appreciate any suggestions/comments/etc. on how to make it better.
I'll tell you the one tool I DON'T use. Cloudformation. I've touched it a grand total of once and it burned me so hard I set a company policy to never use it again.
It's like terraform, except you can't review things for mistakes until it's already in the process of nuking something. Which is terrible when you're inheriting an environment.
I enjoy using mage (https://github.com/magefile/mage). I like having a full language at my disposal for configuring things rather than yaml or json or whatever else.
I operate a couple of Elixir apps and so far a simple Makefile with a couple of shell scripts has been enough. This simplicity is due to the fact that the only external dependency is a database server, everything else (language runtime, web server, caching, job scheduling, etc.) is baked in the Elixir release. One unfortunate annoyance though is that Elixir releases are not portable and can't be cross-compiled (e.g. building on latest Ubuntu and deploying to Debian stable won't work) so we have to build them in a container matching the target OS version. So to be really honest I should mention that Docker is also part of our deployment stack, although we don't run it on production hosts.
Easy, flexible, ansible but not super fast (ssh) Still pretty easy but very fast saltstack (zmq)
Terraform for everything 'outside' your runtime (VM, container), SaltStack for everything 'inside' (VMs and containers) and for appliances (where Terraform has no provider available) as well.
I think we've developed multiple layers in our infrastructure (Cloud Infra - AWS, GCP.., Paas - Kubernetes, ECS.., Service mesh - Istio, linkerd.., application containers..). So it depends on how many layers you have and how you want to manage a particular layer. Companies at `any` scale can get away with just using Google App Engine (Snap) or have 5+ layers in their infrastructure.
I find Jenkins X really interesting for my applications. It seems to solve a lot of issues related to CI/CD and automation in Kubernetes. however, still lacks multi-cluster support.
I'm pretty happy using both Puppet and ansible. I use Puppet for configuring hosts and rolling out configuration changes (because immutable infrastructure isn't a thing you can just do; there's overhead and it does not fit all problems) and ansible for orchestrating actions such as upgrades. They work well together.
I very much dislike ansible's YAML-based language and would hate to use it for configuration management beyond tiny systems, but it's pretty decent as a replacement for clusterssh and custom scripts.
Ansible Ansible Ansible for me!
I’ve tried Puppet and SaltStack, and I constantly find they are harder and more complex than Ansible. I can get something going in Ansible in short order.
Ansible really is my hammer.
We use terraform to describe cloud infrastructure, check all k8s configmaps and secrets into source control (using sops to securely store secrets in git).
undefined
I won't talk much about FAANG scale, because that is hyper specialized.
A small startup shouldn't use any configuration management (assuming configuration management means software like Puppet, Chef, Salt, and Ansible). That is because small startups shouldn't be running anything on VMs (or bare metal). There are so many fully managed solutions out there. There is no reason to be running on VM, SSHing to servers, etc. App Engine, Heroku, GKE, Cloud Run, whatever.
Once you get to the point where you need to run VMs (or bare metal), there are many options. A lot of systems are going to a more image + container based solution. Think something like Container-Optimized OS[1] or Bottlerocket[2], where most of the file system is read-only, it is updated by swapping images (no package updates), and everything runs in containers.
If you are actually interested in config management, I'll give my opinions, and a bit of history. I've used all four of the current major config management systems (Puppet, Chef, Salt, and Ansible).
Puppet was the first of the bunch, it had its issues, but it was better than previous config managements systems. Twitter was one of the first big tech companies to use Puppet, and AFAIK they still do.
Chef was next, it was created by people to did Puppet consulting for a living. It follows a very similar model to Puppet, and solves most of the problems with Puppet, while introducing some problems of its own (mainly complexity in getting started). In my opinion Chef is a clear win over Puppet, and I don't think there is a good reason to pick Puppet anymore. One of the biggest advantages is that the config language is an actual programming language (Ruby). All the other systems started with language that was missing things like loops, and they have slowly grafted on programming language features. It is so much nicer to use an actual programming language. Facebook is a huge Chef user.
Salt was next. It was created by someone who wanted to run commands on a bunch of servers. It grew into a configuration management system. The underlying architecture of Salt is very nice, it is basically a bunch of nodes communicating over a message bus. Salt has different "renderers"[3], which are the language you write the config in, including ones that use a real programming language (Python). I'll back to Salt in a minute.
Ansible... it is very popular. This is going to sound harsh, but I'm just going to say it. I think is it popular with people who don't know how to use configuration management systems. You know how the Flask framework started as an April Fool's joke[4], where the author created something with what he thought were obviously bad ideas, but people liked some of them. Ansible is so obviously bad, at its core, that I actually went and read the first dozen Git commits to see if there were any signs that is was an April Fool's joke.
There was a time a few years ago when Ansible's website said things like "agentless", "masterless", "fast", "secure", "just YAML". They are all a joke.
Ansible isn't agentless. It has a network agent that you have to install and configure (SSH). Yes, to do it correctly you have to actually configure SSH, a user, keys, etc. It also has a runtime agent that you have to install (Python). You have to install Python, and all the Python dependencies your Ansible code needs. Then it has the actual code of the agent, which it copies to the machine each time it runs, which is stupidly inefficient. It is actually easier to install and configure the agents of all the other config management systems than it is to properly install, configure, and secure Ansible's agent(s).
Masterless isn't a good thing, and a proper Ansible setup wouldn't be masterless. The way Ansible is designed is that developers run the Ansible code from their laptops. That means anyone making code changes needs to be able to be able to SSH to every single server in production, with root permissions. And it also risks them running code that hasn't been committed to Git or approved. Any reasonable Ansible setup will have a server from which it runs, Tower, a CI system, etc.
Fast. Ha! I benchmarked it against Salt, wrote the same code in both, that managed the exact some things. Using local execution so Ansible wouldn't have an SSH disadvantage. Ansible was 9 times slower for a run with no changes (which is important because 99.9% of runs have no or few changes). It is even slower in real life. Why is it so slow? Well, SSH is part of it. SSH is wonderful, but it isn't a high performance RPC system. But an even bigger part of the slowness is the insane code execution. You'd think that when you use the `package` or `apt` modules to ensure a package is installed, that it would internally call some `package.installed` function/method. And that the arguments you pass are passed to the function. That is what all the other configuration management systems do. But not Ansible. No, it execs a script, passing the arguments as args to the script. That means every time you want to ensure a package is still installed (it is, you just want to make sure it is), Ansible execs a whole new Python VM to run the "function". It is incredibly inefficient.
Secure. Having a network that allows anyone to SSH to any machine in production and get root isn't the first step I'd take in making servers secure.
It isn't just YAML. It is a programming language that happens to sort of look like YAML. It has its own loop and variable syntax, in YAML. Then it has Jinja templating on top of that. "Just YAML" isn't a feature. To do config management correctly you need actual programming language features, so use an actual programming language.
If I had to pick one again, I'd pick Salt. Specifically I'd use Salt with PyObjects[5] and PillarStack[6].
But I'll reiterate, you shouldn't start with a config management system. Start with something fully managed. Once you need a config management system, take the time to do it correctly. Like it should be a six week project, not a thing you do in an hour. Chef and Salt will take more time to get started, but if setup correctly they will be much better than any Ansible setup. If you don't have the time or knowledge to do Chef or Salt correctly, then you don't have the time or knowledge to manage VMs correctly, so don't.
[1] https://cloud.google.com/container-optimized-os
[2] https://aws.amazon.com/bottlerocket/
[3] https://docs.saltstack.com/en/latest/ref/renderers/
[4] https://en.wikipedia.org/wiki/Flask_(web_framework)#History
[5] https://docs.saltstack.com/en/latest/ref/renderers/all/salt....
[6] https://docs.saltstack.com/en/master/ref/pillar/all/salt.pil...
If you already know and/or use Ruby, use Chef.
It is silly to ask "what should be used at FAANG scale", because either you are working at a FAANG and you are using what they use, or you are very unlikely to ever be at that scale -- and somewhere along the journey to getting there, you will either find or write the system that you need.
For anyone here who isn't yet using and end to end setup like terraform, ansible, puppet etc and has more basic needs around managing environment variables and application properties, I highly recommend https://configrd.io.
I used to use Chef, but I really didn’t like it. For small projects now, I just use a set of shell scripts, where each installs and/or configures one thing. Pair it with a Phoenix server pattern. It has treated me very well the last two years
What about Nix?
puppet is pretty good in my experience
For most teams: Docker or Ansible all the things.
For teams that have a large IaaS footprint: Chef (agent-less actually adds complexity in this environment.)
Ansible where possible, Chef when I have to (for legacy reasons, usually), and Terraform/Docker/Packer when given the option.
I'm working with both Ansible and Puppet for the last 6 years on a daily basis. Ansible for: - i absolutely love and adore Ansible - extremely easy and much much much pleasant to read. Sometimes ansible feels like poetry to me. - ad-hoc SysAdministration - I do not mean "ansible" command, but actual style of work when you need to do something right here and right now. - prototyping, Dev and staging environments setup, experiments.
Puppet for a polished production. Puppet has robust and stable ecosystems and infrastructure. It is a client-server model from the beginning. It is easier to create and put in the production library of all your puppet modules. It has hiera for central config values and secrets management. At the same time, I hate the Puppet's resource relations. Puppet's architecture feels like something developed in 1991, an ugly monster monolith and extremely heavy.
Terraform. For actual low-level infrastructure management. And I don't like to put whole high-level host configuration into IaaC! IaaC has minimal host configuration capabilities. Set hostname, set IP, register with Puppet or call ansible - only a few lines in user-data or bash-script on boot, which then calls actual configuration management!
Gitlab-CI - switched from Jenkins. Concourse-ci looks extremely interesting! Also reviewing some GitOps frameworks. Kubernetes - bare-metal runs self-made puppet-based pure k8s. Also, kops and EKS for AWS. Applications in k8s are managed via Helm.
Compared to Puppet Ansible is less enterprissy, it is more like a hipster tool. I would like to replace the Puppet with Ansible. But maybe I need the help from all of YOU who have voted for Ansible. How do you achieve Puppet's level of management with Ansible? How do you achieve client-server setup with ansbile - somehow I do not see lot's of people using ansible-pull? (Without using Tower!) You create cronjob with ansible-pull on a node boot? :D Or whole your ansible usage is limited to running ansible-playbook from your console manually? Ok maybe you sometimes put it in the last action of your CI/CD pipeline ;) Nodes classification and review? Central config values management for everything?
I use hashi Vault and lots of other things too. Some questions are rhetoric. I've just expressed my mistrust in ansible which doesn't feel complete. :(
How to do you manage a fleet of 1000, 500 or even 200 hosts with ansible? When after provisioning you need to review your fleet, count groups, list groups, check states. Ah, you want to suggest Consul for that role? :)
Kubernetes for the win. It will replace config management diversity. It gives you node discovery, state review, and much much much more.
We're using Terraform for infrastructure and Ansible for deployments with great success.
Shameless self-plug: ChangeGear. We’re cheapest in-class for medium-sized companies.
At G scale you could never afford to run something as grossly wasteful as chef. It would be cheaper to have several full-time engineers maintaining a dedicated on-host config service daemon and associated tools, than it would be for some ruby script to cron itself every 15 minutes.
Salt + Serverless.
I'm also really interested in what companies at scale are using. Anyone here from FAANG?
docker-compose + custom stuff + reduce all dependency on tooling
Kind of surprised there isn't really a consistent answer for this. Just skimming through these answers.