Configuration and Architecture Experiments
by Justin Michalicek on Sept. 12, 2017, 11:21 p.m. UTCI've recently been playing with new tools for configuration management and deployment, both at work and with my own projects. These include Docker, Ansible, and an InfluxDB/Telegraf/Grafana stack. I have been using all them for production in house tools at work as well as this blog.
I first got started with Docker using it at work. It works great as a production environment. I can make an image which is easily deployable anywhere Docker is found. Configuration of images and containers is dead simple, rarely more complex than writing a bash script. After years of fighting with Chef and cookbook depency hell 1 this is a relief. It's simple, it's straightforward, you just install packages, configure things, etc. pretty much like if you were typing into a shell. It just works. I have also found Docker a lifesaver in dev environments, but that's for a different post.
Even when using Docker there's still some level of configuration which must be done on the server itself. You need to install Docker, configure Docker, manage images, setup local users, things you do outside of Docker containers such as a tmux instance running irssi, etc. For that, I've come to love Ansible. You run it locally and it just uses an ssh connection to connect to your server and configure things. You can do everything you need - install apt packages, edit local configurations, upload files, etc. Ansible comes with a huge library of modules for configuring just about anything. If it doesn't have what you need, there's always Ansible galaxy, which I have not used but fear may result in similar issues to Chef dependencies. So far I have not needed anything which did not come with Ansible, though.
For server monitoring, alerting, and pretty graphs I've been liking the stack from InfluxData. I found their stack when I started out researching a statsd/collectd/Graphite stack. Being a Python and Django developer, going that route seemed an obvious choice. What initially turned me away from it was that graphite is still Python 2 only. That there right is a showstopper for me in just about every case. It's time to move on from Python 2 and big projects should have been putting effort into moving to Python 3 for a long time now. Then there was the statsd github repo, which didn't have any recent commits (I know, that could be a sign of stability, but in the fast paced web and devops world, it's weird for a popular project) and I'm not a big fan of server side JS anyway. So first, searching for a replacement for Graphite, I found InfluxData's stack. I as initially going to use StatsD, Grafana, and InfluxDB but the StatsD plugins I found for sending data in InfluxDB's preferred format were out of date and InfluxDB's authentication was lacking when using StatsD directly (I believe, it's actually been several months and I've forgotten some of the specifics there). Further research led me to Telegraf, the last bit of the InfluxData stack. Telegraf replaces collectd and StatsD, including letting applications report to it using StatsD metrics. So I settled on using the full InfluxData stack and have been happy with it so far for both running telegraf remotely to test http responses and locally to monitor things like memory and cpu usage.
So, after all of that, what I've ended up with for bash-shell.net is as follows. I run nginx and postgres as standard daemons, mostly out of laziness - nginx likely will move to Docker or even be replaced by Traefik. The Django app for bash-shell.net is running gunicorn in a Debian 9 docker container with static files hosted on Digital Ocean's new (at the time of writing) beta S3 compatible object storage called Spaces. While only 1 service, it runs as a docker compose stack. I also run a separate docker compose stack with a container each for InfluxDB, Telegraf, and Grafana. These are all deployed using Ansible. I have broken the Ansible playbook up into several steps installing required packages from apt, configuring the Docker daemon, and then separate tasks for uploading pulling Docker images for each container, uploading configuration files for the daemons running in the containers, and starting the compose stacks. This allows me to easily configure an entire server with one command or use the same Ansible playbooks to only upload new config files for the InfluxData stack or only pull the latest Docker image for the blog's container and restart its stack.
There's little in they way of quality control or oversight that I've seen for Chef cookbooks. There are tons of cookbooks and recipes to do 99% the same thing. Cookbooks also upgrade somewhat haphazardly causing tons of dependency conflicts. I've even had one where the github link on chef supermarket went to a different code with the same version number than what you could download form opscode which itself was somehow different than what you got when actually installing it with Berkshelf.↩