I’ve found that DevOps tooling has become a key part of my tech and operations. We take a lot of time to select and improve my DevOps toolset. The vast majority of tools that I use are open source. By sharing the tools that we use and like, I hope to start a discussion within the DevOps community on what further improvements can be made.
We hope that you enjoy browsing through the list below. You may already be well acquainted with some of the tools below, and some may be newer to you.
What is it? Nagios is an open source tool for monitoring systems, networks and infrastructure. Nagios provides alerting and monitoring services for servers, switches, applications and services.
Why use Nagios? Nagios main strengths are that it is open source, relatively robust and reliable, and is highly configurable. It has an active development community, and runs on many different kind of operating systems. You can use Nagios to monitor services such as DHCP, DNS, FTP, SSH, Telnet, HTTP, NTP, POP3, IMAP, SMTP and more. It can also be used to monitor database servers such as MySQL, Postgres, Oracle and SQL Server.
Has it had any criticism? Nagios has been criticized as lacking scalability and usability. However, Nagios is stable and its limitations and problems are well-known and understood. And certainly some, including Etsy, are happy to see Nagios live on a little longer.
What is Monit? Monit is a utility for managing and monitoring processes, programs, directories and filesystems on a Unix system. Monit is able to conduct automatic maintenance and repair, and can execute meaningful causal actions in error situations.
Why use Monit? Monit helps you monitor and manage server programs to make sure they stay online consistently. It also monitors checksum, file system and permissions to make sure they’re always correct. It also has a basic web interface which allows you to set up processes.
Any problems? The fact that monit needs to run as sudo could be a dealbreaker for some.
What is ELK? The ELK stack actually refers to three technologies – Elasticsearch, Logstash and Kibana. Elasticsearch is a NoSQL database that is based on the Lucene search engine, Logstash is a log pipeline tool that accepts inputs from different sources, and exports the data to various targets, and Kibana is a visualization layer for Elasticsearch. And they work very well together.
What are its use cases? Together they’re often used in log analysis in IT environments (although you can also use the ELK stack for BI, security and compliance and analytics.)
Why is it popular? ELK is incredibly popular. The stack is downloaded 500,000 times every month. This makes it the world’s most popular log management platform. SaaS and web startups in particular are not overly keen to stump up for enterprise products such as Splunk.
There’s an increasing amount of discussion as to whether open source products are overtaking Splunk, with many seeing 2014 as a tipping point.
What is Consul.io? Consul is a tool for discovering and configuring services in your infrastructure. It can be used to present nodes and services in a flexible interface, allowing clients to have an up-to-date view of the infrastructure they’re part of.
Why use Consul.io? Consul.io comes with a number of features for providing consistent information about your infrastructure. Consul provides service and node discovery, tagging, health checks, consensus based election routines, key value storage and more. Consul allows you to build awareness into your applications and services.
Anything else I should know? Hashicorp have a really strong reputation within the developer community for releasing strong documentation with their products, and Hashicorp is no exception. Consul is distributed, highly available, and datacenter aware.
What is Jenkins? Everyone loves Jenkins! Jenkins is an open source CI tool, written in Java. CI is the practice of running tests on a non-developer machine automatically every time someone pushes code into a source repo. Continuous Integration is considered a prerequisite for Continuous Integration.
Why would I want to use Jenkins? Jenkins helps automate a lot of the work of frequent builds, allows you to resolve and detect issues quickly, and reduce integration costs because serious integration issues become less likely.
Any problems with Jenkins? Jenkins configuration can be tricky. Jenkins UI has evolved over many years without a guiding vision – and it’s arguably got more complex over the years. It has been compared unfavourably to more modern tools such as Travis CI (which of course isn’t open source).
What is it? There was a time last year when it seemed that all anyone wanted to talk about was Docker. Docker provides a portable application environment which enables you to package an application in a unit for application development.
Should I use it? Depending on who you ask, Docker is either the next big thing in software development or a case of the emperor’s new clothes. Docker has some neat features, including Docker Hub, a public repository of Docker containers, and docker-compose, a tool for managing multiple containers as a unit on a single machine.
It’s been suggested that Docker can be a way of reducing server footprint by packing containers on physical tin without running physical kernels – but equally Docker’s security story is a hot topic. Docker’s UI also continues to improve – Docker has just released a new Mac and Windows client.
What’s the verdict? Docker can be a very useful technology – particularly in development and QA – but you should think carefully about whether you need or want to run it in production. Not everyone needs to operate at Google scale.
What is it? Ansible is a free platform for configuring and managing servers: it combines multi-node software deployment, task execution and configuration management.
Why use Ansible? Configuration management tools such as Ansible are designed to automate away much of the work of configuring machines.
Manually configuring machines via SSH, and running the commands you need to install your application stack, editing config files, and copying application code can be tedious work, and lead to each machine being its own ‘special snowflake’ depending on who configured it. This can compound if you are setting up tens, or thousands of machines.
What are the problems with using Ansible? Ansible is considered to have a fairly weak UI. Tools such as Ansible Tower exist, but many consider them a work in progress, and using Ansible Tower drives up the TCO of using Ansible.
Ansible also has no notion of state – it just executes a series of tasks, stopping when it finishes, fails, or encountering an error. Ansible has also been around for less time than Chef and Puppet, meaning that it has a smaller developer community than some of its more mature competitors.
What is it? Saltstack, much like Ansible, is a configuration management tool and remote execution engine. It’s primarily designed to allow the management of infrastructure in a predictable and repeatable way. Saltstack was designed to manage large infrastructures with thousands of servers – the kind seen at LinkedIn, Wikipedia and Google.
What are the benefits of using Salt? Because Salt uses the ZeroMQ framework, and serializes messages using msgpack, Salt is able to achieve severe speed and bandwidth gains over traditional transport layers, and is thus able to fit far more data more quickly through a given pipe. Getting set up is very simple, and someone new to configuration management can be productive before lunchtime.
Any problems with using Saltstack? Saltstack is considered to have a weaker Web UI and reporting capabilities than some of its more mature competitors. It also lacks deep reporting capabilities. Some of these issues have been addressed in Saltstack Enterprise, but this may be out of budget for you.
What is it? Collectd is a daemon which collects statistics on system performance, and provides mechanisms to store the values in different ways.
Why should I use collectd? Collectd helps you collect and visualize data about your servers, and thus take informed decisions. It’s useful for working with tools like Graphite, which can render the data that collectd collects.
Collectd is an incredibly simple tool, and requires very few resources. It can even run on a Raspberry Pi! It’s also popular because of its pervasive modularity. It’s written in C, and contains almost no code that would be specific to any operating system, and will run on any Unix-like operating system.
What is Git? Git is the most widely used version control system in the world today. An incredibly large number of products use Git for version control, from hobbyist projects to large enterprises, from commercial products to open source. Git is designed with speed, flexibility and security in mind, and is an example of a distributed version control system.
Should I use Git? Git is an incredibly impressive tool – combining speed, functionality, performance and security. When compared side by side to other SCM tools, Git often comes out ahead. Git has also emerged as a de facto standard, meaning that vast numbers of developers already have Git experience.
Why shouldn’t I use Git? Git has an initially steep learning curve. Its terminology can seem a little arcane and new to novices. Revert, for instance, has a very different meaning in git to in SCM and CVS. However, it rewards that investment curve with increased development speed once mastered.
What is Rudder? Rudder is (yet another!) open source audit and configuration management tool that’s designed to help automate system config across large IT infrastructures.
What are the benefits of Rudder? Rudder allows users (even non-experts) to define parameters in a single console, and check that IT services are installed, running and in good health. Rudder is useful for keeping configuration drift low. Managers are also able to access compliance reports and access audit logs. Rudder is built in Scala.
What is Maven? Maven is a build automation tool, designed primarily for use with Java projects.
How does Maven work? Maven is used to define how your .java files get compiled to .class, packaged into .jar files, processed with tools. Maven aims to be completely self contained, so you don’t need any additional scripts and tasks.
It works by downloading all the libraries you use and the libraries they use for you automatically,helping avoid “dependency hell” for large projects.
Where’s the best place to get started? The best way to get started using Maven is by following the “Maven in five minutes” guide, which sets you up with a project ready for you to code in with all the neccesary files and folders set up.
What is Chef? Chef is a config management tool designed to automate machine setup on physical servers, VMs and in the cloud. Many companies use Chef software to manage and control their infrastructure – including Facebook, Etsy and Indiegogo. Chef is designed to define Infrastructure as Code.
What is infrastructure as code? Infrastructure as Code means that, rather than manually changing and setting up machines, the machine setup is defined in a Chef recipe. Leveraging Chef allows you to easily recreate your environment in a predictable manner by automating the entire system configuration.
What are the next steps for Chef? Chef has released Chef Delivery, a tool for creating automated workflows around enterprise software development, establishing a pipeline from creation to production. Chef Delivery establishes a pipeline that every new piece of software should go through in order to prepare it for production use. Chef Delivery works in a similar way to Jenkins, but offers greater reporting and auditing capabilities.
What is it? Much like Chef, Puppet is designed to provide a standard way of delivering and operating software, no matter where it runs.
Why use Puppet? Planning ahead and using config management tools like Puppet can cut down on the amount of time you spend repeating basic tasks, and help ensure that your configurations are consistent and accurate across your infrastructure.
What are the problems with Puppet? Puppet is considered to be unwieldy, and requiring constant practice to maintain proficiency. It can be difficult for a new team of developers to pick up. This blog post by Ryan Lane of Lyft is particularly instructive in that regard.
What is it? Cobbler is a Linux provisioning server that facilitates and network-based system installation of multiple OSes from a central point using services such as DHCP, TFTP and DNS.
Cobbler can be configured for PXE, reinstallations and virtualized guests using Xen, KVM and Xenware. Cobbler also comes with a lightweight configuration management system, as well as support for integrating with Puppet.
What is it? Vagrant – another tool from Hashicorp – provides easy to configure, easily reproducible and portable work environments that are built on top of industry standard technology. Vagrant helps enforce a single consistent workflow to maximise the flexibility of you and your team.
Why use Vagrant? Vagrant provides operations engineers with a disposable environment and consistent workflow for developing and testing infrastructure management scripts. Vagrant can be downloaded and installed within minutes on Mac OS X, Linux and Windows.
Vagrant allows you to create a single file for your project to define the kind of machine you want to create, the software that needs to be installed, and the way you want to access the machine.
Are there any problems with Vagrant? Vagrant has been criticized as being painfully, troublingly slow.
What is it? AWS is a secure cloud services platform, which offers compute, database storage, content delivery and other functionality to help businesses scale and grow.
Why use AWS? EC2 is the most popular AWS service, and provides a very easy way for DevOps teams to run tests. Whenever you need them, you can set up an EC2 server with a machine image up and running in seconds.
EC2 is also great for scaling out systems. You can set up bundles of servers for different services, and when there is additional load on servers, scripts can be configured to spin up additional servers. You can also handle this automatically through Amazon auto-scaling.
What are the downsides of AWS? The main downside of AWS is that all of your servers are virtual. There are options available on AWS for single tenant access, and different instance types exist, but performance will vary and never be as stable as physical infrastructure.
If you don’t need elasticity, EC2 can also be expensive at on-demand rates.
What is it? CoreOS is a Linux distribution that is designed specifically to solve the problem of making large, scalable deployments on varied infrastructure easy to manage. It maintains a lightweight host system, and uses containers to provide isolation.
Why use CoreOS? CoreOS is a barebones Linux distro. It’s known for having a very small footprint, built for “automated updates” and geared specifically for clustering.
If you’ve installed CoreOS on disk, it will update by having two system partitions – one “known good” because you’ve used it to boot to, and another that is used to download updates to. It will then automatically reboot and switch to update.
CoreOS gives you a stack of systemd, etcd, Fleet, Docker and rkt with very little else. It’s useful for spinning up a large cluster where everything is going to run in Docker containers.
What is it? Weave focuses on creating overlay networking for Docker hosts. Weave creates a virtual network that connects each host machine together. This simplifies application routing, as it gives the appearance of every container being plugged into a single network switch.
Why use Weave? If you use Docker, you’re probably already at least aware of Weave. Weave creates a virtual SDN across every Docker host in your infrastructure. Weave gives every container its own IP, which allows you to design your application topology without changing your application’s behaviour. Weave Run, a companion tool for service discovery, also contributes to an easier dev experience.
Why does Weave exist? Docker began as a single host solution, and was not initially intended to run across multiple hosts. As a result, when people started running Docker across multiple hosts, they had to do complex port management – if you have two webservers, they can’t both listen into port 80 on the same host!
By giving each container its own IP address, so that they can listen to port 80 on the same host. It also means, of course, that other services can access those webservers via port 80 and the correct IP.
What is Chocolatey? Chocolatey is apt-get for Windows. Once installed, you can install Windows applications quickly and easily using the command line. You could install Git, 72Zip, Ruby, or even Microsoft Office! The catalog is now incredibly complete – you really can install a wide array of apps using Chocolatey.
Why should I use Chocolatey? Because manual installs are slow and inefficient. Chocolatey promises that you can install a program (including dependencies, such as the .NET framework) without user intervention.
You could use Chocolatey on a new PC to write a simple command, and download and install a fully functioning dev environment in a few hours. It’s really cool.
What is it? Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these services are used in one form or another by distributed applications.
Why use Zookeeper? Zookeeper is a co-ordination system for maintaining distributed services. It’s best to see Zookeeper as a giant properties file for different processes, telling them which services are available and here they are located. This post from the Engineering team at Pinterest outlines some possible use cases for Zookeeper.
Where can I read more? Aside from Zookeeper’s documentation, which is pretty good, chapter 14 of “Hadoop: The Definitive Guide” has around 35 pages, describing in some level of detail what Zookeeper does.
What is it? Drone is a continuous integration platform, based on Docker and built in Go. Drone works with Docker to run tests, and works with Github, Gitlab and Bitbucket.
Why use Drone? The use case for Drone is much the same as any other continuous integration solution. CI is the practice of making regular commits to your code base. Since with CI you will end up building and testing your code more frequently, the development process will be sped up. Drone does this – speeding up the process of building and testing.
How does it work? Drone pulls code from a Git repository, and then runs scripts that you define. Drone allows you to run any test suite, and will report back to you via email or indicate the status with a badge on your profile. Because Drone is integrated with Docker, it can support a huge number of languages including PHP, Go, Ruby and Python, to name just a few.
What is it? Pagerduty is an alarm aggregation and monitoring system that is used predominantly by support and sysadmin teams.
How does it work? PagerDuty allows support teams to pull all of their incident reporting tools into a single place, and receive an alert when an incident occurs. Before PagerDuty came along, companies used to cobble together their own incident management solutions. PagerDuty is designed to plug in whatever monitoring systems they are using, and manage the incident reporting from one place.
Anything else? PagerDuty provides detailed metrics on response and resolution times too.
What is it? Dokku is a mini-Heroku, running on Docker.
Why should I use it? If you’re already deploying apps the Heroku way, but don’t like the way that Heroku is getting more expensive for hobbyists, running Dokku from a tool such as DigitalOcean could be a great solution.
Having the ability to deploy a site to a remote and have it immediately using Github is a huge boon. Here’s a tutorial for getting it up and running.
What is TeamCity? TeamCity is a Java based build management and CI tool. Although it predates DevOps – it’s definitely a strong DevOps tool. TeamCity works, much like Jenkins, as a major, all-in-one, CI server.
What are some of TeamCity’s cool features? TeamCity understands your tests. It uses different formatters for RSpec, Cucumber, TestUnit and Shoulda. This means that it provides more than a pass/fail on each build. TeamCity will also provide a number of the tests that have run.
What sets TeamCity apart is the support for larger teams. Users are able to set or take responsibility for a broken build or an individual test. That makes it easy to keep your team informed about who’s working on what.
What is it? Sublime-Text is a cross-platform source code editor with a Python API. It supports many different programming languages and markup languages, and has extensive code highlighting functionality.
What’s good about it? Sublime-Text is featureful, it’s stable, and it’s being continuously developed. It is also built from the ground up to be extremely customizable (with a great plugin architecture, too).
What is it? Gradle is an open source build automation tool that builds upon the concepts of Apache Ant and Apache Maven and introduces an Groovy-based DSL instead of the XML form used by Maven.
Why use Gradle instead of Ant or Maven? For many years, build tools were simply about compiling and packaging software. Today, projects tend to involve larger and more complex software stacks, have multiple programming languages, and incorporate many different testing strategies. It’s now really important (particularly with the rise of Agile) that build tools support early integration of code as well as easy delivery to test and prod.
Gradle allows you to map out your problem domain using a domain specific language, which is implemented in Groovy rather than XML. Writing code in Groovy rather than XML cuts down on the size of a build, and is far more readable.
What is it? Spinnaker is an open-source, multi-cloud CD platform for releasing software changes with high velocity and confidence.
What’s it designed to do? Spinnaker was designed by Netflix as the successor to its “Asgard” project. Spinnaker is designed to allow companies to hook into and deploy assets across two cloud providers at the same time.
What’s good about it? It’s battle-tested on Netflix’s infrastructure, and allows the creation of pipelines that begin with the creation of some deployable asset (say a Docker image or a jar file), and end with a deployment. Spinnaker offers an out of the box setup, and engineers can make and re-use pipelines on different workflows.
What is it? Kubernetes is an open-source container cluster manager by Google. It aims to provide a platform for automating deployment, scaling and operations of container clusters across hosts.
Why should I use it? Kubernetes is a system for managing containerized applications across a cluster of nodes. Kubernetes was designed to address some of the disconnect between the way that modern, clustered applications work, and the assumptions they make about some of their environments.
On the one hand, users shouldn’t have to care too much about where work is scheduled – the unit is presented at the service level, and can be accomplished by any of the member nodes. On the other hand, it is important because a sysadmin will want to make sure that not all instances of a service are assigned to the same host. Kubernetes is designed to make these scheduling decisions easier.
What is it? Flynn is one of the most popular open source Docker PaaS solutions. Flynn aims to provide a single platform that Ops can provide to developers to power production, testing and development, freeing developers to focus.
Why should you use Flynn? Flynn is an open source PaaS built from pluggable components that you can mix and match however you want. Out of the box, it works in a very similar way to Heroku, but you are able to replace pieces and put whatever you need into Flynn.
Is Flynn production-ready? The Flynn team correctly point out that “production ready” means different things to different people. As with many of the tools in this list, the best way to find out if it’s a fit for you is to try them!