Nix all the things (part 1): Ansible
It’s been nearly four years since my last blog post. During this time, my initial skepticism of Nix has been completely assimilated. Resistance, as they say, was futile. Like the Borg of DevOps, Nix has quietly infiltrated and taken over every aspect of my work and homelab, systematically replacing conventional tools like Docker, Terraform, and Ansible. Even shell scripting isn’t safe.
In this multi-part series (ambitious for someone who couldn’t write one post in 4 years, I know), I’ll share some practical examples of how I’ve used Nix to replace traditional DevOps tools, starting today with something small yet illustrative: how I’ve fully transitioned from Ansible to Nix for managing Proxmox VE server configurations.
grafting nix onto ansible
Why?
Yaml is terrible. Using a typed language like nix to directly generate json avoids a lot of footguns. Nix is also a far more powerful language to do transformations and substitutions in. Over the years, when writing nix code I’ve found myself drifting towards representing the underlying information in the densest possible form, and then using functions to transform it into the expected structure. For a very basic example, I could define a list of hostnames in a five-node consul cluster as:
[
"consul-1.example.com"
"consul-2.example.com"
"consul-3.example.com"
"consul-4.example.com"
"consul-5.example.com"
]
The much better alternative would be:
with lib; let
n = 5;
in n |> range 1 |> (map (x: "consul-"+(toString x)+".example.com"))
Or if you don’t have the pipe operator enabled or (really (like (lisp))),
with lib; let
n = 5;
in (map (x: "consul-"+(toString x)+".example.com") (range 1 n))
Here, n
is encoding the fact that your consul cluster has 5 nodes. Your terraform and nixos configs can use this one value to generate other properties. For example:
with lib; let
n = 3;
toFQDN = x: "consul-"+(toString x)+".example.com";
hosts = map toFQDN (range 1 n);
in genAttrs hosts (host: lib.nixosSystem {
system = "x86_64-linux";
modules = [{
services.consul.extraConfig = {
bootstrap_expect = 1 + ( n / 2 );
retry_join = hosts;
advertise_addr = host;
};
}];
});
# similarly generate terranix VM definitions using n
# but let's save that for part 2
To scale the above consul cluster from 3 to 5, I would:
- change
n = 3
to 5 - apply the terraform (terranix) code to spin up the remaining two nodes
- do a rolling deployment of the nixos config across the cluster
In addition to the ease of templating, using nix also allows for better dependency management and reproducibility. For example, if I want to install nix on a PVE server, I can precisely pin the version of the role I want as a flake input:
{
inputs.nix-install = {
url = "github:danielrolls/nix-install";
flake = false;
};
outputs = { ... }@args: {
...
};
}
Now to use this, the ansible YAML might have looked like:
- name: install nix
hosts: all
vars:
flakes: true
roles:
- role: <path to nix-install role>
while the equivalent nix code would look like:
{ nix-install, ... }: builtins.toJSON [{
name = "install nix";
hosts = "all";
vars.flakes = true;
roles = [{ role = "${nix-install}"; }];
}]
No submodule shenanigans, no fiddling with local copies of the role. For a given state of the flake lock, I can be sure that the exact same version of the role will be run no matter where/when I run it. Updating it is also as simple as nix flake update
.
How?
The underlying tools like Ansible and Terraform are still essential for the imperative steps they perform. Here, Nix acts as the declarative specification layer that these tools follow. In my opinion, the right balance is to use something like system-manager (a tool from Numtide for declarative management of services, packages, and files on non-NixOS systems) to handle as much of the system config as possible, and use ansible only as a way to deliver this payload and set up other things that system-manager can’t. Although limited compared to a full NixOS installation with its extensive modules, simply being able to declaratively handle symlinks, packages, and systemd units with nix makes things much easier.
In the following example, the Ansible playbook does two essential things:
- Install Nix itself
- Run system-manager to set up everything else
For more complex environments, such as at my workplace, additional imperative tasks beyond nix-system’s scope are necessary. For example:
- Configuring network bonds
- Setting kernel flags In these scenarios, I’ve ensured idempotency explicitly within the Ansible tasks to prevent drift and unintended changes.
Example
ansible/inventory.nix
:
{ target, ... }: builtins.toJSON {
all.hosts.${target}.ansible_user = "root";
}
target
will be passed to this file from flake.nix.
ansible/playbook.nix
:
{ self, nix-install, ... }: builtins.toJSON (map (x: x // { hosts = "all"; }) [
{
name = "install nix";
vars.flakes = true;
roles = [{ role = "${nix-install}"; }];
}
{
name = "install system-manager";
tasks = [
{
name = "copy flake";
"ansible.builtin.copy" = {
src = "${self}/";
dest = "/opt/config-flake";
};
}
{
name = "activate system config";
"ansible.builtin.shell" = ''
bash -lc "nix run /opt/config-flake#systemConfigs.default"
'';
register = "system_result";
changed_when = "'Activating' in system_result.stdout";
failed_when = "system_result.rc != 0";
}
];
}
])
nix-install
is the flake input passed through to this file from flake.nix. The playbook will:
- install nix
- copy the flake to the server
- activate the system-manager configuration
Instead of copying the flake and building system-manager on the target machine, systemConfigs.default
could have been copied to the server with nix-copy-closure. But I wanted to have a local copy of the deployed config just in case.
flake.nix
:
{
description = "OVH VPS";
inputs = {
nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-unstable";
system-manager = {
url = "github:numtide/system-manager";
inputs.nixpkgs.follows = "nixpkgs";
};
pvemon: a jank PVE metrics exporter that avoids using the API:
pvemon = {
url = "github:illustris/pvemon";
inputs.nixpkgs.follows = "nixpkgs";
};
nix-install = {
url = "github:danielrolls/nix-install";
flake = false;
};
};
While it is only used in one place, defining target in a let block up here makes it easier to find and change
outputs = { self, nixpkgs, system-manager, pvemon, ... }@args: let
target = "machine-1.example.com";
in {
Define a package that can easily be run from the flake. This can easily be passed to eachDefaultSystemMap to make it work on other platforms, but I can’t be bothered:
packages.x86_64-linux.ansible-apply = let
pkgs = nixpkgs.legacyPackages.x86_64-linux;
in with nixpkgs.lib; pkgs.writeShellApplication {
name = "ansible-apply";
runtimeInputs = [ pkgs.ansible ];
Generate a shell script that will run the ansible-playbook command with the generated inventory and playbook files passed:
text = readFile (pkgs.replaceVars ./ansible/ansible.sh {
inventory = pkgs.writeText "inventory.yml"
(import ./ansible/inventory.nix (args // {
inherit (pkgs) system;
inherit target;
}));
playbook = pkgs.writeText "playbook.yml"
(import ./ansible/playbook.nix (args // {
inherit (pkgs) system;
}));
});
};
};
Define the system-manager config:
systemConfigs.default = with nixpkgs.lib; (system-manager.lib.makeSystemConfig {
modules = [({ pkgs, lib, ... }: {
nixpkgs.hostPlatform = "x86_64-linux";
system-manager.allowAnyDistro = true;
environment.systemPackages = with pkgs; [
htop
tmux
sysstat
];
systemd.services = {
node-exporter = {
serviceConfig = {
Type = "simple";
Restart = "on-failure";
};
script = concatStringsSep " " [
(lib.getExe pkgs.prometheus-node-exporter)
"--web.listen-address=127.0.0.1:9100"
];
wantedBy = [ "system-manager.target" ];
unitConfig = {
Description = "Prometheus Node Exporter";
After = "network-online.target";
};
};
pvemon = {
serviceConfig = {
Type = "simple";
Restart = "on-failure";
};
script = "/bin/bash -lc '${lib.getExe pvemon.packages.${pkgs.system}.pvemon} --host 127.0.0.1'";
wantedBy = [ "system-manager.target" ];
unitConfig = {
Description = "PVEmon";
After = "network-online.target";
};
};
};
})];
This bit makes it possible to directly run the systemConfig attribute
}).overrideAttrs (old: {
meta = (old.meta or {}) // {
mainProgram = "activate";
};
});
};
}
The complete flake can be found here.
Recap, and what’s next
In this part, we went over:
- replacing yaml with nix in ansible
- creating reproducible playbook wrapper scripts
- declaratively managing services and packages with nix on non-nixos systems
In the next part, which at current pace will be ready some time this decade, we will build on top of this initial setup, and spawn a cluster of nixos VMs with terranix.