Declarative Homelab Management – Overview

Red Tomato's Blog

2025-03-30

Colmena, Homelab, Linux, NixOS, self-hosting

Intro

In a very long time, managing servers is a manual task to me and I have to write a bunch of shell scripts or rely on cloud-init (when supported) to configure the server exactly as I wanted. Ansible helps a lot, but it is still largely an imperative approach and I need multiple playbooks to configure the server.

Everything changed when I started using Nix to manage my dotfiles. It completely transformed how I think about application and configuration management. Over time, this mindset extended to how I manage my daily driver and then all my servers. As I mentioned in my last video, I want to know exactly how my server is configured and I can easily spin up a new server with the exact settings I need.

Especially with the rise of Configuration as Code and Infrastructure of code, more and more applications can now be managed declaratively either through dedicated tools or Terraform providers. Having the ability to declare everything for my machine has been a game-changer. It allows me to achieve:

Reproducibility: I can reinstall a server with the exact configuration if it fails.
Portability: My configuration can be easily transferred across different environments.
Backups & security: My configurations are stored securely and can be restored when needed.

Since everything is declarative, I can leverage CI/CD pipelines to build and manage my applications and their configuration. I have been running almost all machines using this approach for quite some time and it has been incredible.

In today’s post, I want to walk you through how I set up my tools server, the applications I use and how I use them to improve both security and flexibility of my workflows.

I might cover how I access my tools server from anywhere in a separate blog post.

Hopefully, this post will help you get started on building a similar setup tailored to your needs. Below is a high-level diagram of my tools server:

GitHub Repo

You can find my tools server configuration at GitHub, here is the folder structure.

├── servers
│   ├── tools
│   │   ├── minio
│   │   ├── secrets
│   │   ├── vault
│   │   ├── default.nix
│   │   ├── github-runner.nix
│   │   ├── postgres.nix
│   │   ├── redis.nix
│   │   ├── keycloak
│   │   ├── traefik.nix
│   │   └── networking.nix
│   ├── configuration.nix
│   ├── home.nix
│   ├── ca.nix
│   └── disk-config.nix
├── terraform
│   ├── minio
│   ├── vault
│   └── keycloak
├── flake.nix
├── flake.lock
└── README.md

The repository is structured into three main parts:

flake

We use Nix flake to lock dependencies to provide reproducibility. I want to highlight two key components on this flake.nix.

mkColmenaConfig

Since many configuration parameters (tags, username, modules, timezone) are shared, I created this function to help me managing homelab servers. It requires host and hostModule and allows overriding values such as user, buildOnTarget, system, extraModules as needed. This reduces repetitive code while giving you flexibility.

app.apply

I have also created an app on this flake. To deploy changes to a server, simply run nix run .#apply <server_name>. Nix will pull colmena and runs colmena apply --experimental-flake-eval --on "$serverName" at the back.

Server Configuration

The servers folder contains shared configurations at the root level, while each individual server has its own directory with specific settings.

Applications

On this tools server, I hosts below applications

Valkey: Valkey is a high-performance data structure server that primarily serves key/value workloads that replace redis after the license change. I did not enable tls because this Valkey instance is only accessible in my local network and enabling TLS has some performance penalty according to this GitHub issue.
PostgreSQL: PostgreSQL is the database for my applications. Currently I use it for Keycloak. I can also create database easily for other applications in the future. This instance is restricted to local access only.
Minio: MinIO is a high-performance, S3 compatible object store. I use it to replace AWS S3 for my projects.
Traefik: Traefik is my favourite reverse proxy. I use it to expose applications running on the server and handle the SSL certificate lifecycle.
Keycloak: I need an IDP for my daily workflow and Keycloak is my goto choice. It provides SSO for some applications like Hashicorp vault and issues tokens to protect my other applications. Since I don’t plan to use tls_client_auth, I put Traefik in the front to handle SSL.
Hashicorp Vault: I not only use it as a secret store but also as the ACME provider. By using vault, it greatly enhance and simplify certificate management. Traefik relies on it to generate SSL certificates for all applications. Another key advantage is that secrets can be accessed consistently across all my machines. Many tools, such as External Secrets Operator for Kubernetes, support Vault natively.

Configurations management

Since the server runs NixOS, I can declare all configurations. My previous video demonstrates how you can use nixos-anywhere and colmena to bootstrap and manage nixos configurations.

A few tools I want to call out.

disko handles disk partitioning.
home-manager is used to manage user specific configurations.
NixOS Service Modules are preferred for running applications. If a service module is missing, I will use systemd unit to run the application on Nix. Docker will be the last resort for applications that are not available on Nix.

Terraform

I use terraform to manage application data for keycloak, minio and hashicorp vault. There are two main benefits

I helps me to restore application data easily.
I can declare and use pipelines for applying changes.

Security

Security is a core principle in my setup regardless of the environment or use case. I follow a secure-by-default approach by implementing the following measures:

TLS Everywhere – I enable TLS wherever possible and and only allow http in my home environment when the service or data is short lived.
Strict Secret Management – Any secrets stored in a Git repository, whether public or private, must be encrypted. If encryption is not possible, the secret should never be committed to Git.
Additional Security Measures – There are other additional security rules I set for myself which brings some inconveniences. However, I believe these trade-offs are worthwhile. Over time, these precautions help create a more secure and resilient environment.

For this server specifically, there are three key security aspects to be aware of:

Secrets encryption

Firstly I use age and sops to encrypt secrets in <server_name>/secrets folder. In the tools server, I put everything on a single yaml file for convenience. You are free to separate and store your secrets per services.

Then I can use sops-nix to decrypt these secrets by putting each secret to a separate file on the target machine. I can reference the secret files path on the application.

This is possible because one of the recipients is the age key generated from target machine’s Ed25519 ssh key. Sop-Nix will automatically import SSH keys on the target servers as age keys.

For example, on servers/tools/minio/sops.nix you will see I have a sops secret call minio-root-credential.

sops.secrets."minio-root-credential" = {
  sopsFile = ../secrets/secrets-enc.yaml;
  format = "yaml";
  mode = "0440";
  owner = config.users.users.minio.name;
  group = config.users.users.minio.group;
};

Then on servers/tools/minio/service.nix I can reference the secret file path as

1	minioRootCredential = config.sops.secrets."minio-root-credential".path;

And use it on my service module

services.minio = {
  enable = true;
  browser = true;
  region = "us-east-1";
  rootCredentialsFile = "${minioRootCredential}";
};

This set up makes sure all secrets are safely encrypted on GitHub

TLS

LS and PKI are fundamental security components that deserve more attention. I have maintained my own CA for years, using openssl to generate all certificates as needed. To simplify the process, I wrote a shell script to help me request certificates. This works very well but I knew I need automation for SSL certificate provisioning via ACME. As there was no immediate urgency, I did not go down the path to host my own ACME service. After discovering Hashicorp Vault supports ACME natively, I decided to set this up with vault and Traefik will be the first client to manage certificate lifecycle with it.

Given Vault’s critical role in this setup, I decided to manage its certificate manually to minimize dependencies. At first I thought about using the acme config to create certificate locally via dns validation as demonstrated on the NixOS manual. However, since I currently only need 1 certificate for vault, setting up a renewal reminder with a cron job is much simpler and more elegant. Additionally, a brief downtime for certificate renewal is an acceptable trade-off in my case.

Once Vault and ACME were configured, Traefik can start requesting certificate via http-01 validation from the vault. This setup has worked flawlessly.

Terraform

When using Terraform for this server, I focus on two key aspects:

State Management: I decided to use Cloudflare R2 bucket to store the states to ensure durability and accessibility.
Secrets Management: I also encrypt secrets using age and sops. Then I use the sops provider to integrate secrets with my other terraform resources.

Summary

This approach has allowed me to manage my home servers efficiently, securely, and with full reproducibility. Hopefully, this post provides inspiration for building your own declarative infrastructure.

That’s all I want to share with you today. See you on the next one.