I like self-hosting because I can control failure domains, maintenance windows, and upgrade timing. I also like sleeping through the night, so I stopped treating every service like a one-off snowflake.
The biggest improvement I made was not a new tool. It was standardizing the parts nobody brags about.
1) Name things once and keep the pattern
I use the same naming format for hosts, containers, and DNS records:
role-site-envfor servicessite-envfor DNS names- explicit zone ownership notes in the repo
That sounds small, but it cuts troubleshooting time hard. When something fails, I know where to look first instead of parsing creative naming choices from six months ago.
2) Treat reverse proxy config like code, not notes
Every upstream I expose has the same baseline:
- TLS only
- explicit
proxy_read_timeoutandproxy_connect_timeout - consistent forwarded headers
- health endpoint and a simple smoke check command
When I need to onboard a new app, I copy a known-good pattern and adjust two or three values. I do not freehand configs in production anymore.
3) Backups are only real if restore checks are scheduled
A backup job that exits zero is not proof of recoverability. My baseline now:
- daily backup job logs kept for review
- weekly checksum or file-count sanity check
- monthly restore test into an isolated location
- short runbook note every time restore behavior changes
The restore check is the part that catches drift. Permissions, path changes, or bad excludes show up there, not during backup.
4) Write the runbook while the fix is fresh
As a veteran systems engineer and a stay-at-home dad, I care about time. Relearning old incidents at 11 PM is a time tax I no longer accept.
If I touch production config, I leave behind:
- what changed
- why it changed
- how to verify it
- how to roll it back
Nothing fancy. Plain text in the repo is enough.
What changed in practice
After standardizing naming, proxy defaults, and restore checks, incident handling became more boring. That is a win. Most failures now fall into known buckets, and recovery steps are predictable.
KISS is still the best scaling strategy for small teams and solo operators. Complexity will show up on its own. You do not need to invite more.