Three-tier backup system - hot (server) / warm (NAS) / cold (detachable SSD until HDD) #98

Open
opened 2026-05-24 00:01:07 +00:00 by coilysiren · 0 comments
Owner

Area - homelab / backups / DR

Goal

Stand up a three-tier backup scheme using full independent copies at every tier. No dedup or incremental chains: a solo operator can't babysit repo integrity, so isolated corruption blast radius beats space savings. Each colder tier is less writable on demand than the warmer one above it.

Tiers

  • Hot - server, ~1 week, ~25GB budget - daily full backups, fast restore of recent files, prune oldest once over budget.
  • Warm - NAS, ~3 months, ~320GB budget - DR tier for full server loss, retention-versioned, NAS pulls from the server so a compromised server cannot write to it.
  • Cold - detachable SSD freed from the Windows NVMe swap, monthly connect - offline air-gapped copy that survives corruption or ransomware hitting every online tier at once. Format ext4, homelab-managed, labeled and dated.

Key decisions

  • Full copies only - corruption stays isolated to one backup instead of sharing fate across a dedup repo.
  • NAS pulls, server does not push - keeps the DR tier unwritable by a compromised or runaway server.
  • Verify on connect - write a sha256 beside each cold copy, re-check on mount, since unpowered NAND leaks charge (~1yr consumer retention spec).
  • Periodic rewrite - the cold job occasionally rewrites the full set while mounted to refresh aging NAND, not just detect rot.
  • Cold entrypoint - one script triggered by plugging in: detect mount, rsync latest full, hash and verify, log to vault inbox, print safe-to-unplug.

Tasks

  • Server daily full-backup job with space-bounded prune at 25GB
  • NAS pull job with ~3-month retention
  • ext4 format the retired SSD, label and date
  • Cold script: mount-detect, rsync latest full, sha256 write plus verify, vault-inbox log
  • Decide cold retention count (open question)

Open question

Cold retention horizon. Rolling 12 monthlies is ~40GB at ~3.5GB per full copy and sets a one-year worst-case recovery window, versus just filling the disk. Disk size and recovery horizon both fall out of this choice.

**Area** - homelab / backups / DR **Goal** Stand up a three-tier backup scheme using full independent copies at every tier. No dedup or incremental chains: a solo operator can't babysit repo integrity, so isolated corruption blast radius beats space savings. Each colder tier is less writable on demand than the warmer one above it. **Tiers** * Hot - server, ~1 week, ~25GB budget - daily full backups, fast restore of recent files, prune oldest once over budget. * Warm - NAS, ~3 months, ~320GB budget - DR tier for full server loss, retention-versioned, NAS pulls from the server so a compromised server cannot write to it. * Cold - detachable SSD freed from the Windows NVMe swap, monthly connect - offline air-gapped copy that survives corruption or ransomware hitting every online tier at once. Format ext4, homelab-managed, labeled and dated. **Key decisions** * Full copies only - corruption stays isolated to one backup instead of sharing fate across a dedup repo. * NAS pulls, server does not push - keeps the DR tier unwritable by a compromised or runaway server. * Verify on connect - write a sha256 beside each cold copy, re-check on mount, since unpowered NAND leaks charge (~1yr consumer retention spec). * Periodic rewrite - the cold job occasionally rewrites the full set while mounted to refresh aging NAND, not just detect rot. * Cold entrypoint - one script triggered by plugging in: detect mount, rsync latest full, hash and verify, log to vault inbox, print safe-to-unplug. **Tasks** - [ ] Server daily full-backup job with space-bounded prune at 25GB - [ ] NAS pull job with ~3-month retention - [ ] ext4 format the retired SSD, label and date - [ ] Cold script: mount-detect, rsync latest full, sha256 write plus verify, vault-inbox log - [ ] Decide cold retention count (open question) **Open question** Cold retention horizon. Rolling 12 monthlies is ~40GB at ~3.5GB per full copy and sets a one-year worst-case recovery window, versus just filling the disk. Disk size and recovery horizon both fall out of this choice.
coilysiren added
P4
and removed
P3
labels 2026-05-31 07:00:43 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/infrastructure#98
No description provided.