banner
icyleaf

icyleaf

How to Build a Home Lab: Hardware and Architecture

The previous introductory article comprehensively introduced the possibilities of building a home lab with both hardware and software. In practice, everyone's indoor environment and network wiring are different, and varying financial capabilities lead to countless solutions. I cannot provide a 100% solution, but I will review the evolution of my device architecture, what needs arose at each stage, the problems encountered, and how I responded. At the end of the article, I will also highlight some factors that should not be overlooked when building a home lab.

My Device Architecture#

First, I declare that my personal device architecture does not represent the best solution; it is merely the result that currently meets my requirements and will be iteratively updated as needs and technology change. My dream is to have a basement like the one in the picture below where I can tinker freely.

The dream of a homelab, inspired by @HCocoa@twitter

memo-devices-changes

At the time, the house renovation was minimalist, leading to unreasonable network wiring design:

  • The TV wall, bedroom, and studio were pre-wired with Cat 6 cables, but all were single lines.
  • The weak current box was too small to expand, resulting in many devices piling up behind the TV in the early stages.
  • The house is not large enough to set up a heat-dissipating independent cabinet.
  • There are Wi-Fi dead zones in the rooms.

After adding family members, the original studio was converted into a children's room, and some devices were relocated. I could only optimize and improve based on the existing structure. Below is a glimpse of my painful history.

Evolution of Device Architecture#

2014 ~ 2016#

v0 Device Architecture Topology Diagram

homelab-diagram-v0

During this period, I changed jobs, and my daily commute was long, so I didn't tinker much with the network at home. The router was an old Netgear WGR614 carried over from my renting days. I bought a cheap first-generation smart TV from LeEco, which met my basic need for sound, but the system became unresponsive in less than six months, leading me to never consider an Android TV again. In the first two years, I tried to save money, knowing I should use Unicom in Beijing, but I foolishly went with Northern Telecom, which had terrible internet speed. I also picked up some unreliable junk components that I won't mention.

2016 ~ 2018#

v0.1 Device Architecture Topology Diagram

homelab-diagram-v0.1

Thanks to work benefits, my travel frequency gradually increased, and I needed more storage for the increasing number of photos I was taking. In 2016, I learned about a HP Microserver Gen8 from What Worth Buying and imported one from Germany. This server has 4 drive bays, dual Gigabit Ethernet ports, and iLO management features, but I only installed Black Synology on it, making it one of the earliest NAS server architectures. The 5.x version of Synology did not support Docker, so it served purely as a NAS for storing photos and videos, replacing Dropbox's Drive service. From the diagram, you can see that I had no idea what link aggregation was at the time; otherwise, I would have definitely connected the Black Synology to both network ports.

The router was upgraded to a Netgear 6300v2, and I flashed KoolCenter's customized Merlin firmware for better internet access. I was forced to use internal network penetration at this time. After upgrading Black Synology, I couldn't completely clean up the previous Synology QuickConnect service, which was discarded. After applying for a public IP from Unicom customer service, I reported it regularly through DDNS (the script was provided by Tangcu Nose).

The TV was replaced with a Sony 4K 60-inch internet TV, and I connected an Amlogic S912 foreign trade box to install Kodi and YouTube, creating a home audio-visual system.

The PS4 was bundled with GTA5 and The Last of Us during some Black Friday promotion on Amazon.

2018 ~ 2019#

v1 Device Architecture Topology Diagram

homelab-diagram-v1.0

In 2018, I complained on Twitter about how difficult it was to flash Merlin firmware, and anbutu recommended the openwrt system and gifted me an N270 x86 32-bit dual network port industrial computer, which introduced me to a new field: soft routing.

Black Synology was upgraded to version 6.x, which supports Docker, and I began trying to run some basic services like aria2, home assistant, and Adguard home. I didn't know when the neighborhood would suddenly lose power for a few minutes and then come back. Once, a hard drive was reported by Synology's detection to have many bad sectors, which scared me into immediately buying a Schneider APC BK650 with communication protocol to ensure safe shutdown after power loss.

At the beginning of 2019, I started learning to shoot videos and purchased a new Intel host to relearn how to install Hackintosh after a gap of 9 years to replace my aging MacBook Pro 2015. I didn't upgrade to subsequent versions mainly because Intel was underperforming, and Apple soldered all the hardware, making it not worth upgrading to the top configuration.

2020 ~ 2021#

v2 Device Architecture Topology Diagram

homelab-diagram-v2.0

At the end of 2020, I officially stepped into the first year of homelab; previously, I was only meeting storage needs with NAS services.

The soft router was upgraded to an E3845 four-port industrial computer (codename Larva), which only handled basic services like dialing, DNS services, ad blocking, DDNS, etc.

I made a mysteriously confident choice of a Netgear GS105 4-port unmanaged switch. Many people asked me why I didn't choose an 8-port one; I didn't realize the changes that would come later and only considered that the weak current box could only fit 4 ports.

Application services were handled by the newly popular Honey Badger Super Storage (codename Corruptor) mining machine1: 6 drive bays / dual Gigabit Ethernet ports / J1900 CPU / 8G memory / 64G MSATA. I mainly liked its chassis size and design, which looked much better than Snail Star and similar products. I researched and tried the following systems on it:

  • Debian: Running Docker services directly felt a bit unsatisfactory.
  • OMV: The simplest NAS system providing Docker/Proxmox kernel, but any operation took a long time to apply, which was unacceptable.
  • Proxmox: Running a dozen not-so-performance-intensive systems and services was barely acceptable.
  • harvester: Rancher's open-source hyper-converged infrastructure software providing fully integrated storage and virtualization features based on k8s, but I never expected J1900 couldn't handle it...

You might wonder why I didn't make good use of the Gen 8 server. The reason is that the G1610T's performance is too weak to support ESXi virtualization, and the prices for upgradeable CPUs like E3 1265L v2 are inflated. I always regarded it as a pure NAS server. I set up two pairs of hard drives in RAID 1 on the four drive bays. One RAID group of 3T was for photo storage, and the second group of 3T was for data storage services provided to Honey Badger Super Storage. I expanded one SSD in the optical drive bay to act as a cache drive.

After some struggles, the stubborn J1900 couldn't handle the pressure. I bought a 17x17 Douxi ITX motherboard from Xianyu, customized a flex power supply, modified a silent fan, paired it with an 8700es CPU, a single 32G memory stick from PDD, and a Cool Beast 256G M2 SSD. I reused the Honey Badger Super Storage chassis, but the chassis width was narrow, so the only suitable option was the Lian Li AXP90 x36, and the rear exhaust fan was a Coolermaster Vortex 80 silent fan.

The system continued to use Proxmox to set up backup and restore tools2 and then ran VMs for jellyfin, portainer, vaultwarden, uptime kuma, traefik, and other monitoring, database, and application services. For storage, the Cool Beast SSD served as the system disk, while four idle hard drives formed a btrfs RAID 10. The fifth drive was a 3T movie and TV show download disk, and the last one was a backup.

The hosts eliminated through two hardware upgrades did not lose their value:

  • The J1900 board U was sold for a few bucks, so I simply got a new MAXT mini case for testing new experimental systems and services, giving it the new codename Deadpool.
  • The N270 soft router was sent to a friend in Wuhan who needed it during the early outbreak of the pandemic.

2022 ~ 2023#

This actually reflects the version for the entire year of 2022, and I will update it if there are changes later.

v2.1 Device Architecture Topology Diagram

homelab-diagram-v2.1

The original plan was to run smoothly for 1-2 years without major changes, just continuing to experiment based on the Proxmox system and finalize running multiple VMs for k8s/k3s clusters, allowing me to retire in peace. However, two incidents disrupted my plans. One was that after upgrading to 64G memory, a hard drive failed, but fortunately, it could work normally after being removed from RAID. The second incident was a complete All-in-One boom caused by the CPU cooling fan failing to work. During the procurement of a fan and optimization of the chassis airflow, I went through a reflective process. I clearly rejected the All-in-One design, yet my main development machine also had this design. The more services there are, the more one must ensure service availability, which means adding at least 1-2 new hosts, but there really isn't more space at home. After reviewing domestic and foreign materials, I targeted a few options:

  • NEC M700: 6th generation, modified to support 7/8/9 generations.
  • Raspberry Pi 4B: There are many successful cases, but the price is simply outrageous3.
  • Rongpin king3399: RK3399 has strong performance, but I missed the boat, and the price skyrocketed while 2G memory is a bit small.

One day, while browsing recommendations on Xianyu, I saw a better EAIDK 610 RK3399 development board with 4G memory priced just over 200. I tried buying two and, with the help of anbutu and the community eaidk-610 armbian-build project, successfully compiled and flashed the armbian system. After running the k3s service, I bought two more to form a mixed cluster of amd86 and arm86 with Proxmox VMs.

eaidk-610

While testing the cluster, the J1900 host also completed a trial run of the nomad service and gradually stabilized, allowing it to form a mixed cluster with Proxmox VMs.

For internal network penetration, in addition to public network plus ports, there was also the traefik hub solution. As one of the earliest internal testers, I had additional free quotas, and the free version now limits a maximum of 5 public services.

The Gen 8 server had a lot of video transcoding work to run when transferring data to new helium drives, which put a strain on the CPU. I bought a 1230v2 for 150 on Xianyu to continue working, but Synology needed modifications to correctly recognize the new U.

My Device Choices#

I have spent a long time introducing the evolution of my devices. From the text and topology diagrams, everyone has a rough understanding. For convenience, I will summarize a list for everyone to appreciate. In terms of hardware and software tinkering, I always think of Mingcheng brother.

For hardware and software, I adopted the strategy of ThoughtWorks Technology Radar to categorize any solution into four stages: Evaluate, Experiment, Adopt, and Hold. Therefore, it will include many solutions, and those marked as Adopt can be used with confidence.

Hardware#

Host CodenameSystemStageQuantityPurpose
Immortal amd64
9700k/32G/6TB/6600xt
macOS
Windows
Adopt1Personal productivity tool
Queen amd64
HP Gen8 (1230v2/8G/40TB)
SynologyAdopt1NAS
Corruptor amd64
Honey Badger Super Storage (8700es/64G/10TB)
ProxmoxAdopt1Virtual development machine
Bunker amd64
J1900/8G/1TB
DebianAdopt1Nomad cluster member
Larva amd64
E3845/2G/16GB/i211x4
OpenWrtAdoptSoft router
Splitter arm64
EAIDK610 (RK3399/4G/6+128GB)
ArmbianExperiment4k3s cluster
Colossus arm64
Apple TV 2021(4K/32G)
Apple TVAdoptNew TV box
Lair armv8
H96 Pro+ (S912/4G/32GB)
Android TVAdoptBackup TV box
Drone arm64
Orange Pi 3 LTS
ArmbianEvaluateIncomplete IP KVM

When a host fails and needs maintenance, a separate monitor and keyboard/mouse are required. A better solution is IP KVM. Existing solutions either only support Raspberry Pi 4B or require two development boards. I once used a Raspberry Pi 3B and Orange Pi 3 LTS to flash tinypilot, which could only enable HDMI screen capture and could not simulate keyboard/mouse operations.

I have always been weak in low-level and hardware aspects. I think when the apocalypse comes, those of us who write software services will be the first to go, haha.

Hard Drives#

Within my personal capability, I prioritize: M2 SSD > SATA SSD > Helium drives > Non-stacked drives. In this regard, I also looked at everyone's suggestions for purchases. The advice is to keep the hard drives well organized, especially regarding purchase time, purchase channels, quantity, hard drive numbers, warranty periods, expiration dates, and regular SMART data recording.

The army of hard drives

harddisk-2022-full

Distribution of hard drives
harddisk-category

UPS Power Supply#

A brief power outage can more likely cause server hardware (especially hard drives) to fail and break. To ensure the safety of hardware and data, a UPS power supply is essential, with a priority on supporting communication. Although a UPS can keep running in battery mode during a power outage, the battery power is limited, and if the server knows its current status, it can perform a safe shutdown.

UPS DeviceCoverage AreaDescription
APC BK650Proxmox + Black Synology + WIFI APConnects to Proxmox and enables NUT service and
apcupsd data access to Prometheus
APC BK650Hackintosh + Armbian cluster + NomadConnects to Nomad and enables NUT service
apcupsd data access to Prometheus
Weak Current Box UPSOptical modem + Soft router + SwitchThe soft router receives notifications from the other two NUTs
Four-port 12V with one spare

I have only used Schneider APC UPS with communication protocols, which can basically be managed through apcupsd or NUT (most NAS systems like Synology, QNAP, etc., support this). This service allows devices without direct communication lines to receive notifications and perform safe shutdown operations4.

Software#

Operating Systems#

Operating SystemStageDescription
ProxmoxAdoptA highly playable virtual machine system with automated management
DebianAdoptThe personal most familiar basic amd64 OS for servers
ArmbianAdoptARM version of Debian, for the same reasons as above
OpenWrtAdoptA highly playable open-source soft router system
TalosExperiment100% API-managed distribution based on k8s supporting multiple deployment environments
Pi-holeExperimentA very popular DNS management system abroad, user-friendly interface
RockstorEvaluateA NAS system based on openSUSE + btrfs, supports SMART and NUT
Note not compatible with Asia/Beijing timezone
KairosEvaluateA newly released containerized system, interested but not yet successfully run
TrueNASEvaluateA NAS system developed based on FreeBSD (ZFS preferred)
OMVHoldA highly complete NAS system, but personally not fond of it
PhotonOSHoldVmware virtualization optimization but dislike redhat systems
SmarterOSHoldA NAS system supporting virtualization and ZFS but relies heavily on memory

Powerful machines use Proxmox as the host and run the required services or container orchestration management services or containerized systems (Linux Container OS5) internally based on Debian or Armbian.

File Systems#

TypeStageDescription
btrfsAdoptConvenient disk management, supports snapshots and COW
ext4AdoptThe most reliable file system
zfsEvaluateRobust, reliable, scalable, and easy to manage but consumes memory
xfsEvaluateReportedly very fast, personally haven't researched much
talos default file system

My personal priority is btrfs > ext4 > zfs > xfs. Note that btrfs is currently not recommended for use with RAID 5/6, and I don't consider zfs because adding new disks after forming RAID is troublesome and costly. I don't know much about xfs, but those interested can check the benchmark tests of the above file systems in PostgreSQL.

Regarding btrfs, my personal view is that only by trying it yourself can you know the results. Although Promox released btrfs as a technical preview in version 7.0, I have been using it for nearly two years without major issues, except for one minor fault caused by a poor-quality hard drive bought from Taobao with too many bad sectors. Btrfs can work normally even after removing bad disks in RAID10 with a minimum of 4 disks (just perform a balance after deletion), and I haven't encountered any other problems. Although the COW feature may slow down disk IO, I can accept it.

For those interested in btrfs, I recommend watching @Houge's teaching video or the official beginner's tutorial video released by openSUSE. For those who have used or are familiar with btrfs, you can read more about the differences in snapshot implementation between btrfs and zfs, comparison between btrfs and xfs, Five Years of Btrfs and BTRFS Best Practices to be well-informed.

Storage Services#

ServiceStageDescription
sambaAdoptHighest compatibility and practicality, only recommended for manual file mounting
nfsAdoptCan serve as a minimum guarantee for data mounting
minosAdoptOpen-source storage service compatible with S3 applications
juicefsExperimentS3 compatible and highly POSIX compliant open-source storage service
longhornExperimentSimple and easy-to-use open-source block storage service, disk migration is very easy
rook cephEvaluateA cloud-native storage service with great potential
Not recommended for small clusters or weak CPUs
mayastorEvaluateBlock storage service optimized for NVME

Previously, storage was mainly Samba, NFS, or even just APF, but I only started officially experimenting in a production environment in 2022, especially for storage related to k8s, where I am still a novice.

Container Management and Orchestration Services#

ServiceStageDescription
portainerAdoptA management service supporting multiple orchestration services like Docker/k3s/nomad
kubesphereHoldA user-friendly k8s front-end container management service for beginners and enterprises, overall a bit heavy
nomadAdoptAn orchestration service with a low entry threshold but lacking teaching materials
k3sExperimentA lightweight k8s distribution highly optimized for edge computing and IoT scenarios
kubernetesEvaluate100% authentic k8s, I dare not approach :D
docker swarmHoldAn orchestration service that the official is almost abandoning, not recommended

Portainer is a user-friendly container management tool that I still use today. K3s is also the easiest orchestration service to step into the k8s world and is edge-friendly.

Gateways#

ServiceStageDescription
traefikAdoptThe best gateway service in my opinion
caddyAdoptA simple and easy-to-use gateway service supporting Let's Encrypt
nginxAdoptFor managing multiple domains, consider nginx proxy manager

Although all are marked as Adopt, I mainly use the first two. Traefik is my top choice for gateways, while Caddy is simple to use. The first two are powerful and easy to use, and I can't think of a reason to use the third.

Automated Deployment#

ServiceStageDescription
ansibleAdoptA configuration tool for automated deployment without agents (using SSH)
terraformAdoptA tool for automating deployment of any service with an interface
Ansible is still the best choice
fluxcdExperimentThe best tool for automatic configuration deployment for k8s in gitops
argocdExperimentAutomatic configuration deployment for k8s in gitops with a visual topology
pulumiExperimentA tool supporting multiple native language configuration versions of terraform
Excellent architecture, user-friendly, but painful for plugin developers
saltHoldHas agents, initially launched to crush ansible
Looking at market choices, it’s not that great

As long as it involves operating systems, ansible + terraform is unbeatable! Fluxcd has no issues with configuring and deploying k8s services, but the entry threshold exists; it depends on whether you can get started. I recommend using it after familiarizing yourself with the basic concepts of k8s and having some practical deployment experience.

Factors Not to Be Overlooked#

A large portion of the article introduced my personal homelab device architecture evolution and hardware/software choices. What other easily overlooked factors are there?

If we compare devices to core buildings, the factors that cannot be overlooked are the infrastructure. Both must be grasped to ensure that the homelab can exert maximum efficiency; no one wants performance to be less than 100% or unexpected failures to occur.

Network Cable Specifications#

Ensure that all homelab devices are connected to a wired network of gigabit or higher. Wi-Fi can be affected by surrounding channel interference, transmission attenuation, and other instability issues.

Different specifications of network cables and their speeds, sourced from IEEE 802 LMSC

cable-ethernet-data-rates

SpecificationTypeSpeedInterfaceRemarks
Cat 5
CAT 5
100Base-T
10Base-T
100MbpsRJ45Not recommended
Cat 5E
CAT 5E
100Base-T1000Mbps
2.5Gbps
RJ452.5G networks limited to within 100 meters
Cat 6
CAT 6
100Base-T1Gbps
10Gbps
RJ4510G networks limited to within 50 meters
Cat 6A
CAT 6A
100Base-T10GbpsRJ4510G networks can reach within 200 meters
No 6E standard
Cat 7
CAT 7
100Base-T10GbpsGG45/TERAShielded
Fiber Optic---Not familiar, see Wikipedia for details

To reiterate, A gigabit or higher network is indispensable, with a minimum of CAT 5E recommended, and strongly recommend using CAT 6/6A specifications. For those with deep pockets, CAT 7 or fiber optics are fine. If you are unsure about the state of your home network, here are two methods to check:

  1. Check the printing on the cable for the cable specification label.
  2. Use iperf3 on two connectable wired devices to act as server and client for testing.
# One device starts the server, assuming the server IP is 192.168.1.100
iperf3 -s

# Another device starts the client, connecting to the server for testing
iperf3 -c 192.168.1.100

Noise and Heat Dissipation#

  • Hardware
    • The noise from mechanical hard drives during read/write (if you can afford it, go all SSD or wait for EDSFF E1/E3 cards for civilian use).
    • The bearings, speed, and size of fans can also produce noise (CPU cooling, graphics card, chassis, power supply, etc.).
    • Motherboard DEBUG buzzers (some can be turned off or removed).
  • Software
    • Synology writes to the entire disk by default for system stability as a backup. To solve the noise issue, it is recommended to strategically remove this.
    • Linux systems can consider using lm-sensors for detection and configuration.
  • Space
    • The placement location determines noise tolerance and heat dissipation efficiency.

Power Saving and Consumption#

The standby TPW of the CPU is just a reference; the overall consumption also needs to consider hard drives, memory, and graphics cards, as well as peak power. There isn't much to elaborate on this. One should consider energy efficiency but not overly focus on it, especially when purchasing overpriced products just to reduce TPW by 5-10w, which was also mentioned in the summary of the previous article.

Conclusion#

In the journey of homelab, the initial choice of hardware is not crucial. All-in-one setups are something every beginner will experience. Over time, just like houses need maintenance and cars require regular servicing, the stability of services and data security also need attention and maintenance.

You might say that everything has been stable for years with my all-in-one setup, but I can only say: those who have never experienced pain will never know what it means to suffer.

Footnotes#

  1. I remember seeing the launch in the Mining Community and later reading Awen Jun's article.

  2. The official provides backup and restore methods, and there are also backup scripts on GitHub Gist.

  3. Due to rising costs under the pretext of the pandemic, the Raspberry Pi 4B can sell for as high as 1200 in the domestic market, while my unused 3B sold for 600.

  4. Configuration tutorial and shutdown solutions 1, 2.

  5. For options regarding containerized OS, you can check Reimu's blog post.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.