The previous introductory article comprehensively introduced the possibilities of building a home lab with both hardware and software. In practice, everyone's indoor environment and network wiring are different, and varying financial capabilities lead to countless solutions. I cannot provide a 100% solution, but I will review the evolution of my device architecture, what needs arose at each stage, the problems encountered, and how I responded. At the end of the article, I will also highlight some factors that should not be overlooked when building a home lab.
My Device Architecture#
First, I declare that my personal device architecture does not represent the best solution; it is merely the result that currently meets my requirements and will be iteratively updated as needs and technology change. My dream is to have a basement like the one in the picture below where I can tinker freely.
The dream of a homelab, inspired by @HCocoa@twitter
At the time, the house renovation was minimalist, leading to unreasonable network wiring design:
- The TV wall, bedroom, and studio were pre-wired with Cat 6 cables, but all were single lines.
- The weak current box was too small to expand, resulting in many devices piling up behind the TV in the early stages.
- The house is not large enough to set up a heat-dissipating independent cabinet.
- There are Wi-Fi dead zones in the rooms.
After adding family members, the original studio was converted into a children's room, and some devices were relocated. I could only optimize and improve based on the existing structure. Below is a glimpse of my painful history.
Evolution of Device Architecture#
2014 ~ 2016#
v0 Device Architecture Topology Diagram
During this period, I changed jobs, and my daily commute was long, so I didn't tinker much with the network at home. The router was an old Netgear WGR614 carried over from my renting days. I bought a cheap first-generation smart TV from LeEco, which met my basic need for sound, but the system became unresponsive in less than six months, leading me to never consider an Android TV again. In the first two years, I tried to save money, knowing I should use Unicom in Beijing, but I foolishly went with Northern Telecom, which had terrible internet speed. I also picked up some unreliable junk components that I won't mention.
2016 ~ 2018#
v0.1 Device Architecture Topology Diagram
Thanks to work benefits, my travel frequency gradually increased, and I needed more storage for the increasing number of photos I was taking. In 2016, I learned about a HP Microserver Gen8 from What Worth Buying and imported one from Germany. This server has 4 drive bays, dual Gigabit Ethernet ports, and iLO management features, but I only installed Black Synology on it, making it one of the earliest NAS server architectures. The 5.x version of Synology did not support Docker, so it served purely as a NAS for storing photos and videos, replacing Dropbox's Drive service. From the diagram, you can see that I had no idea what link aggregation was at the time; otherwise, I would have definitely connected the Black Synology to both network ports.
The router was upgraded to a Netgear 6300v2, and I flashed KoolCenter's customized Merlin firmware for better internet access. I was forced to use internal network penetration at this time. After upgrading Black Synology, I couldn't completely clean up the previous Synology QuickConnect service, which was discarded. After applying for a public IP from Unicom customer service, I reported it regularly through DDNS (the script was provided by Tangcu Nose).
The TV was replaced with a Sony 4K 60-inch internet TV, and I connected an Amlogic S912 foreign trade box to install Kodi and YouTube, creating a home audio-visual system.
The PS4 was bundled with GTA5 and The Last of Us during some Black Friday promotion on Amazon.
2018 ~ 2019#
v1 Device Architecture Topology Diagram
In 2018, I complained on Twitter about how difficult it was to flash Merlin firmware, and anbutu recommended the openwrt system and gifted me an N270 x86 32-bit dual network port industrial computer, which introduced me to a new field: soft routing.
Black Synology was upgraded to version 6.x, which supports Docker, and I began trying to run some basic services like aria2, home assistant, and Adguard home. I didn't know when the neighborhood would suddenly lose power for a few minutes and then come back. Once, a hard drive was reported by Synology's detection to have many bad sectors, which scared me into immediately buying a Schneider APC BK650 with communication protocol to ensure safe shutdown after power loss.
At the beginning of 2019, I started learning to shoot videos and purchased a new Intel host to relearn how to install Hackintosh after a gap of 9 years to replace my aging MacBook Pro 2015. I didn't upgrade to subsequent versions mainly because Intel was underperforming, and Apple soldered all the hardware, making it not worth upgrading to the top configuration.
2020 ~ 2021#
v2 Device Architecture Topology Diagram
At the end of 2020, I officially stepped into the first year of homelab; previously, I was only meeting storage needs with NAS services.
The soft router was upgraded to an E3845 four-port industrial computer (codename Larva), which only handled basic services like dialing, DNS services, ad blocking, DDNS, etc.
I made a mysteriously confident choice of a Netgear GS105 4-port unmanaged switch. Many people asked me why I didn't choose an 8-port one; I didn't realize the changes that would come later and only considered that the weak current box could only fit 4 ports.
Application services were handled by the newly popular Honey Badger Super Storage (codename Corruptor) mining machine1: 6 drive bays / dual Gigabit Ethernet ports / J1900 CPU / 8G memory / 64G MSATA. I mainly liked its chassis size and design, which looked much better than Snail Star and similar products. I researched and tried the following systems on it:
- Debian: Running Docker services directly felt a bit unsatisfactory.
- OMV: The simplest NAS system providing Docker/Proxmox kernel, but any operation took a long time to apply, which was unacceptable.
- Proxmox: Running a dozen not-so-performance-intensive systems and services was barely acceptable.
- harvester: Rancher's open-source hyper-converged infrastructure software providing fully integrated storage and virtualization features based on k8s, but I never expected J1900 couldn't handle it...
You might wonder why I didn't make good use of the Gen 8 server. The reason is that the G1610T's performance is too weak to support ESXi virtualization, and the prices for upgradeable CPUs like E3 1265L v2 are inflated. I always regarded it as a pure NAS server. I set up two pairs of hard drives in RAID 1 on the four drive bays. One RAID group of 3T was for photo storage, and the second group of 3T was for data storage services provided to Honey Badger Super Storage. I expanded one SSD in the optical drive bay to act as a cache drive.
After some struggles, the stubborn J1900 couldn't handle the pressure. I bought a 17x17 Douxi ITX motherboard from Xianyu, customized a flex power supply, modified a silent fan, paired it with an 8700es CPU, a single 32G memory stick from PDD, and a Cool Beast 256G M2 SSD. I reused the Honey Badger Super Storage chassis, but the chassis width was narrow, so the only suitable option was the Lian Li AXP90 x36, and the rear exhaust fan was a Coolermaster Vortex 80 silent fan.
The system continued to use Proxmox to set up backup and restore tools2 and then ran VMs for jellyfin, portainer, vaultwarden, uptime kuma, traefik, and other monitoring, database, and application services. For storage, the Cool Beast SSD served as the system disk, while four idle hard drives formed a btrfs RAID 10. The fifth drive was a 3T movie and TV show download disk, and the last one was a backup.
The hosts eliminated through two hardware upgrades did not lose their value:
- The J1900 board U was sold for a few bucks, so I simply got a new MAXT mini case for testing new experimental systems and services, giving it the new codename Deadpool.
- The N270 soft router was sent to a friend in Wuhan who needed it during the early outbreak of the pandemic.
2022 ~ 2023#
This actually reflects the version for the entire year of 2022, and I will update it if there are changes later.
v2.1 Device Architecture Topology Diagram
The original plan was to run smoothly for 1-2 years without major changes, just continuing to experiment based on the Proxmox system and finalize running multiple VMs for k8s/k3s clusters, allowing me to retire in peace. However, two incidents disrupted my plans. One was that after upgrading to 64G memory, a hard drive failed, but fortunately, it could work normally after being removed from RAID. The second incident was a complete All-in-One boom caused by the CPU cooling fan failing to work. During the procurement of a fan and optimization of the chassis airflow, I went through a reflective process. I clearly rejected the All-in-One design, yet my main development machine also had this design. The more services there are, the more one must ensure service availability, which means adding at least 1-2 new hosts, but there really isn't more space at home. After reviewing domestic and foreign materials, I targeted a few options:
- NEC M700: 6th generation, modified to support 7/8/9 generations.
- Raspberry Pi 4B: There are many successful cases, but the price is simply outrageous3.
- Rongpin king3399: RK3399 has strong performance, but I missed the boat, and the price skyrocketed while 2G memory is a bit small.
One day, while browsing recommendations on Xianyu, I saw a better EAIDK 610 RK3399 development board with 4G memory priced just over 200. I tried buying two and, with the help of anbutu and the community eaidk-610 armbian-build project, successfully compiled and flashed the armbian system. After running the k3s service, I bought two more to form a mixed cluster of amd86 and arm86 with Proxmox VMs.
While testing the cluster, the J1900 host also completed a trial run of the nomad service and gradually stabilized, allowing it to form a mixed cluster with Proxmox VMs.
For internal network penetration, in addition to public network plus ports, there was also the traefik hub solution. As one of the earliest internal testers, I had additional free quotas, and the free version now limits a maximum of 5 public services.
The Gen 8 server had a lot of video transcoding work to run when transferring data to new helium drives, which put a strain on the CPU. I bought a 1230v2 for 150 on Xianyu to continue working, but Synology needed modifications to correctly recognize the new U.
My Device Choices#
I have spent a long time introducing the evolution of my devices. From the text and topology diagrams, everyone has a rough understanding. For convenience, I will summarize a list for everyone to appreciate. In terms of hardware and software tinkering, I always think of Mingcheng brother.
For hardware and software, I adopted the strategy of ThoughtWorks Technology Radar to categorize any solution into four stages: Evaluate
, Experiment
, Adopt
, and Hold
. Therefore, it will include many solutions, and those marked as Adopt
can be used with confidence.
Hardware#
Host Codename | System | Stage | Quantity | Purpose |
---|---|---|---|---|
Immortal amd64
9700k/32G/6TB/6600xt | macOS Windows | Adopt | 1 | Personal productivity tool |
Queen amd64 HP Gen8 (1230v2/8G/40TB) | Synology | Adopt | 1 | NAS |
Corruptor amd64 Honey Badger Super Storage (8700es/64G/10TB) | Proxmox | Adopt | 1 | Virtual development machine |
Bunker amd64 J1900/8G/1TB | Debian | Adopt | 1 | Nomad cluster member |
Larva amd64 E3845/2G/16GB/i211x4 | OpenWrt | Adopt | Soft router | |
Splitter arm64 EAIDK610 (RK3399/4G/6+128GB) | Armbian | Experiment | 4 | k3s cluster |
Colossus arm64 Apple TV 2021(4K/32G) | Apple TV | Adopt | New TV box | |
Lair armv8 H96 Pro+ (S912/4G/32GB) | Android TV | Adopt | Backup TV box | |
Drone arm64 Orange Pi 3 LTS | Armbian | Evaluate | Incomplete IP KVM |
When a host fails and needs maintenance, a separate monitor and keyboard/mouse are required. A better solution is IP KVM. Existing solutions either only support Raspberry Pi 4B or require two development boards. I once used a Raspberry Pi 3B and Orange Pi 3 LTS to flash tinypilot, which could only enable HDMI screen capture and could not simulate keyboard/mouse operations.
I have always been weak in low-level and hardware aspects. I think when the apocalypse comes, those of us who write software services will be the first to go, haha.
Hard Drives#
Within my personal capability, I prioritize: M2 SSD
> SATA SSD
> Helium drives
> Non-stacked drives
. In this regard, I also looked at everyone's suggestions for purchases. The advice is to keep the hard drives well organized, especially regarding purchase time, purchase channels, quantity, hard drive numbers, warranty periods, expiration dates, and regular SMART data recording.
The army of hard drives
Distribution of hard drives
UPS Power Supply#
A brief power outage can more likely cause server hardware (especially hard drives) to fail and break. To ensure the safety of hardware and data, a UPS power supply is essential, with a priority on supporting communication. Although a UPS can keep running in battery mode during a power outage, the battery power is limited, and if the server knows its current status, it can perform a safe shutdown.
UPS Device | Coverage Area | Description |
---|---|---|
APC BK650 | Proxmox + Black Synology + WIFI AP | Connects to Proxmox and enables NUT service and apcupsd data access to Prometheus |
APC BK650 | Hackintosh + Armbian cluster + Nomad | Connects to Nomad and enables NUT service apcupsd data access to Prometheus |
Weak Current Box UPS | Optical modem + Soft router + Switch | The soft router receives notifications from the other two NUTs Four-port 12V with one spare |
I have only used Schneider APC UPS with communication protocols, which can basically be managed through apcupsd or NUT (most NAS systems like Synology, QNAP, etc., support this). This service allows devices without direct communication lines to receive notifications and perform safe shutdown operations4.
Software#
Operating Systems#
Operating System | Stage | Description |
---|---|---|
Proxmox | Adopt | A highly playable virtual machine system with automated management |
Debian | Adopt | The personal most familiar basic amd64 OS for servers |
Armbian | Adopt | ARM version of Debian, for the same reasons as above |
OpenWrt | Adopt | A highly playable open-source soft router system |
Talos | Experiment | 100% API-managed distribution based on k8s supporting multiple deployment environments |
Pi-hole | Experiment | A very popular DNS management system abroad, user-friendly interface |
Rockstor | Evaluate | A NAS system based on openSUSE + btrfs, supports SMART and NUT Note not compatible with Asia/Beijing timezone |
Kairos | Evaluate | A newly released containerized system, interested but not yet successfully run |
TrueNAS | Evaluate | A NAS system developed based on FreeBSD (ZFS preferred) |
OMV | Hold | A highly complete NAS system, but personally not fond of it |
PhotonOS | Hold | Vmware virtualization optimization but dislike redhat systems |
SmarterOS | Hold | A NAS system supporting virtualization and ZFS but relies heavily on memory |
Powerful machines use Proxmox as the host and run the required services or container orchestration management services or containerized systems (Linux Container OS5) internally based on Debian or Armbian.
File Systems#
Type | Stage | Description |
---|---|---|
btrfs | Adopt | Convenient disk management, supports snapshots and COW |
ext4 | Adopt | The most reliable file system |
zfs | Evaluate | Robust, reliable, scalable, and easy to manage but consumes memory |
xfs | Evaluate | Reportedly very fast, personally haven't researched much talos default file system |
My personal priority is btrfs
> ext4
> zfs
> xfs
. Note that btrfs is currently not recommended for use with RAID 5/6, and I don't consider zfs because adding new disks after forming RAID is troublesome and costly. I don't know much about xfs, but those interested can check the benchmark tests of the above file systems in PostgreSQL.
Regarding btrfs, my personal view is that only by trying it yourself can you know the results. Although Promox released btrfs as a technical preview in version 7.0, I have been using it for nearly two years without major issues, except for one minor fault caused by a poor-quality hard drive bought from Taobao with too many bad sectors. Btrfs can work normally even after removing bad disks in RAID10 with a minimum of 4 disks (just perform a balance after deletion), and I haven't encountered any other problems. Although the COW feature may slow down disk IO, I can accept it.
For those interested in btrfs, I recommend watching @Houge's teaching video or the official beginner's tutorial video released by openSUSE. For those who have used or are familiar with btrfs, you can read more about the differences in snapshot implementation between btrfs and zfs, comparison between btrfs and xfs, Five Years of Btrfs and BTRFS Best Practices to be well-informed.
Storage Services#
Service | Stage | Description |
---|---|---|
samba | Adopt | Highest compatibility and practicality, only recommended for manual file mounting |
nfs | Adopt | Can serve as a minimum guarantee for data mounting |
minos | Adopt | Open-source storage service compatible with S3 applications |
juicefs | Experiment | S3 compatible and highly POSIX compliant open-source storage service |
longhorn | Experiment | Simple and easy-to-use open-source block storage service, disk migration is very easy |
rook ceph | Evaluate | A cloud-native storage service with great potential Not recommended for small clusters or weak CPUs |
mayastor | Evaluate | Block storage service optimized for NVME |
Previously, storage was mainly Samba, NFS, or even just APF, but I only started officially experimenting in a production environment in 2022, especially for storage related to k8s, where I am still a novice.
Container Management and Orchestration Services#
Service | Stage | Description |
---|---|---|
portainer | Adopt | A management service supporting multiple orchestration services like Docker/k3s/nomad |
kubesphere | Hold | A user-friendly k8s front-end container management service for beginners and enterprises, overall a bit heavy |
nomad | Adopt | An orchestration service with a low entry threshold but lacking teaching materials |
k3s | Experiment | A lightweight k8s distribution highly optimized for edge computing and IoT scenarios |
kubernetes | Evaluate | 100% authentic k8s, I dare not approach :D |
docker swarm | Hold | An orchestration service that the official is almost abandoning, not recommended |
Portainer is a user-friendly container management tool that I still use today. K3s is also the easiest orchestration service to step into the k8s world and is edge-friendly.
Gateways#
Service | Stage | Description |
---|---|---|
traefik | Adopt | The best gateway service in my opinion |
caddy | Adopt | A simple and easy-to-use gateway service supporting Let's Encrypt |
nginx | Adopt | For managing multiple domains, consider nginx proxy manager |
Although all are marked as Adopt
, I mainly use the first two. Traefik is my top choice for gateways, while Caddy is simple to use. The first two are powerful and easy to use, and I can't think of a reason to use the third.
Automated Deployment#
Service | Stage | Description |
---|---|---|
ansible | Adopt | A configuration tool for automated deployment without agents (using SSH) |
terraform | Adopt | A tool for automating deployment of any service with an interface Ansible is still the best choice |
fluxcd | Experiment | The best tool for automatic configuration deployment for k8s in gitops |
argocd | Experiment | Automatic configuration deployment for k8s in gitops with a visual topology |
pulumi | Experiment | A tool supporting multiple native language configuration versions of terraform Excellent architecture, user-friendly, but painful for plugin developers |
salt | Hold | Has agents, initially launched to crush ansible Looking at market choices, it’s not that great |
As long as it involves operating systems, ansible + terraform is unbeatable! Fluxcd has no issues with configuring and deploying k8s services, but the entry threshold exists; it depends on whether you can get started. I recommend using it after familiarizing yourself with the basic concepts of k8s and having some practical deployment experience.
Factors Not to Be Overlooked#
A large portion of the article introduced my personal homelab device architecture evolution and hardware/software choices. What other easily overlooked factors are there?
If we compare devices to core buildings, the factors that cannot be overlooked are the infrastructure. Both must be grasped to ensure that the homelab can exert maximum efficiency; no one wants performance to be less than 100% or unexpected failures to occur.
Network Cable Specifications#
Ensure that all homelab devices are connected to a wired network of gigabit or higher
. Wi-Fi can be affected by surrounding channel interference, transmission attenuation, and other instability issues.
Different specifications of network cables and their speeds, sourced from IEEE 802 LMSC
Specification | Type | Speed | Interface | Remarks |
---|---|---|---|---|
Cat 5 CAT 5 | 100Base-T 10Base-T | 100Mbps | RJ45 | Not recommended |
Cat 5E CAT 5E | 100Base-T | 1000Mbps 2.5Gbps | RJ45 | 2.5G networks limited to within 100 meters |
Cat 6 CAT 6 | 100Base-T | 1Gbps 10Gbps | RJ45 | 10G networks limited to within 50 meters |
Cat 6A CAT 6A | 100Base-T | 10Gbps | RJ45 | 10G networks can reach within 200 meters No 6E standard |
Cat 7 CAT 7 | 100Base-T | 10Gbps | GG45/TERA | Shielded |
Fiber Optic | - | - | - | Not familiar, see Wikipedia for details |
To reiterate, A gigabit or higher network is indispensable
, with a minimum of CAT 5E recommended, and strongly recommend using CAT 6/6A specifications. For those with deep pockets, CAT 7 or fiber optics are fine. If you are unsure about the state of your home network, here are two methods to check:
- Check the printing on the cable for the cable specification label.
- Use iperf3 on two connectable wired devices to act as server and client for testing.
# One device starts the server, assuming the server IP is 192.168.1.100
iperf3 -s
# Another device starts the client, connecting to the server for testing
iperf3 -c 192.168.1.100
Noise and Heat Dissipation#
- Hardware
- The noise from mechanical hard drives during read/write (if you can afford it, go all SSD or wait for EDSFF E1/E3 cards for civilian use).
- The bearings, speed, and size of fans can also produce noise (CPU cooling, graphics card, chassis, power supply, etc.).
- Motherboard DEBUG buzzers (some can be turned off or removed).
- Software
- Synology writes to the entire disk by default for system stability as a backup. To solve the noise issue, it is recommended to strategically remove this.
- Linux systems can consider using lm-sensors for detection and configuration.
- Space
- The placement location determines noise tolerance and heat dissipation efficiency.
Power Saving and Consumption#
The standby TPW of the CPU is just a reference; the overall consumption also needs to consider hard drives, memory, and graphics cards, as well as peak power. There isn't much to elaborate on this. One should consider energy efficiency but not overly focus on it, especially when purchasing overpriced products just to reduce TPW by 5-10w, which was also mentioned in the summary of the previous article.
Conclusion#
In the journey of homelab, the initial choice of hardware is not crucial. All-in-one setups are something every beginner will experience. Over time, just like houses need maintenance and cars require regular servicing, the stability of services and data security also need attention and maintenance.
You might say that everything has been stable for years with my all-in-one setup, but I can only say: those who have never experienced pain will never know what it means to suffer.
Footnotes#
-
I remember seeing the launch in the Mining Community and later reading Awen Jun's article. ↩
-
The official provides backup and restore methods, and there are also backup scripts on GitHub Gist. ↩
-
Due to rising costs under the pretext of the pandemic, the Raspberry Pi 4B can sell for as high as 1200 in the domestic market, while my unused 3B sold for 600. ↩
-
Configuration tutorial and shutdown solutions 1, 2. ↩
-
For options regarding containerized OS, you can check Reimu's blog post. ↩