Homelab

Homelab: 802.1x 2021

One technology I have played around with a little at work but wanted to get a better handle on is 802.1x. I have taken and passed the Cisco ISE cert a few years back and have used that with other services at work, but for the home setup I mostly wanted to be able to put different wireless devices onto different vlans based on device and user. Windows Server natively makes this possible with Network Policy Server (NPS).

An example of me playing with Network Policy Server

NPS is a Radius server in Windows at the end of the day. It gives you the conditions and a rules system to respond to different Radius calls, as well as a way to setup Accounting. It is fairly simple compared to something like ISE that can also do Posture and Profiling for devices; but for a quick free solution works well for home. You can say of a client is attempting to authenticate over a system like wireless, then accept x methods, vs if its wired or a switch login then accept other forms. Instead of going point by point of how to set it up, which you can find elsewhere online, I want to give some high level edge cases you may run into. First NPS need Windows Server with the Desktop experience, if you are running member servers or domain controllers as Server Core to simplify the environment then it will not work. NPS also does not easily HA. You can run multiple servers with it running, and export the config from one, then import it in another, but there is not a good system for dynamically syncing these (less you call random peoples PowerShell scripts a good system).

Some good reasons to use NPS is the simple AD integration, you can have users use their domain auth and easily get access. Or do as I do, really too much for home, or possibly anywhere, setup a domain CA, have a GPO that creates certs for each machine, then use cert based auth via 802.1x deployed via GPO. If anyone has questions about this I am happy to answer, but there are many places online that will talk about each of those configs and how to do them. Another place to integrate Radius other than 802.1x for wired and wireless is network device login. I use Radius for the stack of Ruckus switches (2 is considered a stack (like when you run k3s as a “cluster” of 1)) I have at home.

This is one of those Windows Services that works well, but also has not been touched in YEARS by Microsoft; like WSUS, or any other service that is useful. To backup this point, I installed several old versions of Windows Server in ESXi that I had laying around. Lesson 1 that I learned, the web console doesn’t work well with some of the legacy mouse support systems. Second you may need legacy VMware tools iso VMware Tools support for Windows 2000, Windows XP, and Windows Server 2003 (81466). The internet seems to say it came out in Server 2008.

https://social.microsoft.com/Forums/getfile/51145/

Homelab: NAS 2021

One piece that sits at the heart of my Homelab is the NAS I have. This is actually the same NAS I have written about years ago, looking back on that post brought back memories of the pervious system and Server 2008 that I didn’t recall. In the last year I have added several drives and a new network card to this box, I thought that as well as my experience running FreeNAS, now TrueNAS Core, over 8+ years was worth discussing.

When I built out that box, I had 5x3TB drives, each around $125 dollars. Now those same drives are $40. The rough rule of thumb I was always told is 1GB of RAM for ZFS for every TB of storage you have. So I maxed the mini-itx motherboard out at 16GB of RAM to get as close as I could get. This let met run basic services and I was running a few small VMs/Jails on the box. This did cut into the RAM I had available, but was a nice feature. This allowed me to run the Unifi controller without another system running. Back then, Raspberry Pis came with 256MB of RAM, making it not ideal to run too many services. I later would end up moving all of those to dedicated Raspberry Pis then later VM hosts.

These 5 disks served me well for a while; I every year or two would have a drive die, and it got cheaper and cheaper to replace them. I use this NAS for backups from my Windows desktop, and Macbook. Time Machine backups over the network to Macs works very well with TrueNAS. I ended up getting a smaller version of this box for my parents home, and sister, you can run the OS off a USB with a single or 2 small hard drives in a box like an Intel NUC, then have it always backup their PCs. Reminding people “plug in that USB drive” to backup seems to never stick. TrueNAS offers one click updates, with optional automatic checkin; this makes keeping the system up to date easily.

There have been reports of recent corruption with 12.0, but I have not seen that. Also there was a bug where you could get a banner saying “THIS IS A TESTING RELEASE NOT FOR PRODUCTION” on a production branch, so that is fun. These days those backups, and my Veeam backups are done to the NAS. I tried to use it as a iSCSI and then a NFS target, but the IO was a bit too much for these old spinning drives. Now I use vSAN, as mentioned, which has performed well for VMs, that leaves the NAS just as dumb storage for Veeam. Veeam is a good product that makes it very easy to backup VMs, I will probably write an article on it later. The software has a free 10 VM backup license for Homelabs.

In 2020 I was using a high percentage of the storage for backups and VMs, and was pondering upgrading. I didn’t want to throw down enough money to build a whole new system, and I liked this case a lot, so I started to look at what I could do to add to it. I was using 5 drives, but the case technically supports 7, with 2 being on the bottom. The issue was, I didn’t have enough SATA ports to add to the system. This brings me to one of the scariest, worst, best, cards I have bought. This card, adds 4 ports through a mini PCI-E connection. It actually works really well, with the drives coming up like any other, it gives you 1 PCI E Lane at roughly 2.5Gbp/s for my version. I have 2 drives of the now 7 I have in a RAIDZ2 (RAID 6), and for over a year it has worked well. The one other thing I added to the box was a 10GB networking card, I did a push a bit ago to move most of the Homelab server stuff to 10GB, and this box was part of that. TrueNAS is built on FreeBSD, and has good hardware compatibility, I got an old Intel X520 for compatibility and ease. I have seen it get near 5gbit/s, averaging closer to 2gb/s with writes.

First of all, yes the card is at a slight angle, but it works fine and is secure, so we will ignore that. I also used this time to upgrade the CPU. If you look for 7 year old CPUs on eBay, they are actually not that much money. I went from a Celeron from when I bought the system to a i5-4590. With this new CPU (and breaking a leg on the stock cooler) I ordered a new CPU cooler. That turned into an issue because they sent me the wrong version for an AMD instead of the Intel mount. You can see the very very tiny clearance that the CPU cooler has to the chipset heatsink. I also had this system in the office, since with adding disks to ZFS you need to destroy the pool and rebuild. I had to move all the data off to another system, destroy the array, then move it all back. Dynamically adding disks is always a dream ZFS has had and is always around the corner. Hopefully with OpenZFS 2.0, and the merging of the Linux and Unix code bases, we will get shiny new features like that.

Overall the system has worked well for the last 8 or so years, I have 4TB which is about 30% free still. I could probably clean it out more if I tried. I also have been using OneDrive to backup critical things like family photos, which slightly lowers my need for the system. The homelab AD has all the machines automount a chunk of storage as a shared drive, which makes normal home things and transferring files easier. I will continue to run this, and see how vSAN works for me going forward. I am a bit wary of vSAN running into issues on the consumer level gear I have, so having a whole backup of my VMs on the NAS gives me some peace of mind.

The years of using FreeNAS/TrueNAS were a good jumping off point as we recently got new Netapp Appliances at work, and I was tasked with learning them. Netapp ONTAP uses very similar concepts; instead of zVol you have FlexVol, instead of Datasets you have FlexGroups. Netapp also does some weird things like using Raid-4 or Raid-4 with added protection, instead of a traditional Raid-5/Raid-Z. If you work for a company that has a Netapp and want to learn more about it, I would push you to get the Netapp Simulator. It is a VM image that contains a virtual Netapp to play around with. It’s much better to break a virtual Netapp than a production one.

Homelab: Hypervisors – Part 2 – VMware

What I want to say is, after deciding it was time to move to VMware and attempt to use vSAN instead of Storage Spaces Direct (S2D) I wanted to research the hardware I had and see if it would work on ESXi 7.0. But of course I did not thoroughly read all of the changes vSphere 7.0 has brought. The holiday was approaching and I was going to use this time to do my migration. I had read up on vSAN and knew I needed cache drives. I bought a few small (250GB) NVME drives to put into each system. Getting those drives installed took a day because I needed to create a custom 3D printed mount. That would give me a good speed boost for my storage no matter what. Having recently upgraded to 10GB networking, I already had HP and SolarFlare 10gb networking cards. The time came and I copied all of the VMs I had in Microsoft VHDX format to my NAS (which wasn’t getting changed), then unplugged the first Hypervisor, and attempted a ESXi 7.0 Install.

One hardware change I should note, I am using USB 3.0 128GB thumb drives for the ESXi OS. This also allowed me to leave the original Windows drive untouched, allowing for easy rollback if this was a nightmare. I put the ESXi 7.0 disk into the first system AND! Error, no networking card found… I started searching online and quickly found a lot of people pointing to this article. ESXi 7.0 had cut a ton of network driver support, everything from the Realtek motherboard NIC to the 10GB SolarFlare card would not be supported, with no way around it (I tried). It comes down to 6.x had a compatibility layer in it where Linux drivers could be used if there were not native drivers, 7.0 removes this. I then got a ESXi 6.7 installer (VMware doesn’t allow you to just download older versions on a random account, but Dell still hosts their version) and installed that. Everything came online and started working. Now that I knew the one thing blocking me was that, I installed all my systems with 6.7 while I waited for the 3 new Supermicro AOC-STGN-i2S Rev 2.0 Intel 82599 2-Port 10GbE SFP+ cards I ordered. Using the Intel 82599 chipset, they have wide support. 2 Ports is nice; and, the 2.0 revision of the card is compact allowing them to fit into my cases. So far I recommend them, they also are around $50 on eBay, which is not bad.

I played with a few of the systems, but decided to wait till the new network cards were in a few days later to initialize vSAN and copy all of the data back over. I used this guide, from the same author of the other post about ESXi 7.0 changes to configure the disks in the system how I wanted them. At one point I thought I was stuck, but I just had to have VMware rescan the drives. I setup a vSphere appliance on one of the hosts. This gives me all the cluster functionality, and single webpage to manage all the hosts. Here I an also create a “Distributed Switch” which is a virtual switch template which can be applied to each of the hosts. I can set the vlans I have, and how I want them to work in one place, then deploy it to all the systems easily. This works as long as all your hosts have identical network configurations. After watching a YouTube video or two on vSAN setup I went ahead setting that up. The setup was straight forward, the drives reported healthy, and now I was ready to put some data on it.

A small flag about vSAN, it uses a lot of RAM to manage itself and track which system has what. I was seeing about 10-12 gb of ram used on each of my hosts, that has 32gb to begin with. There are guides online for this, and I believe it can be tweaked. It has to do with how large a cache drive you have, and your total storage. Not a big deal, but if you are running a full cluster, something to be aware of.

Migrating the old VMs from their Hyper-V disk images to VMware was not too difficult. I used qemu-img to convert from VHDX to VMDK. The VMDK images that qemu creates are the desktop version of the VMDK format. VMwares desktop products create slightly different disk images than the server versions. I then unloaded these VMDKs onto the vSAN and used the internal vmkfstools on ESXi Shell to convert those images to the server versions. The Windows systems realized the changes, and did a hardware reset, they worked right away. The Linux systems (mostly CentOS 8) would not boot under any of the SCSI controllers VMware had. After reading online, and a bit of guessing, I booted them with the IDE controller which appeared to be the only one dracut had modules for. Once the systems were online I could do updates, and with the new kernel version they had available they made new initrd images. These images being created on the platform with the new virtual hardware, installed the SCSI controller modules and could then be changed from IDE to SCSI mode.

So far other than the hardware changes that needed to happen, moving to VMware has worked out well. I am using a VMware Users Group license, https://www.vmug.com/, which is perfect for homelabs, and doesn’t break the bank. I am starting to experiment with some of the newer or just more advanced VMware features that I have not used before. We spoke of vSAN, I also have setup DRS (Distributed Resource Scheduler, allowing for VMs to move between hosts as resources are needed), and want to setup a key manager server to play with VM encryption and virtual TPMs.

Now that I am off of that… unsupported… Storage Spaces Direct configuration updates are much easier. I can put a host into maintenance mode, which moves any running VMs, then reboot it and once its back online, things re shuffle. This does mean I need enough space on the cluster for 1/3 of it to be off at a time, but that is ok. I am running 32gb of ram, with 2 empty DIMMS in each system, when the time comes I can inexpensively add more RAM.

If you/your work has a NetApp subscription, there is a NetApp Simulator which is a cool OVA you can deploy on VMware to learn NetApp related things. I was using that at work to learn how to do day to day management of NetApps. Another neat VM image that comes in the form of OVA I found recently is Nextcloud’s appliance. They have a single OVA that has a great flow for taking you through configuring their product.

Overall the VMware setup as been as easy as I thought it could be. Coming from a workplace who runs their management systems without a lot of access, it has been nice having vSphere 7.0. It automatically checks in online, and lets me know when there are updates for different parts of the system.

Homelab: Hypervisors – Part 1 – Hyper-V

For the last year I have been running Microsoft Hyper-V on Server 2019. Due to mounting issues I moved over to VMware vSphere, this first post will discuss my Hyper-V setup and feedback about it; then the next post will speak about my migration and new setup. When I started building out my home setup I was studying to take a Windows Server certification for work, with that and about half of the virtual machines I had at home were Windows, Hyper-V was the choice for hypervisor. One feature that stood out to me was Dynamic Memory on Hyper-V because my home setup was not that large; as well as the automatic virtual machine activation (Microsoft Doc). Later, I was attempting to run Storage Spaces Direct (S2D, SSD would be a confusing acronym so Storage Spaces Direct goes by S2D), except my setup was not supported, which made me run a… not recommended configuration… more on that soon; and I kept having issues around the Hyper-V management tools. I decided it was time to migrate from Hyper-V and S2D to VMware vSphere and vSAN.

(Please note for feedback I am discussing Windows Server 2019 here, and VMware vSphere 7.0)

Selecting a Hypervisor

I wanted to briefly go over a bit more of my thought process when selecting a Hypervisor. I already mentioned some of the reasons it made sense, but I also wanted to mention more of my thoughts. I started the search looking for a Type-1 Hypervisor awhile ago, I was going to be running on a Intel NUC with a few Windows and Linux VMs. Being a homelab, I thought I would look at free options.

Having used Proxmox years ago with a ton of issues I wanted to steer clear of it (looking back this probably was not fair, it was several years since I last used it and I believe it has gotten better); I had also used CitrixHypervisor (formerly XenServer) with many issues including a storage array killing itself randomly in one reboot. One of the requirements I gave myself was to have a real management system, I did not want to run KVM on random Linux hosts. That brought me to the 2 big ones, VMware and Microsoft. VMware has a lot of licensing around different features, but was the system I knew better. I could get a VMware Users group membership for homelabs, and that would take care of the licensing. On the other hand, with me studying for Windows Server tests, and the book speaking of different Windows Server and Hyper-V features, I thought I would give it a try. The following are things I liked about it, and then what turned me away from it.

Great Things About Hyper-V

I want to give a fair overview of my year plus running Hyper-V. There are some great features; dynamic memory allows you to run modern OSes with a upper and lower limit on memory, and then most of the time while the VMs are idling your memory footprint is very low. Another great feature is the earlier mentioned automatic activation, as long as your Hyper-V host (Windows Server 2019 Standard or Datacenter not the free Hyper-V Server) is activated, it can pass that activation to your guests and allow you to run Server 2012+. All of the services are running on Windows, out of the box you get all the benefits there; such as creating group policies for your servers and using that to do a lot of your fleet management. I recently have started using Windows Admin Center, which gives you a single view on all your Windows systems and allows you to update them all in one place. Hyper-V works well if you have a single node, and want to do basic things with it; when you move to doing clustering and advanced storage Hyper-V starts to give you a lot of issues.

Hyper-V Manager on Server 2016 (and 2019)

General Hyper-V Issues

To dive more into the woes I was having with Hyper-V, some of it is my own doing, some of it is the tools. Even before I was running S2D, I was running several Hyper-V boxes each with its own storage. I will go into my issues with S2D soon. Hyper-V’s management tools are not good. You have several options on how you will manage the systems, the first and easiest is Hyper-V Manager. This is a simple program that allows you to 1-1 manage a Hyper-V system. I mean 1-1 because if you have VMs that are part of a failover cluster, you can connect to them here to view them but that is it. Hyper-V Manager only allows you to manage VMs that live on one hypervisor with no redundancy; for casual use, it works. I use it for my primary AD host because I don’t want anything fancy going on with that box, when I need to start everything from scratch, I need AD and DNS to come up cleanly.

Maybe you have outgrown the one off server management and want to move your systems into a cluster. Now its time for Failover Cluster Manager. You add all the servers into a Failover Cluster together, and get through the checks you have to pass. Then there is a wizard to migrate your VMs from Hyper-V Manager into Failover Cluster Manager. One requirement to do this is to have storage that every box in the cluster can use, either S2D or iSCSI (you can do things like Fibre Channel, but I was not going to do that). I used the tool and the VM said all its files were moved onto shared iSCSI storage that all the machines could use. Should be good right? Things seem to be working. Then I would move certain VMs to other hosts and it would fail, just some of them. It came down to either a ISO, or one of the HDD hibernation files, or checkpoints (Microsoft version of Snapshots) being on one of the hosts, and the UI NOT mentioning this. Thus, when the VM tried to load on another system, a file it needed was not there and it could not load. Failover Manager is also fairly simplistic and doesn’t not give you a ton of tools. Again, Windows Admin Center adds some nice info on a standard cluster, but it is not fantastic; leaving you to dig through Powershell to try to manage your Failure Cluster.

On occasion the Virtual Machine Manager service that is in charge of managing the VMs, and gives the interface to monitor, modify, and access the VMs would lock up. Hyper-V Manager and Cluster Manager would show no status for the VMs, and I would have to restart the service. These minor issues would stack up over time.

To manage Hyper-V remotely (meaning from any other system) you need to setup Windows Remote Management, winrm. This system by default uses unencrypted HTTP. Encryption can be turned on with a few commands in the command line, but it creates a cert based on your hostname and IP address. If you have more than one IP, OR you are in a Failover Cluster this means you will be spending a lot of time customizing these certificates because it will just get a cert for your host, and when that node because the Failover Cluster manager, it needs that virtual IP and hostname in the cert. I had to create different certs for that virtual interface and put them on the different nodes manually, there are people in the Microsoft support forum talking about this. Here is an example incase it helps anyone of creating a cluster listener after manually creating a cluster cert.

winrm create winrm/config/listener?Address=IP:192.168.3.8+Transport=HTTPS ‘@{Hostname=”home-cluster.home.ntbl.co”;CertificateThumbprint=”BFCDE6C85A0B12426A44BC3F44236313317C63CC”;ListeningOn=”192.168.3.8″}’

There also is a System Center Manager that is another package you can purchase from Microsoft to manage Hyper-V. Having dealt with System Center to manage Windows systems at work, I did not want to touch that at all. Hyper-V has a lot of things going for it, and the underlying code running VMs works well 99% of the time. I wish Microsoft put more time to grow the tools you use for managing it. Parts of the process like setting up networking on different nodes could be much smoother, in comparison with VMware Distributed Switching. I installed one of the systems I had on Windows Server Core (no user GUI) to learn more about that. If your primary interface needs a VLAN for management, this a painful experience. You have to create the Hyper-V virtual switch and attach your management interface to it and assign the VLAN all from within Powershell. If you need to do it, this is a good resource. Thing like this, and the winrm issues, make Hyper-V feel unpolished even after being in the market for years.

S2D Issues

I wanted to put these systems into a failover cluster, allowing them to move VMs between each other as needed, except then I needed shared storage. I attempted to use iSCSI from my FreeNAS box; alas, with 7, old, spinning drives, the speed was not great with more than a few VMs. Then I thought I had some spare SATA SSDs and I could use S2D to do shared storage. For those who have not attempted to setup S2D, your hard drives have to either be NVME, or an internal HDD controller. The system will refuse to work on any configuration that it does not like. With most of my systems being small form factor PCs, and I am just using a few SATA drives I got USB 3.1 4 bay SATA enclosures. Not optimal but decent speed and it allows me to add a good number of drives to each system without a large expense of a full RAID or SAS controller.

S2D refused to work with these drives. I believe it came down to the controller the USB drives was using and it not signaling something the systems wanted. The drives showing up as Removable also made Windows refuse. There are commands you can run like below that will enable more disk types to work, but I could not get my dries to show up.

(Get-Cluster).S2DBusTypes=4294967295
Powershell command to enable all disk types in Storage Spaces Direct from this article

Then I had an idea, an evil and terrible and great idea. I created 3 VMs, one on each Hyper-V box, then gave the 3 disks of each server to the VM in full. Now I had 3 VMs, each with 3 drives (and a separate OS drive) to run S2D. To the VM OS, it looked like they had 3 SCSI HDDs that was happy to use for S2D. I put these three Windows Servers into a failover cluster together, and setup S2D. Overall setup was not too bad. If you have Windows Admin Center configured it is much easier to setup and use Storage Spaces Direct than the GUI in Windows Server. There are a ton of Powershell commands for configuring S2D and you will probably end up using a bunch of them.

This worked! The systems were in a failover cluster, of their own, and my main failover cluster that controlled the VMs could use it as shared storage. If you use Windows Admin Center you can get nice stats from the Storage cluster about the sync status of the disks. Every time one of the storage nodes reboots, the cluster needs to re-sync itself. There are different RAID levels you can set the S2D setup to, I set it to have 2 additional copies of each set of data, this means each node has a full copy of the data; this uses a lot of space but i can have 1 node run everything (which ended up being overkill).

This setup ran for a while decently, other than the small VM overhead, it was fast and worked. The issues arose when the second Tuesday of the month came around and I needed to do patching. The storage network was sitting on top of the Hypervisors, and they didn’t really understand that. I often ran into problems where I would shutdown one of the storage nodes to patch it, and patch the host, then the other 2 nodes would lock up or say all storage was lost. This would occur even when preemptively moving who was the main node, and prepping to restart. With storage dropping out from under all the VMs, they would die and need to manually be rebooted or repaired. This made me start to look for a new setup after a few of these months.

All in all, I ended up running for over a year about 5 Windows VMs and 5 Linux VMs on Hyper-V with good uptime. One benefit of Hyper-V is you get the hardware compatibility of Windows, which is vast. The big downside of Hyper-V is the tools around it. At times they seem unfinished, at other times buggy. My next post will be about the migration, and my experience with vSphere 7.0!

Resources

Good guide for Storage Spaces Direct

http://woshub.com/configure-storage-spaces-direct-s2d-windows-server-2016/

Homelab: Network 2020

As a younger person in my career I got a few Cisco certs, the study material was available to me, and I thought it would be an interesting thing to learn. At this point, I have had a CCNP for almost 10 years and I still enjoy messing with networking even if it is not my day to day job. While I historically have used Cisco a lot, there are many other brands out there these days that have good gear, some even low power enough that I can run at home and not worry about the power bill. Below is my current home setup, it has changed a lot over time, and this is more of a snapshot than a proper design document. That is what homelabs are for right? Messing around with things.

Firewall

The firewall I am running is one I have mentioned on here before. The system itself is an OLD Dell Optiplex 990, released in early 2011 and soon to have its 10th birthday! Idling at ~30 watts, it works well for what I need it to with a gen 2 i7, and 8gb of ram. I added a 4 port Intel gigabit ethernet card to it, which allows for more ports and hardware offloading of a lot of IP tasks.

I looked around at different firewall OS options. Pfsense is the obvious one, but I found its interface lacking. (I use Palo Alto Networks firewalls at work and that interface/flow is more what I am used to) Opnsense is a bit better, but still leaves something to be desired on the UI side. Then I tried a Home License of Sophos XG. It is free as long as you stay at 4 cores or less, with 6 or less GB of ram used; you are given an “evaluation license” until 2099-12-31, if it runs out I will ask for an extension. For more than a year I have been enjoying it, the interface is slick, and you get the enterprise auto patching built in. In the time I have run it, I have had 1 zero-day attack on the product and it was immediately patched without me having to login. I use it as my home firewall between vlans, a DHCP server, and I also have IPSec and SSL VPNs for when I am away from home. The system does DNS for the house (on the less secure vlans, AD does those) and allows for block lists to be used. This is like a pihole but built into the product.

There are a few things it does a little odd, but I enjoy not having to go and write weird config files on the backend of some Linux/BSD to have my firewall work. I have it hooked into AD for auth, and that way I can login with a domain admin, and allow users who have domain accounts to VPN into home. It has been VERY stable, and usually only reboots when I tell it to do an update, or that one time the ~10 year old PC blew a power supply.

Cross Room Link

At the start of the year, I was running a Ubiquiti Wi-Fi mesh at home, it got decent speeds, and allowed me to not run wires over the apartment. The access points used were these models, link. They were only 2×2 802.11AC Wave 1; got decent speeds (around 400mbps), but being in a New York City apartment, I would get interference sometimes, even on 5ghz. The interferences would cause issues when playing games or transferring files. The bigger issue was my desk with a bunch of computers, and the firewall were on different sides of this link, meaning any data that was on a different vlan had to go over and back on this Wi-Fi link. On top of that, I will mention I basically HAVE to use 5ghz, I did a site survey with one of the APs and the LOWEST used 2.4ghz band near me was 79% utilized…

Anyway I started looking around for what I would replace it with, I always thought fiber could be a way to go since its small and if I could get white jackets on it, then it would blend in with the wall. I spend a few weeks emailing and calling different vendors trying to find someone who would do a single cable run of white jacketed fiber. Keep in mind this is early 2020 with Covid starting up. Lots of places could not do orders of 1, or their website would say they could and later they would say they couldn’t and refund me. Finally I found blackbox.com, I have no affiliation with them they just did the job quickly and I appreciate that. I got a 50 meter or so run, and was able to install that with the switches below.

Switches

Now that I had the fiber I needed some small switches I could run at home. After looking at what others have on reddit and www.servethehome.com I found the Ruckus ICX 7150-C12P. A 14 1gb/s ethernet, switch with 2 1/10gb SFP+ ports. The switch is compact, fan-less, and has 150 watts of POE! I can run access points, and cameras off of it without other power supplies. I have learned to look for before buying this sort of gear off eBay to try to get the newest firmware. With Cisco and HPe they love to put it behind a wall that requires an active support contract. Not only does Ruckus NOT do that, they have firmware available for their APs that allows it to run without a controller, more on that later.

I ended up buying 1 of the Ruckus switches “used” but it came sealed in box. Then getting another one broken, after seeing some people online mention they sometimes over heat if it was somewhere without proper ventilation and that can kill their power supply. The unit is fan-less, but the tradeoff there is nothing can sit on top of it, because it needs to vent. I was able to get one for around $40, then a new power supply for $30, all in I spend $70 for a layer 3 switch with 10gb ports! Now I have these 2 units on opposite sides of the room, in a switch stack. This way they act as one and I only need to manage “one switch”.

With the Ubiquiti gear no longer acting as a Wi-Fi link, which I have written about before, I only had one of the APs running. As mentioned before the access point was only 2×2 antennas and 802.11AC Wave 1. I was pondering getting a new Wi-Fi 6 access point, while looking around someone on reddit, again, suggested looking at Ruckus access points. Their antenna design is very good, and with their “Unleashed” firmware you get similar features to running a Ubiquiti controller. After looking at the prices I had to decide if I wanted to go Ubiquiti with Wi-Fi 6, and wait for their access points to come out, or get something equally priced but more enterprise level like a 802.11AC Wave 2 access point (like a Ruckus R510 or R610 off eBay).

I recently had a bad experience with some Ubiquiti firmware, then all of a sudden they killed Ubiquiti Video with very little warning, and some the more advanced functions I would want to do are either minimally or not documented with Ubiquiti. One could argue that I am used to enterprise gear, and Ubiquiti is more “pro-sumer” than enterprise; thus, I should not be upset at the lack of enterprise features. That made me decide to try something new. I ended up getting a Ruckus R610 off eBay and loading the “Unleashed” firmware on it. I can say the speeds and coverage is much better than the older access point. It is 3×3 802.11AC Wave 2, and with most of my devices still being 802.11AC I figured that was a good call.

One feature of the Unleashed firmware is it can manage all your Ruckus hardware. The web management portal has a place to attach your switches as well, and do some management of them there. I have been scared to do this, and coming from a traditional CLI switch management background have yet to do so.

Unleashed Home Screen

I was able to POE boot the AP just like I did with Ubiquiti, converting the firmware was easy, and there are many guides on Youtube for it. The UI does not have the same polish that Ubiquiti does, but the controller is in the AP itself which is very nice. There is a mobile app, but it is fairly simplistic. The web interface allows for auto updating, and can natively connect to Active Directory making it very easy to manage authentication.

There are 3 wireless networks in the home, 1 is the main one for guests, with their 6 year old unpatched android phones, that has a legacy name and meh password, that way I don’t have to reset some smart light switches Wi-Fi settings. This is where all the IoT junk lives. There is one with a better password that connects to the same vlan I am slowly moving things in the house over to, at least the key is more secure. Then there is the X wireless network, this one is not broadcast and has 802.1X on it. When a user authenticates with their domain creds, depending on the user and device I send them to a different vlan. This is mostly used for trusted devices like our laptops, and iPad when I want to do management things. This network for my domain account allows me on the management network.

10gb/s Upgrade!

The latest upgrade I have embarked on was 10gb/s. I moved my active VM storage off of the NAS to Storage Spaces Direct for perf. While the NAS has worked well for years, the 7 – 3 TB disks do not give fantastic IOPS when different VMs are doing a lot of transactions. After lots of thought and trials I went with Storage Spaces Direct and will write about it later. The main concern was that it allows all the hypervisors to have shared storage and keep it in sync, and to do that they need good interconnects. This setup is the definition of, lab-do-not-do-in-prod, with 3 nodes each with 3 SSDs over USB 3. I knew with USB 3 my theoretical bottle neck was 5gb/s, which is much better than the 1gb/s I had, that also had to be shared with all server and other traffic.

First I had to decide how I would layout the 10gb/s network, while the ICX 7150 has 2 – 10gb/s ports, 1 is in use to go between the switches. After looking around and comparing my needs/wants/power/loudness-the-significant-other-would-put-up-with I got a MikroTik CRS309-1G-8S+IN. I wasn’t super excited to use them, since their security history is not fantastic, but I didn’t want to pay a ton or have a loud switch. I run the switch with the layer 2 firmware, and then put its management interface on a cut off vlan, that way it is very limited on what it can do.

After that I got a HP 10gb/s server cards, and tried a Solarflare S7120. Each had their ups and downs, the HPs are long and would not fit into some of the slim desktops I had. But when the would work, like in the Dells, they would work right away without issues. The Solarflare are shorter cards which is nice, but most of them ship with a firmware that will not work on some motherboards or newer operating systems. For these you need to find a system they work in, boot to Windows (perhaps an older version) then flash them with a tool off their website. After that they work great. I upgraded the 3 main hypervisors, and the NAS. I have seen the hypervisors hit 6.1gb/s when syncing Storage Spaces. With memory caching I can get over the rated disk speed.

That is the general layout of the network at this point. I am using direct attached cables for most of the systems. I did order some “genuine” Cisco 10gb/s SFP+ off Amazon for ~$20, I didn’t believe they would be real, but I had someone I know who works at Cisco look them up and they are real. Old stock shipped to Microsoft in 2012 or so, but genuine parts. The Ruckus switches and these NICs do not care which brand the SFPs are, so I figured I would get one I knew. The newer Intel NICs will not work with non Intel SFPs so look out.

To summarize, the everything comes in from my ISP to the Sophos XG box, then that connects to a port on one of the Ruckus switches. Those two Ruckus switches have a fiber link between them. Then one of the SFP+ ports on the Ruckus switch goes to a SFP+ port in the MikroTik switch. All the hypervisors hang off that MikroTik switch with SFP+ DACs. Desktops, video game consoles, and APs all attach to 1gb/s ethernet ports on the Ruckus switches. I have tried my best to label all the ports as best I can to make managing everything easier. I’m sure this will evolve more with time, but for my apartment now 10gb/s networking with a Ruckus R610 AP has been working very well.

ESXi Migration & Lenovo ThinkCentre M710s

I have started a transition from Hyper-V and Storage Spaces Direct to VMWare vSphere and vSAN. I apologize that these blog posts order is all over the place. Part of the transition is upgrading the hardware on some of the hosts I have, including getting 250GB NVME drives for vSAN cache. I started the migration with one of the desktops that run in the cluster, a Lenovo ThinkCentre M710s. After finding the small slot the NVME drive goes in, I realized there is a manufacture piece of plastic you are supposed to get to install a NVME drive. Since I do not have that, and do not want to pay for it, I spent a good bit more than a hour the first day of the migration creating this bracket and 3D printing it. Then while that was printing, I realized one of the feet on the system had gone missing, so I made a small one of those.

This post is just a quick update and a preview of more to come.

NVME Drive Holder: Lenovo ThinkCentre M710s NVME Bracket by danberk – Thingiverse

Foot: Lenovo ThinkCentre M710s Foot by danberk – Thingiverse

Booting VMware vSphere ESXi 7.0 on Certain Dell Hardware

I recently attempted to boot a Dell Precision M6800 into ESXi 7.0u1 to test some functionality before going to prod. Unfortunately this was met with “Invalid Partition Table”, switching between UEFI and BIOS boot didn’t seem to fix it giving “No boot device available” instead. After searching online I found this, https://communities.vmware.com/t5/ESXi-Discussions/quot-Invalid-Partion-Table-quot-Error-booting-ESXi-7-from-USB/m-p/1823852 which had comments such as “just dont run on a laptop” which was not very helpful.
I spent a chunk of time playing with the partitions and seeing how they were configured. I noticed when I went into the UEFI on the laptop it said it couldn’t find any file systems available, but when I loaded Windows or Linux on the system, the UEFI could see those boot partitions. I tried updating the firmware like Dell recommended, with no change. I then realized the ESXi 7.0 image is FAT16 for the EFI partition, while all other EFI partitions I have seen are FAT32.

I copied the files and folder out of the boot partition, reformatted it with FAT32 instead of FAT16, marked it as EFI type (ESP in Gparted), and moved the files back. The system booted fine the first time, with ESXi running happily. If you need boot ESXi on a Dell M6800, or M4800, or other give that a try. If this worked or didn’t work for you leave a comment below.

Homelab: Overview

I am starting a series about my homelab and how it is all laid out. I have written this article a few times, with months in between. Each time the setup changes, but we seem to be at a stable-ish point where I will start this series. Since I wrote this whole article and now a while later am editing it, I will mark with italics and underline when present me is filling in. I think it will give a neat split of growth in the last year or so I have been working on this. Or it will make it illegible, we will see. My home setup gives me a good chance to test out different operating systems and configs in a domain environment before using that tech elsewhere like at work.

Hypervisor

Starting off with virtualization technology, I settled a while ago on Microsoft Hyper-V instead of ESXi, the main reason behind it is I already had Windows Server, and Hyper-V allows for Dynamic memory, and allocating a range of memory for a VM. When something like an AD controller is idling, it doesn’t need much memory; when it starts it may, Dynamic memory allows me to take that into account. I will say one place that has bit me later is file storage, but that will be a later post.

The setup is technically “router on a stick”, where the Sophos XG firewall functions as the router, and the rest of the devices hang off of that. The Sophos XG machine is a old Dell Optiplex 990 (almost 10 years old!) with an Intel quad NIC in it. That way it can do hardware offloading for most of the traffic. I intend to do posts for networking, hypervisors, file storage, domain, and more; thus I will not get too in the weeds right now on the particulars.

The file storage is a FreeNAS box recently updated to 7, 3TB HDDS. I have had this box for about over 6 years (I just looked it up in November 2020, one of the drives has 55257 hours or 6.3 years of run time on it); it is older but has worked well for me so far.

The network backbone is a new switch I really like that I was able to get 2 of off eBay; they were broken but I was able to repair them, more on that later as well. They are Brocade, now Ruckus, ICX7150-12P; 12 1GB/s POE ports, 2 additional 1GB/s uplink port, and 2, 1/10GB/s SFP/SFP+ ports. These switches can run at layer 3, but I have the layer 2 firmware on them currently. They have a fiber connection between them, before that I was using 2 Unifi APs in a bridge, that didn’t work fantastic however because A. I am in NY, B. they were only 2×2 802.11AC Wave 1, and C. I am in NY. I custom ordered (so the significant other would not get mad) a white 50m fiber cable to go around the wall of the apartment.

With SSDs in the hypervisor boxes (I call them HV# for short) and iSCSI storage for VMs as well, which VMs are on which host doesn’t particularly matter. Flash forward 6 months or so, since that first sentence was written, I now still use the NAS for backups, but the hypervisors are running Storage Spaces Directed and doing shared storage now. This allows the hypervisors to move move VMs around during patching or pause during a system update if they are less critical. The Intel NUC and small Dell Inspiron are much under powered compared to the mid tower hypervisors, so they run usually only 1 or 2 things. The NUC runs the primary older domain controller, and that is it. It is an older NUC that I got about 7 years ago, so its not that fast. The “servers” in the hypervisor failover cluster are a Lenovo and 2 Dell Optiplex 5050s. I like these Dells because they go for about $200 on eBay, while having a Intel 7600 i5, can support 64GB of ram, and have expansion slots for things like 10gb SFP+ cards. These machines also idle at about 30 watts, which makes the power bill more reasonable.

Some of the services I run include:

  • 2 Domain Controllers (Server 2016, and 2019)
    • Including Routing and Access service for RADIUS and 802.1x on wifi on wired
  • Windows Admin Center Server (Windows Server 2019)
  • Windows Bastion (This box does Windows Management) (Server 2019)
  • Veeam Server (Server 2019)
  • Unifi Controller/Unifi Video for security camera (Ubuntu)
  • 3 Elastic Search boxes for ELK (CentOS 8)
  • Linux Bastion (CentOS 8)
  • Foreman Server (CentOS 8)
  • LibreNMS (This I grew to really like) (CentOS 8)
  • Nessus Server (CentOS 8)
  • Jira Server (CentOS 8)

That is the general overview, I will spend the next while diving into each bit and discussing how it is configured and what I learned in doing that.

Redhat/CentOS 7-8 PKI/CAC/Smart Card SSH Login with Active Directory and SSSD

I was experimenting with integrating CentOS with my home Active Directory (AD) cluster. I wanted centralized user management, and for a stretch goal, get PKI login working for Smart Card auth. I have used winbind before to connect CentOS 6 to Active Directory, that configuration before was a bit annoying. These days with CentOS/RHEL 7 and 8 we have SSSD, which is more straight forward. For all the following tests I used Putty-CAC (link), a Windows app that allows GSSAPI, and Smart Card auth.

SSSD Config

I will start off with my experience, then follow up with a how to; for this article I already have AD configured to support Smart Card auth, and have stored the Smart Card public key for my user. I will follow up with an article about that configuration. Active Directory integration is straight forward and easy. One setting you can enable is: hiding the domain names from the username, this allows the users to feel native to the system. Using users and groups are easy; I made a group to which I gave sudo access. When using Smart Cards you will need to put NOPASSWD in the sudo entry for that group, because the Smart Card users usually do not have passwords, usually… You can use Smart Card auth with Active Directory AND a password as long as you do not set “Smart card is required for interactive logon”. If you do check that box, AD sets a random password on the backend for that user.

After setup, with this config we store the authorized_keys in AD under the attribute altSecurityIdentities. The main tool to debug Smart Card auth is the tool sss_ssh_authorizedkeys, this allows you to have the system attempt to pull their ssh key on demand. A big warning about SSSD, it loves to cache information. If you attempt to run that command, and then make changes to your sssd.conf or AD, and re-run sss_ssh_authorizedkeys, it will fail because it is caching the failed lookup from before. My recommended command as root between tests where it may be caching is:

systemctl stop sssd && rm -rf /var/lib/sss/db/* && rm -rf /var/lib/sss/mc/* && systemctl start sssd

SSSD Config

1. Setup hostnamectl (make sure your host knows what its name is supposed to be) and dns, for SSSD to work well you need the system to be able to find itself in DNS, you can set up SSSD to auto register with dynamic DNS (more on that later)
2. Install Packages
     - Ubuntu
       apt -y install realmd sssd sssd-tools libnss-sss libpam-sss adcli samba-common-bin oddjob oddjob-mkhomedir packagekit    
     - CentOS
       sudo yum install realmd sssd oddjob oddjob-mkhomedir adcli samba-common samba-common-tools krb5-workstation       

At this point running “# realm discover your_domain_fqdn” will list out services your domain needs for users to login. Usually the main program you need to enable is oddjobd which will create home directories when users login. Note, for these examples I find it easier to have a domain in them than the subsistute it, I will use my home test domain “home.ntbl.co” here.

3. systemctl enable oddjobd
4. systemctl start oddjobd
5. realm join -U admin_user_on_domain home.ntbl.co
6. vim /etc/sudoers.d/winadmins
Add the line “%domain\ admins@home.ntbl.co ALL=(ALL) ALL“, where “domain admins” is a group I have in AD, and “home.ntbl.co” is my domain. This setup does not support Smart Card login with sudo, since you need NOPASSWD for that sudo login. Example "%domain\ admins@home.ntbl.co ALL=(ALL) NOPASSWD:ALL". You can create a sub sudo file like I did here, or visudo to edit sudo and have it syntax checked.


7. Below is my /etc/sssd/sssd.conf without Smart Card auth setup.

 [sssd]
 domains = home.ntbl.co
 config_file_version = 2
 services = nss, pam
  
 [domain/home.ntbl.co]
 ad_domain = home.ntbl.co
 krb5_realm = HOME.NTBL.CO
 realmd_tags = manages-system joined-with-adcli
 cache_credentials = True
 id_provider = ad
 krb5_store_password_if_offline = True
 default_shell = /bin/bash
 ldap_id_mapping = True
 use_fully_qualified_names = false
 fallback_homedir = /home/%u@%d
 access_provider = ad
  
 dyndns_update = true
 dyndns_refresh_interval = 43200
 dyndns_update_ptr = true
 dyndns_ttl = 3600 

Adding “use_fully_qualified_names” changes your username from “dan@home.ntbl.co” to “dan”. Not a requirement, but a nice, quality of life setting. The bottom adds dynamic dns, which will push your IP to AD DNS. Windows does dynamic DNS updates by default, and unless the systems are statically assigned, or even if they are, this can be a nice feature. Now "systemctl stop sssd" and “systemctl start sssd”, then you should be able to login with your AD account.

GSSAPI

Before getting into Smart Card auth, I wanted to briefly mention GSSAPI. This is a method to do auth between systems. It allows Windows clients to one click login to SSH by passing an auth token from your Windows session right to SSH. If you setup SSSD, enable GSSAPIAuthentication in /etc/ssh/sshd_config then you can use an app like Putty-CAC to SSH with GSSAPI. I have found this usually works with SSSD by just setting GSSAPI to yes. If you just want to admin Linux from AD, and have no other requirements I would suggest you look into this for your environment because it is so easy. If yo are going to follow the rest of the guide, make sure to turn GSSAPI back off, or it will log you in automatically and you may think its Smart Card auth working; that fooled me for a few minutes.

Smart Card Auth

For all of my tests, I used the following Smart Card, Amazon link. I think these other cards would work as well, and they are cheaper; but I have not personally tried them. Amazon link. I may write an article later about setting up these cards, if you are interested write a comment below.

Add Certs to AD

You need the Smart Card’s public key data in SSH authorized_keys format. This guide will show you how to get that string from Putty CAC. You have to enjoy when a .gov site tells you to go to user NoMoreFood and get security software, the open source world is great.

In Active Directory, go to Active Directory Users and Computers, turn on Advanced Features, by going to the View menu, and enabling Advanced Features. Then select the user you want to add ssh keys for, and select the “Attribute Editor” tab. You will find an entry at the top called “altSecurityIdentities”, add the line that would usually be in ~/.ssh/authorized_keys there, it should look like “ssh-rsa key_stuff”.

Configuring SSSD for Cert Auth

To add Smart Card auth to SSSD, just add the following to your sssd.conf, merge the sections with the ones from above.

[sssd]
services = nss, pam, ssh, sudo

[pam]
pam_cert_auth = True

[domain/home.ntbl.co]
enumerate = True
ldap_user_extra_attrs = altSecurityIdentities:altSecurityIdentities
ldap_user_ssh_public_key = altSecurityIdentities
ldap_use_tokengroups = True

Now restart sssd. If you run "sss_ssh_authorizedkeys dan" with dan replaced with your name, then you SHOULD get a key back if everything is setup correctly. If you do not get a key back, use the command below to reset sssd and reload. If you still do not get a key then you will need to edit settings in sssd.conf, and continue to tweak:

systemctl stop sssd && rm -rf /var/lib/sss/db/* && rm -rf /var/lib/sss/mc/* && systemctl start sssd

I will say this does seem to take some trial and error. /var/log/sssd/ has some good logs that can help point you in the correct direction if you are running into issues. One quick note I will make, you may see people online say “use the command ‘sss_ssh_authorizedkeys -debug 4 home.ntblc.o’ to debug the command.” This command does not have a debug throw, that that does is uses the -d argument which is domain, then tries to parse the rest. You end up with key lookup attempts on domain “ebug” for user 4. Sadly sss_ssh_authorizedkeys is not very verbose, debugging it is a bit of a pain; do not listen to people who mention the above debug command, at least on CentOS/Rhel 7 and 8 it does not work.

As long as you are getting a key back from the above command, then you can wire it into SSH. Edit /etc/ssh/sshd_config with the following, note some sites say AuthorizedKeysCommandUser should be root, some say it should be nobody. I error on the side of lesser permissions and set it to:

 AuthorizedKeysCommand /usr/bin/sss_ssh_authorizedkeys
 AuthorizedKeysCommandUser nobody

Hope something here has helped someone, feel free to drop a comment.

iOS/macOS On-Demand IPsec VPN with Sophos XG

Having a small home lab I wanted to be able to setup internal services, and then on the go be able to access them. While I could setup a L2TP or SSL VPN and connect whenever I wanted to use these services, I thought I would give On-Demand VPN via a iOS/macOS configuration a try. Little did I know the world of hurt I was entering. I will start with the settings you need to get it working, since a lot of people just want that. Then I will talk about the crazy and painful road I went down before finding 1, just 1, set of settings that seem to work. If you have any questions, thoughts, or success stories please comment below!

Fun fact: I will be calling the protocol IPsec here. That is what the original RFC called it, what the original working group was called, and the capitalization they used. Sophos agrees and uses that capitalization, while Cisco and depending on which web page you are on for Microsoft may call it IPSEC or IPSec or IPsec.

On-Demand VPN gives you the ability to set certain websites or IPs, and when your phone or laptop attempts to connect, the machine silently brings a IPsec tunnel online and uses it for that traffic. This allows you to run services at home, and to users (your mom or cat or whomever) it looks like just another website. Apple has 1 big requirement for them, you have to use certificate based auth. You can not use a pre-shared key/password. Also up front, to save you a few days of trying things. iOS and macOS will NOT check your certificate store for your VPN endpoint (Sophos XG) certificate, it HAS to ship with the firmware or you will get the fantastic and descriptive “Could not validate the server certificate.” Also believe it or not, that is one of the most descriptive errors you will get here. There are some posts on the Apple support forums from Apple engineers saying the root CA has to be in already on the device. If anyone gets it to work with your own let me know.

Sophos XG Setup

I am using Sophos XG v18 with a Home license, backed by AD running on a Dell Optiplex for this guide (dont worry it as a cool Intel Nic in it). To setup the IPsec server in Sophos XG first we need to make 2 certificates. Login to the admin portal, then on the bottom left select “Certificates”. You need 2 certificates; 1 is our “local certificate” (we will call it Cert-A) this is a cert that is used for the server (Sophos) end. As previously mentioned, this has to be a real signed cert. I ended up forwarding a subdomain on my site to the firewall, and then using Let’s Encrypt to create a cert for that URL. I used this site, https://hometechhacker.com/letsencrypt-certificate-dns-verification-noip/ to guide me in creating the cert on my laptop, then I uploaded that to the Sophos firewall. This will require you to have access to your domains DNS settings or be able to host a web file.

The second cert (Cert-B) is for the client, Sophos will call it “The Remote Cert”; this is to auth to the firewall, that can just be a locally generated cert. All devices will share this cert. The devices will use their username and password combination to identify the user. I used email as the cert ID, note this email does not have to exist, I just made one up on my domain so I will know what this cert is. Once created, go back to the main Certificates page and download the client/remote certificate, I suggest putting an encryption password on it since the Apple tools seem to freak out if that is missing. But ALSO the password for this cert will be in clear text in your config, so don’t make it a password you care about. These certs all need to be rotated at least once a year, with the newer requirements; Let’s Encrypt is every 90 days and I intend on automating that on one of the Linux machines I have.

Self-signed client end cert

Now that we have our 2 certificates, lets go over to “VPN” on the left hand navigation. I have tried many settings in the main “IPsec Connections,” and none of them have worked for me. I get fun and generic errors from the Mac of “received IKE message with invalid SPI (759004) from other side” or “PeerInvalidSyntax: Failed to process IKE SA Init packet (connect)”.

Click the “Sophos Connect Client” tab, the back end of this client is just a well setup IPsec connection. Fill in the form, from the external interface you want to use, to selecting “Digital certificate” as your auth method, followed by the “Local certificate” which is the Let’s Encrypt one (Cert-A). “Remote certificate” is the one we will load on your device (Cert-B).

Now you select which users you want to have access to use this. I have Active Directory backing my system, so I can select the AD users who have logged in before to the User Portal. This is a trick to Sophos XG you may need, if you use AD and a user doesn’t show up, that means they need to login to the User Portal first.

Select an IP range to give these clients, I suggest something outside any of your normal ranges, then you can set the firewall rules and know no other systems are getting caught in them. Once you are happy, or fill in other settings you want like DNS servers, click “Apply”. After a second it will activate, you can download the Windows and Mac client here, or follow along to make a profile.

Apple Configuration

To create a configuration file you need to download Apple Configurator 2, https://apps.apple.com/us/app/apple-configurator-2/id1037126344 onto a Mac. I know what you are thinking, 2.1 Stars, Apple must love enterprises. Download that from the store and open it up. If you do not have a Mac I attached a template that you can edit as a text document down below. This profile needs a Name, as well as an identifier. The identifier is used to track this config uniquely, if you update the profile, then your device will override old configs instead of merging. You will see on the left there are LOTS of options you can set, the only 2 week need are “Certificates” and “VPN”.

Starting with Certificates, click into that section, then hit the Plus in the top right. Upload the cert we exported from Sophos (Cert-B) earlier for the end device, and enter the password for it. Again note, this password is in plain text in the config file.

Now for the VPN Section. Click the Plus in the top right again to make a new profile, name the connection anything. Set the Connection Type to “IPsec”. IKEv2 is IPsec but a newer version, I will get into some of this later after our config is done and I can rant. Server is your Sophos XG URL. Account and password can be entered here to ease setup, or you can leave one or both blank to make the user enter it when they import the config. You can leave the user/password fields blank (it will give you a yellow triangle but that is fine) and then give it out widely and not have your creds in it… For “Machine Authentication” you want “Certificate”; you will see in selecting “Certificate” all of a sudden the On-Demand area appears. For “Identity Certificate” select the one we uploaded before. Finally we can enable “Enable VPN On Demand” and select the IPs or URLs you want to trigger the VPN.

Once that is done, save the profile and open it on a Mac or you can use this configuration tool to upload it to an iOS device. That should be it! Your devices should be able to start the connection if you ask it, and if you go to the website should auto vpn. Make sure you have firewall rules in Sophos XG for this new IP range, or that can block you from being able to access things.

A small note, from my tinkerings with the On Demand profile if you go to Safari on a iOS device, it will connect when you visit a website that is in the configuration. If you use a random app, such as an SSH application, I didn’t find it always bringing the tunnel up, and at times it had to manually be started. Something to lookout for, a nice part of the the IPsec tunnel is that it starts quickly.

Now that the config is done, I want to mention some of the other things I have learned in tinkering with this for several days. The only way I got it to work is using that Sophos Connect area, and the other big not documented thing is you have to use a publicly trusted cert for the Sophos end. I found 1 Apple engineer mention this on their forum, and a TON of people talking about how they couldn’t get the tunnel to work with their private CA. I have tried uploading a CA, and injecting it different places with different privileges for the Mac and never could get it to work. The Let’s Encrypt cert imminently worked.

For IPsec v1, aka IKEv1, Apple uses the BSD program racoon on the backend to manage the connection. Using the “Console” app you can find the logs of this. For IKEv2 it seems Apple wrote their own client around 2016-2018, there are a lot of reports online that it just doesnt work at all with cert based auth. All the guides about it working stop around 2016. You can find earlier ones, or people using pre-shared keys, but selecting pre-shared keys doesnt allow us to do a On Demand VPN. The bug has been reported for a while, https://github.com/lionheart/openradar-mirror/issues/6082. If you try to do this, you can expect A LOT of “An unexpected error has occurred” from the VPN client. Even looking at the Wireshark traffic didn’t lend any help on tuning Sophos to give the IKEv2 client something it would accept. If someone figures out how to get that to work in this setup please let me now.

Now that everything is setup you can host things yourself. I give the auto connecting VPN less rights than when I do a full tunnel on my laptop, but it allows for things like Jira to be hosted, then mobile clients to easily connect.

Template

For your cert to work in the template it needs to be converted. Sophos will give you a .p12 file for your cert, use the following command to get the version that needs to be in the .mobileconfig file. You’ll at minimum want to edit the cert area and put yours in there, set the password for the cert, and any URLs you need.

openssl enc -a -in user.p12 -out user.enc