Software

Missing Email Alerts from LibreNMS

I realized recently that I haven’t gotten any alerts from LibreNMS recently, including when I rebooted devices for patching. After going to the “Alert Transport”, and attempting to send a message I got “SNMP Error: Could not authenticate.” Others seem to recently get this as well. (Link)

Turns out after May 31st (although for me it seems more like June 6th, 2022) Google disabled simple password logins for Gmail accounts. You need to enable two factor auth, then enable an app specific password for LibreNMS. This was a good quick guide on how to do that. With LibreNMS sending alerts when something is wrong, but not having a alert that it is working, it may be worth going and checking if you use LibreNMS and Gmail.

Computer Vision for Datacenter Auditing

I am going to start a series of posts of random ideas I have had but not had time to fully implement. The first in this series is a idea I have worked on about ~3 years ago (November 2019)  for being able to audit a datacenter as well as map systems physicals location to their logical one in a network.

The core of the idea is to use cameras in a datacenter to see servers in the rack, these could be security cameras, then use that data to map out the datacenter and save the administrators time from having to manually preform these actions. The process begins by training a machine vision learning model on what a server looks like. Most of the time at work I am working with Dell servers so I thought that was a good starting point. To make the model generic enough I was just attempting to train it on what a 1U server looks like vs what a 2U server looks like.

At this point I needed A LOT of photos of servers with different lighting and angles. I took a bunch myself as a seed set of different racks I had, then I turned to the web. Where could I get a large assortment of photos of Dell servers in different configurations and lightings? The homelab section on reddit! People all the time post their setups at home and what they have. I went through and downloaded several hundreds photos of different peoples setups. Another place to get photos was from eBay where a lot of sellers put up photos of servers in different settings; the downside is that a lot of people reuse the same photos again and again. I don’t know if the internet has yet to figure out what the copyright rules of using photos from online to train a model.

I researched a bunch of different techniques and was playing around with OpenCV, but then found a tutorial that seemed to be in line with what I was looking to do. (This one is also good, and very similar material) I researched different image processing models, and played around with several.

Now that I had the photos, I downloaded GitHub – tzutalin/labelImg: 🖍️ LabelImg is a graphical image annotation tool and label object bounding boxes in images. This is a tool where you go to each photo, select the item you are trying to learn, and label it. This took a while, its fully manual work. A lot of the photos from the web had multiple servers in the photo, and each one would need to be selected. This proved to be one of the more time consuming parts of the project. I had to manipulate the photos to allow the rectangular bounding boxes to be able to fit the servers, even when the photos are at weird angles.

I had to pick some of the photos to be the training set, than other photos to be the testing set. With that metadata ready and everything marked, I converted the final metadata from XML to CSV, using xml_to_csv.py provided at the above example repos. That was then fed into Tensorflow. The system I had to start with for this other than a laptop was a CentOS 7 server, this proved to be very annoying because some dependencies such as protobuf were not available at new enough versions and had to be custom compiled.

It was time to let the model run for a while and see what it could learn. Several important things were learnt in this process. First, if you have GPUs makes sure you have a Tensorflow that is compiled and ready to use them. The speed you speed you get with and without them is kind of crazy. Also, more RAM and GPU helps a lot speed up the process. At first I was playing with this on just a laptop, and that one didn’t have the GPU drivers for CUDA. This was taking DAYS to work on the model. Later I switched to using GPUs I had in a server, and this greatly increased the iteration cycle speed.

Off the bat it was able to get a decent percentage recognition of the servers in the photos I had presented it! I do think a lot of the photos I then tested it on were fairly ideal conditions, with good lighting and camera angle. This may give a better than real world experience with it working. To improve the model I can always find more photos and train it with more images. I was able to get the model to recognize about 80% of the servers in racks I showed it at this time. Another factor that could help in the future is the evolution of cameras. A lot of places are replacing 720p/1080P cameras with 4k cameras, the more resolution the system has to work with the better.

The next step I wanted to do was start matching physical location to logical. The idea behind this is, I can find regions in a photo or video where servers are, and each server through its iDrac/IPMI allows me to blink front chassis lights. So one host at a time I will have automation send the command to blink the front chassis lights, and perhaps some lights on the HDDs, then scan for which region in the photo has started to blink!

This is the idea I have slowly worked on for the last little while, I have prototypes of most of it working, but have not had a lot of time to put into it.  The hope would be we could use existing cameras to get the footage we need to map existing datacenters we have. Then perhaps in the future port this system to something like Hololens, or Apple/Meta AR system. Once we have that mapping, now we can start to draw out the physical servers and their location in the world/racks on a webpage, and make it easier for people working in a datacenter to find boxes they need. Hopefully one day allowing for people to click a server on a webpage, and then connect into its controller without a human painstakingly going to each box and doing this mapping. Of course all of this is fixed by a team labeling each server, but where is the fun in that.

CentOS 8 Migration

I have a pipeline which creates live images to network boot different systems. Historically this has been based on CentOS. A little while ago I moved it to CentOS 8 because I had some newer hardware that was not supported on the older kernel of 7. Everything was working well until recently when CentOS 8 went end of life, and I could no longer rely on the CentOS 8 Docker containers.

The journey began for a new EL8 system. I wanted to keep on EL8 instead of switching to Streams because all the other systems I had running were EL8 (CentOS 8 or RHEL8), and I wanted to keep compatibility. At the same time, I didn’t want to do a new build of the image, have things break, and not realize it was because of a CentOS Streams change upstream. I also used the CentOS 8 docker container which seems to have been pulled, so that forced me to do this change now.

My first thought was Oracle Linux. It has been around for a while, is ALMOST drop in compatible, and can be used without going and getting licenses (RHEL). (There are some small silly things like instead of “epel-release” the package is “oracle-epel-release-el8”) This lead to nothing but issues. I replaced all the repos I had in the image creation stage with Oracle Linux ones, then every build I got a ton of “nothing provides module(platform:el8)” lines for any package that used yum/dnf modules. I spent a chunk of time on this, finding no real answers, and one Oracle support page that looked like it could help saying I needed to buy a support contract. Classic Oracle. At one point I thought it had something to do with Commit – rpms/centos-release – 89457ca3bf36c7c29d47c5d573a819dd7ee054fe – CentOS Git server where a line in os-release confuses dnf, but then that line was there. Also Oracle doesn’t seem to have a kickstart url repo, which is needed to do this sort of network boot. They wanted the end user to set that repo up, which may be the source of my issues. This also touched on the issue Disable Modular Filtering in Kickstart Repos – Red Hat Customer Portal, but I wasn’t even getting to a base OS setup, then I could make changes to the os and dnf for how it processes modules.

In my searches I did find this nice script to get bash variables for OS and version. https://unix.stackexchange.com/a/6348

Then I figured I would try either AlmaLinux or Rocky Linux. They both came out around when Redhat said Cent 8 was going away. Looking into both projects, they both are backed by AWS and Equinix who are big players, which made me feel a bit better about it. I had heard a bit more about Rocky and its support, so I tried that. I dropped in the new repos, and kickstart location, and everything just worked… Even things that were a issue when playing with Oracle Linux went away. For example, epel-release was once again called what it should be.

In the end so far it seems to be happy! We will see if any other small differences pop up and bite me…

Below is an example of the top of the kickstart I am using, if anyone is interested in more of how I create live images, leave a comment and I can do a post on it:

lang en_US.UTF-8
keyboard us
timezone Europe/Brussels --isUtc
auth --useshadow --enablemd5
selinux --disable
network --device=eno1 --bootproto=dhcp
skipx
part / --size 4096 --fstype ext4
part /opt --size 4096 --fstype ext4
firewall --disabled

url --url=https://download.rockylinux.org/pub/rocky/8/BaseOS/x86_64/kickstart/

# Root password
rootpw --iscrypted <Insert encrypted password here>

repo --name=baseos --baseurl=https://download.rockylinux.org/pub/rocky/8/BaseOS/x86_64/os/ --install
repo --name=extras --baseurl=https://download.rockylinux.org/pub/rocky/8/extras/x86_64/os/ --install
repo --name=appstream --baseurl=https://download.rockylinux.org/pub/rocky/8/AppStream/x86_64/os/ --install

Migrating Chrome Plugins from Manifest v2 to v3 Impressions

Google has recently decided that soon everyone will need to migrate from Manifest v2 of Chrome Plug-ins to Manifest v3. The one big change for me other than some of the new syntax changes, was you can no longer inject scripts into webpages. A lot of the changes for Manifest v3 are around the security context for plugins, which is good to see. In the past I could append to a webpages <script> data, and then have the page process that script in the pages context, now all that processing has to take place within the plugin itself, instead of on the page. You can still add to pages, but it has to be more static content, instead of dynamic.

One change that creates for you is which browser context you are working in. If you are on the page, you can directly hit all aspects of the page, and do AJAX requests under the users context. Now, any scripting you want done has to be done in the plugin itself, and if you want to access a non-public asset, the plugin requires the user to login itself. If you attempt to inject scripts onto the page you will get a CORS error, stating its from a different context..

For the main plugin I dabble with and work on, the API I access is open. This allows me to not worry about the context I am working in the browser too much. If it was an authed API, I would have to worry about having the user auth to the plugin itself. I moved all the logic from a split context of the plugin doing some of the work, then handing high level data to scripts injected into the webpages; to doing all the work in the plugin, then injecting final results (HTML data), and assets I want to change onto the page. In the end, this leads to a cleaner solution, and centralizes all the logic.

A big added benefit I saw to switching from Manifest v2 to v3 was in the security review process that is done when you upload an updated plugin, you get approved faster than in the past. For me, I got my new plugin approved in around a day (note the plugin I was working on is relatively small).

Hardening Embedded Apache Tomcat 9

I recently was working to make sure some of my web apps can pass a Tenable Nessus security scan. Since I tend to use the same embedded Tomcat for a lot of the apps I kept hitting similar findings. I had to do a bit of digging to find some of these answers so I thought I would document them. If anyone else has any helpful tips for embedded Tomcat please feel free to comment!

Apache Tomcat Default Files

Apache Tomcat Default Files | Tenable®

The main issue with this finding is that the 404 page the app presents has the Tomcat version number. This could be a issue because if there is a vuln in that version, you can be targeted.

final Tomcat tomcat = new Tomcat();

var host = (StandardHost) tomcat.getHost(); 
var errorReportValve = new org.apache.catalina.valves.ErrorReportValve();
errorReportValve.setShowReport(false); 
errorReportValve.setShowServerInfo(false); 
host.addValve(errorReportValve);

errorReportValve.setProperty(“errorCode.0”, “empty.html”);

The above line can be used if you want to specify a 404 page to use instead.

Source: https://stackoverflow.com/a/59967152

Web Application Potentially Vulnerable to Clickjacking

Web Application Potentially Vulnerable to Clickjacking | Tenable®

This finding is because the application is not sending the proper X-Frame-Options or Content-Security-Policy headers.

final Tomcat tomcat = new Tomcat();

final Context ctx = tomcat.addContext("/", MY_FILE_LOC);

FilterDef httpHeaderSecurityFilter = new FilterDef();
httpHeaderSecurityFilter.setFilterName("httpHeaderSecurity");
httpHeaderSecurityFilter.setFilterClass("org.apache.catalina.filters.HttpHeaderSecurityFilter");
httpHeaderSecurityFilter.addInitParameter("antiClickJackingEnabled", String.valueOf(Boolean.TRUE)); 
httpHeaderSecurityFilter.addInitParameter("antiClickJackingOption", "DENY");
httpHeaderSecurityFilter.addInitParameter("xssProtectionEnabled", String.valueOf(Boolean.TRUE));
httpHeaderSecurityFilter.addInitParameter("blockContentTypeSniffingEnabled", String.valueOf(Boolean.TRUE));
httpHeaderSecurityFilter.setAsyncSupported(String.valueOf(Boolean.TRUE));

FilterMap httpHeaderSecurityFilterMap = new FilterMap();
httpHeaderSecurityFilterMap.setFilterName("httpHeaderSecurity");
httpHeaderSecurityFilterMap.addURLPattern("/*");
httpHeaderSecurityFilterMap.setDispatcher("REQUEST");

ctx.addFilterDef(httpHeaderSecurityFilter);
ctx.addFilterMap(httpHeaderSecurityFilterMap);

Source: https://github.com/jiaguangzhao/base/blob/905aaf4111f4779e236043ff423951672ade848a/src/main/java/com/example/base/aop/configure/TomcatConfigure.java

CentOS/Rhel 8 Auto login Fix

I have a PXE environment that requires systems to boot up, then automatically login and start a program on boot. All of a sudden this stopped working after years of working. It took me a while to figure it out so figured I would post in case anyone else ran into this.

I have been doing auto login the recommended systemd for a while, as shown: https://wiki.archlinux.org/title/Getty. I copied /lib/systemd/system/getty@.service into /etc/systemd/system/getty@tty1.service. Then with a script edited it using sed in the build pipeline. In the end the line was:

ExecStart=-/usr/bin/agetty --noclear %I $TERM --autologin username

This worked for YEARS, then suddenly stopped. In investigating, I saw another file was being written next to mine at /etc/systemd/system/getty@tty1.servicee ; with another e added to the end of service, making it servicee. After a lot of playing around with it and looking at other guides I figured out, there was a update to systemd/getty and now it cares that all options are before the terminal variable is presented. Changing that line to the following fixed it.

ExecStart=-/usr/bin/agetty --noclear --autologin username %I $TERM 

Homelab: 802.1x 2021

One technology I have played around with a little at work but wanted to get a better handle on is 802.1x. I have taken and passed the Cisco ISE cert a few years back and have used that with other services at work, but for the home setup I mostly wanted to be able to put different wireless devices onto different vlans based on device and user. Windows Server natively makes this possible with Network Policy Server (NPS).

An example of me playing with Network Policy Server

NPS is a Radius server in Windows at the end of the day. It gives you the conditions and a rules system to respond to different Radius calls, as well as a way to setup Accounting. It is fairly simple compared to something like ISE that can also do Posture and Profiling for devices; but for a quick free solution works well for home. You can say of a client is attempting to authenticate over a system like wireless, then accept x methods, vs if its wired or a switch login then accept other forms. Instead of going point by point of how to set it up, which you can find elsewhere online, I want to give some high level edge cases you may run into. First NPS need Windows Server with the Desktop experience, if you are running member servers or domain controllers as Server Core to simplify the environment then it will not work. NPS also does not easily HA. You can run multiple servers with it running, and export the config from one, then import it in another, but there is not a good system for dynamically syncing these (less you call random peoples PowerShell scripts a good system).

Some good reasons to use NPS is the simple AD integration, you can have users use their domain auth and easily get access. Or do as I do, really too much for home, or possibly anywhere, setup a domain CA, have a GPO that creates certs for each machine, then use cert based auth via 802.1x deployed via GPO. If anyone has questions about this I am happy to answer, but there are many places online that will talk about each of those configs and how to do them. Another place to integrate Radius other than 802.1x for wired and wireless is network device login. I use Radius for the stack of Ruckus switches (2 is considered a stack (like when you run k3s as a “cluster” of 1)) I have at home.

This is one of those Windows Services that works well, but also has not been touched in YEARS by Microsoft; like WSUS, or any other service that is useful. To backup this point, I installed several old versions of Windows Server in ESXi that I had laying around. Lesson 1 that I learned, the web console doesn’t work well with some of the legacy mouse support systems. Second you may need legacy VMware tools iso VMware Tools support for Windows 2000, Windows XP, and Windows Server 2003 (81466). The internet seems to say it came out in Server 2008.

https://social.microsoft.com/Forums/getfile/51145/

Converting .heic on Windows With Open Source Tools and a Context Menu Shortcut

While taking photos and uploading them places, like this blog, I get the photos in .heic format from the iPhone, then need to convert them into JPEG for WordPress. There are a few paid, and some questionable freeware out there to do it, but I wanted to use open source tools. ImageMagick is an open source tool that can do the conversion, but that requires the command line, so I found the registry keys needed to add a right click context menu to convert the images!

This context menu only shows up when selecting a .heic file as well, which is a nice way to do it. How to install:

  • Install ImageMagick (link), the version I got was “ImageMagick-7.0.11-4-Q16-HDRI-x64-dll.exe”
  • Copy the following lines into a text document
Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\SystemFileAssociations\.heic\Shell\convertojpeg]
@="Convert To JPEG"

[HKEY_CLASSES_ROOT\SystemFileAssociations\.heic\Shell\convertojpeg\command]
@="\"C:\\Windows\\System32\\cmd.exe\" /C magick.exe mogrify -verbose -format jpg \"%1\""

  • Name it install_imagemagick.reg (or really anything.reg)
  • Open that file in file explorer

After install you should be able to right click a .heic photo and “Convert To JPEG”. I did not need to restart/logout/restart explorer. I am calling cmd.exe first instead of the program directly, because this allows you to easily update ImageMagick and not need to directly link to the file.

Using a Custom User-Agent with Google OAuth Client in Java

I have been using the Google OAuth for some of my projects at work for a while. A recent request was to add custom user-agent strings to different apps for the people doing analytics on which apps are using the authentication servers. I have some functions that do custom HTTP Get calls using the Bearer token we get from the OAuth flow, then the library also does its own calls behind the scene. I was able to add a user-agent to my calls easily, but the under the hood ones the library does kept coming up as “Google-HTTP-Java-Client/1.34.2 (gzip)”. I tried a few different ways, and at the same time was searching online, and didn’t see anyone speaking about this. Below is a quick block to put into your app if you want to set the user-agent.

These are the current versions of the OAuth library, and the http client I have been using to do auth.

compile group: 'com.google.oauth-client', name: 'google-oauth-client', version: '1.31.4'
compile group: 'com.google.oauth-client', name: 'google-oauth-client-servlet', version: '1.31.4'
compile group: 'com.google.http-client', name: 'google-http-client', version: '1.39.0'
compile group: 'com.google.http-client', name: 'google-http-client-jackson2', version: '1.39.0'

For my setup, I have the OAuth Servlet that initializes the OAuth flow, then a second servlet which handles the callback; as documented here. I added to the “class OauthCallback extends AbstractAuthorizationCodeCallbackServlet” the following ConnectionFactory under the override for the initializeFlow() function. Replace “myApp-v1.0.1” with your app name. Hope this helps someone!

@Override
protected final AuthorizationCodeFlow initializeFlow() throws IOException {
    ConnectionFactory connectionFactory = url -> {
        HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection();
        httpURLConnection.setRequestProperty("user-agent", "myApp-v1.0.1");
        return httpURLConnection;
    };
    return new AuthorizationCodeFlow.Builder(BearerToken.authorizationHeaderAccessMethod(),
            new NetHttpTransport.Builder().setConnectionFactory(connectionFactory).build(),
            new JacksonFactory(),
            .... (code removed);
}

Homelab: Hypervisors – Part 2 – VMware

What I want to say is, after deciding it was time to move to VMware and attempt to use vSAN instead of Storage Spaces Direct (S2D) I wanted to research the hardware I had and see if it would work on ESXi 7.0. But of course I did not thoroughly read all of the changes vSphere 7.0 has brought. The holiday was approaching and I was going to use this time to do my migration. I had read up on vSAN and knew I needed cache drives. I bought a few small (250GB) NVME drives to put into each system. Getting those drives installed took a day because I needed to create a custom 3D printed mount. That would give me a good speed boost for my storage no matter what. Having recently upgraded to 10GB networking, I already had HP and SolarFlare 10gb networking cards. The time came and I copied all of the VMs I had in Microsoft VHDX format to my NAS (which wasn’t getting changed), then unplugged the first Hypervisor, and attempted a ESXi 7.0 Install.

One hardware change I should note, I am using USB 3.0 128GB thumb drives for the ESXi OS. This also allowed me to leave the original Windows drive untouched, allowing for easy rollback if this was a nightmare. I put the ESXi 7.0 disk into the first system AND! Error, no networking card found… I started searching online and quickly found a lot of people pointing to this article. ESXi 7.0 had cut a ton of network driver support, everything from the Realtek motherboard NIC to the 10GB SolarFlare card would not be supported, with no way around it (I tried). It comes down to 6.x had a compatibility layer in it where Linux drivers could be used if there were not native drivers, 7.0 removes this. I then got a ESXi 6.7 installer (VMware doesn’t allow you to just download older versions on a random account, but Dell still hosts their version) and installed that. Everything came online and started working. Now that I knew the one thing blocking me was that, I installed all my systems with 6.7 while I waited for the 3 new Supermicro AOC-STGN-i2S Rev 2.0 Intel 82599 2-Port 10GbE SFP+ cards I ordered. Using the Intel 82599 chipset, they have wide support. 2 Ports is nice; and, the 2.0 revision of the card is compact allowing them to fit into my cases. So far I recommend them, they also are around $50 on eBay, which is not bad.

I played with a few of the systems, but decided to wait till the new network cards were in a few days later to initialize vSAN and copy all of the data back over. I used this guide, from the same author of the other post about ESXi 7.0 changes to configure the disks in the system how I wanted them. At one point I thought I was stuck, but I just had to have VMware rescan the drives. I setup a vSphere appliance on one of the hosts. This gives me all the cluster functionality, and single webpage to manage all the hosts. Here I an also create a “Distributed Switch” which is a virtual switch template which can be applied to each of the hosts. I can set the vlans I have, and how I want them to work in one place, then deploy it to all the systems easily. This works as long as all your hosts have identical network configurations. After watching a YouTube video or two on vSAN setup I went ahead setting that up. The setup was straight forward, the drives reported healthy, and now I was ready to put some data on it.

A small flag about vSAN, it uses a lot of RAM to manage itself and track which system has what. I was seeing about 10-12 gb of ram used on each of my hosts, that has 32gb to begin with. There are guides online for this, and I believe it can be tweaked. It has to do with how large a cache drive you have, and your total storage. Not a big deal, but if you are running a full cluster, something to be aware of.

Migrating the old VMs from their Hyper-V disk images to VMware was not too difficult. I used qemu-img to convert from VHDX to VMDK. The VMDK images that qemu creates are the desktop version of the VMDK format. VMwares desktop products create slightly different disk images than the server versions. I then unloaded these VMDKs onto the vSAN and used the internal vmkfstools on ESXi Shell to convert those images to the server versions. The Windows systems realized the changes, and did a hardware reset, they worked right away. The Linux systems (mostly CentOS 8) would not boot under any of the SCSI controllers VMware had. After reading online, and a bit of guessing, I booted them with the IDE controller which appeared to be the only one dracut had modules for. Once the systems were online I could do updates, and with the new kernel version they had available they made new initrd images. These images being created on the platform with the new virtual hardware, installed the SCSI controller modules and could then be changed from IDE to SCSI mode.

So far other than the hardware changes that needed to happen, moving to VMware has worked out well. I am using a VMware Users Group license, https://www.vmug.com/, which is perfect for homelabs, and doesn’t break the bank. I am starting to experiment with some of the newer or just more advanced VMware features that I have not used before. We spoke of vSAN, I also have setup DRS (Distributed Resource Scheduler, allowing for VMs to move between hosts as resources are needed), and want to setup a key manager server to play with VM encryption and virtual TPMs.

Now that I am off of that… unsupported… Storage Spaces Direct configuration updates are much easier. I can put a host into maintenance mode, which moves any running VMs, then reboot it and once its back online, things re shuffle. This does mean I need enough space on the cluster for 1/3 of it to be off at a time, but that is ok. I am running 32gb of ram, with 2 empty DIMMS in each system, when the time comes I can inexpensively add more RAM.

If you/your work has a NetApp subscription, there is a NetApp Simulator which is a cool OVA you can deploy on VMware to learn NetApp related things. I was using that at work to learn how to do day to day management of NetApps. Another neat VM image that comes in the form of OVA I found recently is Nextcloud’s appliance. They have a single OVA that has a great flow for taking you through configuring their product.

Overall the VMware setup as been as easy as I thought it could be. Coming from a workplace who runs their management systems without a lot of access, it has been nice having vSphere 7.0. It automatically checks in online, and lets me know when there are updates for different parts of the system.