Projects

Missing Email Alerts from LibreNMS

I realized recently that I haven’t gotten any alerts from LibreNMS recently, including when I rebooted devices for patching. After going to the “Alert Transport”, and attempting to send a message I got “SNMP Error: Could not authenticate.” Others seem to recently get this as well. (Link)

Turns out after May 31st (although for me it seems more like June 6th, 2022) Google disabled simple password logins for Gmail accounts. You need to enable two factor auth, then enable an app specific password for LibreNMS. This was a good quick guide on how to do that. With LibreNMS sending alerts when something is wrong, but not having a alert that it is working, it may be worth going and checking if you use LibreNMS and Gmail.

Computer Vision for Datacenter Auditing

I am going to start a series of posts of random ideas I have had but not had time to fully implement. The first in this series is a idea I have worked on about ~3 years ago (November 2019)  for being able to audit a datacenter as well as map systems physicals location to their logical one in a network.

The core of the idea is to use cameras in a datacenter to see servers in the rack, these could be security cameras, then use that data to map out the datacenter and save the administrators time from having to manually preform these actions. The process begins by training a machine vision learning model on what a server looks like. Most of the time at work I am working with Dell servers so I thought that was a good starting point. To make the model generic enough I was just attempting to train it on what a 1U server looks like vs what a 2U server looks like.

At this point I needed A LOT of photos of servers with different lighting and angles. I took a bunch myself as a seed set of different racks I had, then I turned to the web. Where could I get a large assortment of photos of Dell servers in different configurations and lightings? The homelab section on reddit! People all the time post their setups at home and what they have. I went through and downloaded several hundreds photos of different peoples setups. Another place to get photos was from eBay where a lot of sellers put up photos of servers in different settings; the downside is that a lot of people reuse the same photos again and again. I don’t know if the internet has yet to figure out what the copyright rules of using photos from online to train a model.

I researched a bunch of different techniques and was playing around with OpenCV, but then found a tutorial that seemed to be in line with what I was looking to do. (This one is also good, and very similar material) I researched different image processing models, and played around with several.

Now that I had the photos, I downloaded GitHub – tzutalin/labelImg: 🖍️ LabelImg is a graphical image annotation tool and label object bounding boxes in images. This is a tool where you go to each photo, select the item you are trying to learn, and label it. This took a while, its fully manual work. A lot of the photos from the web had multiple servers in the photo, and each one would need to be selected. This proved to be one of the more time consuming parts of the project. I had to manipulate the photos to allow the rectangular bounding boxes to be able to fit the servers, even when the photos are at weird angles.

I had to pick some of the photos to be the training set, than other photos to be the testing set. With that metadata ready and everything marked, I converted the final metadata from XML to CSV, using xml_to_csv.py provided at the above example repos. That was then fed into Tensorflow. The system I had to start with for this other than a laptop was a CentOS 7 server, this proved to be very annoying because some dependencies such as protobuf were not available at new enough versions and had to be custom compiled.

It was time to let the model run for a while and see what it could learn. Several important things were learnt in this process. First, if you have GPUs makes sure you have a Tensorflow that is compiled and ready to use them. The speed you speed you get with and without them is kind of crazy. Also, more RAM and GPU helps a lot speed up the process. At first I was playing with this on just a laptop, and that one didn’t have the GPU drivers for CUDA. This was taking DAYS to work on the model. Later I switched to using GPUs I had in a server, and this greatly increased the iteration cycle speed.

Off the bat it was able to get a decent percentage recognition of the servers in the photos I had presented it! I do think a lot of the photos I then tested it on were fairly ideal conditions, with good lighting and camera angle. This may give a better than real world experience with it working. To improve the model I can always find more photos and train it with more images. I was able to get the model to recognize about 80% of the servers in racks I showed it at this time. Another factor that could help in the future is the evolution of cameras. A lot of places are replacing 720p/1080P cameras with 4k cameras, the more resolution the system has to work with the better.

The next step I wanted to do was start matching physical location to logical. The idea behind this is, I can find regions in a photo or video where servers are, and each server through its iDrac/IPMI allows me to blink front chassis lights. So one host at a time I will have automation send the command to blink the front chassis lights, and perhaps some lights on the HDDs, then scan for which region in the photo has started to blink!

This is the idea I have slowly worked on for the last little while, I have prototypes of most of it working, but have not had a lot of time to put into it.  The hope would be we could use existing cameras to get the footage we need to map existing datacenters we have. Then perhaps in the future port this system to something like Hololens, or Apple/Meta AR system. Once we have that mapping, now we can start to draw out the physical servers and their location in the world/racks on a webpage, and make it easier for people working in a datacenter to find boxes they need. Hopefully one day allowing for people to click a server on a webpage, and then connect into its controller without a human painstakingly going to each box and doing this mapping. Of course all of this is fixed by a team labeling each server, but where is the fun in that.

Cisco ISE 2.X Certificate Expiration

Quick post: I had a HA pair of ISE boxes in a lab the other day have the certificates that I made with a Windows Certificate Authority expire the other day and I ran into some odd behavior. To be clear, in this scenario, the certificates had a valid chain of trust, but it was past its expiration date.

I logged in after realizing this and had odd behavior, node-A could not read node-Bs certificates. Both nodes said they were no longer on domain, even though the domain disagreed and I logged in with domain credentials that were recently changed. Then when I went to make a Certificate Signing Request (CSR), I was able to make it, but when I went to download it I got a generic message of “Cannot connect to node-a”. At the same time all these issues were going on, under “Node Status” on the dashboard, both nodes were sharing health data.

In the end, ISE gets weird when the cert date has expired. I generated a new self signed cert for node-A. Then deleted the expired certs because the system didnt want me to make a CSR for the same thing it thought it had a cert for already. This allowed me to then properly make a CSR and export it. That gave me “ciscoisenodea.pem”, I brought that over to my setup Windows CA, and with a admin command prompt ran certreq -submit -attrib "CertificateTemplate:WebServer" ciscoisenodea.pem . Saved that to my local desktop, and went into ISE to Bind it to the CSR. Node-A then rebooted. All of a sudden things like the domain pairing, started showing they were working again. Then the second node, I did the same process, and all of a sudden everything was happy again. Note: make sure you have a your admin backup password, one of the nodes DID refuse to talk to AD and I had to use that, while the other one said it wasn’t on the domain, but did work…

Hope this helps someone out there!

Ruckus ICX 7150-C12P Switch Repair

A while ago I purchased a Ruckus ICX 7150-c12p off eBay to use at home. It gives 14x1gb/s ports, and 2 SFP+ ports. The SFP+ ports are limited to 1gb/s by default, and there is a honor system license for upgrading them to 10gb/s. These switches go for $600 – $1200 depending on where you get them and which license you get with it (1gb/s vs 10gb/s). The switch is also POE, and can do 4 POE+ (30 watt) ports. I had one of these switches and it worked great. I wanted to get a second one to replace the WiFi link I was using across my apartment with a fiber link.

Instead of paying ~$250, which was their going rate on eBay; I saw a forum post about replacing this models power supply, and thought I would give that a shot. I got a broken switch for $45, and then a PSU for $50. The PSU I used was a SL Power LB130S56K 56V 2.32 130W. Armed with someone’s photos of doing this repair it ended up going fine. The hardest part of the whole operation is that the pins going onto the main board are reversed from what the power supply comes with, so you need to flip them. I have been running the unit for almost 2 years now without issue.

This model of switch is great because of its features and is fanless. The fanless-ness part of it is nice for homelabs near your desk, because the switches are silent. Because they are fanless, they cant have anything put on top of them, and need some room to breath. I think a lot of the ones you see online dead are because someone didn’t give it enough air, and the PSU died. Note when looking for a similar dead switch on eBay, you really want the seller saying “when plugged in nothing happens”, not “it periodically blinks” because that could be bad ram and its in a boot loop.

Having run two of these switches for over a year, I can give some feedback. I really like them. I have the two I have in a stack, I login once and manage both. When it comes time for firmware updates you SCP the file to the management IP, and it downloads the file to both, and then flashes and reloads. I came from using Cisco gear usually, or sometimes Arista; the CLI is a bit different, and Ruckus handles VLAN setup a bit weird, but once you get used to it, it makes sense. They are solid switches, with POE, that you can set and forget for a while.

New PC Build 2022

Having built my PC a few years ago, I was thinking about upgrading it, but with chip prices being what they are, and graphics cards costing more than a new car, I thought I would wait. Then a friend of mine happen to have an AMD 5800X that he was willing to give me a deal on… After years of having the custom case, while it was nice, it was HUGE and made it difficult to setup a desk in a tiny New York apartment. I used this opportunity to shrink a bit, and update some of the components. I say some, because some of them (like the graphics card) were going to stay to save cost.

Old case being taken apart, its a bit of a mess

At this point, this post is mostly a standard PC build, with a few hiccups along the way. Looking on the Small Form Factor PC subreddit, and https://caseend.com/ (a website for small pc cases), I settled on the ZZAW C3. It is relatively small, supports Micro-ATX, full size graphics cards, and all-in-one water coolers in a ~22 liter case. I always try to get Micro-ATX over Mini-ITX for my desktop because you get more PCI slots; as well as 4 DIMMs for RAM instead of 2. I also wanted to try one of these all-in-one water coolers, since I never had and thought I could get good results (and a quiet case).

The case came nicely wrapped. There were not a lot of instructions on how to put the case and system together, you kind of just had to know. That took a bit of trial and error. There are a few screws that are very similar sizes, and not labeled. The whole setup went smoothly after that except that getting the cooler to fit in the case was a bit of a challenge, and the motherboard… So, the motherboard… I got a AsRock X570 Pro4 motherboard, it had all the features I wanted. I get the case put together, I installed my previous power supply, I kept the RAM and graphics card, moved my SSD over (I had a PCIe Gen 4 Samsung 980 Pro on order, but it hadn’t come in yet), put thermal paste on the CPU, install the cooler AND… nothing. System will not boot.

There are lights on the motherboard saying CPU, RAM, and DISK failed. I start searching online and trying different things. A bit in, someone says “make sure your motherboard is updated to support 5000 series AMD”. I just got this board, its fairly new, it has to work right? Well turns out you need BIOS version 3.20 to support the CPU I have, AsRock is at 4.20 for this motherboard. After taking my old 2600X out of my old motherboard, taking out the new CPU, cleaning the thermal paste, reseating, re-thermal pasting the 2600X, turns out the board shipped with version 3.10… One revision before what I needed. I updated the BIOS, then swapped the CPU back, doing all those steps again. This time, the system worked.

I later got the Samsung 980 Pro SSD, which was a tiny bit of a headache to migrate over because I had Bitlocker on and trying to relocate the bootloader was not straight forward. I kept getting “Boot Device Not Found” and eventually somehow doing a system restore to before the last Windows Update got Windows 10 to reinstall the bootloader on the drive. I have no idea why that would have fixed it. I had disabled Bitlocker before starting this ordeal, but that just does it one reboot, I really needed to just fully disable it to save me typing in the recovery pin many times.

The system is working well. The only remaining issue, that is very odd, but I am just living with is whenever the system attempts to do a restart, it freezes before coming back. It properly shuts down, and if I hold the power button and then turn it back on everything is fine. But it refuses to gracefully restart. It also will fully shutdown fine. Odd…

Towing a U-Haul with a Subaru Forester

Recently I was helping a family member move states. They had some larger, but light, furniture to move and we were trying to figure out a solution. Having recently gotten a 2021 Subaru Forester with a tow hitch on it I thought I would help them move those items with a U-Haul trailer. I could not find a lot online about this, other than a few Reddit/Forum posts; so I thought I would post about what I learned.

One of the main reasons we went with renting a U-Haul we would tow over a truck is cost. U-Haul trucks cost is based on distance, U-Haul trailers come with unlimited miles. The estimate we got for renting a truck was around $1,300 for 3 days, the trailer was $550. We were renting for 4 days.

First the 2021 Forester (non-wilderness package) is rated to tow 1,500lbs in the US. That is the big issue and ceiling that you will hit. The 4×8 is 850lbs empty, and the 5×8 is 900lbs empty. That leaves us with only 600lbs of capacity when getting the larger one that can fit a bed. Is this a hard ceiling? No, but as people on Reddit and other sites have pointed out it can wear out your car (mostly the transmission) more if you tow over that, specially if you do this often. I knew the route I was going to take, and that 98% of it would be flat interstates. That, along with knowing I was towing lighter things, made me less worried about the weight.

The next thing to worry about is how to hook it up. The U-Hauls come with a 2 inch ball mount coupler. My Forester had the hitch installed from Subaru. Subaru puts in a Class I, 1-1/4 inch receiver. I used the following ball mount, which ended up working perfectly.

Ball Mount: https://www.amazon.com/CURT-45572-Euro-Mount-Ball/dp/B003B3GX5E/

I spoke to U-Haul on the phone and they said you want the hitch to be level at about 18.5 to 18.75 inches off the ground. The Forester mount is about 14 inches off the ground, and the above posted ball mount adds 4 5/8s inches, giving you the height you want.

I got the CURT 21410 Trailer hitch pin Amazon recommended. DO NOT GET THIS ONE. It is 2 inches long, with the extra metal around the mount for the hitch, this pin would not fit the stock Subaru hitch. It didn’t fit just by a hair. I ended up at the last minute going to Home Depot and getting a 2 1/4 inch long, 1/2 inch thick pin that fit.

Now that we had the hitch, we are off! I ordered a 5×8 U-Haul 2 months before we needed it on U-Hauls website, I quickly got a call saying they don’t have one at the local rental place, but would get back to me within 48 hours of the rental to tell me where to pick it up. The call never came. With less than 24 hours left, I call the main 1-800 number, and work with a nice lady on the phone to find where I can pick one up. There was a different facility than the only I selected online, 30 miles away that had it available.

The next day I was there when I would them I would be to pick it up, and no one was there. There was a sign on the door saying “for help call X number”, and then someone answered and within 5 minutes was over to help me. The pickup place was a small business attached to a self storage place who check once a day or so for pickups, and since at the last minute I changed it, it wasn’t noticed. I was just happy the person came so quick to help me get going.

With having the hitch already on the car, pickup took less than 5 minutes. I didn’t know if they would inspect the car, or check anything, but he said just back up to the trailer and we will hook it up. I asked the older gentleman if there is any advice he would have for someone who has not towed before. He said to leave extra stopping room, and make wide turns. The trailer dropped right on the ball, then we attached the wiring for the lights, and checked they were working. The standard 4 pin hookup the Forester hitch came with was exactly what the trailer had. Another important thing I was told for towing, CROSS THE SAFETY CHAINS BEFORE ATTACHING TO THE CAR. Apparently if you don’t cross the chains you can get a ticket, and State Troopers LOVE to give out tickets for it.

The smaller trailers (4×8, 5×8) do not have breaks, they fully rely on your cars brakes. This is something to think about; for instance, parking on a hill, all the weight of the trailer rests on your car and its transmission. I used the parking break a lot when parking the trailer. When I grabbed the trailer it had 2 wood blocks under the tires to keep it in place, I asked to take those with me (which ended up being a very good idea). They helped when parking in some locations, as well as when you want to take the trailer off, if you don’t have them it will want to roll because it has no brakes of its own. When I went to drop it off, the other U-Haul location (I did a one way trip), required those blocks to hold it the trailer in place, so I once again was glad I had them. They may have had some more of their own, but get the blocks when you pick it up, you’ll be glad you did.

The trailer says max speed 55, and after a bit of getting used to it, I felt comfortable with that. You just have to get used to being in the right lane, and giving plenty of room when changing lanes. It takes a while to start and stop, be prepared for that. Some people online mentioned and I felt one or two times, if you brake too hard, and then the trailer pushes forward on your car, the automatic transmission does not like that ad can rev up in situations you wouldn’t want it to.

I hope this post helps anyone who has a similar situation, and feel free to drop questions or your experience!

CentOS 8 Migration

I have a pipeline which creates live images to network boot different systems. Historically this has been based on CentOS. A little while ago I moved it to CentOS 8 because I had some newer hardware that was not supported on the older kernel of 7. Everything was working well until recently when CentOS 8 went end of life, and I could no longer rely on the CentOS 8 Docker containers.

The journey began for a new EL8 system. I wanted to keep on EL8 instead of switching to Streams because all the other systems I had running were EL8 (CentOS 8 or RHEL8), and I wanted to keep compatibility. At the same time, I didn’t want to do a new build of the image, have things break, and not realize it was because of a CentOS Streams change upstream. I also used the CentOS 8 docker container which seems to have been pulled, so that forced me to do this change now.

My first thought was Oracle Linux. It has been around for a while, is ALMOST drop in compatible, and can be used without going and getting licenses (RHEL). (There are some small silly things like instead of “epel-release” the package is “oracle-epel-release-el8”) This lead to nothing but issues. I replaced all the repos I had in the image creation stage with Oracle Linux ones, then every build I got a ton of “nothing provides module(platform:el8)” lines for any package that used yum/dnf modules. I spent a chunk of time on this, finding no real answers, and one Oracle support page that looked like it could help saying I needed to buy a support contract. Classic Oracle. At one point I thought it had something to do with Commit – rpms/centos-release – 89457ca3bf36c7c29d47c5d573a819dd7ee054fe – CentOS Git server where a line in os-release confuses dnf, but then that line was there. Also Oracle doesn’t seem to have a kickstart url repo, which is needed to do this sort of network boot. They wanted the end user to set that repo up, which may be the source of my issues. This also touched on the issue Disable Modular Filtering in Kickstart Repos – Red Hat Customer Portal, but I wasn’t even getting to a base OS setup, then I could make changes to the os and dnf for how it processes modules.

In my searches I did find this nice script to get bash variables for OS and version. https://unix.stackexchange.com/a/6348

Then I figured I would try either AlmaLinux or Rocky Linux. They both came out around when Redhat said Cent 8 was going away. Looking into both projects, they both are backed by AWS and Equinix who are big players, which made me feel a bit better about it. I had heard a bit more about Rocky and its support, so I tried that. I dropped in the new repos, and kickstart location, and everything just worked… Even things that were a issue when playing with Oracle Linux went away. For example, epel-release was once again called what it should be.

In the end so far it seems to be happy! We will see if any other small differences pop up and bite me…

Below is an example of the top of the kickstart I am using, if anyone is interested in more of how I create live images, leave a comment and I can do a post on it:

lang en_US.UTF-8
keyboard us
timezone Europe/Brussels --isUtc
auth --useshadow --enablemd5
selinux --disable
network --device=eno1 --bootproto=dhcp
skipx
part / --size 4096 --fstype ext4
part /opt --size 4096 --fstype ext4
firewall --disabled

url --url=https://download.rockylinux.org/pub/rocky/8/BaseOS/x86_64/kickstart/

# Root password
rootpw --iscrypted <Insert encrypted password here>

repo --name=baseos --baseurl=https://download.rockylinux.org/pub/rocky/8/BaseOS/x86_64/os/ --install
repo --name=extras --baseurl=https://download.rockylinux.org/pub/rocky/8/extras/x86_64/os/ --install
repo --name=appstream --baseurl=https://download.rockylinux.org/pub/rocky/8/AppStream/x86_64/os/ --install

Migrating Chrome Plugins from Manifest v2 to v3 Impressions

Google has recently decided that soon everyone will need to migrate from Manifest v2 of Chrome Plug-ins to Manifest v3. The one big change for me other than some of the new syntax changes, was you can no longer inject scripts into webpages. A lot of the changes for Manifest v3 are around the security context for plugins, which is good to see. In the past I could append to a webpages <script> data, and then have the page process that script in the pages context, now all that processing has to take place within the plugin itself, instead of on the page. You can still add to pages, but it has to be more static content, instead of dynamic.

One change that creates for you is which browser context you are working in. If you are on the page, you can directly hit all aspects of the page, and do AJAX requests under the users context. Now, any scripting you want done has to be done in the plugin itself, and if you want to access a non-public asset, the plugin requires the user to login itself. If you attempt to inject scripts onto the page you will get a CORS error, stating its from a different context..

For the main plugin I dabble with and work on, the API I access is open. This allows me to not worry about the context I am working in the browser too much. If it was an authed API, I would have to worry about having the user auth to the plugin itself. I moved all the logic from a split context of the plugin doing some of the work, then handing high level data to scripts injected into the webpages; to doing all the work in the plugin, then injecting final results (HTML data), and assets I want to change onto the page. In the end, this leads to a cleaner solution, and centralizes all the logic.

A big added benefit I saw to switching from Manifest v2 to v3 was in the security review process that is done when you upload an updated plugin, you get approved faster than in the past. For me, I got my new plugin approved in around a day (note the plugin I was working on is relatively small).

Hardening Embedded Apache Tomcat 9

I recently was working to make sure some of my web apps can pass a Tenable Nessus security scan. Since I tend to use the same embedded Tomcat for a lot of the apps I kept hitting similar findings. I had to do a bit of digging to find some of these answers so I thought I would document them. If anyone else has any helpful tips for embedded Tomcat please feel free to comment!

Apache Tomcat Default Files

Apache Tomcat Default Files | Tenable®

The main issue with this finding is that the 404 page the app presents has the Tomcat version number. This could be a issue because if there is a vuln in that version, you can be targeted.

final Tomcat tomcat = new Tomcat();

var host = (StandardHost) tomcat.getHost(); 
var errorReportValve = new org.apache.catalina.valves.ErrorReportValve();
errorReportValve.setShowReport(false); 
errorReportValve.setShowServerInfo(false); 
host.addValve(errorReportValve);

errorReportValve.setProperty(“errorCode.0”, “empty.html”);

The above line can be used if you want to specify a 404 page to use instead.

Source: https://stackoverflow.com/a/59967152

Web Application Potentially Vulnerable to Clickjacking

Web Application Potentially Vulnerable to Clickjacking | Tenable®

This finding is because the application is not sending the proper X-Frame-Options or Content-Security-Policy headers.

final Tomcat tomcat = new Tomcat();

final Context ctx = tomcat.addContext("/", MY_FILE_LOC);

FilterDef httpHeaderSecurityFilter = new FilterDef();
httpHeaderSecurityFilter.setFilterName("httpHeaderSecurity");
httpHeaderSecurityFilter.setFilterClass("org.apache.catalina.filters.HttpHeaderSecurityFilter");
httpHeaderSecurityFilter.addInitParameter("antiClickJackingEnabled", String.valueOf(Boolean.TRUE)); 
httpHeaderSecurityFilter.addInitParameter("antiClickJackingOption", "DENY");
httpHeaderSecurityFilter.addInitParameter("xssProtectionEnabled", String.valueOf(Boolean.TRUE));
httpHeaderSecurityFilter.addInitParameter("blockContentTypeSniffingEnabled", String.valueOf(Boolean.TRUE));
httpHeaderSecurityFilter.setAsyncSupported(String.valueOf(Boolean.TRUE));

FilterMap httpHeaderSecurityFilterMap = new FilterMap();
httpHeaderSecurityFilterMap.setFilterName("httpHeaderSecurity");
httpHeaderSecurityFilterMap.addURLPattern("/*");
httpHeaderSecurityFilterMap.setDispatcher("REQUEST");

ctx.addFilterDef(httpHeaderSecurityFilter);
ctx.addFilterMap(httpHeaderSecurityFilterMap);

Source: https://github.com/jiaguangzhao/base/blob/905aaf4111f4779e236043ff423951672ade848a/src/main/java/com/example/base/aop/configure/TomcatConfigure.java

CentOS/Rhel 8 Auto login Fix

I have a PXE environment that requires systems to boot up, then automatically login and start a program on boot. All of a sudden this stopped working after years of working. It took me a while to figure it out so figured I would post in case anyone else ran into this.

I have been doing auto login the recommended systemd for a while, as shown: https://wiki.archlinux.org/title/Getty. I copied /lib/systemd/system/getty@.service into /etc/systemd/system/getty@tty1.service. Then with a script edited it using sed in the build pipeline. In the end the line was:

ExecStart=-/usr/bin/agetty --noclear %I $TERM --autologin username

This worked for YEARS, then suddenly stopped. In investigating, I saw another file was being written next to mine at /etc/systemd/system/getty@tty1.servicee ; with another e added to the end of service, making it servicee. After a lot of playing around with it and looking at other guides I figured out, there was a update to systemd/getty and now it cares that all options are before the terminal variable is presented. Changing that line to the following fixed it.

ExecStart=-/usr/bin/agetty --noclear --autologin username %I $TERM