Author: danberk

Ender 3 Pro Safety PSA

I have had an Ender 3 Pro for over 3 years. A year and a half ago I replaced the main controller board with the v4.2.7 silent board. I was surprised how much that lowered the noise from the 3D printer. I have also added more parts like the auto leveling bed probe; doing this had me compiling my own firmware to make sure all the add-ons worked.

Everything was great until recently I smelled a bad smell from the printer. I have gotten an enclosure over the holidays and thought it may just be I wasn’t used to the smell so concentrated. After it went on a few more prints I started searching to see if anyone else had this happen. It was a much worse smell than the normal PLA smell. After searching the Ender 3 subreddits, I found posts talking about how the terminals can melt.

I opened up the controller compartment and was shocked to see how badly the terminals had been melting! Apparently, the wires are tinned and overtime with the movement of the printer and work their way out of the terminal. This leads to the power arcing and melting the plastic. The suggested fix is to replace the terminals (or the board), and then install “ferrules” on the ends of the cleaned cables.

I contacted Creality for support, which they redirected me to Amazon, and opened an email chain, which Amazon never responded to. The v4.2.7 board was $30, I bought a new one to not deal with all of that. When I installed the ferrules and the new board it came right back up. Then inserting my SD card, which still had the last firmware I made on it, brought the printer 100% back.

Quick Game Review: Firewatch

Firewatch is a game I have had on my virtual shelf for a long time. I recently got a Steam Deck and figured I would give the game a try; the game is even “verified” for the Steam Deck. I went in not knowing anything about the story/game/art style. This is a story driven game, that through a quick opening scene puts you into the main character who is spending a summer in an old fire tower.

There are moments that the game feels action-y, like you have to get somewhere quickly, but in the end this game is similar to some others that fall into the “walking simulators” genre. Most of the game you are going point to point, and having the story unfold as you go. I went through the whole game in a little over 3 hours. I HIGHLY enjoyed it. The story was great and takes your actions/chats into account, changing the outcome as you go. There is also a way to play though the game with creator commentary on. You play the game again like normal, but around the map see stands that are commentary points. Shortly after finishing the game for the first time, I started another playthrough with this on. You know a game has to be good when years after it was created it still has an active subreddit.

I would recommend this game, but parts of the story are sad, so be prepared for that. At its standard price of $20 USD I think it is worth it; but this game also goes on sale regularly. I was excited to see that the developer of this game was planning another game, a new one set in Egypt. Then it turned out the company got bought by Valve and are all now on different projects.

(Art from fans on the subreddit)

Adding Content Security Policy (CSP) Support to Embedded Tomcat 10

Continuing the series of hardening embedded Tomcat in Java to meet Nessus security scans, I am back with an example of adding a Content Security Policy to your app. There are some ways in a more standard Tomcat server to provide CSP policies, but with an embedded server that can be more difficult.

I have used an embedded Tomcat server for years to build applications. The following example is using Tomcat 10, but the principle is the same or Tomcat 9. The main difference as a Tomcat 9 to 10 transition is moving from the javax namespace to jakarta. With more and more libraries, such as Jooq, moving to more modern Java versions; as well as, some of the new Java versions offering good performance improvements out of the box, it may be time for everyone to move to the Jakarta namespace. (Even if that means leaving some libraries such as Google OAuth behind)

In my recent example project going over how to use Pac4J for Oauth with Tomcat 10, I have added an example here of what the FilterBase class would look like. You then need to initialize the filter where you are starting the Tomcat thread. That will add the needed header to all the web requests your application processes.

Pac4J Integration with Embedded Tomcat 10 using Generic OAuth via Keycloak

(I will ramble for a bit, if you just want the guide jump below) I want to start a series more around programing than the other articles I have put up here. I know everyone here knows me as the good-looking hardware hacking guy, but most of my time at work is spent on programming and systems automation. I haven’t used the programming tag on this blog in a while, and I want to start this new series beginning with discussing upgrading to Tomcat 10.

I have for years been using an embedded Tomcat + Servlet backend, with a jQuery frontend for different small webapps I have made. I know for anyone who learned webdev in the recent past that sounds very old. I am using more and more Dropwizard and React these days, but that does leave my legacy projects on this old framework. While not the newest or flashiest thing, it does perform well with some systems handling hundreds to thousands of calls a second. (My hope is to get approval and open source some of them soon.) With the changes in the Java universe, (after Oracle bought Sun and decided to ruin everyone’s fun and causing splintering) I had to start moving from the traditional JavaX servlet namespace that Tomcat 9 and before used to Tomcat 10s Jakarta namespace. Migrating the servlets themselves was not so bad. The first big issue arose around Google’s OAuth library. I have used this library for a long time. It provides the easy-ish ability to connect to any OAuth server you want (I have specific ones at work I use) for authentication. Recently Google, doing what they do, marked this library as Maintenance Mode Only, stating they would only do emergency security fixes, but overall, its abandoned. Not what you want to hear from your authentication library. They also are not planning to move it over the Jakarta namespace making me stuck on Tomcat 9 for as long as it has support. This should be a long-time sine many big companies and projects are right where I was, and the plan is for 9 and 10 to develop together for a long time. This does mean that I cannot use the newer features of Java though. From this I knew I had to start looking at other options.

Every time I have to work on auth systems, it is maze, and once I get it working, I want it to stay working for a while, where I hopefully don’t have to touch it. There are not a ton of OAuth libraries for Java backend systems, and I wanted one that I knew had community support and would last for a while. That brought me to the popular PAC4J project. A lot of the guides I found for using this were around using JavaX and/or using PAC4J to integrate with Google Auth, or Facebook Auth, or other auth systems. I want to be able to use the systems at work, or a more generic provider such as Keycloak. I spent a good amount of time bringing different bits of information together to get a fully working PAC4J 5.7 + Tomcat 10 + Servlets setup working. I posted it on Github, and below has a guide on how to configure Keycloak in Docker to demonstrate this. This took a fair amount of time to put together, I hope it helps you out there, if it does, please give the post a like or star the repo, it pushes me to keep doing these tutorials.

Keycloak Setup

To start at the beginning, Keycloak is a webserver that works as an Identity Provider (IDP). It has its own database for users and groups, or can link into many other systems such as Google, Facebook, and many more. When hooking up to it, you can decide to use SAML or OpenID. I tend to use OAuth which is a subset of the OpenID standard (I think thats the order). This guide will use Keycloak as our IDP, then connect to that with RAC4J. I do believe PAC4J has a more native and easier to use Keycloak integration, but where is the fun in that.

Keycloak out of the box has a Docker image for development. You optionally can attach persistent storage to make users and groups stay after you delete and rebuild the container. Using this container with local storage is not a good setup this way for production, you should use a database like Postgres to back the container instead. But for our home testing this setup is fine and will work well. I use persistent storage so I can have a local Keycloak that can be updated without recreating everything. More information about the Keycloak docker container can be found here.

Setting up Keycloak with persistent storage

docker volume create keycloak
# We need to set the keycloak user to be the owner of that folder
docker run --rm --entrypoint chown -v keycloak:/keycloak alpine -R 1000:0 /keycloak
docker run -p 8081:8080 -e KEYCLOAK_ADMIN=admin -e KEYCLOAK_ADMIN_PASSWORD=admin -v keycloak:/opt/keycloak/data/h2 quay.io/keycloak/keycloak:20.0.2 start-dev --http-relative-path /auth

Setting up Keycloak WITHOUT persistent storage

docker run -p 8081:8080 -e KEYCLOAK_ADMIN=admin -e KEYCLOAK_ADMIN_PASSWORD=admin quay.io/keycloak/keycloak:20.0.2 start-dev --http-relative-path /auth

After a minute you should be able to browse to “http://127.0.0.1:8081/auth” and get:

Keycloak can use OpenID as its connector which is more of a superset of OAuth itself. For this guide I want to go over using OAuth; so we will tweak some of the Keycloak configs to work how we want, as a generic OAuth provider.

  • Login with the username and password of “admin” that we set in creating the container
  • Click in the top left dropdown where it says “master” and create a new realm, we will call it “example
  • Now that we are in your new realm, go to “Users” on the left
    • Create a new user, lets name this new user “Jon” (or whatever your heart desires)
    • Once created, click the “Credentials” tab, and set a password, I disable “temporary password” because this is not a production auth system
  • Click “Client Scopes” in the top left
    • Create client scope
    • Name “openid”
    • Type “Default
    • Save
    • Mappers” tab
    • Add predefined mapper
    • I did “full name”, “email”, “username”, “groups”
    • Note: I found the add screen here buggy, if you page to the next list of mappers, saving doesn’t work, so do one page at a time
  • Click “Clients” in the top left
    • Create client
    • Client-id “example-client”, I like to enable “Always display in console”, “Next”
    • Enable “Client Authentication”, then save.
      • “Client Authentication” enables “confidential access type”, this is the classic OAuth where you need a client secret to access the server
    • For this demo we need to enter “http://127.0.0.1:8080/oauth/redirect*” to the “Valid redirect URIs“, and save
    • Go up to “Client scopes” and add the “openid” scope as a “Default” type
    • Note: I have found with PAC4J + Keycloak specifically, if you do not add the openid type, then you will get through auth but when PAC4J goes to get user information you will get an error of WARN org.keycloak.events type=USER_INFO_REQUEST_ERROR, realmId=59d04435-daf4-4ca7-8623-195769911c0e, clientId=null, userId=null, ipAddress=172.17.0.1, error=access_denied, auth_method=validate_access_token. You may also see WARN org.keycloak.services KC-SERVICES0091: Request is missing scope 'openid' so it's not treated as OIDC, but just pure OAuth2 request. If your client is not requesting the openid scope.
    • Go to the “Credentials” tab, view the “Client secret” and save that for later

At this point going to http://127.0.0.1:8081/auth/realms/example/.well-known/openid-configuration will give you the OAuth endpoints we need.

Using The Example Code

Clone down the repository over at my Github. The README.md should contain what you need to get going! After adding your client secret from above, if you followed this guide, that demo should work for you with http://127.0.0.1:8080/ being the web app, and http://127.0.0.1:8081/auth being the auth server.

In the end you get a simple screen like this one that lets you play around with the functionality and experiment with the code.

Bitbucket: Convert From Standalone ElasticSearch to Embedded OpenSearch

At work I maintain random stacks of software, and sometimes help people with other stacks that they maintain. Recently I was asked to help bring a Atlassian Bitbucket stack up to date. In the past Atlassian always included a built-in ElasticSearch (ES) server. This was used to index code in Bitbucket and allow searching. It’s not a hard requirement for the server to function, but important for user experience.

When an environment moves from Bitbucket Server to Bitbucket enterprise you are supposed to go to a standalone ES over the embedded one for performance. I don’t know if people elsewhere commonly do this, but the stacks I have seen have just continued to use the embedded version. Admittedly, these are smaller instances; at scale I would understand that. That was until recently, when due to a licensing change Atlassian could no longer embed a up to date ElasticSearch. For a while they decided the best way to move forward was to keep bundling the one from before the licensing change (I think 7.10).

This works until you have an infosec team use Nessus and find you have an out-of-date ES sitting around when 7.16, or the 8.0 branch are out. From all that, this one stack had moved to a standalone ES cluster. We also now had to install the Atlassian security plugin into ES; this was not a simple task, and this plugin only supports a few versions of ES, none of which were current. At least then we are at a BETTER spot with security.

Now fast forward a few months of this mess going on, and Atlassian moved Bitbucket from ElasticSearch to OpenSearch. OpenSearch is a fork of ElasticSearch at version 7.10.2 from Amazon to get around these new licensing terms. Normally if you were still using the embedded version of ES, when you did your next upgrade of Bitbucket it would move you to OpenSearch. Because this stack had already moved to standalone instance it did not migrate over. We are now in the worst of both worlds, off the supported path, and can’t get back on it. If you search the Atlassian documentation there are guides on how to move to a standalone version, but not back. A big catch I found was they use default passwords in the embedded version, that are not easy to find, which lead you making it hard to migrate back.

Migrating Back

Below are some notes I have on migrating back. Hopefully they help someone.

There are two main folders we will work in, one is your Atlassian Bitbucket installation folder for this version, I will call it %atlassian-install%, then there is your Bitbucket data folder that moves between your versions, with your upgrades, we will call that %bitbucket-home%. (Note: I did all this on Linux, but I am calling the variables that because it is easy)

Default %atlassian-install% is /opt/atlassian/bitbucket/7.21.7, or your current version. Default %bitbucket-home% is /var/atlassian/application-data/bitbucket, but I tend to move that to /opt.

Under %atlassian-install%/opensearch/plugins/opensearch-security/securityconfig/internal_user.yml is the details Bitbucket needs to connect to this OpenSearch instance. The default password is “bitbucket-changeit”. To create a new hash of a password, the following file needs to be given execute privileges and does not come with that on Linux; %atlassian-install%/opensearch/plugins/opensearch-security/tools/hash.sh .

Go into %bitbucket-home%/shared/bitbucket.properties if you have one, this file is created as you migrate between versions or databases; and remove any legacy elasticsearch username/password/url settings. For example: plugin.search.elasticsearch.baseurl or plugin.search.config.baseurl as shown in the documentation. The properties file overrides settings you have in the instance/database. You may have a SystemD service file to automatically start Bitbucket, this file has the start-bitbucket.sh file starting with -ns or --no-search to run a standalone instance, remove the no search option.

Now start Bitbucket and go to Administration -> Troubleshooting and support tools -> System Information, you will see Search failed to connect. Go to Administration -> Server settings, then enter your new search information there. If you just removed ElasticSearch, and started OpenSearch with the server, all you have to do is make sure the port is right, by default 7992 I believe, then make sure the username is “bitbucket” and the password is “bitbucket-changeit”. If you get a connection error it may be that you have to setup a TLS trust between Bitbucket and Opensearch, but that is outside the scope of this guide.

Below is the default %bitbucket-home%/shared/search/config/opensearch.yml

cluster.name: bitbucket_search
node:
  name: bitbucket_bundled

network.host: _local_
discovery.type: single-node

path:
  logs: ${BITBUCKET_HOME}/log/search
  data: ${BITBUCKET_HOME}/shared/search/data

action.auto_create_index: false

http.port: 7992
transport.tcp.port: 7993

# The OpenSearch security plugin stores its configuration in an index in the cluster itself. On startup if the
# security index doesn't exist yet, sitting this to true will cause the security plugin to read the yml files and
# configure the index using the contents of the files.
plugins.security.allow_default_init_securityindex: true

# Using the yml files with default initialisation, we create a bitbucket user and give it the all_access in-built role.
# However, access to the REST API is disabled by default even for the all_access role so we need to explicitly give
# it permission here so that the bitbucket user can access the OpenSearch REST API.
plugins.security.restapi.roles_enabled: ["all_access"]

# Mandatory TLS setup for transport layer
plugins.security.authcz.admin_dn:
  - CN=BITBUCKET
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.transport.pemcert_filepath: bitbucket.pem
plugins.security.ssl.transport.pemkey_filepath: bitbucket-key.pem
plugins.security.ssl.transport.pemtrustedcas_filepath: root-ca.pem

# Logs audit events to bitbucket_search_server.json
plugins.security.audit.type: log4j
plugins.security.audit.config.log4j.logger_name: audit
plugins.security.audit.config.log4j.level: INFO
Optiplex 5050 Back view

Dell Optiplex 5050 Micro Windows Server Installation

Recently I was able to pick up some Dell Optiplex 5050 Micros for $60 on eBay. These are tiny machines, with an Intel i5-7500T (4 core/4 Thread) CPU, 8GB of ram, and a 256GB SSD. For $60 they needed a power supply, but those are easy to come by. My plan was to replace my aging Intel NUC that is the core domain services for the house (AD, Radius, CA) and perhaps the aging firewall, if I can figure out how to get a second NIC into the system, more on that later.

My philosophy when running a standalone network (even with internet access) is to have at least 1 of your Domain Controllers (DCs) be a physical box at all times. An alternative is a dedicated hypervisor with local disks, but anyone who has tried to start a VM manually on VMWare knows how painful it can be without any interface to the system other than the command line. In addition, these days it’s easy to make all the DCs virtual, but if you ever have to cold boot your environment; then you run into not having DNS. Following not having DNS, things like vCenter and vSAN can’t come up cleanly, and there are more and more chain on effects. Having a physical machine allows you to bring DNS and core services up first, then start all other services that rely on your domain.

The first task I had was to get one of the Optiplex 5050s ready for Windows Server. I started with upgrading the ram to 16GB, because I had it laying around. After that, since this is an eBay purchase, I updated the firmware/BIOS and ran diagnostics before it touched the home network. The seller was nice enough to install Windows 10 Pro on the machine, which has a license in the BIOS; but I formatted the drive before starting that instance. People are generally nice, but who knows what was in that image. After getting Windows Server 2022 installed I hit my first issue. Server 2022 does not have a driver for the Intel i219-V that is in this chassis.

I tried getting the drivers from the Dell site, but Windows refused to use them because they were for Windows 10, and not Server edition. My current fix for this was going to select the driver, telling it to “Browse my computer for drivers”, letting me pick, then manually selecting the “Intel” “Intel(R) Ethernet Connection (2) I219-V” driver. I had a USB ethernet dongle that worked for me to get online and at least be able to see that driver. Now the box is happily online. The main issue with this technique is that I keep getting an “Optional” Windows Update for an updated driver that seems to never install. I think that is because I installed the Dell driver, but it never runs correctly.

Another thing I try to do with most systems, especially the systems in charge of security is get Virtualization Based Security running. This is a newer Windows feature, where core elements that need to maintain secrets are run in tiny Hyper-V containers. The user never sees it, but this gives added protection to the system. If you run “msinfo32”, you can get an output of its status. Most of the time, you need to enable chipset virtualization support; then add the system feature of “Host Guardian Hyper-V Support”. On older systems (Windows Server 2019) and desktops, I think it’s just called “Hyper-V”, then you get these features enabled.

On paper this machine is 78% faster than the Intel i5-3427U, and that makes a world of difference. The old system took a while to boot, and a while to backup, which is what spurred me to upgrade. This system feels amazingly fast for a $60 system. Keep in mind that it cost less than the Raspberry Pi 4, has Intel, and didn’t have to wait the years Raspberry Pis take right now!

I have the main DC run domain services, DNS, Network Policy Service (RADIUS), and certificate services. For the first two, I just had to install Domain Services and DNS and the system started acting in that role. For NPS I exported the config from the old DC, and then installed the service and imported onto the new one. As a reminder, Domain Services has to be installed first, or if you have NPS/Certificate Services installed, then try to do Domain Services, it will tell you it can’t install. Certificate Services, I added a new CA, stopped the old one’s service, and removed it as an enrollment agent in ADSI. My 802.1x and other certs given out by GPO are short lived, around 90 days; I will wait for the old ones to expire and systems to naturally get newer certs.

The second system I got; I thought I would try to do some hardware hacking. My hope was to repurpose it as a firewall for my aging Dell Optiplex 990 from 2011. To do this I would want to add at least 1 more NIC to the system. I ordered a 1gb ethernet NIC that goes where the WLAN chip goes. At first it did not show up in Linux and I was worried. Turns out the system bios had “wlan” disabled, and by enabling that, it turned on that PCIe channel. Then the card would show up. Having mounted the ethernet port in the extra serial blank this system has did make it look very clean and easy. I had to tuck the wire away as it came from the front of the unit to the back and had the sata drive siting on it. After playing with it a good amount, removing the card, reseating, putting electrical tape under it, I was able to get the line up, but not reliably at 1gb/s, it tended to go down to 100mb/s a lot in coming up. While things like loosening the screw holding it down, and putting electrical tape under it helped, the system was not reliable enough for me to feel comfortable using it for homelab-production. I shaved down the connectors at the end of the card, with them being that large, the screw couldn’t easily get between them. That did not help that much.

In the end I am enjoying the one system as a new DC. And eventually will figure out what I want to do with the other one. With having a NVMe slot, and SATA internally, in addition to being able to go up to 32GB of ram on a low power budget they are very capable little machines.

Stable Diffusion Image, "computer with redhat logo on screen, in a field with mountains and a dinosaur in the background"

Clean Tenable Nessus Scans for RHEL 7 with Podman

There can be an alert misfire for Tenable Nessus plugins 137561, 138032, 142002 based on your YUM repo configuration. This leads to 3 medium alerts that should not be there.

Plugins:

  • RHEL 7 : OpenShift Container Platform 4.3.25 containernetworking-plugins (RHSA-2020:2443) (Plugin 137561)
  • RHEL 7 : OpenShift Container Platform 4.2.36 containernetworking-plugins (RHSA-2020:2592) (Plugin 138032)
  • RHEL 7 / 8 : OpenShift Container Platform 4.6.1 package (RHSA-2020:4297) (Plugin: 142002)

If you have a stack that is using podman with RHEL 7 and does not have the default redhat.repo file, then packages are installed that have newer versions in the OpenShift repos. Normally this would be fine, but the Nessus scanner is supposed to check if you have OpenShift repos enabled, and if not then stop and say the latest versions from RHEL 7 OS is good; but the check fails if you are missing the RHEL 7 OS repos. The OS repo HAS to be enabled also, or the check will show as failing. This situation can easily happen if you have an air gapped system or a system on Satellite where you are not using the default repo in redhat.repo. Luckily the baseurl does not matter, as long as you set the name to “rhel-7-server-rpms”, and I put the name= line in there for good measure, then the check will come back clean.

/etc/yum.repos.d/redhat.repo

[rhel-7-server-rpms]
name = Red Hat Enterprise Linux 7 Server (RPMs)
baseurl=file:///opt/rhel_7_x86_64_os/
enabled = 1
gpgcheck = 0
gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

Before (or after) setting that, you will need to disable the YUM Redhat Subscription Manager plugin, or the next time you run “yum” it will wipe your redhat.repo and reload it from subscription manager. To do this, go to /etc/yum/pluginconf.d/subscription-manager.conf and set “enabled=0”. Also # subscription-manager config --rhsm.manage_repos=0

Below are examples of the errors you can see from Nessus.

Remote package installed : containernetworking-plugins-0.8.3-3.el7_8
Should be                : containernetworking-plugins-0.8.6-1.rhaos4.2.el7
OR
Remote package installed : runc-1.0.0-69.rc10.el7_9
Should be : runc-1.0.0-81.rhaos4.6.git5b757d4.el7

The default thinking may be, it says I need to update to the OpenShift packages; then it makes sense to install the OpenShift repos. And if you go get a Redhat developer account to debug this, you have the OpenShift repos there. That is because the developer account gives you a lot of entitlements including OpenShift, and if you add the OpenShift repos to a bunch of systems, you may be liable to get OpenShift licenses, or get errors because those systems do not have the entitlements. The key is the packages say “.el7_8″/”.el7_9″ instead of “.rhaos4.2”. This is a plugin misclassification, not a need for updates.

Note: The image is a random AI generated one: Stable Diffusion Image, “computer with redhat logo on screen, in a field with mountains and a dinosaur in the background”. I think they are fun.

THE µKENBAK-1

Back again with another retro computer kit from the same creator as THE ALTAIR-DUINO, a small quick kit in the µKENBAK-1. The µKENBAK-1 with the µ in front denotes one of the earlier versions of the kit. This is smaller than the original computer kit, compared to the full-sized replica or nanoKENBAK-1 now offered by the creator. This is a small, and simple kit. Running off an Atmel processor (same as Arduino), this little recreation offers a fun, simple front panel, and relatively quick assembly.

Compared to some of the other kits that have been posted here, this one is straight forward to put together. While you have the classic soldering, the kit is all through hole components and is a pleasant hour or so to put together. The most time for me in putting the kit together actually came down to getting the PCB with the stand-offs in the case and lined up with the back holes. This proved to be a difficult, and time-consuming process. You need to pre tighten them on the front panel, which then slides into the case, and line them up with the back holes. Between the standoffs being plastic and wanting to strip, and them wanting to wiggle all over, most of my time went into this instead of soldering. In the end, I got 5/6 in place and called it a day.

Evil Stand-Offs

The creator of this kit shows his experience in creating these kits, in little details, which make the kit a nice experience; one example is the usb extension cable which gives you an easy connection out the back is the perfect length to do the job but not be in your way. Another is the instruction booklet coming with a bunch of examples on how to use the computer, right after the assembly instructions. These instructions come in a nice spiral book included in the box.

The creators website, https://adwaterandstir.com/kenbak/ also goes into detail about the creator of this machine, (the original one in the 1970’s), and its history.

This is one of the easier kits I have done, but enjoyable in its ease to put together. I would recommend this kit to someone who is looking to get started with these kits.

Missing Email Alerts from LibreNMS

I realized recently that I haven’t gotten any alerts from LibreNMS recently, including when I rebooted devices for patching. After going to the “Alert Transport”, and attempting to send a message I got “SNMP Error: Could not authenticate.” Others seem to recently get this as well. (Link)

Turns out after May 31st (although for me it seems more like June 6th, 2022) Google disabled simple password logins for Gmail accounts. You need to enable two factor auth, then enable an app specific password for LibreNMS. This was a good quick guide on how to do that. With LibreNMS sending alerts when something is wrong, but not having a alert that it is working, it may be worth going and checking if you use LibreNMS and Gmail.

Computer Vision for Datacenter Auditing

I am going to start a series of posts of random ideas I have had but not had time to fully implement. The first in this series is a idea I have worked on about ~3 years ago (November 2019)  for being able to audit a datacenter as well as map systems physicals location to their logical one in a network.

The core of the idea is to use cameras in a datacenter to see servers in the rack, these could be security cameras, then use that data to map out the datacenter and save the administrators time from having to manually preform these actions. The process begins by training a machine vision learning model on what a server looks like. Most of the time at work I am working with Dell servers so I thought that was a good starting point. To make the model generic enough I was just attempting to train it on what a 1U server looks like vs what a 2U server looks like.

At this point I needed A LOT of photos of servers with different lighting and angles. I took a bunch myself as a seed set of different racks I had, then I turned to the web. Where could I get a large assortment of photos of Dell servers in different configurations and lightings? The homelab section on reddit! People all the time post their setups at home and what they have. I went through and downloaded several hundreds photos of different peoples setups. Another place to get photos was from eBay where a lot of sellers put up photos of servers in different settings; the downside is that a lot of people reuse the same photos again and again. I don’t know if the internet has yet to figure out what the copyright rules of using photos from online to train a model.

I researched a bunch of different techniques and was playing around with OpenCV, but then found a tutorial that seemed to be in line with what I was looking to do. (This one is also good, and very similar material) I researched different image processing models, and played around with several.

Now that I had the photos, I downloaded GitHub – tzutalin/labelImg: 🖍️ LabelImg is a graphical image annotation tool and label object bounding boxes in images. This is a tool where you go to each photo, select the item you are trying to learn, and label it. This took a while, its fully manual work. A lot of the photos from the web had multiple servers in the photo, and each one would need to be selected. This proved to be one of the more time consuming parts of the project. I had to manipulate the photos to allow the rectangular bounding boxes to be able to fit the servers, even when the photos are at weird angles.

I had to pick some of the photos to be the training set, than other photos to be the testing set. With that metadata ready and everything marked, I converted the final metadata from XML to CSV, using xml_to_csv.py provided at the above example repos. That was then fed into Tensorflow. The system I had to start with for this other than a laptop was a CentOS 7 server, this proved to be very annoying because some dependencies such as protobuf were not available at new enough versions and had to be custom compiled.

It was time to let the model run for a while and see what it could learn. Several important things were learnt in this process. First, if you have GPUs makes sure you have a Tensorflow that is compiled and ready to use them. The speed you speed you get with and without them is kind of crazy. Also, more RAM and GPU helps a lot speed up the process. At first I was playing with this on just a laptop, and that one didn’t have the GPU drivers for CUDA. This was taking DAYS to work on the model. Later I switched to using GPUs I had in a server, and this greatly increased the iteration cycle speed.

Off the bat it was able to get a decent percentage recognition of the servers in the photos I had presented it! I do think a lot of the photos I then tested it on were fairly ideal conditions, with good lighting and camera angle. This may give a better than real world experience with it working. To improve the model I can always find more photos and train it with more images. I was able to get the model to recognize about 80% of the servers in racks I showed it at this time. Another factor that could help in the future is the evolution of cameras. A lot of places are replacing 720p/1080P cameras with 4k cameras, the more resolution the system has to work with the better.

The next step I wanted to do was start matching physical location to logical. The idea behind this is, I can find regions in a photo or video where servers are, and each server through its iDrac/IPMI allows me to blink front chassis lights. So one host at a time I will have automation send the command to blink the front chassis lights, and perhaps some lights on the HDDs, then scan for which region in the photo has started to blink!

This is the idea I have slowly worked on for the last little while, I have prototypes of most of it working, but have not had a lot of time to put into it.  The hope would be we could use existing cameras to get the footage we need to map existing datacenters we have. Then perhaps in the future port this system to something like Hololens, or Apple/Meta AR system. Once we have that mapping, now we can start to draw out the physical servers and their location in the world/racks on a webpage, and make it easier for people working in a datacenter to find boxes they need. Hopefully one day allowing for people to click a server on a webpage, and then connect into its controller without a human painstakingly going to each box and doing this mapping. Of course all of this is fixed by a team labeling each server, but where is the fun in that.