Quick Blurb

Missing Email Alerts from LibreNMS

I realized recently that I haven’t gotten any alerts from LibreNMS recently, including when I rebooted devices for patching. After going to the “Alert Transport”, and attempting to send a message I got “SNMP Error: Could not authenticate.” Others seem to recently get this as well. (Link)

Turns out after May 31st (although for me it seems more like June 6th, 2022) Google disabled simple password logins for Gmail accounts. You need to enable two factor auth, then enable an app specific password for LibreNMS. This was a good quick guide on how to do that. With LibreNMS sending alerts when something is wrong, but not having a alert that it is working, it may be worth going and checking if you use LibreNMS and Gmail.

Cisco ISE 2.X Certificate Expiration

Quick post: I had a HA pair of ISE boxes in a lab the other day have the certificates that I made with a Windows Certificate Authority expire the other day and I ran into some odd behavior. To be clear, in this scenario, the certificates had a valid chain of trust, but it was past its expiration date.

I logged in after realizing this and had odd behavior, node-A could not read node-Bs certificates. Both nodes said they were no longer on domain, even though the domain disagreed and I logged in with domain credentials that were recently changed. Then when I went to make a Certificate Signing Request (CSR), I was able to make it, but when I went to download it I got a generic message of “Cannot connect to node-a”. At the same time all these issues were going on, under “Node Status” on the dashboard, both nodes were sharing health data.

In the end, ISE gets weird when the cert date has expired. I generated a new self signed cert for node-A. Then deleted the expired certs because the system didnt want me to make a CSR for the same thing it thought it had a cert for already. This allowed me to then properly make a CSR and export it. That gave me “ciscoisenodea.pem”, I brought that over to my setup Windows CA, and with a admin command prompt ran certreq -submit -attrib "CertificateTemplate:WebServer" ciscoisenodea.pem . Saved that to my local desktop, and went into ISE to Bind it to the CSR. Node-A then rebooted. All of a sudden things like the domain pairing, started showing they were working again. Then the second node, I did the same process, and all of a sudden everything was happy again. Note: make sure you have a your admin backup password, one of the nodes DID refuse to talk to AD and I had to use that, while the other one said it wasn’t on the domain, but did work…

Hope this helps someone out there!

Systemctl: Assignment outside of section. Ignoring.

I wanted to throw together a quick post for a recent issue I have seen on Redhat 7/CentOS 7 boxes. A recent OS update has brought a small but important change to SystemD. In the past if you wanted to add environment variables to a SystemD service, you could enter # systemctl edit postgresql-14 (note I will be using postgresql-14 as the example service in this post), then add a line such as:

Environment=PGDATA=/opt/postgres/14/data/

After saving the file, and starting the service you are good to go. Recently after a minor update, I started getting the error “[/etc/systemd/system/postgresql-14.service.d/override.conf:1] Assignment outside of section. Ignoring.”, then the service would not start. It turns out, you can no longer drop Environment lines directly into the SystemD overrides, you need to mark which section of the SystemD file you are overriding. Below is the new proper way to go about this:

[Service]
Environment=PGDATA=/opt/postgres/14/data/

Quick fix, but can take a bit of digging. Also for SystemD and Postgres 14, this is the current way to easily redirect the data folder. Hope this helps someone!

SSSD with Active Directory Only Showing Primary Group

I was domain joining some Redhat Enterprise Linux 7 boxes to a Windows domain. Everything went smoothly except many of my users could only see their Primary groups. Some users whom had more permissions on the domain could see all their groups, just not some particular users. This seems to be a common failure scenario for SSSD with AD, and many people have opened bugs or chimed in with different fixes online. I found the solution on one forum post, and it saved me, and I wanted to amplify it.

As long as some of your users can see all their groups, you know its not exactly a problem with RHEL connecting to AD, or a protocol like LDAP being blocked. A odd side effect of this setup was periodically the groups could be scanned and then it would show the users in that group. If I ran “sss_cache -E“, then “getent group SecondaryGroup“, some of the time it would show the users inside the group. Then once the user logged in, the user would be removed via that command, as well as when I ran “groups” under the user.

The SSSD log didnt have a ton of help other than it couldn’t read all the groups. I tried a TON of the recommended settings, like enabling enumerate = True, enumerate = false, ldap_use_tokengroups = true, ldap_use_tokengroups = false; none of these changed anything. Then https://serverfault.com/a/938893 mentioned it may be a permissions problem between the computer object in AD and the user object. I looked and sure enough, my system had NO permissions on the users that were failing. I attempted to add the tokenGroups┬ápermission mentioned in this article and that still didnt help, but we were on the right track!

The answer came from https://serverfault.com/a/796005, there is a permission needed called “Read Remote Access Information”, once that is granted to your computer object onto the user, then secondary groups will start populating. I gave “Domain Computers” that permission, since it seemed to only be effecting some of the Linux systems and Windows was happy to have it as well.

Some random commands that can help you debugging SSSD:

SSSD likes to cache a lot, making it hard to troubleshoot, using the following clears all caches and restarts SSSD:

systemctl stop sssd && rm -rf /var/lib/sss/db/* && rm -rf /var/lib/sss/mc/* && systemctl start sssd

CentOS/Rhel 8 Auto login Fix

I have a PXE environment that requires systems to boot up, then automatically login and start a program on boot. All of a sudden this stopped working after years of working. It took me a while to figure it out so figured I would post in case anyone else ran into this.

I have been doing auto login the recommended systemd for a while, as shown: https://wiki.archlinux.org/title/Getty. I copied /lib/systemd/system/getty@.service into /etc/systemd/system/getty@tty1.service. Then with a script edited it using sed in the build pipeline. In the end the line was:

ExecStart=-/usr/bin/agetty --noclear %I $TERM --autologin username

This worked for YEARS, then suddenly stopped. In investigating, I saw another file was being written next to mine at /etc/systemd/system/getty@tty1.servicee ; with another e added to the end of service, making it servicee. After a lot of playing around with it and looking at other guides I figured out, there was a update to systemd/getty and now it cares that all options are before the terminal variable is presented. Changing that line to the following fixed it.

ExecStart=-/usr/bin/agetty --noclear --autologin username %I $TERM 

Quick Update

I have not written in a bit so I thought I would give a quick update. I haven’t had a lot of time recently to work on things because I was moving, and had to re-certify for Cisco. Cisco gave everyone 6 months additional time on certificates because of Covid last year (yay), but then I saw a reddit post that mentioned they had no extended the time between tests for multi test certs (booo). With a bit over a month left of time I studied for the CCNP Core Security cert and was able to pass. Now I have a CCNP “Enterprise” (the old Route and Switch) and a CCNP Security. The new Cisco cert system is very odd with each test being a “specialty” and then combinations of different tests adding up to other certs.

I have continued to play with the Mister FPGA. I never had an Amiga and there are some fun games for that system. A lot of the other games I have been laying on it are from my childhood, running the ao486 core. A bit ago I got another retro computer kit that will eventually be added to the list, but this one is a bit more involved. It is a full replica IBM 5170 motherboard. It needs soldered together, as well as add on cards found for it. Hopefully I will have more time for projects in the upcoming months, and at the same time I will try to do some more documentation of the current homelab.

Cisco ISR 4451 Serial Password Recovery

I had to password recover a Cisco ISR 4451, and kept having issues getting into the ROMMON prompt. Every guide mentioned sending a BREAK character during startup, but I could not get that to work. I was using the mini-USB port in the front, and as far as I knew did not have password recovery disabled. It turns out there is a problem with the mini-USB port and the Mac driver, I switched to using a traditional serial cable with a DB-9 connector/RJ45 serial port and suddenly I could get into ROMMON. I wanted to post incase anyone else runs into this.

Below is the startup process, at the end there you should be able to send a BREAK character.

Initializing Hardware ...

System integrity status: 00000610
Rom image verified correctly


System Bootstrap, Version 15.3(3r)S1, RELEASE SOFTWARE
Copyright (c) 1994-2013  by cisco Systems, Inc.

Current image running: Boot ROM0

Last reset cause: PowerOn
Cisco ISR4451-X/K9 platform with 4194304 Kbytes of main memory


Warning: filesystem is not clean
File size is 0x1d482044
Located isr4400-universalk9.03.16.04b.S.155-3.S4b-ext.SPA.bin 
<SEND BREAK HERE>

Using a Custom User-Agent with Google OAuth Client in Java

I have been using the Google OAuth for some of my projects at work for a while. A recent request was to add custom user-agent strings to different apps for the people doing analytics on which apps are using the authentication servers. I have some functions that do custom HTTP Get calls using the Bearer token we get from the OAuth flow, then the library also does its own calls behind the scene. I was able to add a user-agent to my calls easily, but the under the hood ones the library does kept coming up as “Google-HTTP-Java-Client/1.34.2 (gzip)”. I tried a few different ways, and at the same time was searching online, and didn’t see anyone speaking about this. Below is a quick block to put into your app if you want to set the user-agent.

These are the current versions of the OAuth library, and the http client I have been using to do auth.

compile group: 'com.google.oauth-client', name: 'google-oauth-client', version: '1.31.4'
compile group: 'com.google.oauth-client', name: 'google-oauth-client-servlet', version: '1.31.4'
compile group: 'com.google.http-client', name: 'google-http-client', version: '1.39.0'
compile group: 'com.google.http-client', name: 'google-http-client-jackson2', version: '1.39.0'

For my setup, I have the OAuth Servlet that initializes the OAuth flow, then a second servlet which handles the callback; as documented here. I added to the “class OauthCallback extends AbstractAuthorizationCodeCallbackServlet” the following ConnectionFactory under the override for the initializeFlow() function. Replace “myApp-v1.0.1” with your app name. Hope this helps someone!

@Override
protected final AuthorizationCodeFlow initializeFlow() throws IOException {
    ConnectionFactory connectionFactory = url -> {
        HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection();
        httpURLConnection.setRequestProperty("user-agent", "myApp-v1.0.1");
        return httpURLConnection;
    };
    return new AuthorizationCodeFlow.Builder(BearerToken.authorizationHeaderAccessMethod(),
            new NetHttpTransport.Builder().setConnectionFactory(connectionFactory).build(),
            new JacksonFactory(),
            .... (code removed);
}

ESXi Migration & Lenovo ThinkCentre M710s

I have started a transition from Hyper-V and Storage Spaces Direct to VMWare vSphere and vSAN. I apologize that these blog posts order is all over the place. Part of the transition is upgrading the hardware on some of the hosts I have, including getting 250GB NVME drives for vSAN cache. I started the migration with one of the desktops that run in the cluster, a Lenovo ThinkCentre M710s. After finding the small slot the NVME drive goes in, I realized there is a manufacture piece of plastic you are supposed to get to install a NVME drive. Since I do not have that, and do not want to pay for it, I spent a good bit more than a hour the first day of the migration creating this bracket and 3D printing it. Then while that was printing, I realized one of the feet on the system had gone missing, so I made a small one of those.

This post is just a quick update and a preview of more to come.

NVME Drive Holder: Lenovo ThinkCentre M710s NVME Bracket by danberk – Thingiverse

Foot: Lenovo ThinkCentre M710s Foot by danberk – Thingiverse

Booting VMware vSphere ESXi 7.0 on Certain Dell Hardware

I recently attempted to boot a Dell Precision M6800 into ESXi 7.0u1 to test some functionality before going to prod. Unfortunately this was met with “Invalid Partition Table”, switching between UEFI and BIOS boot didn’t seem to fix it giving “No boot device available” instead. After searching online I found this, https://communities.vmware.com/t5/ESXi-Discussions/quot-Invalid-Partion-Table-quot-Error-booting-ESXi-7-from-USB/m-p/1823852 which had comments such as “just dont run on a laptop” which was not very helpful.
I spent a chunk of time playing with the partitions and seeing how they were configured. I noticed when I went into the UEFI on the laptop it said it couldn’t find any file systems available, but when I loaded Windows or Linux on the system, the UEFI could see those boot partitions. I tried updating the firmware like Dell recommended, with no change. I then realized the ESXi 7.0 image is FAT16 for the EFI partition, while all other EFI partitions I have seen are FAT32.

I copied the files and folder out of the boot partition, reformatted it with FAT32 instead of FAT16, marked it as EFI type (ESP in Gparted), and moved the files back. The system booted fine the first time, with ESXi running happily. If you need boot ESXi on a Dell M6800, or M4800, or other give that a try. If this worked or didn’t work for you leave a comment below.