Work

IsoFileReader

A while ago I was working on a system to handle network boot operations. The main server is written in Java, and I needed to be able to read contents out of ISO files. The easy solution would have been to extract the ISO to the hard drive. I was trying to avoid that to save space; and with all the different images, not have thousands of tiny files littering the drive.

Then I found stephenc repo (java-iso-tools) for reading ISO files in Java. This library worked great! It had examples which helped me get started, and was fast to dive though a file. It supported the traditional ISO-9660 formatted files, which I needed, and I was good to go. Years later, the people over at CentOS and Redhat Linux had the idea to start putting giant SHA hashes as file names. Suddenly the disc images I was getting contained filenames that were 128 characters in length; and sadly java-iso-tools was failing to parse these names. To explain why, we need a bit of a dive into how the ISO-9660 standard works.

ISO-9660 is Developed by ECMA (European Computer Manufacturers Association) as ECMA-119, and then was adopted into ISO-9660. Thus, technically I was able to get the standards documents and investigate how ECMA-119 worked. Images start with a header; pointing to several tables, and the root folder file. The information about files on the disc span out from that root file. The root file, is the root directory on the image. From there every file is either a directory (with/without children) or a file which can be read.

The standard has had many changes to it over the years. While the original ECMA-119/ISO-9660 standard dates back to the start of the CD-ROM, over time people added to the standard. With PC’s at the time running MS-DOS and being able to save files to a FAT file system as 8 letter then 3 letter for extensions, the formatted needed added onto so one day CentOS could have 128 character file names. Some early additions to the format were Rock Ridge, and the Enhanced tables. When reading the first bytes from an image, there are several byte blocks which state which version of the standard they work with; this was forward thinking in this way. The basic tables help simple devices easily be able to read the discs. They can offer short file names, and point to the same binary data other tables later do. Then the enhanced tables can offer more information, and be able to add additional features to the disc. Some of these features can include things like file permissions.

At this point I had decided I needed to fix the problem and was going to write my own library to do it. While it sounds crazy, I enjoy writing these low level libraries. I started with the ECMA-119 standard, and going through the flow, like I was a CD-ROM device reading an image. I would later add on code for Rock Ridge, and reading all the enhanced tables, and even adding on a UDF parser.

I don’t want to spend too much time going through the standard. If you are interested: ECMA-119/ISO 9660 Standard, ECMA-167/ISO_IEC 13346/Original UDF Standard, Rock Ridge, UDF 2.60, there is a collection of the standard documents in depth. This post is more to talk about the project in general, and how I enjoyed working on it. A few constraints I set upon myself were I wanted it to be 100% in Java 8. That way it could be natively compiled if someone wanted to do that, wouldn’t just be connecting to some native binary tool, and would work with older Java code bases. The project currently targets Java 11 being the LTS out at the time I was working on it. I know there are many code bases out there which are Java 8, and I actually dont think there is any code except some tests using Java 9+ features. If someone had a Java 8 project, they could remove the tests and compile to 8. We live in a little bit of an odd time now, where a project like this targets more enterprise users who tend to be back on older versions. And at the same time Java 24 is coming out. I wanted to give high level classes that a user needing a simple tool could use; but at the same time have deeper level objects publicly available.

I was using this in the earlier mentioned network booting environment.There I can be building 100+ servers at a time; speed, small, and fast code were important. I ended up adding as test some performance benchmarks. I test the old library as my control, then I do normal file lookups as well as pre-indexed. I developed a system where certain heuristics of the image are taken and can be stored. Then you can feed in this initial “vector” I called it, of the image and a file vector. If the image matched the initial vector for a few characteristics, we could reasonably assume its the same image originally scanned, then instead of reading all the header tables, we jump to the location of the file vector with trust. This does leave it up to the developer to make sure they are matching pre-indexed images with vectors; but if you do, you can much faster serve files.

This project was fairly straight forward to test, I had many and there are many ISO images out on the internet. And plenty of them are Linux Images! I also had the older library which I could use as a control to test against. I ended up writing many tests which help when people send Pull Requests to make sure nothing has broken. This project I needed done to support what I was working on. There were a few places where I didn’t fully flush out the metadata, but left it to the end user to, if they cared about that data type. I spent a lot of time in Hex Fiend hex editor marking segments and trying to understand where code I had was breaking down.

Over the years of working in Open Source, and going to a technical college, I have seen many strong technical projects that are very impressive code, and can do a ton of interesting things. And then the developer focuses on interesting things they can make their code do, and spends no time putting documentation together. At the same time there are many project that get the job done, but aren’t anything special; these projects put a few documents together and maybe an example, and then get all the usage. The area developers hate to spend time, but can be the most valuable is documentation. That pushed me to spend a lot of time commenting the code, and writing a large README file showing how to work with the library.

I hope you will take a look at the project, maybe use it, and feel free to drop issues as they arise! I have been using the library in production for years now. It doesn’t get a ton of updates, because there hasn’t been a lot I need to add to it. When a PR or Issue arise I take care of it. And with the project being published under my work, I get a lot of automated PRs to help upgrade the library.

Take a look! https://github.com/palantir/isofilereader

Using Kerberos to Authenticate WinRM for Ansible

I have been trying to get Kerberos auth working with WinRM to be the authentication for transport mechanism within Ansible. I want to configure a Window system, from the non-domain-joined Linux host that runs my automations. Getting these two hosts to talk over WinRM introduces a bunch of options and difficulties with each one. If you look at the table on Ansible’s website for Windows auth with WinRM, you see only a few options for a domain joined machine:

https://docs.ansible.com/ansible/latest/os_guide/windows_winrm.html#credssp

I specifically needed it for an Active Directory account part of my setup was creating lab machines and building domain controllers on the fly. Basic auth is out, Certificate is out, what is left is Kerberos, NTLM, or CredSSP. Then to throw another wrench in this, the Ansible host and server are in FIPS mode. At this point FIPS disables MD5. NTLMv2 uses MD5 internally, which means it does not want to work with an FIPS enabled machine. CredSSP is backed by NTLM hashes as well making Kerberos your only option.

I did not want to have to domain join my Ansible machine to my Windows Domain; this is a test environment. Through a bunch of tinkering I have found a way to run Ansible, and have Ansible use a local krb5.conf file, instead of your system one in /etc/krb5.conf.

I am on Rocky and installed the following:
- dnf install krb5-devel krb5-libs krb5-workstation python3.12-devel
- pip3.12 install pykerberos gssapi krb5 pypsrp[kerberos]<=1.0.0
- (Note I am using python 3.12 for my Ansible)
You do need the host you wish to connect to have its FQDN accessible from your Ansible system (we will assume Linux)
- This can be in the hosts file or DNS
Then you need to set the inventory.yml similar to:
- my-host-post-domain:
        ansible_host: host.example.com
        ansible_user: Admin@EXAMPLE.COM
        ansible_password: WindowsPassword123
        ansible_connection: winrm
        ansible_winrm_transport: kerberos
        ansible_winrm_kinit_cmd: “./kinit.sh”
        ansible_winrm_message_encryption: never
        ansible_winrm_server_cert_validation: ignore
Create a file where you launch ansible from, kinit.sh:
- #!/bin/bash
  cd “$(dirname “$0″)”
  export KRB5_CONFIG=./krb5.conf
  kinit $1
Create your krb5.conf file
- [libdefaults]
  default_realm = EXAMPLE.COM
  dns_lookup_realm = false
  dns_lookup_kdc = false
  ticket_lifetime = 24h
  renew_lifetime = 7d
  forwardable = true
  rdns = false
  
  [realms]
  EXAMPLE.COM = {
  kdc = 192.168.100.2
  admin_server = 192.168.100.2
  }
  
  [domain_realm]
  .example.com = EXAMPLE.COM
  example.com= EXAMPLE.COM
  
  (I am purposefully disabling DNS lookup and using my IP addresses, that is up to you.)
Then I run my Ansible with the following:
- KRB5_CONFIG=./krb5.conf ansible-playbook -i inventory.yml site.yml

It seems if you do not have the kinit.sh file, then kinit does not see the config. And if you don’t have the environment variable before the Ansible command, when Ansible goes to use GSS to connect to the Windows system, Ansible will not see the config.

Troubleshooting

Some fun errors along the way:

Server not found in Kerberos database
- This means the server you are CONNECTING TO cant be found, usually this means the ansible_host is not the FQDN. Then when kinit is done it tries to connect to AD via the IP and that fails.
Kerberos auth failure for principal Admin@EXAMPLE.COM with subprocess: kinit: Cannot find KDC for realm \”EXAMPLE.COM\” while getting initial credentials
- It cant find the krb5.conf file, OR under [domain_realm], your mapping has an issue

Bitbucket: Convert From Standalone ElasticSearch to Embedded OpenSearch

At work I maintain random stacks of software, and sometimes help people with other stacks that they maintain. Recently I was asked to help bring a Atlassian Bitbucket stack up to date. In the past Atlassian always included a built-in ElasticSearch (ES) server. This was used to index code in Bitbucket and allow searching. It’s not a hard requirement for the server to function, but important for user experience.

When an environment moves from Bitbucket Server to Bitbucket enterprise you are supposed to go to a standalone ES over the embedded one for performance. I don’t know if people elsewhere commonly do this, but the stacks I have seen have just continued to use the embedded version. Admittedly, these are smaller instances; at scale I would understand that. That was until recently, when due to a licensing change Atlassian could no longer embed a up to date ElasticSearch. For a while they decided the best way to move forward was to keep bundling the one from before the licensing change (I think 7.10).

This works until you have an infosec team use Nessus and find you have an out-of-date ES sitting around when 7.16, or the 8.0 branch are out. From all that, this one stack had moved to a standalone ES cluster. We also now had to install the Atlassian security plugin into ES; this was not a simple task, and this plugin only supports a few versions of ES, none of which were current. At least then we are at a BETTER spot with security.

Now fast forward a few months of this mess going on, and Atlassian moved Bitbucket from ElasticSearch to OpenSearch. OpenSearch is a fork of ElasticSearch at version 7.10.2 from Amazon to get around these new licensing terms. Normally if you were still using the embedded version of ES, when you did your next upgrade of Bitbucket it would move you to OpenSearch. Because this stack had already moved to standalone instance it did not migrate over. We are now in the worst of both worlds, off the supported path, and can’t get back on it. If you search the Atlassian documentation there are guides on how to move to a standalone version, but not back. A big catch I found was they use default passwords in the embedded version, that are not easy to find, which lead you making it hard to migrate back.

Migrating Back

Below are some notes I have on migrating back. Hopefully they help someone.

There are two main folders we will work in, one is your Atlassian Bitbucket installation folder for this version, I will call it %atlassian-install%, then there is your Bitbucket data folder that moves between your versions, with your upgrades, we will call that %bitbucket-home%. (Note: I did all this on Linux, but I am calling the variables that because it is easy)

Default %atlassian-install% is /opt/atlassian/bitbucket/7.21.7, or your current version. Default %bitbucket-home% is /var/atlassian/application-data/bitbucket, but I tend to move that to /opt.

Under %atlassian-install%/opensearch/plugins/opensearch-security/securityconfig/internal_user.yml is the details Bitbucket needs to connect to this OpenSearch instance. The default password is “bitbucket-changeit”. To create a new hash of a password, the following file needs to be given execute privileges and does not come with that on Linux; %atlassian-install%/opensearch/plugins/opensearch-security/tools/hash.sh .

Go into %bitbucket-home%/shared/bitbucket.properties if you have one, this file is created as you migrate between versions or databases; and remove any legacy elasticsearch username/password/url settings. For example: plugin.search.elasticsearch.baseurl or plugin.search.config.baseurl as shown in the documentation. The properties file overrides settings you have in the instance/database. You may have a SystemD service file to automatically start Bitbucket, this file has the start-bitbucket.sh file starting with -ns or --no-search to run a standalone instance, remove the no search option.

Now start Bitbucket and go to Administration -> Troubleshooting and support tools -> System Information, you will see Search failed to connect. Go to Administration -> Server settings, then enter your new search information there. If you just removed ElasticSearch, and started OpenSearch with the server, all you have to do is make sure the port is right, by default 7992 I believe, then make sure the username is “bitbucket” and the password is “bitbucket-changeit”. If you get a connection error it may be that you have to setup a TLS trust between Bitbucket and Opensearch, but that is outside the scope of this guide.

Below is the default %bitbucket-home%/shared/search/config/opensearch.yml

cluster.name: bitbucket_search
node:
  name: bitbucket_bundled

network.host: _local_
discovery.type: single-node

path:
  logs: ${BITBUCKET_HOME}/log/search
  data: ${BITBUCKET_HOME}/shared/search/data

action.auto_create_index: false

http.port: 7992
transport.tcp.port: 7993

# The OpenSearch security plugin stores its configuration in an index in the cluster itself. On startup if the
# security index doesn't exist yet, sitting this to true will cause the security plugin to read the yml files and
# configure the index using the contents of the files.
plugins.security.allow_default_init_securityindex: true

# Using the yml files with default initialisation, we create a bitbucket user and give it the all_access in-built role.
# However, access to the REST API is disabled by default even for the all_access role so we need to explicitly give
# it permission here so that the bitbucket user can access the OpenSearch REST API.
plugins.security.restapi.roles_enabled: ["all_access"]

# Mandatory TLS setup for transport layer
plugins.security.authcz.admin_dn:
  - CN=BITBUCKET
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.transport.pemcert_filepath: bitbucket.pem
plugins.security.ssl.transport.pemkey_filepath: bitbucket-key.pem
plugins.security.ssl.transport.pemtrustedcas_filepath: root-ca.pem

# Logs audit events to bitbucket_search_server.json
plugins.security.audit.type: log4j
plugins.security.audit.config.log4j.logger_name: audit
plugins.security.audit.config.log4j.level: INFO

Clean Tenable Nessus Scans for RHEL 7 with Podman

There can be an alert misfire for Tenable Nessus plugins 137561, 138032, 142002 based on your YUM repo configuration. This leads to 3 medium alerts that should not be there.

Plugins:

RHEL 7 : OpenShift Container Platform 4.3.25 containernetworking-plugins (RHSA-2020:2443) (Plugin 137561)
RHEL 7 : OpenShift Container Platform 4.2.36 containernetworking-plugins (RHSA-2020:2592) (Plugin 138032)
RHEL 7 / 8 : OpenShift Container Platform 4.6.1 package (RHSA-2020:4297) (Plugin: 142002)

If you have a stack that is using podman with RHEL 7 and does not have the default redhat.repo file, then packages are installed that have newer versions in the OpenShift repos. Normally this would be fine, but the Nessus scanner is supposed to check if you have OpenShift repos enabled, and if not then stop and say the latest versions from RHEL 7 OS is good; but the check fails if you are missing the RHEL 7 OS repos. The OS repo HAS to be enabled also, or the check will show as failing. This situation can easily happen if you have an air gapped system or a system on Satellite where you are not using the default repo in redhat.repo. Luckily the baseurl does not matter, as long as you set the name to “rhel-7-server-rpms”, and I put the name= line in there for good measure, then the check will come back clean.

/etc/yum.repos.d/redhat.repo

[rhel-7-server-rpms]
name = Red Hat Enterprise Linux 7 Server (RPMs)
baseurl=file:///opt/rhel_7_x86_64_os/
enabled = 1
gpgcheck = 0
gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

Before (or after) setting that, you will need to disable the YUM Redhat Subscription Manager plugin, or the next time you run “yum” it will wipe your redhat.repo and reload it from subscription manager. To do this, go to /etc/yum/pluginconf.d/subscription-manager.conf and set “enabled=0”. Also # subscription-manager config --rhsm.manage_repos=0

Below are examples of the errors you can see from Nessus.

Remote package installed : containernetworking-plugins-0.8.3-3.el7_8
Should be                : containernetworking-plugins-0.8.6-1.rhaos4.2.el7
OR
Remote package installed : runc-1.0.0-69.rc10.el7_9
Should be : runc-1.0.0-81.rhaos4.6.git5b757d4.el7

The default thinking may be, it says I need to update to the OpenShift packages; then it makes sense to install the OpenShift repos. And if you go get a Redhat developer account to debug this, you have the OpenShift repos there. That is because the developer account gives you a lot of entitlements including OpenShift, and if you add the OpenShift repos to a bunch of systems, you may be liable to get OpenShift licenses, or get errors because those systems do not have the entitlements. The key is the packages say “.el7_8″/”.el7_9″ instead of “.rhaos4.2”. This is a plugin misclassification, not a need for updates.

Note: The image is a random AI generated one: Stable Diffusion Image, “computer with redhat logo on screen, in a field with mountains and a dinosaur in the background”. I think they are fun.

SSSD with Active Directory Only Showing Primary Group

I was domain joining some Redhat Enterprise Linux 7 boxes to a Windows domain. Everything went smoothly except many of my users could only see their Primary groups. Some users whom had more permissions on the domain could see all their groups, just not some particular users. This seems to be a common failure scenario for SSSD with AD, and many people have opened bugs or chimed in with different fixes online. I found the solution on one forum post, and it saved me, and I wanted to amplify it.

As long as some of your users can see all their groups, you know its not exactly a problem with RHEL connecting to AD, or a protocol like LDAP being blocked. A odd side effect of this setup was periodically the groups could be scanned and then it would show the users in that group. If I ran “sss_cache -E“, then “getent group SecondaryGroup“, some of the time it would show the users inside the group. Then once the user logged in, the user would be removed via that command, as well as when I ran “groups” under the user.

The SSSD log didnt have a ton of help other than it couldn’t read all the groups. I tried a TON of the recommended settings, like enabling enumerate = True, enumerate = false, ldap_use_tokengroups = true, ldap_use_tokengroups = false; none of these changed anything. Then https://serverfault.com/a/938893 mentioned it may be a permissions problem between the computer object in AD and the user object. I looked and sure enough, my system had NO permissions on the users that were failing. I attempted to add the tokenGroups permission mentioned in this article and that still didnt help, but we were on the right track!

The answer came from https://serverfault.com/a/796005, there is a permission needed called “Read Remote Access Information”, once that is granted to your computer object onto the user, then secondary groups will start populating. I gave “Domain Computers” that permission, since it seemed to only be effecting some of the Linux systems and Windows was happy to have it as well.

Some random commands that can help you debugging SSSD:

SSSD likes to cache a lot, making it hard to troubleshoot, using the following clears all caches and restarts SSSD:

systemctl stop sssd && rm -rf /var/lib/sss/db/* && rm -rf /var/lib/sss/mc/* && systemctl start sssd

CentOS/Rhel 8 Auto login Fix

I have a PXE environment that requires systems to boot up, then automatically login and start a program on boot. All of a sudden this stopped working after years of working. It took me a while to figure it out so figured I would post in case anyone else ran into this.

I have been doing auto login the recommended systemd for a while, as shown: https://wiki.archlinux.org/title/Getty. I copied /lib/systemd/system/getty@.service into /etc/systemd/system/getty@tty1.service. Then with a script edited it using sed in the build pipeline. In the end the line was:

ExecStart=-/usr/bin/agetty --noclear %I $TERM --autologin username

This worked for YEARS, then suddenly stopped. In investigating, I saw another file was being written next to mine at /etc/systemd/system/getty@tty1.servicee ; with another e added to the end of service, making it servicee. After a lot of playing around with it and looking at other guides I figured out, there was a update to systemd/getty and now it cares that all options are before the terminal variable is presented. Changing that line to the following fixed it.

ExecStart=-/usr/bin/agetty --noclear --autologin username %I $TERM

Cisco ISR 4451 Serial Password Recovery

I had to password recover a Cisco ISR 4451, and kept having issues getting into the ROMMON prompt. Every guide mentioned sending a BREAK character during startup, but I could not get that to work. I was using the mini-USB port in the front, and as far as I knew did not have password recovery disabled. It turns out there is a problem with the mini-USB port and the Mac driver, I switched to using a traditional serial cable with a DB-9 connector/RJ45 serial port and suddenly I could get into ROMMON. I wanted to post incase anyone else runs into this.

Below is the startup process, at the end there you should be able to send a BREAK character.

Initializing Hardware ...

System integrity status: 00000610
Rom image verified correctly


System Bootstrap, Version 15.3(3r)S1, RELEASE SOFTWARE
Copyright (c) 1994-2013  by cisco Systems, Inc.

Current image running: Boot ROM0

Last reset cause: PowerOn
Cisco ISR4451-X/K9 platform with 4194304 Kbytes of main memory


Warning: filesystem is not clean
File size is 0x1d482044
Located isr4400-universalk9.03.16.04b.S.155-3.S4b-ext.SPA.bin 
<SEND BREAK HERE>

Redhat/CentOS 7-8 PKI/CAC/Smart Card SSH Login with Active Directory and SSSD

I was experimenting with integrating CentOS with my home Active Directory (AD) cluster. I wanted centralized user management, and for a stretch goal, get PKI login working for Smart Card auth. I have used winbind before to connect CentOS 6 to Active Directory, that configuration before was a bit annoying. These days with CentOS/RHEL 7 and 8 we have SSSD, which is more straight forward. For all the following tests I used Putty-CAC (link), a Windows app that allows GSSAPI, and Smart Card auth.

SSSD Config

I will start off with my experience, then follow up with a how to; for this article I already have AD configured to support Smart Card auth, and have stored the Smart Card public key for my user. I will follow up with an article about that configuration. Active Directory integration is straight forward and easy. One setting you can enable is: hiding the domain names from the username, this allows the users to feel native to the system. Using users and groups are easy; I made a group to which I gave sudo access. When using Smart Cards you will need to put NOPASSWD in the sudo entry for that group, because the Smart Card users usually do not have passwords, usually… You can use Smart Card auth with Active Directory AND a password as long as you do not set “Smart card is required for interactive logon”. If you do check that box, AD sets a random password on the backend for that user.

After setup, with this config we store the authorized_keys in AD under the attribute altSecurityIdentities. The main tool to debug Smart Card auth is the tool sss_ssh_authorizedkeys, this allows you to have the system attempt to pull their ssh key on demand. A big warning about SSSD, it loves to cache information. If you attempt to run that command, and then make changes to your sssd.conf or AD, and re-run sss_ssh_authorizedkeys, it will fail because it is caching the failed lookup from before. My recommended command as root between tests where it may be caching is:

systemctl stop sssd && rm -rf /var/lib/sss/db/* && rm -rf /var/lib/sss/mc/* && systemctl start sssd

SSSD Config

1. Setup hostnamectl (make sure your host knows what its name is supposed to be) and dns, for SSSD to work well you need the system to be able to find itself in DNS, you can set up SSSD to auto register with dynamic DNS (more on that later)
2. Install Packages
     - Ubuntu
       apt -y install realmd sssd sssd-tools libnss-sss libpam-sss adcli samba-common-bin oddjob oddjob-mkhomedir packagekit    
     - CentOS
       sudo yum install realmd sssd oddjob oddjob-mkhomedir adcli samba-common samba-common-tools krb5-workstation

At this point running “# realm discover your_domain_fqdn” will list out services your domain needs for users to login. Usually the main program you need to enable is oddjobd which will create home directories when users login. Note, for these examples I find it easier to have a domain in them than the subsistute it, I will use my home test domain “home.ntbl.co” here.

3. systemctl enable oddjobd
4. systemctl start oddjobd
5. realm join -U admin_user_on_domain home.ntbl.co
6. vim /etc/sudoers.d/winadmins
Add the line “%domain\ admins@home.ntbl.co ALL=(ALL) ALL“, where “domain admins” is a group I have in AD, and “home.ntbl.co” is my domain. This setup does not support Smart Card login with sudo, since you need NOPASSWD for that sudo login. Example "%domain\ admins@home.ntbl.co ALL=(ALL) NOPASSWD:ALL". You can create a sub sudo file like I did here, or visudo to edit sudo and have it syntax checked.

7. Below is my /etc/sssd/sssd.conf without Smart Card auth setup.

 [sssd]
 domains = home.ntbl.co
 config_file_version = 2
 services = nss, pam
  
 [domain/home.ntbl.co]
 ad_domain = home.ntbl.co
 krb5_realm = HOME.NTBL.CO
 realmd_tags = manages-system joined-with-adcli
 cache_credentials = True
 id_provider = ad
 krb5_store_password_if_offline = True
 default_shell = /bin/bash
 ldap_id_mapping = True
 use_fully_qualified_names = false
 fallback_homedir = /home/%u@%d
 access_provider = ad
  
 dyndns_update = true
 dyndns_refresh_interval = 43200
 dyndns_update_ptr = true
 dyndns_ttl = 3600

Adding “use_fully_qualified_names” changes your username from “dan@home.ntbl.co” to “dan”. Not a requirement, but a nice, quality of life setting. The bottom adds dynamic dns, which will push your IP to AD DNS. Windows does dynamic DNS updates by default, and unless the systems are statically assigned, or even if they are, this can be a nice feature. Now "systemctl stop sssd" and “systemctl start sssd”, then you should be able to login with your AD account.

GSSAPI

Before getting into Smart Card auth, I wanted to briefly mention GSSAPI. This is a method to do auth between systems. It allows Windows clients to one click login to SSH by passing an auth token from your Windows session right to SSH. If you setup SSSD, enable GSSAPIAuthentication in /etc/ssh/sshd_config then you can use an app like Putty-CAC to SSH with GSSAPI. I have found this usually works with SSSD by just setting GSSAPI to yes. If you just want to admin Linux from AD, and have no other requirements I would suggest you look into this for your environment because it is so easy. If you are going to follow the rest of the guide, make sure to turn GSSAPI back off, or it will log you in automatically and you may think it’s Smart Card auth working; that fooled me for a few minutes.

Smart Card Auth

For all of my tests, I used the following Smart Card, Amazon link. I think these other cards would work as well, and they are cheaper; but I have not personally tried them. Amazon link. I may write an article later about setting up these cards, if you are interested write a comment below.

Add Certs to AD

You need the Smart Card’s public key data in SSH authorized_keys format. This guide will show you how to get that string from Putty CAC. You have to enjoy when a .gov site tells you to go to user NoMoreFood and get security software, the open source world is great.

In Active Directory, go to Active Directory Users and Computers, turn on Advanced Features, by going to the View menu, and enabling Advanced Features. Then select the user you want to add ssh keys for, and select the “Attribute Editor” tab. You will find an entry at the top called “altSecurityIdentities”, add the line that would usually be in ~/.ssh/authorized_keys there, it should look like “ssh-rsa key_stuff”.

Configuring SSSD for Cert Auth

To add Smart Card auth to SSSD, just add the following to your sssd.conf, merge the sections with the ones from above.

[sssd]
services = nss, pam, ssh, sudo

[pam]
pam_cert_auth = True

[domain/home.ntbl.co]
enumerate = True
ldap_user_extra_attrs = altSecurityIdentities:altSecurityIdentities
ldap_user_ssh_public_key = altSecurityIdentities
ldap_use_tokengroups = True

Now restart sssd. If you run "sss_ssh_authorizedkeys dan" with dan replaced with your name, then you SHOULD get a key back if everything is setup correctly. If you do not get a key back, use the command below to reset sssd and reload. If you still do not get a key then you will need to edit settings in sssd.conf, and continue to tweak:

systemctl stop sssd && rm -rf /var/lib/sss/db/* && rm -rf /var/lib/sss/mc/* && systemctl start sssd

I will say this does seem to take some trial and error. /var/log/sssd/ has some good logs that can help point you in the correct direction if you are running into issues. One quick note I will make, you may see people online say “use the command ‘sss_ssh_authorizedkeys -debug 4 home.ntblc.o’ to debug the command.” This command does not have a debug throw, what this does is uses the -d argument which is domain, then tries to parse the rest. You end up with key lookup attempts on domain “ebug” for user 4. Sadly sss_ssh_authorizedkeys is not very verbose, debugging it is a bit of a pain; do not listen to people who mention the above debug command, at least on CentOS/Rhel 7 and 8 it does not work.

As long as you are getting a key back from the above command, then you can wire it into SSH. Edit /etc/ssh/sshd_config with the following, note some sites say AuthorizedKeysCommandUser should be root, some say it should be nobody. I error on the side of lesser permissions and set it to:

 AuthorizedKeysCommand /usr/bin/sss_ssh_authorizedkeys
 AuthorizedKeysCommandUser nobody

Hope something here has helped someone, feel free to drop a comment.

Windows Server DNSSEC Error 9110

TL;DR; Check that your Domain Controllers are in the correct OU and that Microsoft Key Distribution Service is running

I ran into an issue recently when DNSSEC signing a dns zone where Windows Server 2019 gave a very vague error, and would only display that error after 10 minutes of timeout. This made iterating on it very slow since every change I made was a 10 minute wait. Every guide to setup DNSSEC mentioned right clicking the zone, then clicking sign and as long as you select the default it should just work. On another domain, that happened for me and it just worked; except the one original one that kept timing out.

In setting a custom DNSSEC signing policy I noticed that there were different keystores each of which gave a different error. This made me think it was something to do with the specific one I was using. It was time to troubleshoot the service itself not DNSSEC.

I got a list of the services from a known good, and signing, domain controller; then compared that to the bad one to see what was different. Part way down the list I noticed that Microsoft Key Distribution Service was failing to start, and if I tried to start it, there was an error.

Group Key Distribution Service cannot connect to the domain controller on local host Status 0x80070020.

Checking the Event Log showed an issue in finding the Domain Controllers on the network (error above), which was weird because it is a Domain Controller… In looking at where this system was placed in the domain tree, I saw it had been moved from the original OU for domain controllers to another place. I dragged it back, after applying all the GPOs that were on that other folder to the original Domain Controller folder. Then held my breath, hit start on the Key Distribution Service and it started right away.

After that DNSSEC signed with no issues. Long story short, dont move your DCs it’ll only end in pain. And to the one other person on the internet who has seen this problem and never solved it, 5+ years ago https://www.reddit.com/r/sysadmin/comments/3dedwm/dnssec_will_not_sign/ there is your answer!

US Patent US10530642B1

One of the projects I currently work on at work, and have for the last few years is how to go from a blank stack of servers to a fully configured cluster with my companies software running on it. While some projects were starting and getting going in the open source field when I started this project 5+ years ago, a lot of them kept rewriting their API every minor version rev. That started my down a path that has now become a decently large internal network booting infrastructure, and managing interconnects to our inventory system as well as other systems such as Tenable Nessus. I recently was awarded my first patent! This one is specifically about how my system interacts with the inventory to dynamically assign systems as they come online to clusters.

https://patents.google.com/patent/US10530642B1/en?oq=US10530642

My part of the code was all written in Java and continues to evolve as a platform, I hope to open source a good amount of it down the road. I started the project by reading the RFCs for DHCP/PXE and then writing code. I have grown to enjoy writing libraries and some project this way of adhering to the standard (more on that some other time). The general platform can handle ProxyDHCP PXE booting, and then uses iPXE to create menus and boot systems. I spent many hours debugging different vendors PXE code and BIOS vs UEFI to get all the systems to work. The platform now supports plugins for many different aspects of server configuration.

I could write page about small details I have learned a long the way; one issue that has been driving me crazy recently, if you want to ProxyDHCP instead of using your main DHCP stacks these days is Secure Boot. iPXE does not have a Secure Boot signed image, I have tried to get Microsoft to sign it but they will not unless you are selling a product using that the sign iPXE. I am not I just wanted it for internal use. That means you may want to use grub2 as your loader, but there is a bug that has been outstanding for over 6 years and makes ProxyDHCP with grub basically impossible, https://savannah.gnu.org/bugs/?55636 which is sad.

BuildingTents

Giving the campers something to read while they guard the flag

Work

IsoFileReader

Using Kerberos to Authenticate WinRM for Ansible

Troubleshooting

Bitbucket: Convert From Standalone ElasticSearch to Embedded OpenSearch

Migrating Back

Clean Tenable Nessus Scans for RHEL 7 with Podman

SSSD with Active Directory Only Showing Primary Group

CentOS/Rhel 8 Auto login Fix

Cisco ISR 4451 Serial Password Recovery

Redhat/CentOS 7-8 PKI/CAC/Smart Card SSH Login with Active Directory and SSSD

SSSD Config

SSSD Config

GSSAPI

Smart Card Auth

Add Certs to AD

Configuring SSSD for Cert Auth

Windows Server DNSSEC Error 9110

US Patent US10530642B1