Linux

Funny icon for Openshift post

Step-By-Step Getting Started with High Availability OpenShift 4.19 for a Homelab

Last post, looked at getting started with a SNO (Single Node OpenShift) system. Next we will look at a build with multi-node, or multi-master, OpenShift. This runs the core service of etcd on more than one node, allowing for a single node failure. Some services like the virtual machine services need to run on a master as well, having more than one relieves pressure on that system. With SNO, if your master does not start, the entire cluster cannot start. In addition, SNO upgrades will always introduce downtime with the single master rebooting.

Master nodes do have more services than a simple worker, if you are running a small cluster with 3 nodes, you may want to decide if the extra overhead on the second and third nodes are worth it, or if you want to run leaner and run SNO with extra workers. In my experience of vanilla OpenShift, masters run about 20GB of ram more than worker nodes with no additional services on them.

I have a 3 node cluster that I was migrating from VMware and wanted to run HA. This allows me to do no downtime upgrades, with the three nodes sharing the control role.

My Setup

I am installing onto 3 HP Elitedesk 800 G5s, each with an Intel 9700, and 96GB of RAM (they can go to 128GB when RAM prices aren’t insane). I have a dual 10gb/s NIC in each for networking since I will be running ceph. This is the same Homelab cluster I have had for a bit. These machines aren’t too expensive, they have 8 cores each, can go to 128GB of RAM, and have several PCI slots, and NVMe slots. I have used this guide to install OpenShift 4.17-4.20.

Installation Steps for HA OpenShift

Any line starting with $ is a terminal command to use. The whole process will take about an hour; 30 minutes or so to collect binaries and prep your config files, a minute or two to create the ISO, then 30 minutes of the cluster sitting there and installing.

One important thing to say up front to those who have not used Openshift or Kubernetes before: there is 1 IP that all the applications use, the web server looks at the request coming in and WHICH DNS NAME YOU CONNECTED TO, and then routes your traffic that way. You can have 100% of the things setup right, and when you browser to the IP you get “Application is not available” when trying to access the console. This means the system is working! You just need to connect via the correct DNS name.

  1. Prerequisites: Start by going to the same place as the original post to get a pull secret and binaries you will need for the install. These include openshift-install, and oc.
  2. I am on Fedora 42 and needed to run sudo dnf install nmstate to install nmstate. This is required to transform the configs in the agent-config.yaml into the configs that will be injected into the installation ISO.
  3. Make a folder, called something like “ha-openshift”, and put all the binaries in there.
  4. Config Files: Before we had install-config.yaml, now we will have that AND agent-config.yaml.
  5. Below is an install-config.yaml, I will call out things you will want to change for your setup:
    • apiVersion: v1
      baseDomain: example.com
      compute:
      - architecture: amd64
      hyperthreading: Enabled
      name: worker
      platform: {}
      replicas: 0
      controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform: {}
      replicas: 3
      metadata:
      name: cluster1
      networking:
      clusterNetwork:
      - cidr: 10.131.0.0/16
      hostPrefix: 23
      machineNetwork:
      - cidr: 192.168.4.0/24
      networkType: OVNKubernetes
      serviceNetwork:
      - 172.30.0.0/16
      platform:
      baremetal:
      apiVIPs:
      - 192.168.4.5
      ingressVIPs:
      - 192.168.4.7
      pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"b3Blbn==","email":"not-my-real-email@gmail.com"}}}'
      sshKey: ssh-rsa AAAAB
    • The “baseDomain” is the main domain to use, your hosts will be master0.<baseDomain>, the cluster name will be <metadata.name>.<baseDomain>. Make sure you put in what you want here because you can’t change it later. This is how users will reference the cluster.
    • Under workers and controlPlane, you put how many worker nodes and master nodes you want. This is a big difference between SNO and HA, we are saying 3 instead of 1 master.
    • metadata.name is the sub name of this exact cluster. You can have multiple clusters at lets say “example.com”, then setting this will make the cluster apps.cluster1.example.com. (Yes the DNS names get long with OpenShift)
    • clusterNetwork and serviceNetwork will be used internally for backend services, only change these if you are worried about the preset ones conflicting with your IP space.
    • machineNetwork.cidr is the IP space your nodes will live on, this needs to be set for your DHCP network. This is the range the network will use. Some of the IPs below will need static reservations in your DHCP network, the worker and master nodes can have general pool DHCP addresses. We are assuming DHCP here, you can statically assign IPs but its more work and not something I am going to talk about right here.
    • platform.baremetal.apiVIPs is where the API for your cluster will live, this is an additional IP the HA masters will hand back and forth to give the appearance of a single control plane.
    • platform.baremetal.ingressVIPs is another IP that will be handed back and forth but will be the HTTPs front door for applications.
  6. agent-config.yaml, I will call out things you will want to change:
    • apiVersion: v1alpha1
      kind: AgentConfig
      rendezvousIP: 192.168.4.10
      hosts:
        - hostname: hv1
          role: master
          rootDeviceHints:
            serialNumber: "AA22122369"
          interfaces:
            - name: enp1s0f0
              macAddress: 0c:c4:7b:1e:42:14
            - name: enp1s0f1
              macAddress: 0c:c4:7b:1e:42:15
          networkConfig:
            interfaces:
              - name: bond0.4
                type: vlan
                state: up
                vlan:
                  base-iface: bond0
                  id: 4
                ipv4:
                  enabled: true
                  address:
                    - ip: 192.168.4.10
                      prefix-length: 24
                  dhcp: false
              - name: bond0
                type: bond
                state: up
                mac-address: 0c:c4:7b:1e:42:14
                ipv4:
                  enabled: false
                ipv6:
                  enabled: false
                link-aggregation:
                  mode: 802.3ad
                  options:
                    miimon: "150"
                  port:
                    - enp1s0f0
                    - enp1s0f1
            dns-resolver:
              config:
                server:
                  - 192.168.3.5
            routes:
              config:
                - destination: 0.0.0.0/0
                  next-hop-address: 192.168.4.1
                  next-hop-interface: bond0.4
                  table-id: 254
        - hostname: hv2
          role: master
          rootDeviceHints:
            serialNumber: "AA22628"
          interfaces:
            - name: enp1s0f0
              macAddress: 0c:c4:7b:1f:06:e2
            - name: enp1s0f1
              macAddress: 0c:c4:7b:1f:06:e3
          networkConfig:
            interfaces:
              - name: bond0.4
                type: vlan
                state: up
                vlan:
                  base-iface: bond0
                  id: 4
                ipv4:
                  enabled: true
                  address:
                    - ip: 192.168.4.20
                      prefix-length: 24
                  dhcp: false
              - name: bond0
                type: bond
                state: up
                mac-address: 0c:c4:7b:1f:06:e2
                ipv4:
                  enabled: false
                ipv6:
                  enabled: false
                link-aggregation:
                  mode: 802.3ad
                  options:
                    miimon: "150"
                  port:
                    - enp1s0f0
                    - enp1s0f1
            dns-resolver:
              config:
                server:
                  - 192.168.3.5
            routes:
              config:
                - destination: 0.0.0.0/0
                  next-hop-address: 192.168.4.1
                  next-hop-interface: bond0.4
                  table-id: 254
        - hostname: hv3
          role: master
          rootDeviceHints:
            serialNumber: "203129F9D7"
          interfaces:
            - name: enp1s0f0
              macAddress: 0c:c4:7b:1f:03:c2
            - name: enp1s0f1
              macAddress: 0c:c4:7b:1f:03:c3
          networkConfig:
            interfaces:
              - name: bond0.4
                type: vlan
                state: up
                vlan:
                  base-iface: bond0
                  id: 4
                ipv4:
                  enabled: true
                  address:
                    - ip: 192.168.4.30
                      prefix-length: 24
                  dhcp: false
              - name: bond0
                type: bond
                state: up
                mac-address: 0c:c4:7b:1f:03:c2
                ipv4:
                  enabled: false
                ipv6:
                  enabled: false
                link-aggregation:
                  mode: 802.3ad
                  options:
                    miimon: "150"
                  port:
                    - enp1s0f0
                    - enp1s0f1
            dns-resolver:
              config:
                server:
                  - 192.168.3.5
            routes:
              config:
                - destination: 0.0.0.0/0
                  next-hop-address: 192.168.4.1
                  next-hop-interface: bond0.4
                  table-id: 254
    • rendezvousIP is an IP of a node in charge of the setup. You pick one of them to wait for all other masters/workers to be ready before starting the installation. It will wait for all nodes to be online, check they are ready, install them, then install itself.
    • The rest of this config is a three times repeated (one per host) setup of each host, things you will want to change:
  7. DNS Entries: Having created those two files, you know what you want your DNS to be. It’s time to go into your location’s DNS servers and enter addresses just like in the original post. These entries can be made at any time before you start the installation. In the end you should have 1 IP for ingress, 1 for api, then one per node.
    • api.cluster1.example.com -> apiVIPs, in my config 192.168.4.5
    • api-int.cluster1.example.com -> apiVIPs, in my config 192.168.4.5
    • *.apps.cluster1.example.com -> ingressVIPs, in my config 192.168.4.7
    • master0.cluster1.example.com -> node1 IP, in my config hv1 so I put 192.168.4.10
    • master1.cluster1.example.com -> node2 IP, in my config hv2 so I put 192.168.4.10
  8. Image Creation:
  9. $ mkdir ocp
  10. $ cp *.yaml ocp
  11. $ ./openshift-install –dir ./ocp/ agent create image 
  12. This will create a ocp/agent.x86_64.iso 
  13. Installation: Boot that iso on all servers. The image will use the hardware you specified in agent-config.yaml and DNS lookups to identify each node. Make sure the systems NTP is working, and their time looks correct, then that each node can curl:
    • registry.redhat.io 
      quay.io 
      cdn01.quay.io 
      api.openshift.com 
      access.redhat.com
  14. The stack should now install, the main server will show a screen saying the state of the other masters, and when they are all ready, it will proceed with install. This can easily take 30 minutes, and the screen on the rendezvous server can be slow to update.

With any luck you will have all the nodes reboot, and a running stack you can access at your console server location; here that would be console-openshift-console.apps.cluster1.example.com. Each node should show a normal Linux boot up sequence, then will show a login prompt, with that nodes name, and IP address(es). In this learning experience, feel free to restart the installation and the system will wipe the machines again.

In the ha-openshift folder, then the ocp subfolder there will be an auth folder. That will have the kubeadmin and kubeconfig files to authenticate to the cluster. The kubeadmin password can be used to login to oauth at console-openshift-console.apps.cluster1.example.com. The kubeconfig file can be used with the oc command downloaded from Redhat. using $ ./oc --kubeconfig ./ocp/auth/kubeconfig get nodes will show the nodes and their status from your installation machine.

Properly installed cluster example: 
~/homelab_openshift $ ./oc --kubeconfig ./ocp/auth/kubeconfig get nodes
NAME   STATUS   ROLES                         AGE   VERSION
hv1    Ready    control-plane,master,worker   44d   v1.32.9
hv2    Ready    control-plane,master,worker   44d   v1.32.9
hv3    Ready    control-plane,master,worker   44d   v1.32.9

This is an example of a successfully upgraded cluster running, and I am running the standard OpenShift oc get nodes command. Note: the version is the version of Kubernetes being run, not OpenShift.

I will continue this series with posts about Networking, Storage, and VM setup for OpenShift.

Troubleshooting

The install process for OpenShift has a big learning curve. You can make it a bit easier by using Redhats web installer, but that also puts some requirements on the system that a Homelab usually can’t hit, doing the agent based installer bypasses those checks. Once you get your configs dialed in, I have found it easy to reinstall a stack, but getting configs for a stack setup correctly the first few times is tough. The installer also does not do a ton to make it easier on you, if something goes wrong, the biggest indicators I have found are: when SSHed into the installer, the memory usage, the journalctl logs in the installer, and about 8-10 minutes into a good install, you will see the DVD image start to read a lot of data, constant activity on the indicator for a few minutes (that is the CoreOS being written to the disk).

Random things to check in a failing install:

  • SSH into a node using the SSH key in the install-config.yaml, run $ sudo journalctl and scroll to the bottom to see what’s going on, or just run $ sudo journalctl -f.
    • You may see something like:
      • “failing to pull image”: It can’t hit Redhat, or your pull secret expired
      • “ip-10-123-123-132.cluster.local node not recognized”: DNS entries need updated
  • If the system successfully reboots after an install, but you are not seeing the console start, SSH into a node using the SSH key in the install-config.yaml, run $ top. If your RAM usage is about:
    • 1GB, Kubernetes is failing to start, this could be a DNS or image download issue.
    • around 8GB, the core systems are attempting to come online, but something is stopping them such as an issue with the api or apps DNS names.
    • 12-16+GB of ram used, the system should be online.
  • Worth repeating for those who haven’t used Openshift before, internal routing is done via DNS names in your request, if you attempt to go to the ingress VIP via the IP you will get “Application is not available”. This is good! Everything is up, you just need to navigate to the correct URL.

Footnotes

Helpful examples: https://gist.github.com/thikade/9210874f322e72fb9d7096851d509e35

Mellanox Driver Ignoring Vlans

Recently I spend more hours than I want to talk about fixing a server that had a Mellanox ConnectX6-lx card, where I could not get Openshift to get traffic to VMs. I was creating bridges just like I normally do, and traffic was working to the main interface. After a lot of trial and error I wanted to make a quick post in case anyone runs into this.

All of this assumes a trunk port to an interface on Linux (or a bonded interface). If you have an interface in Linux on the native vlan, and that is a standard interface (example eno1). Then you add a sub interface for tagged traffic, eno1.10, the Mellanox mlx5 driver will – in hardware -ignore your vlan tag and just send traffic to the main interface.

One way to see if your card is doing this is search dmesg for “mlx5”: dmesg | grep mlx5, you may see the following:

mlx5_core 0000:0b:00.1: mlx5e_fs_set_rx_mode_work:843:(pid 156): S-tagged traffic will be dropped while C-tag vlan stripping is enabled

(https://github.com/oracle/linux-uek/issues/20)

The Mellanox card is worried about double tagged packets and will drop tags on data coming in. It does this in hardware. You can see the settings for 8021Q kernel module being loaded, and vlan filtering is disabled, but this wont matter. If you change settings like ethtool -K <interface> rx-vlan-offload off it will say the setting is off, but the underlying driver loaded this at init time, and then the settings you set will be ignored. The only way I found to fix it is to move all the IPed interfaces off the main interface.

Once you move the IPed interface under its own sub interface and reboot, data will start flowing to your VMs.

Three, maybe foxes? With a Kerberos, ansible, and radio active hat

Using Kerberos to Authenticate WinRM for Ansible

I have been trying to get Kerberos auth working with WinRM to be the authentication for transport mechanism within Ansible. I want to configure a Window system, from the non-domain-joined Linux host that runs my automations. Getting these two hosts to talk over WinRM introduces a bunch of options and difficulties with each one. If you look at the table on Ansible’s website for Windows auth with WinRM, you see only a few options for a domain joined machine:

https://docs.ansible.com/ansible/latest/os_guide/windows_winrm.html#credssp

I specifically needed it for an Active Directory account part of my setup was creating lab machines and building domain controllers on the fly. Basic auth is out, Certificate is out, what is left is Kerberos, NTLM, or CredSSP. Then to throw another wrench in this, the Ansible host and server are in FIPS mode. At this point FIPS disables MD5. NTLMv2 uses MD5 internally, which means it does not want to work with an FIPS enabled machine. CredSSP is backed by NTLM hashes as well making Kerberos your only option.

I did not want to have to domain join my Ansible machine to my Windows Domain; this is a test environment. Through a bunch of tinkering I have found a way to run Ansible, and have Ansible use a local krb5.conf file, instead of your system one in /etc/krb5.conf.

  1. I am on Rocky and installed the following:
    • dnf install krb5-devel krb5-libs krb5-workstation python3.12-devel
    • pip3.12 install pykerberos gssapi krb5 pypsrp[kerberos]<=1.0.0
    • (Note I am using python 3.12 for my Ansible)
  2. You do need the host you wish to connect to have its FQDN accessible from your Ansible system (we will assume Linux)
    • This can be in the hosts file or DNS
  3. Then you need to set the inventory.yml similar to:
    • my-host-post-domain:
            ansible_host: host.example.com
            ansible_user: Admin@EXAMPLE.COM
            ansible_password: WindowsPassword123
            ansible_connection: winrm
            ansible_winrm_transport: kerberos
            ansible_winrm_kinit_cmd: “./kinit.sh”
            ansible_winrm_message_encryption: never
            ansible_winrm_server_cert_validation: ignore
  4. Create a file where you launch ansible from, kinit.sh:
    • #!/bin/bash
      cd “$(dirname “$0″)”
      export KRB5_CONFIG=./krb5.conf
      kinit $1
  5. Create your krb5.conf file
    • [libdefaults]
          default_realm = EXAMPLE.COM
          dns_lookup_realm = false
          dns_lookup_kdc = false
          ticket_lifetime = 24h
          renew_lifetime = 7d
          forwardable = true
          rdns = false

      [realms]
          EXAMPLE.COM = {
              kdc = 192.168.100.2
              admin_server = 192.168.100.2
          }

      [domain_realm]
          .example.com = EXAMPLE.COM
          example.com= EXAMPLE.COM

      (I am purposefully disabling DNS lookup and using my IP addresses, that is up to you.)
  6. Then I run my Ansible with the following:
    • KRB5_CONFIG=./krb5.conf ansible-playbook -i inventory.yml site.yml

It seems if you do not have the kinit.sh file, then kinit does not see the config. And if you don’t have the environment variable before the Ansible command, when Ansible goes to use GSS to connect to the Windows system, Ansible will not see the config.

Troubleshooting

Some fun errors along the way:

  • Server not found in Kerberos database
    • This means the server you are CONNECTING TO cant be found, usually this means the ansible_host is not the FQDN. Then when kinit is done it tries to connect to AD via the IP and that fails.
  • Kerberos auth failure for principal Admin@EXAMPLE.COM with subprocess: kinit: Cannot find KDC for realm \”EXAMPLE.COM\” while getting initial credentials
    • It cant find the krb5.conf file, OR under [domain_realm], your mapping has an issue

Systemctl: Assignment outside of section. Ignoring.

I wanted to throw together a quick post for a recent issue I have seen on Redhat 7/CentOS 7 boxes. A recent OS update has brought a small but important change to SystemD. In the past if you wanted to add environment variables to a SystemD service, you could enter # systemctl edit postgresql-14 (note I will be using postgresql-14 as the example service in this post), then add a line such as:

Environment=PGDATA=/opt/postgres/14/data/

After saving the file, and starting the service you are good to go. Recently after a minor update, I started getting the error “[/etc/systemd/system/postgresql-14.service.d/override.conf:1] Assignment outside of section. Ignoring.”, then the service would not start. It turns out, you can no longer drop Environment lines directly into the SystemD overrides, you need to mark which section of the SystemD file you are overriding. Below is the new proper way to go about this:

[Service]
Environment=PGDATA=/opt/postgres/14/data/

Quick fix, but can take a bit of digging. Also for SystemD and Postgres 14, this is the current way to easily redirect the data folder. Hope this helps someone!

CentOS 8 Migration

I have a pipeline which creates live images to network boot different systems. Historically this has been based on CentOS. A little while ago I moved it to CentOS 8 because I had some newer hardware that was not supported on the older kernel of 7. Everything was working well until recently when CentOS 8 went end of life, and I could no longer rely on the CentOS 8 Docker containers.

The journey began for a new EL8 system. I wanted to keep on EL8 instead of switching to Streams because all the other systems I had running were EL8 (CentOS 8 or RHEL8), and I wanted to keep compatibility. At the same time, I didn’t want to do a new build of the image, have things break, and not realize it was because of a CentOS Streams change upstream. I also used the CentOS 8 docker container which seems to have been pulled, so that forced me to do this change now.

My first thought was Oracle Linux. It has been around for a while, is ALMOST drop in compatible, and can be used without going and getting licenses (RHEL). (There are some small silly things like instead of “epel-release” the package is “oracle-epel-release-el8”) This lead to nothing but issues. I replaced all the repos I had in the image creation stage with Oracle Linux ones, then every build I got a ton of “nothing provides module(platform:el8)” lines for any package that used yum/dnf modules. I spent a chunk of time on this, finding no real answers, and one Oracle support page that looked like it could help saying I needed to buy a support contract. Classic Oracle. At one point I thought it had something to do with Commit – rpms/centos-release – 89457ca3bf36c7c29d47c5d573a819dd7ee054fe – CentOS Git server where a line in os-release confuses dnf, but then that line was there. Also Oracle doesn’t seem to have a kickstart url repo, which is needed to do this sort of network boot. They wanted the end user to set that repo up, which may be the source of my issues. This also touched on the issue Disable Modular Filtering in Kickstart Repos – Red Hat Customer Portal, but I wasn’t even getting to a base OS setup, then I could make changes to the os and dnf for how it processes modules.

In my searches I did find this nice script to get bash variables for OS and version. https://unix.stackexchange.com/a/6348

Then I figured I would try either AlmaLinux or Rocky Linux. They both came out around when Redhat said Cent 8 was going away. Looking into both projects, they both are backed by AWS and Equinix who are big players, which made me feel a bit better about it. I had heard a bit more about Rocky and its support, so I tried that. I dropped in the new repos, and kickstart location, and everything just worked… Even things that were a issue when playing with Oracle Linux went away. For example, epel-release was once again called what it should be.

In the end so far it seems to be happy! We will see if any other small differences pop up and bite me…

Below is an example of the top of the kickstart I am using, if anyone is interested in more of how I create live images, leave a comment and I can do a post on it:

lang en_US.UTF-8
keyboard us
timezone Europe/Brussels --isUtc
auth --useshadow --enablemd5
selinux --disable
network --device=eno1 --bootproto=dhcp
skipx
part / --size 4096 --fstype ext4
part /opt --size 4096 --fstype ext4
firewall --disabled

url --url=https://download.rockylinux.org/pub/rocky/8/BaseOS/x86_64/kickstart/

# Root password
rootpw --iscrypted <Insert encrypted password here>

repo --name=baseos --baseurl=https://download.rockylinux.org/pub/rocky/8/BaseOS/x86_64/os/ --install
repo --name=extras --baseurl=https://download.rockylinux.org/pub/rocky/8/extras/x86_64/os/ --install
repo --name=appstream --baseurl=https://download.rockylinux.org/pub/rocky/8/AppStream/x86_64/os/ --install

SSSD with Active Directory Only Showing Primary Group

I was domain joining some Redhat Enterprise Linux 7 boxes to a Windows domain. Everything went smoothly except many of my users could only see their Primary groups. Some users whom had more permissions on the domain could see all their groups, just not some particular users. This seems to be a common failure scenario for SSSD with AD, and many people have opened bugs or chimed in with different fixes online. I found the solution on one forum post, and it saved me, and I wanted to amplify it.

As long as some of your users can see all their groups, you know its not exactly a problem with RHEL connecting to AD, or a protocol like LDAP being blocked. A odd side effect of this setup was periodically the groups could be scanned and then it would show the users in that group. If I ran “sss_cache -E“, then “getent group SecondaryGroup“, some of the time it would show the users inside the group. Then once the user logged in, the user would be removed via that command, as well as when I ran “groups” under the user.

The SSSD log didnt have a ton of help other than it couldn’t read all the groups. I tried a TON of the recommended settings, like enabling enumerate = True, enumerate = false, ldap_use_tokengroups = true, ldap_use_tokengroups = false; none of these changed anything. Then https://serverfault.com/a/938893 mentioned it may be a permissions problem between the computer object in AD and the user object. I looked and sure enough, my system had NO permissions on the users that were failing. I attempted to add the tokenGroups permission mentioned in this article and that still didnt help, but we were on the right track!

The answer came from https://serverfault.com/a/796005, there is a permission needed called “Read Remote Access Information”, once that is granted to your computer object onto the user, then secondary groups will start populating. I gave “Domain Computers” that permission, since it seemed to only be effecting some of the Linux systems and Windows was happy to have it as well.

Some random commands that can help you debugging SSSD:

SSSD likes to cache a lot, making it hard to troubleshoot, using the following clears all caches and restarts SSSD:

systemctl stop sssd && rm -rf /var/lib/sss/db/* && rm -rf /var/lib/sss/mc/* && systemctl start sssd

Installing HipChat 4 on Fedora/Rhel/CentOS/el 7

HipChat 4 has recently come out, and then shortly after it was released to my companies internal HipChat server. Being a Linux user I hoped that the aged HipChat 2 client was finally updated for Fedora or Red Hat or CentOS 7 so I could just use yum to install it. When I went to the download page the old yum instructions were replaced by only Ubuntu/Debian instructions! After playing around with the Debian package and getting it to load, I thought I would look at the repo a little more. Low and behold, Atlassian is making a yum repo! Just not publishing instructions on how to use it! The downside is they seem to not be signing the repo, but the code below works with yum to download the latest version.

 

sudo bash -c ‘cat > /etc/yum.repos.d/hipchat.repo << EOF_hipchat
[atlassian-hipchat]
name=Atlassian Hipchat
baseurl=https://atlassian.artifactoryonline.com/atlassian/hipchat-yum-client/
enabled=1
gpgcheck=0
EOF_hipchat’

sudo yum update

sudo yum install hipchat4

Fixing CentOS 6.6 Kickstart Issues

I recently have been working on a system automating CentOS 6 installs for servers. When upgrading to 6.6 my test environment (VMWare Fusion) stopped working. I got a hard kernel panic and halt on loading. Now VMware forums and CentOS site, have posts about work arounds for this. A bunch of them are complex and involve changing modules around, and other files. There is a very easy fix for this, and its detailed below.

NOTE: I am running VMware Fusion, so I will open a package, in Windows and Linux you dont have to do this, just go to the folder.

  1. Stop the VM
  2. Find the VM files
    1. For Fusion there will be a %Your VM%.vmwarevm file, you have to right click that and “Show package contents”
  3. There should be a %Your VM%.vmx file, open that with a text editor
    1. If you are on a Mac, or other machine that likes to do smart quotes, make sure to use a program like vim or Sublime Text that doest add “smart quotes”
  4. A line will read: ethernet0.virtualDev = “e1000e”, change that to ethernet0.virtualDev = “e1000”, just remove the last e. This changes the card from a E1000 in enhanced mode to a normal one. Now CentOS 6.6 will boot.

Here are some place people have discussed issues:

https://communities.vmware.com/message/2443777

VM Experimentation

I am the type of programmer/IT person who enjoys having all my experimentation of systems done inside a virtual machine. That way if I break something, I can easily role back the virtual machine or just delete it. As seen in my last post, I recently built a new NAS. The original plan was to turn my old server into a Proxmox or ESXi box, the downside to that plan I found out quickly; the old box used DDR2, and at this point to get DDR2 memory it is quite expensive. That, along with my worry of power usage on the old box, I decided to give another solution a try.

After researching around I found my local Fry’s Electronics had the Intel NUC in stock. This is a tiny tiny PC that can take up to 16GB of RAM, has an Intel Core i5, and only uses 17 Watts. The box also has Intel vPro; what is vPro you ask? vPro allows you to remotely manage the system, so I can remote into it without buying a fancy management card, I can also remote power the box on and off, or mount a virtual CD. not bad for a ~$300 box. The model I got, DC53427, is a last gen i5, so it was a little cheaper, at the cost of having only 1 USB 3.0 port. It came with a VESA mount, so the NUC could be attached to the back of a monitor, that was a nice feature. I got USB 3.0 enclosure for 2 older 500GB hard drives, and used those as my storage. I installed Proxmox  on the system since my work has been starting to use that software more and more, and this was a chance for me to learn it.

A quick note about Proxmox to those who have not used it, I had come from a VMWare background so my work was my first experience with Proxmox. It is a free system, the company offers paid subscriptions for patches and such, without that the web page bothers you one time when you login, and you just dismiss the message. The software is a wrapper around KVM and some other Linux virtualization technologies. It can handle Windows and Linux systems without a problem. The interface is completely web based, with a Java virtual console; if you don’t update to the latest patches the java console can break with Java 7 Update 51. The software works well enough. There are still some areas that is needs improvements; in VMWare if you want to make a separate virtual network you can use their interface, on Proxmox that’s when you go to the Linux console and start creating virtual bridges. But once I got everything working, it seemed to work well. I don’t know how long I will keep it without trying another system, but for now it is nice. Since the system relies on KVM, it can do feautres like Dynamic memory allocation, if a VM is only using 1 GB of ram but is allocated 6, it will only take 1GB at that time. Also KVM can do deduplication of memory, so if two VMs are running the same OS, it only stores those files in memory once, freeing up more memory space.

I ran into one problem during install of Proxmox, the NUC is so fast, that it would start to boot before the USB 3.0 hard drives had been mounted. After searching around everywhere I found a fix on http://forum.proxmox.com/threads/12922-Proxmox-Install-on-USB-Device; adding a delay in the GRUB boot loader allows enough time for the system to mount the LVM disks correctly and then start. At first I just went to the Grub boot menu, hit “e” then added “rootdelay=10”, to the “linux /vmlinuz-2.6.32-17-pve root=/dev/mapper/pve-root ro rootdelay=10 quiet” line. After the system loaded I went into /Boot and added the same entry to the real Grub menu. Now I had a Intel NUC with 1TB of storage and 16GB of RAM. I could have used the NAS with iSCSI, but that was a lot of config I didn’t want to do; along with, I was setting up some Databases on the system and didn’t want the overhead of using the NASs RAIDZ2 at this time.

I have been using it for a few weeks, and its a nice little box. It never makes a audible level of noise (although it does sit next to its louder brother the NAS). Down the road if I want more power I can always get another NUC and put Proxmox into a clustered mode. These boxes keep going down in price and up in power, so this can grow with my needs.

LDAP Authentication RPI Tutorial (Part 2)

Last time I spoke of how to setup ldap with PHP and briefly touched on using the “ldapsearch” command. I would like to go more in-depth on “ldapsearch”, and show you how you can use it to craft searches for your PHP application. Specifically for RPI, if the user has a RCS account, they can ssh into “rcs-ibm.rpi.edu” and run the following commands. (RCS-IBM puts you on either clark.server.rpi.edu or lewis.server.rpi.edu, these two have the commands you need on them and run AIX) To briefly review the command:

  • First we add the command, then enter the host you are searching, tell the server to try simple anonymous authentication. Next give the server a base to start the search (I am using RPI specific domain components), finally we have to give the heart of our search. I am looking for any Unique ID (username) that starts with “berk”, and ends with anything “*”.
  • ldapsearch -h ‘ldap.rpi.edu’ -x -b ‘dc=rpi, dc=edu’ ‘uid=berk*’”

The main part of the search we will be editing is the ending. Here we specify a filter to find the information we are attempting to access. Each LDAP server has different attributes it can give about each object. For example, the ldap.rpi.edu server gives out “givenName, objectClass, cn(full concatenated name, or common name), sn (surname), loginShell,” and many others; while at the same time “ldap1.server.rpi.edu” returns a much different lists of results.

Finding Which Attributes Will be Returned

The best way to find which fields are available is by doing a search without a filter. Just running the search below will return an unfiltered list of everything in the directory, up till you hit the individual servers limit. I am purposefully not publishing results from these searches for privacy reasons; here is some results for me with some data omitted.

  • “ldapsearch -h ‘ldap.rpi.edu’ -x -b ‘dc=rpi, dc=edu’”
  • # berkod2, accounts, rpi, edu
    dn: uid=berkod2,ou=accounts,dc=rpi,dc=edu
    sn: Berkowitz
    cn: Berkowitz, Daniel
    objectClass: top
    objectClass: posixAccount
    objectClass: inetOrgPerson
    objectClass: eduPerson
    objectClass: rpiDirent
    objectClass: mailRecipient
    objectClass: organizationalPerson
    objectClass: person
    uid: berkod2
    loginShell: /bin/bash
    uidNumber: #####
    mailAlternateAddress: berkod2@rpi.edu
    givenName: Daniel
    gecos: Daniel  Berkowitz
    rpiclusterhomedir: /home/berkod2
    description: PRIMARY-STU
    homeDirectory: /home/06/berkod2
    gidNumber: ###

Now that we have an idea about the data structure and what this server has on it we can reverse the lookup and tweak it. I know ‘uid’ will be the username, and I can get the users name from that! So using CAS I can log a user in and get their username, then I can lookup there LDAP information. (EXAMPLE 1) If a user enters a name, then a user can search for their UID doing the reverse. (EXAMPLE 2) The wild card can also be used if the full name is not known. (EXAMPLE 3) Last we can use multiple fields, combining these ideas to narrow down the result. (Example 4)

  • Example 1
    • “ldapsearch -h ‘ldap.rpi.edu’ -x -b ‘dc=rpi, dc=edu’ ‘uid=berkod2’”
  • Example 2
    • “ldapsearch -h ‘ldap.rpi.edu’ -x -b ‘dc=rpi, dc=edu’ ‘sn=Berkowitz’”
  • Example 3
    • “ldapsearch -h ‘ldap.rpi.edu’ -x -b ‘dc=rpi, dc=edu’ ‘sn=Berko*’”
  • Example 4
    • “ldapsearch -h ‘ldap.rpi.edu’ -x -b ‘dc=rpi, dc=edu’ ‘sn=Berko*’ ‘uid=berkod*'”