Dragged by the roots

This one had me scratching my head for a while today. A client and an ex-client both contacted me with strange HTTP connectivity issues, which manifest as errors occurring on one server while the exact same code is working elsewhere. The logs revealed that a HTTPS connection was being rejected because the connection to the external site could not be validated. The problem was that the root certificates were out of date, and the external site was using Let’s Encrypt SSL certificates, which as of this month (October 2021) has a new compatibility restriction meaning their certs can only be validated by a client if the client trusts the ISRG Root X1 certificate. That restriction prevents functionality on iPhones running anything before iOS 10, anything earlier than macOS 10.12.1, various Kindles, early versions of Java 7 and 8, Firefox before v50, and much more. If your Web page makes use of a third-party service that behind-the-scenes is connecting to a site with the latest Let’s Encrypt certificate, that interaction with the third-party could fail due to certificate rejection.

In some cases it’s easy to resolve. Just update the OS/Browser/VM (e.g. AMI)/platform/etc. or whatever it is that is connecting to the LE-certified site. In other cases, such as older pre-iOS 10 devices, your luck has run out.

This is going to get worse as the weeks roll on. Many sites still have older LE certs installed so their clients are currently OK, but the sites automatically update and at some point they will be issued a new LE cert that has the ISRG Root X1 requirement. Once that site gets that upgrade, many of its clients could be affected.

Prepare for increased customer service calls and a lot of tearing hair out by the roots.

Bare bones bleeding edge cluster

Tomcat(xN)+Kubernetes+Docker+Rocky

Audience: SysAdmin, DevOps, Un*x coders

Sometimes, just for exercise, I go nuts. This time I figured my exercise would be to create a Hello World micro-services demo by building a bare bones cluster on the bleeding edge Rocky 8.4 operating system. I say “bleeding edge” but in fact I would just be using recent stable versions of several technologies, rather than the actual bleeding edges.

You can read about my chosen technologies elsewhere, along with all manner of explanations, charts, diagrams and occasionally some sample configurations or lines of code. For this document I stick to command lines and raw configurations, with the aim of providing something that you can copy/paste verbatim and achieve the same result. If you need pictures, follow the links.

The command line samples in the text below are assumed to be executed as root so either use sudo or an elevated shell. Also assume that config files created manually by root will have permission 644 unless chmod is indicated subsequently.

For reference, this is late-August 2021 and the technologies I have chosen are:

This isn’t a ground-up tutorial. Some of the assumptions I make are that the reader is familiar, and somewhat experienced, with the following:

  • bash and other Un*x/Linux staples
  • vi (vim, nano, ed or whatever text editor you prefer, though I’ll be using here-docs a lot)
  • SELinux (no, don’t disable it, that’s a bad habit to acquire)
  • iptables, firewall-cmd, wget and other networking utilities
  • VirtualBox, VMWare or any other VM solutions (or even bare metal!)

Let’s start with the overall architecture. There will be one small Hello application that supports a single URL path that says “Hello, I am X” where X is the identity of the instance within the cluster that is running the application. The running demo will have multiple such instances, and the possible responses to a HTTP GET will be “Hello, I am 1”, “Hello, I am 2”, “Hello, I am 3” etc. By refreshing the browser (or wget) you will see that all the instances will get opportunities to respond.

Hello (one line of Java contained in JSP)

This Hello application is implemented by this simple three-line JSP file named index.jsp: that simply prints out the name of the host server on which it is running (assuming a HOSTNAME environment variable has been set).

<%@page contentType="text/plain" session="false"%><%
  out.println("Hello, I am "+System.getenv("HOSTNAME"));
%>

There is also this one-line file called context.xml:

<?xml version="1.0" encoding="UTF-8"?><Context path=""/>

These two files are placed in a Zip file named ROOT.war (a “Web Archive”) so that it contains:

  • META-INF/context.xml
  • index.jsp

This is one of the smallest deployable applications that you can create. You just need an editor and a Zip tool, though obviously you could use an IDE or similar tools to create the files (and others like MANIFEST.MF) and generate the WAR file. Nevertheless, a Zip called ROOT.war with those two files inside is all you actually need. I will create the ROOT.war file from the command line later.

Tomcat Container

The Hello application cannot run on its own. The index.jsp is a Jakarta Server Page (which used to be known as JavaServer Pages until Jakarta EE took over) and a JSP runs inside a Servlet container. The container compiles the JSP at runtime to produce Java source, which is in turn compiled to Java binary and executed. Recompilation only happens if the JSP is modified. JSP is a mix of content and code, and that’s not always a good idea but there are places for such things and this demo is one of them.

Apache Tomcat is a popular open-source Servlet-enabled Web server and is perfect for running JSP. Tomcat is written in Java and runs in almost any environment that has a compatible Java implementation. When the ROOT.war is placed into Tomcat’s “webapps” directory, Tomcat detects the new file, decompresses it to produce “webapps/ROOT” and serves the content of that new directory via HTTP.

Docker Container

Instead of deploying Tomcat to a dedicated server, the plan is to put Tomcat in a Docker container. Then three of these would be hosted by a Rocky 8.4 server under instructions from Kubernetes. To make things more interesting, this Rocky 8.4 server would itself be a virtual machine hosted by VirtualBox on a Windows 10 PC.

Docker containers make use of the Linux kernel in which they are hosted, so my containers would be using the Rocky 8.4 kernel.

Rocky

This is where the real work starts. The prerequisites are:

  • Windows 10 Pro 21H1 build 19043.1165 Quad x64 32Gb, though any recent Win64 with enough RAM is OK
  • VirtualBox 6.1
  • Latest ISO of Rocky 8.4
  • Coffee (the drink…)

Start VirtualBox (or equivalent VM solution, or physical components!) and create a new “machine” with the following characteristics:

  • Type=Linux, Version=RedHat64
  • 4Gb RAM, PIIX3, I/O APIC, h/w UTC clock
  • 2 processor cores, no cap, PAE/NX
  • Bridged network, Realtek controller, make a note of the MAC address if you are doing DHCP
  • 32Gb storage on SATA controller
  • Connect the downloaded Rocky-8.4-x86_64-dvd1.iso file to the IDE Controller

Start the VM, which will launch the setup GUI after a few seconds of scrolling diagnostics. Here’s a brief summary of my preferred settings:

  • Keyboard and language to match the host machine
  • Complex root password
  • Ethernet on, connected, fixed IP (or DHCP if your local DHCP has the MAC mapped)
  • NTP connected to the nearest national pool of time servers
  • Software selection: “Server” with some additional bits, including:
    • Basic Web Server
    • Headless Management
    • System Tools

The setup can take a few minutes to complete. In the end you will have a Rocky 8.4 VM, to which you can connect via SSH (using PuTTY, for example). The first thing you do when you log in is this:

dnf -y update

The software selection during setup ensures that the following are already installed when you first log in:

  • Apache httpd 2.4.37
  • OpenSSL 1.1.1g
  • Perl 5.26.3

There’s still some more housekeeping to do. Here’s my suggestion, based on a mixed bag of things I find useful:

dnf -y install epel-release
dnf clean all
dnf -y upgrade epel-release
dnf -y install gcc openssl-devel openssl-perl
dnf -y install perl-CPAN perl-App-cpanminus
PERL_MM_USE_DEFAULT=1 cpan   # NOTE: if asked, accept all defaults
  cpan[1]> o conf prerequisites_policy follow
  cpan[1]> o conf commit
  cpan[1]> exit
cpanm --force HTTP::Daemon HTTP::Daemon::SSL
dnf -y install perl-Crypt-OpenSSL-RSA perl-LWP-UserAgent-Determined
cpanm MIME::Types Email::MIME Digest::SHA1 IO::Socket::SSL LWP::Protocol::https
dnf -y install perl-JSON
cpanm HTTP::Request::Params
cpan -i Time::Piece

You’ll notice a mix of dnf, cpan and cpanm in there, all to install Perl modules. Why not just one tool? Turns out that some modules install better with specific tools, and often in a specific order. The above incantation works for me. YMMV

Note: remember that the Perl Crypt::SSLeay module has been superseded, so don’t install it.

You should also check that your server’s hostname is known to DNS or listed in /etc/hosts, as this will be needed later when initialising Kubernetes.

Fresh restart

It’s probably a good time to check if Rocky needs to be rebooted:

dnf needs-restarting -r

If necessary, reboot as follows:

reboot

Docker Community Edition (CE)

Check the list of repositories that are currently installed:

dnf repolist

The Docker CE repo is not on that list by default, so let’s add the CentOS docker-ce-stable to the list (because there’s currently no Rocky version but CentOS should be the same). Unfortunately, Rocky (and RHEL8+) comes with podman and buildah pre-installed but they clash with Docker, so although Podman is a cool feature of Cockpit (which is also pre-installed) these have to go. Podman could be used as a drop-in replacement for Docker, but there are some subtle differences such as Podman not using a daemon like Docker. Maybe I’ll switch to Podman later, but for now, it’s being removed in favour of Docker.

dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
dnf -y erase podman buildah
dnf -y update
dnf -y install docker-ce docker-ce-cli containerd.io
setsebool -P container_manage_cgroup on # to allow containers use systemd

Podman uses systemd but Docker by default uses cgroupfs. Later, kubelet will expect Docker to be using systemd, so I need to make a few changes. Get the two files (docker.service and docker.socket) from GitHub and copy them to the /etc/systemd/system directory.

cd /etc/systemd/system
wget https://raw.githubusercontent.com/moby/moby/master/contrib/init/systemd/docker.service
wget https://raw.githubusercontent.com/moby/moby/master/contrib/init/systemd/docker.socket
cd ~

Docker doesn’t have a /etc/docker/daemon.json file by default, so create that file and put the following into it:

{
  "exec-opts": ["native.cgroupdriver=systemd"]
}

It’s now just a matter of making sure Docker will start on boot, starting it right now for immediate use, and checking that it has in fact started:

systemctl daemon-reload
systemctl enable docker
systemctl start docker
systemctl status docker --no-pager   # avoids needing to press 'q'

If a non-root user is planning to run Docker, that other user should be added to the ‘docker’ group:

usermod -aG docker otheruser

At this point Docker should be ready to use, and as no core elements have been changed, you shouldn’t need to reboot the server, but unless there’s something else running, a reboot might be advised at this point.

Tomcat from DockerHub

The docker command interacts with DockerHub by default, and there one will find many different Docker images free to use. There are many Tomcat-related images to choose from. At the time of writing, this is what I see:

docker search tomcat
NAME                          DESCRIPTION                                 
tomcat                        Apache Tomcat is an open source implementati...
tomee                         Apache TomEE is an all-Apache Java EE certif...
dordoka/tomcat                Ubuntu 14.04, Oracle JDK 8 and Tomcat 8 base...
kubeguide/tomcat-app          Tomcat image for Chapter 1                  
consol/tomcat-7.0             Tomcat 7.0.57, 8080, "admin/admin"          
cloudesire/tomcat             Tomcat server, 6/7/8                        
aallam/tomcat-mysql           Debian, Oracle JDK, Tomcat & MySQL          
arm32v7/tomcat                Apache Tomcat is an open source implementati...
andreptb/tomcat               Debian Jessie based image with Apache Tomcat...
rightctrl/tomcat              CentOS , Oracle Java, tomcat application ssl...
unidata/tomcat-docker         Security-hardened Tomcat Docker container.  
arm64v8/tomcat                Apache Tomcat is an open source implementati...
amd64/tomcat                  Apache Tomcat is an open source implementati...
fabric8/tomcat-8              Fabric8 Tomcat 8 Image                      
cfje/tomcat-resource          Tomcat Concourse Resource                   
oobsri/tomcat8                Testing CI Jobs with different names.       
jelastic/tomcat               An image of the Tomcat Java application serv...
camptocamp/tomcat-logback     Docker image for tomcat with logback integra...
ppc64le/tomcat                Apache Tomcat is an open source implementati...
99taxis/tomcat7               Tomcat7                                     
picoded/tomcat7               tomcat7 with jre8 and MANAGER_USER / MANAGER...
s390x/tomcat                  Apache Tomcat is an open source implementati...
softwareplant/tomcat          Tomcat images for jira-cloud testing        
secoresearch/tomcat-varnish   Tomcat and Varnish 5.0

That’s a lot of Tomcat. The first one on the list, named simply “tomcat”, is the one I want, and its source can be inspected on GitHub.

Running a test Tomcat Docker image

First I pull the official Tomcat image from DockerHub. (I could alternatively pull it from a private mirror, if I had created a private mirror.) The GitHub repo shows there’s a 10.0.10 on JDK 16, with the Dockerfile found at tomcat/10.0/jdk16/openjdk-buster/Dockerfile so that’s the image I’m going to use:

docker pull tomcat:10.0.10-jdk16-openjdk-buster

Note the structure of the image name: name:version-jvm-dist. The command produces a nice TTY animated download progress map and within a minute I have the image stored locally for use.

The details of this image are stored in a large JSON blob that can be found in this directory:

/var/lib/docker/image/overlay2/imagedb/content/sha256

The Tomcat image contains a deployment of Tomcat that listens to HTTP requests on port 8080, and serves content from its webapps directory (if there were any content deployed there). As the out-of-the-box image of Tomcat contains no content, any HTTP request is going to be getting a “404” response (Not Found), but that would be enough to show that Tomcat was running.

For this test I will map the host’s local port 18080 to the Docker Tomcat’s port 8080. This command should be run in a separate SSH/TTY session because it will spit out the Tomcat stdout stream and keep the output stream open, until Tomcat is later shut down. If you examine the stdout stream you will notice that Tomcat reports version 10.0.10 with JVM 16, exactly as expected. The -i option lets the container consume stdin and -t gives it a pseudo TTY for stdout. The –rm will remove the container when it’s stopped, and -p is the port mapping. Note that this command will hold on to the stdin (keyboard), so you will need a second terminal until the container is stopped.

docker run -it --rm -p 18080:8080 tomcat:10.0.10-jdk16-openjdk-buster

You can test that this instance of Tomcat is listening on port 18080 using wget in the second terminal session:

wget -O- http://localhost:18080/
--2021-08-DD 01:23:45-- http://localhost:18080/
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:18080... connected.
HTTP request sent, awaiting response... 404
2021-08-DD 01:23:45 ERROR 404: (no description).

As expected, Tomcat returned a 404 because there is no content to return.

Stopping Tomcat

If Tomcat were running directly on the VM I could shut it down by making a request to its shutdown port as defined in its server.xml configuration file. In the case of Docker, there really is no point in shutting down just Tomcat. The entire Docker instance should be shut down. So, given that stdin is being consumed by the running container, in a separate console session I stop the container as follows:

First get the details of the running container processes:

docker ps
CONTAINER ID IMAGE                               COMMAND           CREATED
STATUS        PORTS                                       NAMES
3a1470c24d01 tomcat:10.0.10-jdk16-openjdk-buster "catalina.sh run" 22 minutes ago
Up 22 minutes 0.0.0.0:18080->8080/tcp, :::18080->8080/tcp unruffled_saha

From this one can see that the container is identified as 3a1470c24d01 and can be stopped via:

docker stop 3a1470c24d01

It might have been easier to identify the container if I had used –name demo in the docker run command.

Docker Tomcat instance with a deployed ROOT.war

Now it gets really interesting. I want to run a Docker Tomcat container that launches with the ROOT.war in place so that it expands and becomes visible via HTTP. There are several ways this can be achieved. Here are some possibilities.

Mapping the webapps directory to a host path

If you check the configuration of the image via:

docker image inspect tomcat:10.0.10-jdk16-openjdk-buster | grep CATALINA

you will discover that CATALINA_HOME is defined to be /usr/local/tomcat

The docker run command can take a -v option that mounts a volume onto a container path, and Tomcat will be expecting to find its Web applications located at /usr/local/tomcat/webapps within the container, so you could run the image so that the container’s webapps directory is actually a directory on the host machine (e.g. /var/webapps).

docker run -v /var/webapps:/usr/local/tomcat/webapps ...etc...

If the Web application file is located in the host server’s filesystem at /var/webapps/ROOT.war then the Tomcat running in the Docker container will decompress the file into directory /var/webapps/ROOT and run the application therein.

However, when the container is shut down, the contents of /var/webapps will still contain the directories and files created by Tomcat.

This is messy. Furthermore, if you have two or more containers mounting the same host path, they will contest each other (and make a bigger mess if the .war file is updated causing all the Tomcats to start decompressing it…).

Separate mount points per container

You could have /var/webapps1, /var/webapps2 and so on. That’s still messy.

Deploy without unpacking

You could modify Tomcat’s server.xml file to set unpackWARs to false and that would mean that Tomcat would run the Web application from within the .war file without writing anything new to the webapps directory. This is OK if you are 100% sure that the application is not going to try accessing co-resident resources as if they were files, because they won’t be available as files. As Tomcat’s default behaviour is to unpack first, it is common for Web applications to be implemented with the assumption that resources are available as files.

Also, editing server.xml within a Docker image at this point in the demo is a bit of Ninja gymnastics too far, so let’s park this one for now.

Copy ROOT.war from the host into the container

What I need is an image that is like the official Tomcat image I’ve already pulled from DockerHub, except it now contains a copy of my ROOT.war in the image’s file system at /usr/local/tomcat/webapps/ROOT.war (which will be the full path within the container). To do this, I will first create a directory in the host server (my Rocky 8.4 instance) that will represent the context for my custom Tomcat Docker image that I am about to create.

mkdir -p /app/contexts/mytc/ROOT/META-INF

Create the bare-bones Hello World application file named ROOT.war as follows:

cat <<EOT > /app/contexts/mytc/ROOT/index.jsp
<%@page contentType="text/plain" session="false"%><%
  out.println("Hello, I am "+System.getenv("HOSTNAME"));
%>
EOT
cat <<EOT > /app/contexts/mytc/ROOT/META-INF/context.xml
<?xml version="1.0" encoding="UTF-8"?><Context path=""/>
EOT
(cd /app/contexts/mytc/ROOT && zip -r ../ROOT.war *)

Now create a text file named /app/contexts/mytc/Dockerfile containing just two lines (FROM and COPY):

cat <<EOT > /app/contexts/mytc/Dockerfile
FROM tomcat:10.0.10-jdk16-openjdk-buster
COPY ROOT.war /usr/local/tomcat/webapps/ROOT.war
EOT

Build the custom image:

docker build -t mytc /app/contexts/mytc

You can use docker image ls to confirm that the new image is present. In my case it is 670Mb in size, similar to the original Tomcat image on which it is based because the extra ROOT.war that I’ve added is tiny. Run the custom image in a new named container, but be warned this will be slow, for reasons that will be explained:

docker run --name mytc1 -it --rm -p 18080:8080 mytc

As before, the running Tomcat displays its startup diagnostics via stdout (on screen), which contains the following lines:

org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/10.0.10]
org.apache.catalina.startup.HostConfig.deployWAR Deploying web application archive [/usr/local/tomcat/webapps/ROOT.war]

However, there could now be a pause of a minute or two (or three…) on account of the container being low on entropy, which Java needs in order to get its random number generator working properly. Meanwhile, in a separate TTY session, use wget to confirm that the Tomcat running in container mytc1 is serving the demo application:

wget -qO- http://localhost:18080/

(This is the point where you could be waiting a while.) By the time this returns a result, you should be seeing the following in Tomcat’s stdout:

org.apache.catalina.util.SessionIdGeneratorBase.createSecureRandom Creation of SecureRandom
   instance for session ID generation using [SHA1PRNG] took [281,278] milliseconds.
org.apache.catalina.startup.HostConfig.deployWAR Deployment of web application archive
   [/usr/local/tomcat/webapps/ROOT.war] has finished in [282,242] ms
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
org.apache.catalina.startup.Catalina.start Server startup in [282491] milliseconds

That was surprising for me. 282 seconds was a long time to wait for some kind of response. However, it is clear that the delay is in the SecureRandom setup. You can get around this nuisance a number of ways, such as adding the following to the JAVA_OPTS environment variable within the container:

-Djava.security.egd=file:/dev/./urandom

Since the default Docker image for Tomcat does not actually define JAVA_OPTS, it can be added as an environment variable setting to Docker’s run command, as follows (but don’t run this while mytc1 is already running!):

docker run --name mytc1 --env "JAVA_OPTS=-Djava.security.egd=file:/dev/./urandom" -t --rm -p 18080:8080 mytc &

Note that in this version of the run command I used -t instead of -it (dropping the i option) to avoid the container capturing my keyboard inputs, and added a & at the end to make the docker process run independently of my console. The stdout/stderr still appears on my screen, but I could redirect that elsewhere too if I wanted.

Also, if I want the JAVA_OPTS to be baked into the image, I can use this alternative version of the Dockerfile when building the image:

FROM tomcat:10.0.10-jdk16-openjdk-buster
ENV JAVA_OPTS=-Djava.security.egd=file:/dev/./urandom
COPY ROOT.war /usr/local/tomcat/webapps/ROOT.war

Meanwhile, what about the output of the wget command? You’ll get something like this:

Hello, I am f28da8e35a8a

The hex code is the value of the HOSTNAME environment variable and it is unique to the container. You can see the complete list of environment variables inside the running container as follows:

docker exec -it mytc1 env

Also, I can take a look at the contents of the running container’s webapps directory to see it has decompressed ROOT.war in-situ:

docker exec -it mytc1 ls -l /usr/local/tomcat/webapps

Finally, stop the container (which, thanks to the earlier –rm option, will delete itself once stopped):

docker stop mytc1

Note that the ROOT.war is part of the mytc image that I built. If I change or delete the /app/contexts/mytc/ROOT.war it will have no impact on the mytc image. If I want to update the image, I have to rebuild it (i.e. the same docker build command as before, thus replacing the mytc image). Since I prefer not to have to add constant env options when running containers, rebuilding the mytc image is what I did.

Multiple containers

At this point I have a custom-built Docker container that runs Tomcat 10 with Java 16 to serve my ROOT.war application that simply displays the HOSTNAME of the running container. Now I am going to run two of them, listening on different ports, and show that they are both running and have different host names.

docker run --name mytc1 --rm -t -p 18080:8080 mytc &
docker run --name mytc2 --rm -t -p 28080:8080 mytc &

The second container is named mytc2 and is listening on port 28080. Other than that, it is essentially the same as the first container. Now let’s see their output:

wget -qO- http://localhost:18080/
Hello, I am a8de90afe9c0
wget -qO- http://localhost:28080/
Hello, I am b8fb90f97a3e

The responses to the requests on the different ports show that the HOSTNAME environment variable is different in each Docker container.

Finally, I shut them down (combined in a single command):

docker stop mytc1 mytc2

I noticed that shutting down the two containers freed just over 200Mb of RAM, establishing the low-water mark of memory consumption for the container alone. Once you replace Hello World with something more substantial, resource usage will grow.

Kubernetes (Kυβερνήτης, K8s, KooBehrNayTees, “Pilot”)

Next I am going to set up Kubernetes to have a cluster of just one server node (the Rocky 8.4 instance), wrap a Kubernetes pod around a single Docker container running an instance of the mytc image, then tell Kubernetes to run multiple pods, exposing the cluster via port 8080. I should then be able to connect to port 8080 and see an arbitrary “Hello, I am X” response from one of the running Tomcats each time I make the request.

Before I can do any of that, I need to install Kubernetes. I could install minikube instead of the full Kubernetes but where’s the fun in that?

Preparing K8s for Rocky

Kubernetes will be installed on the single Rocky 8.4 server, so this will be a one-node cluster, a single control plane with no worker nodes other than itself. Obviously at least two more worker nodes would be nice, but this is just “Hello World”.

Retain SELinux

Docker is already installed. Using dnf you can also confirm that container-selinux is also installed. Some K8s docs suggest disabling SELinux, but for security reasons it’s better to ensure that containers can work with it. As an active SELinux can sometimes cause unexpected failures, especially when some installers use mv instead of cp (thus carrying the context of the moved file instead of setting a fresh dir-inherited one on the copy), I find it useful to have a separate console watching for failures:

tail -f /var/log/audit/audit.log | perl -pe 's/(\d{9,}\.\d\d\d)/localtime($1)/e' | grep -e '\(failed\|denied\)'

Omit the grep if you want to watch all the fireworks.

Network ports

Kubernates uses several TCP ports, namely 2379, 2380, 6443, 1025, 10251, 10252 and 30000 up. If there are multiple servers acting as nodes in the cluster, make sure the firewall allows these as needed, that your iptables setup can see bridged network traffic and that each node has a unique MAC address and product_uuid (used to identify individual nodes). The recipe for this is:

firewall-cmd --zone=public --permanent --add-port={6443,2379,2380,10250,10251,10252}/tcp
firewall-cmd --zone=public --permanent --add-rich-rule 'rule family=ipv4 source address=172.17.0.0/16 accept'
firewall-cmd --reload

On Rocky, bridged packets traverse the iptables rules and thus containers in the same host can communicate with each other. You can confirm the settings are in place using:

sysctl -a | grep bridge-nf-call

However, if they are not in place then you can easily force them via:

cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system

Disable swap space

Initially I wasn’t going to worry about this. If this were a production exercise then sure, I’d have server instances with much more RAM and I’d forego the swap space. That way the production images would not be impacted by having their underlying memory swapped out by the host, nor would the Kube scheduler be fooled by misleading resource availability, when a node with an apparent abundance of memory might actually be thrashing like crazy.

This is just Hello World so I figured it would be OK to try using the swap space, especially as now the bleeding edge (as of three weeks ago, since v1.22 in fact) Kubernetes has started to dabble in swap support.

It’s an Alpha enhancement so while the capability is in there somewhere, the documentation is completely inadequate. There are too many moving parts to be tamed just to get this experimental option to work. I looked into the NodeSwap feature gate, and the failSwapOn (–fail-swap-on) option, and the —ignore-preflight-errors option, using KUBELET_KUBEADM_ARGS and more. Despite all the magical incantations and several from-scratch VMs, it never worked for me. So in the end, I gave up and did this:

swapoff -a

Then I commented out the “swap” line in /etc/fstab and then:

systemctl daemon-reload

Bye-bye swap. This is the setup that Kubernetes users are most familiar with. Not supporting swap was an understandable initial design decision, but in time the community has come to understand that there are valid cases where supporting swap would be be highly advantageous (e.g. images whose init burst is a memory hog but whose steady-state is miserly). Unix has been handling swap exceptionally well for decades, and SysAdmins know a thing or two about it, so it was inevitable that Kubernetes would support it. Nevertheless, the current Alpha is just that: Alpha, and I’m going to wait until Beta before checking it out again.

Installing K8s on Rocky

The installation repositories known to Rocky do not include Kubernetes, so you will have to manually add the repository to the package manager by creating a file named /etc/yum.repos.d/kubernetes.repo as follows:

cat <<EOT > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOT

Make that repo available for installations:

dnf -y upgrade

Install Kubernetes:

dnf -y install kubeadm kubelet kubectl --disableexcludes=kubernetes

In my case, this installed v1.22 of Kubernetes, the one with the alpha-grade support for swap, which sadly I’m going to ignore.

Now for some more SELinux-fu:

mkdir -p /var/lib/etcd
chcon -Rt svirt_sandbox_file_t /var/lib/etcd
mkdir -p /etc/kubernetes/pki/etcd
chcon -Rt svirt_sandbox_file_t /etc/kubernetes/pki
mkdir -p /etc/cni/net.d
chcon -Rt svirt_sandbox_file_t /etc/cni/net.d

In theory that should be enough preparation for Kubernetes to operate in an SELinux environment.

K8s init

Make sure kubelet starts on boot, and then initialize Kubernetes using init, specifying the private network address space that will be used for the cluster (as hard-coded by the Flannel network that will be installed subsequently):

systemctl enable kubelet
kubeadm init --pod-network-cidr=10.244.0.0/16

It takes a minute or two to complete. Configure Kubernetes to run on boot, and start it up now so it’s ready for use:

systemctl start kubelet

You can check to see that the Kubernetes API server is running:

docker ps | grep kube-apiserver

This is a one-node cluster to run a simple test, so it’s important to override Kubernetes’ default behaviour that prevents running containers on the Master node. Before using kubectl for this I will need a context, so I’m going to reference the master admin.conf:

export KUBECONFIG=/etc/kubernetes/admin.conf

Note: non-root users would put a copy of the admin.conf into their home directory as follows:

cp /etc/kubernetes/admin.conf $HOME/
chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf

Make sure this is set up each time the user (root) logs in:

echo "export KUBECONFIG=$KUBECONFIG" >> $HOME/.bash_profile

Removing the node-role.kubernetes.io/master taint will allow the scheduler to deploy containers to the master node, which is the only node in this small demo. Once de-tainted, confirm that the server is now a Kubernetes node with the dual role of “master” and “control-plane”:

kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl get nodes
NAME            STATUS   ROLES                AGE  VERSION
r84.example.com NotReady control-plane,master 114m v1.22.1

Flannel

Flannel is an inter-container pan-cluster layer 3 comms solution, intended for a multi-node K8s setup but works fine on a single “untainted” master node. By default, the YAML for Flannel will result in a virtual network for the pods using 10.244.*.* IP addresses. Installation is simple:

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Check the nodes again after 30 seconds, repeating if necessary, and (eventually) the status is reported as “Ready”:

kubectl get nodes
NAME            STATUS   ROLES                AGE  VERSION
r84.example.com Ready    control-plane,master 122m v1.22.1

Deployment YAML

I’m now going to create a deployment of two K8s pods, each pod having a single “mytc” Docker container, which will be represented in a YAML file named /app/cluster.yaml. Note that the pull policy is “if not present”, overriding the default that would have K8s go looking elsewhere for mytc (and failing).

cat <<EOT > /app/cluster.yaml
# Pods with Hello World in each
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hw-deploy
  labels:
    app: hw
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hw
  template:
    metadata:
      labels:
        app: hw
    spec:
      containers:
        - name: hw-contain
          image: mytc
          imagePullPolicy: IfNotPresent
          ports:
          - containerPort: 8080
          resources:
            limits:
              memory: "256Mi"
              cpu: "500m"
EOT

Now launch the cluster:

kubectl apply -f /app/cluster.yaml
deployment.apps/hw-deploy created

To watch things evolve, look at the status of the pods:

kubectl get pods
NAME                         READY   STATUS              RESTARTS   AGE
hw-deploy-5dc46768bb-nkw5j   0/1     ContainerCreating   0          14s
hw-deploy-5dc46768bb-zstlz   0/1     ContainerCreating   0          14s

And seconds later the “get pods” should produce something like this:

NAME                         READY   STATUS    RESTARTS   AGE
hw-deploy-5dc46768bb-nkw5j   1/1     Running   0          50s
hw-deploy-5dc46768bb-zstlz   1/1     Running   0          50s

There are now two pods available in hw-deploy:

kubectl get deploy
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
hw-deploy   2/2     2            2           3m48s

To see the Flannel IP addresses assigned to these pods:

kubectl get pod -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP           NODE              NOMINATED NODE   READINESS GATES
hw-deploy-5dc46768bb-nkw5j   1/1     Running   0          13m   10.244.0.8   r84.example.com              
hw-deploy-5dc46768bb-zstlz   1/1     Running   0          13m   10.244.0.9   r84.example.com              

This means that one Tomcat is listening on 10.244.0.8:8080 and the other on 10.244.0.9:8080. While these are IPs within the virtual Flannel network, intended to keep the pod network separate from the “real” network outside, it is still possible to interact with those IPs from the host:

wget -qO- http://10.244.0.8:8080/
Hello, I am hw-deploy-5dc46768bb-nkw5j
wget -qO- http://10.244.0.9:8080/
Hello, I am hw-deploy-5dc46768bb-zstlz

Note that these take a few seconds the first time you call them, because Tomcat is compiling the index.jsp upon first use. Subsequent calls to Tomcat will be using the already-compiled code and the response will be instantaneous.

One could, at this point, set up an instance of nginx to load balance across those two IPs, but there is a problem: those IPs are not stable. Pods can be added and deleted by Kubernetes as needed, such as automatically scaling the size of a set of replica pods in response to changing workloads. Any viable load balancing solution has to be aware of the ephemeral nature of the number of pods and their IP addresses.

NodePort load balancing

There are lots of ways of solving this challenge, and there are some already built-in. You can instruct Kubernetes to expose a Service using NodePort that is load-balanced across similarly labelled pods:

kubectl expose deployment hw-deploy --type=NodePort --name=hw
service/hw exposed

Now let’s see what IP address and port this new service is exposing:

kubectl describe services hw | grep "\\b\(IP\|Port\):"
IP:             10.106.150.68
Port:           <unset>  8080/TCP

Knowing the IP and port it is now possible to query the cluster of Tomcats as follows:

wget -qO- http://10.106.150.68:8080/

Sometimes this produces:

Hello, I am hw-deploy-5dc46768bb-nkw5j

And sometimes it produces:

Hello, I am hw-deploy-5dc46768bb-zstlz

This shows that the Service is balancing the request traffic across the pods.

Cleaning up

To clean up, I will first tell Kubernetes to scale the size of the cluster down to zero:

kubectl scale --replicas=0 deployment/hw-deploy

Then I check that the pods are gone (i.e. there are none ready or available):

kubectl get deploy
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
hw-deploy   0/0     0            0           2h

Finally I delete the deployment:

kubectl delete deploy hw-deploy -n default

At this point, if I want, I can shut down the server:

shutdown

 

Glitch

The other day I noticed my laptop’s CPU widget was showing near 100% usage, but Task Manager was showing only 3%, and applications were crawling. I did the usual incantations to no avail. Safe Mode with everything off seemed OK, and a solid two days of virus scanning revealed nothing (as I would hope, given the safeguards in place). Driver updates led me eventually to a BIOS update. No change. SysInternals revealed nothing. Then Windows decided it was time to do its own updates. More delay.

Everything was running slow, including networking, which seemed to be a problem even when in Safe Mode. Connecting a second laptop into the gigabit switch revealed the same network bottleneck. “Aha!”, I thought, “the switch is kaput.” Bypassing the switch and attaching its uplink cable directly into either laptop didn’t help. “Problem with another gigabit switch further upstream,” I thought.

Bypassing the switch in the attic didn’t help either. Now I’m heading to the main router, reset, shuffling of cable connections, no improvement. Then I attach my problematic laptop to a separate line that is running directly down to the router. Joy!

Interestingly, not only did the networking go back to normal, so did the odd CPU readings.

The evidence was now pointing to a problem in the cabling from the downstream switch all the way to the main router. The worst case scenario was that the cable would have to be replaced, and most of it is behind walls that were put in when the house was massively reworked over a decade ago. I was not looking forward to that, and was envisaging needing some surface-mounted ducting as the existing cable is inaccessible.

However, there are other places that cables can fail, and these places are accessible. The ends of the cables, for example. I checked the cables at the main router, then at the first downstream switch and at the second switch closest to the laptop. No problems were found.

Then I remembered that the uplink cable of the second switch goes to a wall junction, buried behind a desk. On my hands and knees, with a torch held in my teeth, I got to the junction and tried to extract the connector. This thing had been in place, untouched, since the builders were here and it was not going to move easy. Some space would have to be cleared so I could access it easier. Climbing out from under the desk, I briefly rechecked the laptop and noticed the network had somewhat improved. No longer at 10% usual capacity, it was now at about 30%.

At this point I was thinking that either it’s the wall connector, meaning the desk would have to come out, and some work done on the wall to replace the connector, or maybe the short uplink cable between the wall and the switch. I hoped for the latter, grabbed a fresh cable from my supplies, clamped the torch in my teeth and went back to under-desk cave snorkelling.

One more tug on the old cable and the plastic lug snapped as the cable was withdrawn, turning the plastic into a blade-like weapon. My first thought was that I’d have to crimp a new connector onto the cable, but with a high-grade new cable in my other hand I decided the old one could be retired, forever. I clicked the new cable into place, and went back to the laptop.

100% network throughput.

Changing the cable took just a minute. Discovering that the cable was causing my odd CPU behaviour took THREE DAYS, a bruised shoulder (from crawling in the attic), a sore head (from hitting it while snorkelling), and a cut finger (blade-like plastic lug). All good reasons why we pay other people to do tech support.

Food, glorious food

I have not been inside a restaurant for a million years. Well, not exactly true, but it feels like that. The risk of contagion in confined spaces is too great to allow people to dine indoors so, where feasible, businesses have been providing outdoor services. That’s OK when the sun is shining (and unusually for Ireland it has been sunny recently) but outdoor dining is not strong in the culture of a people whose primary topic of conversation is the imminent rain.

However, the problems with indoor dining when it comes to Covid are not due to the rain-resistant roof, but more to do with the proximity of other diners and the reduced flow of air to take contaminants out of harm’s way.

Nevertheless, it looks like the government’s strategy with regards to the reopening of indoor dining is to base it on the vaccination (immunity) status of the diners. I don’t hear any talk about restaurant capacity, airflow, etc. The medical status of the diners is a factor, of course, and limiting your clientele to those who are at less risk makes sense. But there should also be guidance regarding the environment into which these lucky people will be placed.

Something simple would work, like the maximum number of people per square meter (I’d guess 0.25), no table to have more than 8 people, and no table to be more than N* meters from an open window/door with noticeable airflow. These would be based on sound medical/scientific principles, and yet simple enough for the average person to comprehend. More importantly, simple enough to be calculated and applied by the typical restaurateur.

We shall see how things pan out over the coming days.

* I have no idea what N should be.

Eggs and baskets

About a week ago a large chunk of the Web vanished for a few hours as Fastly experienced a major outage in their Web cache service. Popular sites like Reddit, the BBC, Amazon and much of the UK government online services suddenly presented blank pages. There was much finger-pointing for days afterwards, then Cloudflare goes down last Friday (more fingers pointing) and today with many of those fingers finally holstered we suddenly find ourselves in the middle of an Akamai outage. Fingers out and reloading!

The thing about these services is that they are mainly caches: intermediary services that optimise the delivery of content from the origin sites. Unfortunately, when they go down, the origin sites do not just go back to some kind of sub-optimal delivery, they go back to nothing. Caches and content delivery networks are not just a means of optimising delivery, they have become the actual means of delivery, and so many major sites on the Web are totally dependent on them.

Expect more baskets to tumble and eggs to break.