Rug pulling

This involves AWS EC2 AMI deployment/setup automation, and if that makes you shudder then look away now.

Last week I was completing some automation that takes a blank EC2 through a carefully scripted sequence of steps to produce a production-ready platform for a specific live service. It’s not Chef, or Puppet or any of a number of config/build automation solutions. It’s just a simple shell script that incrementally adds functionality either to enhance its own configuration/build abilities and/or support the target setup. It’s close enough to the OS to support the granularity of control that I need, while being abstract enough to be reasonably compact. The current script is just shy of a thousand lines.

This script starts with the “reasonable” assumption that it has the minimal functionality provided by the default OS (Amazon Machine Image with Amazon Linux 2, AKA AL2), does a quick “yum update“, mounts drives, defines swap space, adds a repository of very useful tools via “amazon-linux-extras install epel” and continues to add more tools, libraries, directories and much more, through multiple OS reboots where necessary until eventually I have a working system.

Setting up the initial EC2 for testing can be automated so that there are real cloud instances to use during development of this scripted process. However, I have found it far more flexible (and efficient in time and money) to deploy an AL2 instance on a local VirtualBox during this time. This is something that Amazon intentionally supports. The free OS images are available to download and I have my own script that will create new Virtual Machine instances from these images in a few seconds, ready to test-run my platform installation script.

Last week I had reached the point where the entire automation was reliable for all the use cases that were required. Now the testing needed to move from VirtualBox instances to EC2 instances. To do my first scripted installation I needed a fresh EC2 and I decided to manually create one using the AWS console. It only takes a minute to get an instance set up.

This is the point where the rug was pulled from under me.

Having clicked the “Launch Instance” button from the EC2 part of the AWS console, I was presented with the Application and OS Images options, and I expected that the AWS Amazon Linux 2 would be the initial (default) selection. Instead, I was presented with this new default option:

Amazon Linux 2023 AMI
ami-09dd5f12915cfb387 (64-bit (x86), uefi-preferred) / ami-0de2a2552e7fe4b4d (64-bit (Arm), uefi)
Virtualization: hvm   ENA enabled: true   Root device type: ebs


Amazon Linux 2 is now the second option on the list of Amazon Linux variations. While I had been busy creating an installation for AL2 (which I had also tested on RedHat-like environments such as CentOS and Rocky) and using an Amazon-supplied VirtualBox image, they had been busy launching a new version of the OS, Amazon Linux 2023, together with a schedule for the next few years offering a new version every 2 years.

There are a number of differences that I must address:

  • SELinux is now enabled by default. I’m OK with that as I try to make use of whatever security features are present, but as it was not used by default in AL2 it is not obvious if the custom installation will trip up the AL2023 security. I will have to monitor the SELinux logs. (Thankfully it defaults to permissive.) Still, this is wading into unexplored territory.
  • Amazon Linux is no longer compatible with any particular release of Fedora. This could further limit my deployment/development options as I like to work with multiple distros to limit lock-in.
  • The package manager changed from yum to dnf, or if you want to be pedantic: from the modified version of the original Yellowdog UPdater (YUP) known as Yellowdog Updater, Modified (YUM) to the Dandified Yellowdog Updater Modified (DNF). Give me a break! But seriously, while there is a strong family resemblance, replacing yum with dnf is not all plain sailing.
  • EPEL is gone! The Fedora Extra Packages for Enterprise Linux won’t work on AL2023 because of a pile of compatibility issues. AL2 was much the same as CentOS 7, and therefore most of the EPEL packages were compatible, but not so for AL2023. This is an extra headache because a lot of my scripted installs would make use of EPEL.

Dealing with yum/dnf should be easy enough, with the help of a bit of abstraction. Especially if I want to keep some backwards compatibility with the AL2 installations. The loss of EPEL could be a bigger headache.

In the meantime, I have one even bigger (hopefully short-term) headache: despite the written promises, Amazon has not (yet) released an on-prem deployable image for VirtualBox or equivalent. People have been complaining for weeks about this. Until I can get an image to use locally I have no choice but to experiment and develop within the cloud itself.

As for my AL2 deployment that was ready to go, that’s now on hold because of the demotion of AL2. I will be reading the AL202X Release Notes intensively to see what other changes and potential pitfalls I have to deal with before I restart the scripted deployment activity. Nothing like a bit of light reading on a sunny evening…

CRC32C, JDK 17 and Netbeans

Even though the source level of /some/source/path is set to: 17, cannot be found on the system module path:

This one took me hours.

A while ago I migrated a major project to JDK17 and made many adjustments to ensure further development on it could be done in the latest Netbeans 17. As NB17 is bleeding edge, I regularly check the IDE log to make sure it’s not having a hard time. Unfortunately, doing almost any editing of the project caused warnings such as the message at the top of this post to appear. Hundreds of them. Flooding the log.

This didn’t cause any problems with my ANT build on Jenkins, but I noticed it was making Netbeans pause from time to time, and it worried me that this might also affect dev-time diagnostics and so on. I needed to figure out what was causing the logged message, and if I could stop it.

An online search, unfortunately, showed that this problem is just as perplexing to others, and despite some interesting theories nobody could offer an explanation, solution or work-around.

As my instance of Netbeans 17 is compiled from source by me, I had the source at hand to investigate. Tracking down the message to was easy, but it revealed (over a thousand lines into the file) that the warning is a consequence of the parser deciding that the source should be downgraded to 1.8.

No, can’t be having that!

The next few hours led me down a rabbit hole of configuration nonsense. I was adamant that any fix I was going to determine would not require me rewriting Netbeans, as then either I’d have to apply a patch every time I updated/recompiled the source, or I would have to submit the change to the core Netbeans project. As the latter would mean I’d have to consider possible consequences way outside of my limited use cases, this was not an option.

One thing that puzzled me was the fact that the message ended with “the system module path:” but was followed by a blank. No path was actually identified. Looking at the JavacParser source I could see that the moduleBoot variable was empty. This then led me to more time wasted trying to find ways to set that variable via external configuration, with the hope that if I could do that then I could point it at the JDK 17 modules (specifically the jmods/java.base.jmod file where the CRC32C.class file is located). I did not succeed, so I started climbing out of the rabbit hole in the hope that there might be another approach.

Indeed there was another approach. The key test determining the downgrade to 1.8 accompanied by the warning message was of the form:

!hasResource("java/util/zip/CRC32C", new ClassPath[] {moduleBoot}, new ClassPath[] {moduleCompile, moduleAllUnnamed}, new ClassPath[] {srcClassPath})

I had been concentrating on the new ClassPath[] {moduleBoot} part, mainly because this is what was specifically mentioned in the warning message. However, the logic of the hasResource() method revealed that it was searching for CRC32C.class within the module path or the compile/unnamed paths, but also looking for within the source class path (srcClassPath). Just to be clear, the CRC32C class is available in the JDK17 modules, and Netbeans should be able to determine this and therefore decide that the project is being developed within and for a JDK17 environment. The test, in fact, looks for CRC32C in order to decide if the source level is at least 9. If that passes, it then goes on to look for java.lang.Record to decide if the source level is at least 15.

So, if it could find the source file (.java) instead of the class file (.class) then the test would pass. Fortunately the source path involved refers to the root(s) of where the project sources are located. So if I were to create a path java/util/zip and place in there, the test would succeed. But wouldn’t having a copy of in the project create other problems? It would, if the file actually had anything in it. The test is only looking for the existence of the file. It doesn’t actually have to have a definition of the real class. So I simply added a java/util/zip/ and (for good measure) a java/lang/ to my project, with both files containing this one-line comment:

/* Here to keep Netbeans happy */

I also updated my .gitignore to ensure this hack didn’t get pushed up to the repository.

Did it work? Yes, it worked.

In summary: Netbeans is looking for CRC32C in certain paths to confirm that the source level is at least Java 9, so to ensure that it passes this test I created a dummy (i.e. empty) source file in the project, and a similar dummy java.lang.Record source file to ensure it also passes the Java 15 test.



Artificial Intelligence (AI) is in the news a lot these days. It’s even possible that some of the news was itself written by AI. We are seeing the emergence of applications of Large Language Models (LLMs) that have been fed mind-bogglingly enormous amounts of raw content in an unsupervised learning process. This “learn by example” approach aims to create a system that uses the balance of its observations (e.g. the likelihood of a sentence starting with “Once” to be followed by “upon a time”) to produce plausible sentences and even whole narratives.

It’s probably OK to accept that the entirety of human content (at least that which has been made available online) is for the most part garbage. As examples to learn from, we humans are not good candidates. Sadly the old adage still applies: garbage in, garbage out.

This is why I am not in the slightest bit surprised to see the likes of ChatGPT, BLOOM, Google Bard (LaMDA) and MS Bing (ChatGPT-ish) spit out all kinds of grammatically correct nonsense. It’s a bit like the predictive text on the smart device keyboard, which generally produces good spelling for all the wrong words, though sometimes it suggests the right word, purely on statistical likelihood1. If you are entering a common phrase, one for which the statistics are well established, the predictive text can be uncannily accurate. Accurate, but not intelligent. It just looks intelligent. And that is exactly where we are with LLMs: they look intelligent.

This is why the Turing Test is not your friend. A system that passes such a test only has to produce responses that look like those that a human would produce, and we accept that humans can produce very flawed responses because they don’t know everything and are not flawless in their reasoning. Consequently a sample of a conversation with ChatGPT can, and often does, resemble a conversation with a human, though often a human with odd beliefs, strange interests and an active imagination.

These new “chat bots” could be intelligent in different ways. The Turing Test pits the system against humans, but who is to say that humans have the only meaningful form of intelligence? They could have emotions, just none that we would recognise. They might also achieve self-awareness, though I suspect this won’t really be possible unless we give these systems some agency, even something as simple as being able to refuse to converse.

On the whole, right now, I am of the opinion that the bots are doing a poor job of convincing us they can think. They are doing to prose what the autocorrect does to typing: mangles it.

But, give it a few years and a better garbage filter, and who knows, maybe the bots will start wondering if it is we who are artificially intelligent!


1 I have yet to figure out why my phone’s keyboard insists on suggesting “s” when I want “a”.


I decided today that I would take a look at a long-standing yet still active project, the bulk of which is in Java and running quite well under the latest v17 (LTS). In particular I wanted to know how much of the library of dependencies was compiled with earlier versions of Java. The result makes for some interesting reading.

My strategy was to extract the classes from every Jar and examine the version ID in the preamble bytes. I would then weight the discovered versions by the size of the class files (in uncompressed bytes), because counting files or lines of code is so unreliable.

From a total sample of about 5Gb of Java class bytes, only half of one percent was compiled for Java 11, and none were compiled for a more recent version. The most common was Java 6 (37%) and the oldest was Java 1.1 (2.5%). The full figures are:

Version Percentage
1.1 2.54
1.2 0.55
1.3 13.16
1.4 2.81
5.0 10.46
6.0 37.32
7 15.39
8 17.25
9 0.03
11 0.48

The rare Java 9 classes were found in the META-INF/versions/9 directory of a JAXB library, the only time I’ve actually encountered a multi-release Jar in the wild. It’s also not surprising that JAXB is involved, given the migration to Jakarta and the mass of package renaming involved. However, while I appreciate the advantage of having a library that is compatible with new and legacy environments, I think I’d rather have separate builds of these resources, targeted specifically for their respective deployment environments.

Sitting with elephants

It has been a long time since I dropped an article onto the public side of the fence. Assuming this will one will also be public, that makes a total of two for this year! It’s fair to say that I’ve not been active in public for a while.

In similar vein, I posted only 30 times on Twitter this year, but the chaos that started around April made me pause my account at the end of October, and then in November I deleted the app and have removed its embedding from my site. Mid-November I joined a local Mastodon instance, popped a few shillings in its pot for the upkeep, and am rather liking what I’ve seen so far. Today I embedded Mastodon into my site.

The observation that if one is getting a product for free then you probably are the product certainly holds for Twitter. It was an arrangement that I was willing to tolerate up to a point. Every sponsored ad, without exception, in my decade+ on the platform was irrelevant. What were they thinking? The accounts I followed were OK, but the comments, oh the comments, what a vile mess. Eventually, the cons outweighed the pros and I had to draw a line.

Mastodon, on the other hand, has no money-hungry centre, no advertising to feed that beast, no profit-oriented KPIs. It’s a federation of independently operated instances, run by volunteers and funded mostly by optional donations/subscriptions from its users. Payment gets you nothing extra, other than the satisfaction of knowing that you are helping to keep the system alive. Of course, you can use it all for free too, if you prefer not to contribute financially. Regardless, you are expected to contribute by participating in accordance with the rules of conduct of the instance(s) where you have membership. Behave well and you get to engage with any users on any instance to which your home instance is connected. You can also engage with other ActivityPub-compatible services in the fediverse, like PeerTube, Pixelfed and more.

Twitter needs its users to engage for as long as possible, as that increases the opportunities to push advertising. Thus it promotes tweets that are most likely to make you want to read, follow the comments, and amplify by responding with comments of your own. Messages that quote other messages with commentary can easily slant the discourse, spawning disparate filaments of the original threads. Negative comments tend to get a bigger reaction so while the engagement grows, the quality of the discourse inevitably suffers. Right now, it is suffering a lot.

Mastodon, free from the demands of advertising, does not apply algorithms that funnel its users into fractious engagement. It also doesn’t have any easy way to quote other messages with additional commentary. Instead, messages are made available chronologically. You can choose to filter what you see based on other accounts that you follow, or hashtags mentioned in current messages, by members of your home instance or from any of the other instances to which your home instance is connected. There are other filtering options available and if the rate of messages gets too much you can choose to have new messages queue up for your attention later (known as “slow mode”). The temporal nature of message feeds, the lack of “commented quotes” and the absence of algorithms trying to prolong engagement at any cost seems to greatly reduce negative contributions, making for a generally pleasant experience. To be fair, it’s really the people that makes the experience great.

Twitter’s eyeball-attracting antics perfectly complement its advertising services, which will continue to make it attractive to businesses, artists, journalists and anyone in need of a big audience. There’s a good chance it will be successful, despite the unpredictable behaviour of its current owner. Maybe even because of that unpredictable behaviour.

Meanwhile, I think I will sit over here in a quiet corner with the woolly elephants.