AI, AI captain

Legal and Political, Security, Technology []

Artificial Intelligence is appearing everywhere and it is increasingly difficult to stop it seeping into our lives. It learns and grows by observing everything we do, in our work, in our play, in our conversations, in everything we express to our communities and everything that community says to us. We are being watched. Many think it is just a natural progression from what we already created. To me, it is anything but natural.

Spellchecking: an AI precursor

Half a century ago, automatic spell-checking was introduced to word processing systems. Simple pattern matching built into the software enabled it to detect unknown words and suggest similar alternatives. By adding statistical information it could rearrange the alternatives so that the most likely correct word would be suggested first. Expand the statistics to include nearby words and the words typed to date and the accuracy of the spell-checking can become almost prescient. Nevertheless, it is all based on statistical information baked into your software.

But where did those statistics come from? We know that over a thousand years ago the military cryptographers were determining word frequency in various languages as an aid to deciphering battlefield communications. Knowledge of letter, word and phrase frequencies was a key component of the effort to defeat the Enigma machine during World War II. So by the time the word processor was commonplace, the statistical basis of spellchecking was also present. It evolved from hundreds of years of analysis, and one could not in any way discern any of the original analysed text from the resulting statistics.

Grammar checking: pseudo-intelligence

In time, spellcheckers were enhanced with the ability to parse sentences and detect syntactic errors. The language models, lexical analysers, pattern matchers and everything else that goes into a grammar checker can be self-contained. The rules and procedures are generally unchanging, though one could gradually build up some adjustments to the recorded statistics based on previous text that was exposed to the system. It appears somewhat intelligent but only because there is a level of complexity involved that a human might find challenging.

Predictive text: spooky cleverness

Things started to get interesting when predictive text systems became mainstream, especially among mobile device users where text entry was cumbersome. Once again, statistics played a huge role, but over time these systems were enhanced to update themselves based on contemporary analysis. Eventually the emergence of (large) language models “trained” on massive amounts of content (much of it from the Web) enabled these tools to make seemingly mind-reading predictions of the next words you would type. Accepting the predicted text could save time, but sometimes the predictions are wildly off base, or comically distracting. Worse, however, is the risk that as more and more people accept the predicted text the more we lose the unique voice of human writers.

Certain risks surface from the use of predictive text based on public and local content, notably plagiarism and loss of privacy. Unlike the simple letter/word counting of the military cryptographers of the ninth century, today’s writing assistance tools have been influenced by vast amounts of other people’s creative works beyond mere words and its suggestions can be near copies of substantial portions of this material.

While unintended plagiarism is worrying, the potential for one’s own content to become part of an AI’s corpus of knowledge is a major concern. In the AI industry’s endless quest for more training data, every opportunity is being exhausted, whether or not the original creators agree. In many cases the content was created by people long before feeding it to an AI became a realistic possibility. The authors would never have imagined how their work could be used (abused?), and many are no longer with us to voice their opinions on it. If they were asked, that is.

And what of your local content? You might not want to feed that to some AI in the cloud so that it influences what the AI delivers to other people. Maybe it is content that you must protect. Maybe you are both morally and legally obliged to protect it. In that case, knowing that an AI is nearby you would take precautions to not expose your sensitive content to such an AI. Right?

Embedded AI: the hidden danger

What if the AI were embedded in many of the tools at your disposal? Protecting your sensitive content (legal correspondence, medical reports etc.) from the “eyes” of an AI would be challenging. Your first task would to make yourself aware of its presence. That, unfortunately, is where it is getting harder every day.

Microsoft introduced Windows Copilot in 2023, including the business versions of their Office suite, meaning that AI is present in your computer’s operating system and your main productivity tools. Thankfully it’s either an optional feature or a paid-for feature so you are not forced to use it. But that may change.

A particularly worrying development, and the motivation behind this post, is Adobe’s recent announcement (Feb 2024) of its AI Assistant embedded into Acrobat and Reader. These are the tools that most people use to create and read PDF documents. It will allow the user to easily search through a PDF document for important information (not just simple pattern searching), create short summaries of the content and much more. Adobe states that the new AI is “governed by data security protocols and no customer document content is stored or used for training AI Assistant without their consent”. It’s currently in beta, and when it is finally released it will be a paid-for service.

Your consent regarding the use of AI is all-or-nothing because you accept (or reject) certain terms when you are installing/updating the software. Given how tempting the features are, granting consent could be commonplace. Today you might have nothing sensitive to worry about, so you grant consent. Some time later, when getting one-paragraph summaries of your PDFs seems a natural part of your daily workflow, you might receive something important, sensitive, perhaps something you are legally obliged to protect. You open the PDF and now the AI in the cloud has it too, and there is no way for you to re-cork the genie.

“No AI here”

We are entering choppy waters for sure. Maybe we need something we can add to our content that says “not for AI consumption”? Without such control by authors and readers alike we could be facing a lot more trouble.

Amazon Linux 2023 on VirtualBox

Operating Systems, Technology

About seven months ago I threw my hat into a GitHub thread that had opened over a year before (March 2022!) asking Amazon to make good on its promise to release off-prem images of its AL 2023 operating system. My jab at Amazon was picked up in an article on The Register and a few weeks later there was finally some movement by Amazon, raising the profile of the issue and eventually leading to a release of KVM and VMware images mid-November. There was no image for VirtualBox and I mentioned this omission in a follow-up on GitHub. The current January 2024 release still only supports KVM and VMWare. The online instructions also omit VirtualBox. This is unusual because they had done so for previous versions of their OS.

Two weeks after the failure of Amazon to produce a VirtualBox image I decided to solve the problem myself . Here’s the environment in which I created the solution:

  • Windows 10
  • Oracle VirtualBox v7
  • WinZip / 7Zip or similar Zip tool
  • CDBurnerXP

First get the OVA file from the latest release page by navigating to the VMware sub-page and downloading the .ova file from the link therein. For the Jan 2024 release you want the file named al2023-vmware_esx-2023.3.20240122.0-kernel-6.1-x86_64.xfs.gpt.ova, and remember to check the SHA256 signature!

Using your preferred Zip tool open the .ova file and extract the .vmdk file therein.

You will find the VBoxManage.exe program in Program Files/Oracle and you can use it to generate a .vdi file for VirtualBox as follows:

  VBoxManage.exe clonehd al2023-___.vmdk al2023-___.vdi --format VDI

(I am using “___” as a shorthand.) Now create three files named “meta-data”, “network-config” and “user-data” as follows:

meta-data

local-hostname: myhost.mydomain.example.org

network-config

network:
  version: 2
  ethernets:
    enp0s3:
      dhcp4: false
      addresses:
        - 192.168.1.234/24
      gateway4: 192.168.1.1
      nameservers:
        addresses: [8.8.8.8]

user-data

package_upgrade: false
ssh_pwauth: True
chpasswd:
  list: |
    ec2-user:mY-C0mpl3x-Pwd
  expire: False
write_files:
  - path: /etc/cloud/cloud.cfg.d/80_disable_network_after_firstboot.cfg
    content: |
      network:
        config: disabled

These are YAML files with two-space indenting. If you are interested in such configurations, check out some official examples! Feel free to use a different IP address for your VM and whatever DNS nameserver you want, and choose a different (complex) password to your liking.

Finally use the command line tool from CDBurnerXP to create an ISO containing the above three files:

cdbxpcmd.exe --burn-data -name:cidata -file:meta-data -file:network-config -file:user-data -iso:seed.iso -format:iso -changefiledates

Run VirtualBox and add the al2023-___.vdi file to the collection of virtual media images. Then set up a new VM with the following configuration:

  • Type: Linux 64-bit
  • System: 4Gb RAM, 1 or 2 CPUs
  • Storage [Controller=IDE] mounted image seed.iso
  • Storage [Controller=SATA] mounted image al2023-___.vdi
  • Display: 33MB, 1 monitor, VMSVGA.
  • Network: bridged adapter, Realtek

Boot the VM and after some initialisation sequences you should be at a login prompt in a minute or two. Log in via the console or use PuTTY (SSH). The user name is ec2-user and the password is per the user-data file above. At this point you can unmount the seed.iso as it has done its job.

WUps

Operating Systems, Technology

Windows Update is both essential and painful. Regularly interrupting the normal flow of work, sometimes sapping all the energy out of the computers, taking control for long periods of time (on older machines this could be hours!) and occasionally “whoops…” Like the past few days where all except one of my PCs has choked on KB5034441. There are suggestions that the problem is due to the relatively new requirement that the Windows recovery partition have at least 250Mb of available space. All of mine have more than double that, so the update failure is likely more complex. The remedy (partition resizing) proposed by Microsoft is far more convoluted than anything the average user would be familiar with, and infeasible for any central IT administrator to apply to their many users. It comes with significant risks, notably disk corruption, and while the patch is an essential fix for a security issue, it only applies to the minority of people who have BitLocker enabled. Even for those affected, it only applies if physical access to the affected PC by an attacker is possible. That’s a lot of “if”s.

What should be done while we wait for Microsoft to fix their fix? Since the failed patch keeps insisting on a retry, my strategy is simple: ignore it. Or at least, instruct my PCs to ignore this particular patch.

Ignoring a WU patch

Microsoft once offered a tool call “Show or Hide Updates” that scanned for available updates and allowed you to select which of them would be hidden from the WU process. This tool doesn’t require any installation. Just run the wushowhide.diagcab file, select the Hide option, wait for it to present the list of available updates and (in this case) select the offending KB5034441. Sadly Microsoft no longer offer the S&HU tool on their site, but thanks to the Wayback Machine you can download wushowhide.diagcab from the archive.

After hiding the offending update via the S&HU tool, if it is still marked as “retry” in the Windows Update section of Windows Settings, just click the retry link and watch the update disappear.

What next?

Microsoft will eventually release a fix for KB5034441. This might be a revision of the patch, in which case the patch identifier may stay the same, which unfortunately means the S&HU configuration will prevent the fix from being applied. You could re-run S&HU to un-hide the patch, but only if you are sure the patch has been fixed.

Alternatively, Microsoft could withdraw the broken patch so it is no longer offered via WU. In its place they would issue a new patch with a new ID to be applied automatically via WU in the usual way. Hopefully this time without choking.

Wet January

LUE []

My small patch on planet Earth has not much climate but plenty of weather. An island subject to ocean buffeting, chills from northern icy regions and occasional heat from the nearby continent. Often on the same day. I recall being greeted by snow in the morning, beaming sunshine in the afternoon and torrential rain that evening. It has been a bit turbulent of late, two storms in two days. Winds at 100km/h, gusts even worse. And rain.

This has me a little peeved, to be honest. I like to go for a short walk now and then, clear the cobwebs out of my head, put some air in my lungs, stop staring at screens for a while. This January I was looking forward to my walks on account of my new hat, a deep blue pure wool Fedora, which sadly in this weather won’t last a minute unless it is nailed to my head. So I sit here with the rain drumming a tattoo on the window behind me while I stare at one of my screens and ponder another wet January day without a nice walk.

OK then. Coffee break is over. Time to get on with writing that report. I wonder if it would be odd to wear a hat while typing…?

Power trip

Hardware, Technology

Over the past several weeks we have had multiple power outages (long, short, brown, buzzing…). Partly due to recent storms, but mostly due to major work being done on local distribution lines. Some of my systems are in the clouds where industrial-grade power management is in place. (I hope.) My personal servers and dev/test systems are on-site and are subject to the vagaries of suburban power services. While “backup, backup, backup” is the mantra that ensures I won’t lose much, recovering from system corruption can be time-consuming.

Thankfully I also have an uninterruptible power supply (UPS) parked below the server shelving. Over the past month (and several outages) I have been pushed to refine and improve how the environment deals with sudden power issues. Here are some observations along the way:

  • apcupsd is brilliant. I have it running in my host server, interacting with the UPS over USB. To this I have added a number of new outage event scripts to deal with the various power-related scenarios.
  • My UPS can offer about half an hour of supply once the mains goes. But this is from a fully charged state, and with multiple outages happening on the same day the second or third time the panic alarm sounds the battery might not have had enough time to recharge. Therefore the event handling scripts should read the “minutes remaining” information from the UPS and act accordingly.
  • Don’t panic. One of the outages last week was for just 40 seconds. So, if the UPS minutes remaining will allow it, wait a bit before commencing a controlled shutdown.
  • The controlled shutdown of my host server will take care of saving the state of any running VMs. But there are also some NAS boxes, some of which are mounted over the network onto some of the VMs. I wanted my host server to also take care of shutting down the NAS boxes. Unfortunately they are from different manufacturers and none of them have UPS signalling support, but they have either SSH access or a Web interface, and I was able to script some shutdown commands from the host server to the NAS (after the VMs are saved). To ensure network connectivity, I also added a small Ethernet switch to the UPS. Power goes, switch stays up, host server saves VMs, shuts down NAS boxes, then shuts itself down.
  • I was not able to find a satisfactory way to shut down the UPS programmatically from the host server, while giving enough time for the host server to shut itself down before the UPS goes. More experimentation may be needed, but maybe on a separate mock-up environment rather than the real thing. After all, even if the UPS is left running, all it is powering is the small Ethernet switch as all other things have powered down.
  • There is no automated recovered when power is restored. I am OK with that, as I am generally on-prem anyway, and to be honest I don’t actually trust the power to be stable until at least 30 minutes after it has been restored.

Finally, one thought does occur to me every time the power goes: does the UPS have enough juice left to power the coffee machine?