<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: David Cantrell</title>
    <description>The latest articles on DEV Community by David Cantrell (@drhyde).</description>
    <link>https://dev.to/drhyde</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F98675%2F87292a43-3bd4-47dd-8169-4683aad0aceb.jpeg</url>
      <title>DEV Community: David Cantrell</title>
      <link>https://dev.to/drhyde</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/drhyde"/>
    <language>en</language>
    <item>
      <title>Automatic cross-platform testing: part 7: 32 bit, again</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Fri, 27 Feb 2026 21:23:41 +0000</pubDate>
      <link>https://dev.to/drhyde/automatic-cross-platform-testing-part-7-32-bit-again-1ipf</link>
      <guid>https://dev.to/drhyde/automatic-cross-platform-testing-part-7-32-bit-again-1ipf</guid>
      <description>&lt;p&gt;I have &lt;a href="https://dev.to/drhyde/automatic-cross-platform-testing-part-6-32-bit-linux-fh2"&gt;written about this before&lt;/a&gt;, but I've had to do it again. I noted previously that Github's actions for things like checking out a repository and for retrieving build artifacts are now 64-bit only. You've been able to work around that by using older versions of the actions but I didn't want to do that so I'm taking a different approach.&lt;/p&gt;

&lt;p&gt;My new method builds on work I've done previously to pull all my testing for different Unixy platforms together into one workflow. &lt;a href="https://github.com/DrHyde/perl-modules-Scalar-Type/blob/38e64fe918da7feee0b0e7dba39a6be00bdc0ddb/.github/workflows/install-various-OSes.yml" rel="noopener noreferrer"&gt;This&lt;/a&gt; tests on 64-bit versions of NetBSD, FreeBSD, OpenBSD, IllumOS (that is, the modern version of Open Solaris), and Linux. And as of earlier today it also runs the tests on 32-bit Linux. As in my previous blog entry it's still really 64-bit hardware, with modern x86 ISA extensions, but it's a 32-bit OS image with 32-bit addressing, userland, compiler toolchain and so on. If you absolutely must have a pure 32-bit x86 environment then you'll need to acquire some obsolete e-waste and run it yourself. Something like &lt;a href="https://icop-shop.com/product/ebox-3352dx3-gl/" rel="noopener noreferrer"&gt;this&lt;/a&gt; based on the Vortex86 chipset looks like a good choice.&lt;/p&gt;

&lt;p&gt;There are only two interesting bits of the workflow. The first implements enough of Github's &lt;code&gt;actions/download-artifact&lt;/code&gt; for my purposes. In the YAML file it's all one line because YAML and everything that uses it is hateful, I've tidied it up here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -L -H "X-GitHub-Api-Version: 2022-11-28"
        -H "Authorization: Bearer ${GH_TOKEN}"
        -o dist-for-install.zip
  $(curl -L -s
         -H "Accept: application/vnd.github+json"
         -H "Authorization: Bearer ${GH_TOKEN}"
         -H "X-GitHub-Api-Version: 2022-11-28"
         https://api.github.com/repos/${GH_REPOSITORY}/actions/artifacts
         -o -
    |jq -r '
         [.artifacts.[]|select(.expired == false)]
         |max_by(.created_at).archive_download_url
    '
  )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The inner &lt;code&gt;curl&lt;/code&gt; uses the &lt;a href="https://docs.github.com/en/rest/actions/artifacts?apiVersion=2022-11-28#list-artifacts-for-a-repository" rel="noopener noreferrer"&gt;Github API&lt;/a&gt; to fetch a list of all the build artifacts that have ever been created for this repository, as JSON, which it filters using &lt;code&gt;jq&lt;/code&gt; to find the most recent one which hasn't expired, and then extract its &lt;code&gt;archive_download_url&lt;/code&gt; field. The outer &lt;code&gt;curl&lt;/code&gt; then fetches that.&lt;/p&gt;

&lt;p&gt;The second bit interesting bit is specific to testing my perl code. Perl can be built on 32-bit platforms to support either 32 or 64 bit integers. On 64-bit machines perl integers are always 64-bit - and there's been a note in the docs for 20 years about how a build-time parameter for 32-bit ints may be added later! I need to test with 32 bit ints, but most OS packages, even on 32-bit machines, will provide a perl with 64-bit ints. So I need to build my own. This is dead simple if a little wordy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -o perl-5.42.0-src.tar.gz https://cpan.metacpan.org/authors/id/B/BO/BOOK/perl-5.42.0.tar.gz
tar xzf perl-5.42.0-src.tar.gz
cd perl-5.42.0
sh Configure -de -Dprefix=$HOME/perl-5.42.0-installed
make -j $(nproc)
make install
$HOME/perl-5.42.0-installed/bin/perl -MCPAN -e 'install qw(App::cpanminus)'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The call to the &lt;code&gt;Configure&lt;/code&gt; script is where the OS packaging people make it have 64-bit ints by adding the &lt;code&gt;-Duse64bitint&lt;/code&gt; flag. Normally you would &lt;code&gt;make test&lt;/code&gt; too but I've satisfied myself that it does indeed pass its tests using this Docker image, so I skip that step to save a lot of time.&lt;/p&gt;

&lt;p&gt;Finally, I can extract my software and test it. It is bloody irritating that Github automatically wrap my tarball in the obsolete &lt;code&gt;zip&lt;/code&gt; format. I blame Microsoft:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;unzip dist-for-install.zip
mkdir dist-for-test
tar -C dist-for-test -xzf *.tar.gz
cd dist-for-test/*
$HOME/perl-5.42.0-installed/bin/cpanm --installdeps .
$HOME/perl-5.42.0-installed/bin/perl Makefile.PL
make test TEST_VERBOSE=1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And there you have it - how to test your code automatically, using Github actions, on a 32-bit platform.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>githubactions</category>
      <category>ci</category>
      <category>32bit</category>
    </item>
    <item>
      <title>Unicode-Search - my first Electron app!</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Sat, 23 Nov 2024 11:35:46 +0000</pubDate>
      <link>https://dev.to/drhyde/unicode-search-my-first-electron-app-41op</link>
      <guid>https://dev.to/drhyde/unicode-search-my-first-electron-app-41op</guid>
      <description>&lt;p&gt;I am a Unix greybeard who mostly works in perl and shell programming, with a bit of dabbling in C and Rust, but for a long time I've wanted to do some Javascript. I just needed an interesting enough project to use it in. Well, at work I spend a lot of time toiling in the Unicode mines, fixing mojibÃ©ke errors in a somewhat elderly code base. As part of that I wrote &lt;a href="https://dev.to/drhyde/a-brief-guide-to-perl-character-encoding-if7"&gt;a guide&lt;/a&gt; for my colleagues, which points them at a very useful &lt;a href="http://xahlee.info/comp/unicode_index.html" rel="noopener noreferrer"&gt;reference website&lt;/a&gt; run by Xah Lee. That has been around for years, but as the single maintainer of several projects myself I am wary of relying too much on &lt;a href="https://www.xkcd.com/2347/" rel="noopener noreferrer"&gt;projects with a single maintainer&lt;/a&gt;. So I decided to write my own, and that this would be my interesting little Javascript project.&lt;/p&gt;

&lt;p&gt;It seems like a sensible choice for learning a new language, as it's not doing much, just looking up data in a static structure and displaying it to the user. It's all synchronous code, there's no I/O beyond updating a web page, and the user interface can be dead simple.&lt;/p&gt;

&lt;p&gt;Electron's &lt;a href="https://www.electronjs.org/docs/latest/tutorial/quick-start" rel="noopener noreferrer"&gt;quick start guide&lt;/a&gt; is excellent and gave me the basic boilerplate for a "Hello World" application, and then the rest was just lots (and lots, and lots) of looking things up on Stack Overflow and &lt;a href="https://www.w3schools.com/js/default.asp" rel="noopener noreferrer"&gt;W3Schools&lt;/a&gt;. From start to finish it took about 6 hours, and &lt;a href="https://github.com/DrHyde/unicode-search" rel="noopener noreferrer"&gt;my code is on Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb476po3alvmd6wwqowij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb476po3alvmd6wwqowij.png" alt="Screenshot of my little app running" width="600" height="575"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm quite sure that those of you skilled in the ways of Javascript will find the code "idiosyncratic", but that's OK, these are baby's first steps. If you have any useful tips for improving it, and can explain them &lt;em&gt;simply&lt;/em&gt;, then they would be most welcome. I would also welcome pull requests that make it stop looking ugly as sin. Learning how to make rounded corners and stuff in CSS wasn't in scope :-)&lt;/p&gt;

&lt;p&gt;Update: because working on the command line is &lt;em&gt;always&lt;/em&gt; better than using a GUI, it has sprouted a CLI tentacle. The same code does most of the work of parsing user input and looking up characters, it just has a different function for spitting the results back out to the user.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcq4df917ks2lardybry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcq4df917ks2lardybry.png" alt="Image description" width="682" height="635"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>electron</category>
      <category>javascript</category>
      <category>unicode</category>
    </item>
    <item>
      <title>Programmable tab completion with bash</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Sun, 10 Nov 2024 23:27:15 +0000</pubDate>
      <link>https://dev.to/drhyde/programmable-tab-completion-with-bash-5ama</link>
      <guid>https://dev.to/drhyde/programmable-tab-completion-with-bash-5ama</guid>
      <description>&lt;p&gt;We all use tab completion in the shell, but have you stopped to think about how it works? How with one command it will auto-complete only directory names, with another directories and filenames, but with another it will complete branch names for your VCS, for example? It's programmable! And so we can bend it to our will!&lt;/p&gt;

&lt;p&gt;I recently had an annoyance. I use a tool called &lt;a href="https://viric.name/soft/ts/" rel="noopener noreferrer"&gt;ts&lt;/a&gt; (task spooler) to run some CPU-intensive processes in the background, and to queue them so that only one such process runs at once. It is invoked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ts command arg1 arg2 arg...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wanted to be able to use tab completion for the &lt;code&gt;command&lt;/code&gt;, and to then use &lt;em&gt;that command's&lt;/em&gt; tab completion for subsequent arguments. After much swearing and cursing - the documentation in the &lt;code&gt;bash&lt;/code&gt; manpage is not the best - and a bit of help from some nice people &lt;a href="https://stackoverflow.com/questions/79173765/how-to-include-completions-for-one-command-in-those-of-another" rel="noopener noreferrer"&gt;on Stack Overflow&lt;/a&gt; I came up with this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;complete&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; bashdefault &lt;span class="nt"&gt;-o&lt;/span&gt; default &lt;span class="nt"&gt;-F&lt;/span&gt; __ts_bash_completions ts

__ts_bash_completions &lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;COMPREPLY&lt;/span&gt;&lt;span class="o"&gt;=()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$COMP_CWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 1 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nv"&gt;COMPREPLY&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;compgen&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COMP_WORDS&lt;/span&gt;&lt;span class="p"&gt;[COMP_CWORD]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else
        &lt;/span&gt;&lt;span class="nb"&gt;local &lt;/span&gt;&lt;span class="nv"&gt;command_completion_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;complete&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COMP_WORDS&lt;/span&gt;&lt;span class="p"&gt;[1]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; 2&amp;gt;/dev/null|sed &lt;span class="s1"&gt;'s/.*-F \([^ ]*\) .*/\1/'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$command_completion_function&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
            &lt;/span&gt;&lt;span class="nv"&gt;COMP_CWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt; COMP_CWORD &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="k"&gt;))&lt;/span&gt;
            &lt;span class="nv"&gt;COMP_LINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$COMP_LINE&lt;/span&gt;|sed &lt;span class="s2"&gt;"s/^&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COMP_WORDS&lt;/span&gt;&lt;span class="p"&gt;[0]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; //"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
            &lt;span class="nv"&gt;COMP_WORDS&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COMP_WORDS&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;:1&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;)&lt;/span&gt;

            &lt;span class="nv"&gt;$command_completion_function&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COMP_WORDS&lt;/span&gt;&lt;span class="p"&gt;[0]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$3&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;fi
    fi&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's go through it in detail. The first line tells the shell to use the function &lt;code&gt;__ts_bash_completions&lt;/code&gt; when the user is typing the &lt;code&gt;ts&lt;/code&gt; command and its subsequent arguments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;complete -F __ts_bash_completions ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then define that function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;__ts_bash_completions () {
    COMPREPLY=()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and the first thing we do is create an empty array &lt;code&gt;COMPREPLY&lt;/code&gt;. &lt;code&gt;bash&lt;/code&gt; completions populate this global variable to tell the shell what options are available. We then see how many complete words there are on the command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    if [ "$COMP_CWORD" -eq 1 ]; then
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;COMP_CWORD&lt;/code&gt; is another global variable that contains the number of complete words currently in the command. If that is 1 then the only word currently in the command is &lt;code&gt;ts&lt;/code&gt; itself, we want to autocomplete the name of a command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        COMPREPLY=($(compgen -c -- "${COMP_WORDS[COMP_CWORD]}"))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;compgen&lt;/code&gt; (&lt;code&gt;comp&lt;/code&gt;letion &lt;code&gt;gen&lt;/code&gt;erator) generates a list of all the commands available in the &lt;code&gt;$PATH&lt;/code&gt; which begin with &lt;code&gt;${COMP_WORDS[COMP_CWORD]}&lt;/code&gt;. This introduces yet another magic global variable, &lt;code&gt;COMP_WORDS&lt;/code&gt; is an array (zero indexed) of all the words currently on the command line, including the one currently being typed, which may be empty. We pick the last one and pass that to &lt;code&gt;compgen&lt;/code&gt; for it to use as a filter.&lt;/p&gt;

&lt;p&gt;At this point we've tab completed the name of the command that &lt;code&gt;ts&lt;/code&gt; is to run and deserve a beer. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fammk7zx1viqmu3ble72q.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fammk7zx1viqmu3ble72q.JPG" alt="Cheers!" width="800" height="1127"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we need to deal with tab completion of the arguments to the command that we want &lt;code&gt;ts&lt;/code&gt; to run. First we need to find what function is used for completions for that command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    else
        local command_completion_function="$(complete -p ${COMP_WORDS[1]} 2&amp;gt;/dev/null|sed 's/.*-F \([^ ]*\) .*/\1/')"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;complete -p&lt;/code&gt; tells you exactly how tab completion is configured for a given command, and spits it out in the form of the command to use to configure it. The function will be the argument to the &lt;code&gt;-F&lt;/code&gt; option, so we use a rather crude &lt;code&gt;sed&lt;/code&gt; invocation to extract it, if it is present. The resulting string will be empty if no completion function is defined. We then check that it isn't empty and play around with the contents of the various &lt;code&gt;COMP_*&lt;/code&gt; variables, setting them up to pretend that &lt;code&gt;ts&lt;/code&gt; isn't involved:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        if [ ! -z "$command_completion_function" ]; then
            COMP_CWORD=$(( COMP_CWORD - 1 ))
            COMP_LINE=$(echo $COMP_LINE|sed "s/^${COMP_WORDS[0]} //")
            COMP_WORDS=( "${COMP_WORDS[@]:1}" )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We decrement &lt;code&gt;COMP_CWORD&lt;/code&gt; because we want to ignore one of the completed words - that being &lt;code&gt;ts&lt;/code&gt; itself. We remove &lt;code&gt;ts&lt;/code&gt; from the start of &lt;code&gt;COMP_LINE&lt;/code&gt; - which is a string containing the entire line of input. And we remove the first element from the &lt;code&gt;COMP_WORDS&lt;/code&gt; array. Finally, we run the command's own completion function which will set &lt;code&gt;COMPREPLY&lt;/code&gt; for us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            $command_completion_function "${COMP_WORDS[0]}" "$2" "$3"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It appears that the calling convention for completion functions has changed over time. At some point in the past they were called with arguments, which only contain the command name and the last two arguments on the command line, then they were changed to instead use those global variables. In this case we can just use the arguments that were passed to our function altering only the first one, the name of the command whose arguments we are completing. Depending on how modern the completion function is for a given command, our &lt;code&gt;ts&lt;/code&gt; completion needs to support both calling conventions.&lt;/p&gt;

&lt;p&gt;There's one final wrinkle, which the observant amongst you may have noticed when I started going through this line by line. In my script I have &lt;code&gt;-o bashdefault -o default&lt;/code&gt; when defining the relationship between &lt;code&gt;ts&lt;/code&gt; and its completion function. Those are to cope with the case where the completion function leaves &lt;code&gt;COMPREPLY&lt;/code&gt; empty. If that happens then &lt;code&gt;-o bashdefault&lt;/code&gt; applies &lt;code&gt;bash&lt;/code&gt;'s own defaults for tab completion, ie helps you pick a filename, and if that doesn't return anything &lt;code&gt;-o default&lt;/code&gt; applies &lt;code&gt;readline&lt;/code&gt;'s defaults. You can see this in action for &lt;code&gt;git&lt;/code&gt;'s completions. If you type &lt;code&gt;git log &amp;lt;tab&amp;gt;&lt;/code&gt; in a git repo you get a list of branch names, and if you type &lt;code&gt;git log foo&amp;lt;tab&amp;gt;&lt;/code&gt; you get a list of branches whose names begin with &lt;code&gt;foo&lt;/code&gt;. But if there are no such branches you get a list of &lt;em&gt;files&lt;/em&gt; whose names begin with &lt;code&gt;foo&lt;/code&gt;, which is the shell's default action.&lt;/p&gt;

&lt;p&gt;Anyway, after a great deal of wrestling with poor documentation, that's a little annoyance dealt with, and exactly the same code can also be used for other similar commands such as &lt;code&gt;sudo&lt;/code&gt; and &lt;code&gt;nohup&lt;/code&gt;. If you think you'll find it useful then the code is &lt;a href="https://github.com/DrHyde/configurations/blob/c4e881670ec21a61ac1eac6cab296c5d8771bcce/bash/dot-bash_completion.d/ts" rel="noopener noreferrer"&gt;on Github&lt;/a&gt; and it's also in the &lt;code&gt;ts&lt;/code&gt; &lt;a href="https://viric.name/wsgi-bin/hgweb.wsgi/ts/rev/9e80ac43e6d2" rel="noopener noreferrer"&gt;mercurial repo&lt;/a&gt; so will no doubt be in a future release.&lt;/p&gt;

</description>
      <category>bash</category>
      <category>shell</category>
    </item>
    <item>
      <title>The typeface you didn't know you wanted and were trained to hate</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Tue, 05 Sep 2023 14:59:32 +0000</pubDate>
      <link>https://dev.to/drhyde/the-typeface-you-didnt-know-you-wanted-and-were-trained-to-hate-5f3f</link>
      <guid>https://dev.to/drhyde/the-typeface-you-didnt-know-you-wanted-and-were-trained-to-hate-5f3f</guid>
      <description>&lt;p&gt;For the last several weeks I've been using &lt;a href="https://dtinth.github.io/comic-mono-font/"&gt;Comic Mono&lt;/a&gt; in my terminal. It's a fixed width typeface based on the font that we've all been trained to despise and sneer at for almost 30 years, Comic Sans.&lt;/p&gt;

&lt;p&gt;But you know what? Comic Mono works really well at small sizes even with my becoming-elderly failing eyes and my inability to always find my reading glasses. I can see plenty of code in my editor with lots of context. On a 40" screen at arm's length I can comfortably get 100 lines of text - so lots of context for the code I'm working on - in a terminal using 12pt text &lt;em&gt;and read it easily&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Give it a go for a few days!&lt;/p&gt;

</description>
      <category>ui</category>
      <category>font</category>
      <category>terminal</category>
    </item>
    <item>
      <title>Number::Phone is now 64-bit only</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Tue, 20 Jun 2023 10:40:02 +0000</pubDate>
      <link>https://dev.to/drhyde/numberphone-is-now-64-bit-only-eje</link>
      <guid>https://dev.to/drhyde/numberphone-is-now-64-bit-only-eje</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/drhyde/deprecating-32-bit-perl-3512"&gt;A couple of years ago&lt;/a&gt; I wrote that I would be dropping support for perls with 32-bit integers in Number::Phone. Well, that two year deprecation cycle is up, and last night I switched over to the &lt;a href="https://metacpan.org/dist/Data-CompactReadonly/view/lib/Data/CompactReadonly/V0/Format.pod"&gt;new database format&lt;/a&gt;, the software for which requires 64-bit integers. As of the next release it will use under 10MB of disk space instead of about 100. An order of magnitude improvement, and it was fun to write as well.&lt;/p&gt;

&lt;p&gt;Perl has supported 64-bit integers on all reasonable platforms for the last 20 years, even on those platforms which still use 32-bit pointers, so I don't expect that anyone will notice any change at all except that their hosted Docker containers will now be &lt;a href="https://github.com/DrHyde/perl-modules-Number-Phone/issues/95"&gt;faster to deploy and cheaper to run&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>perl</category>
      <category>32bit</category>
      <category>deprecation</category>
      <category>64bit</category>
    </item>
    <item>
      <title>cpulimit annoyed me so I improved it</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Sat, 25 Mar 2023 19:07:07 +0000</pubDate>
      <link>https://dev.to/drhyde/cpulimit-annoyed-me-so-i-improved-it-5c4n</link>
      <guid>https://dev.to/drhyde/cpulimit-annoyed-me-so-i-improved-it-5c4n</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/drhyde/gnu-timeout-annoyed-me-so-i-replaced-it-or-an-extremely-simple-introduction-to-fork-and-signal-handling-in-perl-32f7"&gt;Previously&lt;/a&gt; | &lt;a href="https://dev.to/drhyde/gnu-tree-annoyed-me-so-i-fixed-it-2pnk"&gt;more previously&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few days ago I discovered &lt;a href="https://github.com/opsengine/cpulimit"&gt;&lt;code&gt;cpulimit&lt;/code&gt;&lt;/a&gt;. It's a great tool that nicely (haha) complements &lt;code&gt;nice&lt;/code&gt;. Where &lt;code&gt;nice&lt;/code&gt; is normally used to reduce the amount of CPU a process uses by changing it priority, a &lt;code&gt;nice&lt;/code&gt;d process can still end up using more CPU than you want, and will of course use all that it wants if nothing with a higher priority comes along.&lt;/p&gt;

&lt;p&gt;But sometimes you want to restrict a process to using no more than some particular fraction of CPU time regardless of priority. A good example is when you don't want those noisy PC fans to kick in and you don't care how long a job takes because whether it finishes in 15 minutes or 8 hours it's still going to finish while you're asleep.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cpulimit&lt;/code&gt; is perfect for this, and is simple to use. An invocation like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cpulimit &lt;span class="nt"&gt;-l&lt;/span&gt; 50 somecommand ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;will run &lt;code&gt;somecommand&lt;/code&gt; but restrict it to only use 50% of a CPU. It does this by forking a watchdog process that periodically checks to see how much work &lt;code&gt;somecommand&lt;/code&gt; is doing, and if it's used too much briefly pauses it by sending the &lt;code&gt;STOP&lt;/code&gt; signal. After a little while it will un-pause it with the &lt;code&gt;CONT&lt;/code&gt; signal. Of course, because the load on your machine from other processes will never quite be constant, &lt;code&gt;cpulimit&lt;/code&gt; rarely hits the target exactly, but it gets close enough.&lt;/p&gt;

&lt;p&gt;But I wanted to tweak it a bit. I wanted to be able to interactively "turn the volume knob" so that I could give &lt;code&gt;somecommand&lt;/code&gt; more or less CPU whenever I fancied. The result is &lt;a href="https://github.com/opsengine/cpulimit/pull/116/files"&gt;a pull request&lt;/a&gt; which unfortunately is unlikely to ever get merged, as the original author hasn't touched the project in years, but if any of you want the nifty new feature applying the patch and building your own custom &lt;code&gt;cpulimit&lt;/code&gt; is pretty easy.&lt;/p&gt;

&lt;p&gt;How the patch works is simple. It installs signal handlers for &lt;code&gt;SIGUSR1&lt;/code&gt; and &lt;code&gt;SIGUSR2&lt;/code&gt; which respectively increase and decrease the CPU allocation by 1%. Want to turn it up by 50%? Just write a little shell loop to send the signal 50 times:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;seq &lt;/span&gt;1 50&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;&lt;span class="nb"&gt;kill&lt;/span&gt; &lt;span class="nt"&gt;-SIGUSR1&lt;/span&gt; &lt;span class="nv"&gt;$pid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Determining which process to send the signal to is a bit tricky, as there are &lt;em&gt;two&lt;/em&gt; &lt;code&gt;cpulimit&lt;/code&gt; processes running. There's the first one, which is just waiting in the background for &lt;code&gt;somecommand&lt;/code&gt; to finish, then there's the watchdog that got forked off. It's the watchdog you want to send the signals to. You can tell which is the watchdog as it will generally have a higher PID and be using a little bit of CPU. If you are &lt;code&gt;cpulimit&lt;/code&gt;ing multiple processes then you can tell which watchdog is related to which process because the watchdog will have the command and its arguments on its command line. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ps aux|grep ffmpeg|grep -v grep
david   90311 103.3  0.7 36105472 485448 s011  T     6:42pm   1:07.89 ffmpeg ...
david   90312   5.6  0.0 34221044    828 s011  S     6:42pm   0:02.82 cpulimit -l 100 ffmpeg ...
david   90310   0.0  0.0 34122740    796 s011  S     6:42pm   0:00.01 cpulimit -l 100 ffmpeg ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see here that I asked &lt;code&gt;cpulimit&lt;/code&gt; to allow &lt;code&gt;ffmpeg&lt;/code&gt; to only use 1 CPU of the several available on this machine (ie to use 100% of a CPU - on modern machines the maximum allowed is the number of CPU cores * 100%). My shell accordingly started process 90310 which forked and execed &lt;code&gt;ffmpeg&lt;/code&gt; with pid 90311 and forked the watchdog process as pid 90312. The watchdog is using a little bit of CPU. It is therefore to process 90312 that I should send signals.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ for i in $(seq 1 400); do kill -SIGUSR1 90312; done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That will send the "turn it up a bit" signal 400 times, so &lt;code&gt;ffmpeg&lt;/code&gt; is now limited to at most 500% of a CPU, and a few moments later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ps aux|grep ffmpeg|grep -v grep
david   90311 497.0  0.8 36105472 507160 s011  T     6:42pm  12:42.37 ffmpeg ...
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can see that &lt;code&gt;ffmpeg&lt;/code&gt; is running a lot harder, now taking just under 500% of the CPU.&lt;/p&gt;

</description>
      <category>c</category>
      <category>patching</category>
      <category>signals</category>
    </item>
    <item>
      <title>TIL: diff-so-fancy; and some funky git config</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Wed, 25 Jan 2023 19:12:13 +0000</pubDate>
      <link>https://dev.to/drhyde/til-diff-so-fancy-and-some-funky-git-config-2a2o</link>
      <guid>https://dev.to/drhyde/til-diff-so-fancy-and-some-funky-git-config-2a2o</guid>
      <description>&lt;p&gt;I just discovered &lt;a href="https://github.com/so-fancy/diff-so-fancy" rel="noopener noreferrer"&gt;&lt;code&gt;diff-so-fancy&lt;/code&gt;&lt;/a&gt;, and very nice it is too. I immediately added it to my standard git config, which is semi-automatically installed on every machine I use. However, I've not (yet) installed &lt;code&gt;diff-so-fancy&lt;/code&gt; on all the machines I use, and for those platforms for which it's not packaged I probably won't bother installing it from source.&lt;/p&gt;

&lt;p&gt;But if I just follow the author's instructions which amount to adding this to my &lt;code&gt;~/.gitconfig&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[core]
    pager = "diff-so-fancy | less --tabs=4 -RFX"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then &lt;code&gt;git diff&lt;/code&gt; will break:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ git diff HEAD^
diff-so-fancy | less --tabs=4 -RFX: diff-so-fancy: command not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;but there's an easy fix! Whatever is in &lt;code&gt;pager&lt;/code&gt; is just shell code, so this works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[core]
    pager = "if [ ! -z \"$(which diff-so-fancy)\" ]; then diff-so-fancy | less --tabs=4 -RFX; else less; fi"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output from &lt;code&gt;git diff&lt;/code&gt; is piped into that little script. If &lt;code&gt;diff-so-fancy&lt;/code&gt; is installed (ie if &lt;code&gt;"$(which diff-so-fancy)"&lt;/code&gt; is not zero-length) then it does exactly what &lt;code&gt;diff-so-fancy&lt;/code&gt;'s author suggests. Otherwise, if &lt;code&gt;diff-so-fancy&lt;/code&gt; isn't installed, just run &lt;code&gt;less&lt;/code&gt;.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>discuss</category>
    </item>
    <item>
      <title>A Rant</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Fri, 20 Jan 2023 17:54:56 +0000</pubDate>
      <link>https://dev.to/drhyde/a-rant-2i0c</link>
      <guid>https://dev.to/drhyde/a-rant-2i0c</guid>
      <description>&lt;p&gt;How the &lt;em&gt;hell&lt;/em&gt;, given these interests:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lvUw8Zx2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9460vhaynhf7g9bxzr8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lvUw8Zx2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9460vhaynhf7g9bxzr8t.png" alt="My tags, which don't contain any web stuff" width="96" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;is this post in any way relevant and deserving of being promoted to me on the front page of this site?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---YVzFovq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bhkk899spc5yj1thz0ev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---YVzFovq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bhkk899spc5yj1thz0ev.png" alt='A post tagged with "css", "webdev", and "html"' width="672" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And no, negative weights being treated as "anti-follows" isn't enough. There are so many topics I'm not interested in that configuring them all is just too much work.&lt;/p&gt;

</description>
      <category>rant</category>
    </item>
    <item>
      <title>Number::Phone release candidate</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Thu, 12 Jan 2023 21:52:39 +0000</pubDate>
      <link>https://dev.to/drhyde/numberphone-release-candidate-3h7</link>
      <guid>https://dev.to/drhyde/numberphone-release-candidate-3h7</guid>
      <description>&lt;p&gt;I've recently tackled a feature request that's been sitting in the backlog for several years, to use libphonenumber's data to validate numbers with non-geographic country codes. That's codes like +800 (international freephone), +870 (Inmarsat) and so on.&lt;/p&gt;

&lt;p&gt;Previous versions would sort of work with these numbers, in that they could be instantiated as objects, but there was no information on validity or any other properties.&lt;/p&gt;

&lt;p&gt;For example here it's looking at an Iridium number. First with the previous release:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Number::Phone-&amp;gt;new("+881 672520333333333333333732")-&amp;gt;format()&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;+8816 72520333333333333333732&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;Number::Phone-&amp;gt;new("+881 672520732")-&amp;gt;is_mobile()&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;undef (ie, dunno)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and with the release candidate:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Number::Phone-&amp;gt;new("+881 672520333333333333333732")&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;undef (ie, not valid)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;Number::Phone-&amp;gt;new("+881 672520732")-&amp;gt;format()&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;+881 6 725 20732&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;Number::Phone-&amp;gt;new("+881 672520732")-&amp;gt;is_mobile()&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;1&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Much of the code is auto-generated, and this involved quite a bit of hacking around in the build system and object instantiation, so there's the potential for exciting new bugs even for numbers in boring old geographic country codes, although I didn't have to change any of the existing tests so I am hopeful that I avoided that. &lt;a href="https://github.com/DrHyde/perl-modules-Number-Phone/pull/114/files"&gt;Here are the changes&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;I'd be grateful if you could test this, and assuming that no-one finds any bugs it will be in either the March or June quarterly release. You can download it from &lt;a href="https://cpan.metacpan.org/authors/id/D/DC/DCANTRELL/Number-Phone-3.8099_01.tar.gz"&gt;here&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>perl</category>
      <category>telecoms</category>
      <category>testing</category>
    </item>
    <item>
      <title>Automatic cross-platform testing: part 6: 32 bit Linux</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Sun, 17 Jul 2022 17:25:26 +0000</pubDate>
      <link>https://dev.to/drhyde/automatic-cross-platform-testing-part-6-32-bit-linux-fh2</link>
      <guid>https://dev.to/drhyde/automatic-cross-platform-testing-part-6-32-bit-linux-fh2</guid>
      <description>&lt;h1&gt;
  
  
  Previously on this channel
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/drhyde/automatic-cross-platform-testing-part-1-linux-40ih"&gt;Introduction and testing on Linux&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/drhyde/automatic-cross-platform-testing-part-2-freebsd-2394"&gt;Testing on FreeBSD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/drhyde/automatic-cross-platform-testing-part-3-macos-2h2i"&gt;Testing on MacOS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/drhyde/automatic-cross-platform-testing-part-4-windows-1p7e"&gt;Testing on Windows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/drhyde/automatic-cross-platform-testing-part-5-openbsd-40gg"&gt;Testing on OpenBSD&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Testing on 32 bit Linux
&lt;/h1&gt;

&lt;p&gt;Just about every virtual machine out there in the various CI platforms and hosting companies is a 64 bit x86 machine. But I wanted to test my code on a 32 bit machine as well. Unfortunately without having real 32 bit hardware available, or emulating it on more modern hardware, you can't &lt;em&gt;quite&lt;/em&gt; do this, but you can at least run an OS and use libraries compiled for 32 bit hardware, 32 bit memory management, and with 32 bit pointers.&lt;/p&gt;

&lt;p&gt;It's surprisingly easy, and lot less hackish than the solution I found for testing on OpenBSD.&lt;/p&gt;

&lt;p&gt;Github workflows normally run directly on whatever host type you specify in the &lt;code&gt;runs-on&lt;/code&gt; option. But you can instead tell them to &lt;a href="https://docs.github.com/en/actions/using-jobs/running-jobs-in-a-container"&gt;run in a &lt;code&gt;container&lt;/code&gt;&lt;/a&gt;, which can be anything available on Docker Hub. Jobs running in a container can have all the same steps that you would have in a non-containerized job, including using other Github workflow actions.&lt;/p&gt;

&lt;p&gt;To test my code on 32-bit Linux I use the &lt;code&gt;i386/ubuntu:latest&lt;/code&gt; image. I am also using the same technique to run tests for one of my projects, which is sensitive to exactly which Linux distribution you are using, on Arch Linux, using the &lt;code&gt;archlinux:latest&lt;/code&gt; image.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>githubactions</category>
      <category>ci</category>
      <category>32bit</category>
    </item>
    <item>
      <title>Virtual machine efficiency</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Wed, 23 Feb 2022 21:27:22 +0000</pubDate>
      <link>https://dev.to/drhyde/virtual-machine-efficiency-in1</link>
      <guid>https://dev.to/drhyde/virtual-machine-efficiency-in1</guid>
      <description>&lt;p&gt;I'm one of the &lt;a href="https://www.cpantesters.org/" rel="noopener noreferrer"&gt;CPAN testers&lt;/a&gt;, a bunch of people who test everything that gets uploaded to the CPAN. We test with multiple different builds of perl on multiple operating systems and different hardware platforms, and send reports of failing tests. When I'm wearing my Author Hat as opposed to my Tester Hat I very much appreciate test failure reports from platforms I don't have access to, or which are generated using perl builds with features I hadn't considered. I test on 5 different operating systems (6 if you count 32 bit x86 and 64 bit amd64 Linux separately) and 40 different builds of perl.&lt;/p&gt;

&lt;p&gt;I mostly test in virtual machines, running on a single Intel Mac Mini. Using VMs means I don't need multiple noisy real machines of course, but it also makes them all easier to manage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fipwdai3wxrfn3fbvh1h6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fipwdai3wxrfn3fbvh1h6.png" alt="10 VMs running in my dock, and if one VM should accidentally crash there'd be ..."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Until very recently I used &lt;a href="https://www.virtualbox.org/" rel="noopener noreferrer"&gt;Virtualbox&lt;/a&gt; for this, mostly because back when I started running VMs on a Mac it was the best free game in town. But it's not the most efficient and I have now switched to the much better, and still free, &lt;a href="https://mac.getutm.app/" rel="noopener noreferrer"&gt;UTM&lt;/a&gt;, which is a simple GUI wrapper around the notoriously hard to configure &lt;a href="https://www.qemu.org/" rel="noopener noreferrer"&gt;QEMU&lt;/a&gt;, patched to use Apple's hypervisor.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpv9euupoipvkipdxut2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpv9euupoipvkipdxut2.png" alt="CPU usage log for a month, consistently around 50% for the first half, short spikes of 100% and intervening periods of much lower usage after that"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This chart shows CPU usage over the last month. The first half is when I was using Virtualbox. Running all those test builds usage was a fairly constant 50% of what was available. The VMs could keep up with the workload, but there was little capacity for more despite all that unused CPU. No matter how many CPUs I allocated to the various VMs, it just wouldn't use all the resources. It's very clear in the chart when I started migrating VMs from Virtualbox to UTM. CPU usage starts spiking much higher - indeed, it reaches 100% and stays there when the VMs test a batch of uploads - but then drops to almost zero when that batch has finished. The work is done faster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0u8vuaalasvtrstiros1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0u8vuaalasvtrstiros1.png" alt="Memory usage log for a month, very high at the start then dropping off a cliff"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Virtualbox is also a memory hog. This chart shows that I couldn't really add any more Virtualbox VMs, as there was little memory left. And worse, most of what was in use (the blue section of the chart) is "wired" memory, in kernel-space. Yes, most of the memory Virtualbox was using was being used by its kernel extension. It is again clear when I started migrating VMs to UTM. Memory usage falls off a cliff, and &lt;em&gt;"wired"&lt;/em&gt; memory usage falls even faster, with a far greater proportion of the lower total being just normal user-space memory (the red section of the chart).&lt;/p&gt;

&lt;p&gt;Now that I've got back all that wasted memory I am, of course, beginning to fill it with more VMs, this time for development as opposed to testing. I have recently become a larval stage &lt;a href="https://www.rust-lang.org/" rel="noopener noreferrer"&gt;Rustacean&lt;/a&gt; and have dived headlong into reporting &lt;a href="https://github.com/dylni/process_control/issues/12" rel="noopener noreferrer"&gt;obscure platform-specific bugs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But you can see from my dock that I've still got three Virtualbox VMs running. Why?&lt;/p&gt;

&lt;p&gt;While UTM is much better at resource usage, it's not perfect. Virtualbox seems to emulate more of a system instead of passing it through to the hypervisor and so is better for running more unusual OSes. I have two VMs there running &lt;a href="https://omnios.org/" rel="noopener noreferrer"&gt;Illumos&lt;/a&gt; which I have yet to figure out how to boot in UTM. A problem that I think is related to UTM's greater use of the hypervisor is that you can't suspend and resume VMs that use it instead of being emulated, so I've still got a few VMs hanging around in Virtualbox which spend most of their time suspended. Finally, what stops me from using UTM at work is that you &lt;a href="https://github.com/hashicorp/vagrant/issues/12518" rel="noopener noreferrer"&gt;can't use it as a Vagrant provider&lt;/a&gt;. This is incredibly annoying, as the lack of a decent virtualization application makes the otherwise very nice M1 Macs nothing more than pretty toys. I expect that this glaring lack will be fixed within the next couple of years.&lt;/p&gt;

</description>
      <category>virtualization</category>
      <category>virtualbox</category>
      <category>utm</category>
      <category>macos</category>
    </item>
    <item>
      <title>A brief guide to perl character encoding</title>
      <dc:creator>David Cantrell</dc:creator>
      <pubDate>Mon, 31 Jan 2022 18:43:59 +0000</pubDate>
      <link>https://dev.to/drhyde/a-brief-guide-to-perl-character-encoding-if7</link>
      <guid>https://dev.to/drhyde/a-brief-guide-to-perl-character-encoding-if7</guid>
      <description>&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;p&gt;I originally wrote this at work, after my team spent far too many days yelling at the computer because of &lt;a href="https://en.wikipedia.org/wiki/Mojibake" rel="noopener noreferrer"&gt;Mojibake&lt;/a&gt;. Thanks to my employer for allowing me to publish it, and the several colleagues who provided helpful feedback. Any errors are, naturally, not their fault.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;12:45. Restate my assumptions&lt;/li&gt;
&lt;li&gt;
The Royal Road

&lt;ul&gt;
&lt;li&gt;Characters, representations, and strings&lt;/li&gt;
&lt;li&gt;Source code encoding, the utf8 pragma, and why you shouldn’t use it&lt;/li&gt;
&lt;li&gt;
Input and output

&lt;ul&gt;
&lt;li&gt;PerlIO layers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
The Encode module

&lt;ul&gt;
&lt;li&gt;Encode::encode&lt;/li&gt;
&lt;li&gt;Encode::decode&lt;/li&gt;
&lt;li&gt;Encode:: everything else&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Debugging

&lt;ul&gt;
&lt;li&gt;The UTF8 flag&lt;/li&gt;
&lt;li&gt;Devel::Peek&lt;/li&gt;
&lt;li&gt;hexdump&lt;/li&gt;
&lt;li&gt;PerlIO::get_layers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
The many ways of writing a character

&lt;ul&gt;
&lt;li&gt;String literals&lt;/li&gt;
&lt;li&gt;The chr function&lt;/li&gt;
&lt;li&gt;Octal&lt;/li&gt;
&lt;li&gt;Hexadecimal&lt;/li&gt;
&lt;li&gt;By codepoint name&lt;/li&gt;
&lt;li&gt;Other hexadecimal&lt;/li&gt;
&lt;li&gt;In regular expressions&lt;/li&gt;
&lt;li&gt;ASCII-encoded JSON strings in your code&lt;/li&gt;
&lt;li&gt;Accented character vs character + combining accent&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Odds and ends&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;a&gt;&lt;/a&gt;12:45. Re-state my assumptions
&lt;/h2&gt;

&lt;p&gt;We will normally want to read and write UTF-8 encoded data. Therefore you should make sure that your terminal can handle it. While we will occasionally have to deal with other encodings, and will often want to look at the byte sequences that we are reading and writing and not just the characters they represent, your life will still be much easier if you have a UTF-8 capable terminal. You can test your terminal thus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;perl &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'binmode(STDOUT, ":encoding(UTF-8)"); say "\N{GREEK SMALL LETTER LAMDA}"'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That should print &lt;code&gt;λ&lt;/code&gt;, a letter that looks a bit like a lower-case &lt;code&gt;y&lt;/code&gt; mirrored through the horizontal axis.&lt;/p&gt;

&lt;p&gt;And if you pipe the output from that into &lt;code&gt;hexdump -C&lt;/code&gt; you should see the byte sequence &lt;code&gt;0xce 0xbb 0x0a&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a&gt;&lt;/a&gt;The Royal Road
&lt;/h2&gt;

&lt;p&gt;Ideally, your code will only have to care about any of this at the edges - that is, where data enters and leaves the application. That could be when reading or writing a file, sending/receiving data across the network, making system calls, or talking to a database. And in many of these cases - especially talking to a database - you will be using a library which already handles everything for you. In a brand new code-base which doesn’t have to deal with any legacy baggage you should, in theory, only have to read this first section of this document.&lt;/p&gt;

&lt;p&gt;Alas, most real programming is a habitation of devils, who will beset you from all around and make you have to care about the rest of it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Characters, representations, and strings
&lt;/h3&gt;

&lt;p&gt;Perl can work with strings containing any character in Unicode. Characters are written in source code either as a literal character such as "m" or in several other ways. These are all equivalent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;m&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x6d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# or chr(109), of course&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x{6d}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{U+6d}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{LATIN SMALL LETTER M}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As are these:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x3bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x{3bb}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{U+3bb}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{GREEK SMALL LETTER LAMDA}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Non-ASCII characters can also appear as literals in your code, for example &lt;code&gt;"λ"&lt;/code&gt;, but this is not recommended - see the discussion of the &lt;code&gt;utf8&lt;/code&gt; pragma below. You can also use octal - &lt;code&gt;"\154"&lt;/code&gt; - but this too is not recommended as hexadecimal encodings are marginally more familiar and easier to read.&lt;/p&gt;

&lt;p&gt;Internally, characters have a &lt;em&gt;representation&lt;/em&gt;, a sequence of bytes that is unique for a particular combination of character and encoding. Most modern languages default to using UTF-8 for that representation, but perl is old enough to pre-date UTF-8 - and indeed to pre-date any concern for most character sets. For backward-compatibility reasons, and for compatibility with the many C libraries for which perl bindings exist, it was decided when perl sprouted its Unicode tentacle that the default representation should be ISO-Latin-1. This is a single-byte character set that covers most characters used in most modern Western European languages, and is a strict superset of ASCII.&lt;/p&gt;

&lt;p&gt;Any string consisting solely of characters in ISO-Latin-1 will by default be represented internally in ISO-Latin-1. Consider these strings:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Release the raccoon!&lt;/strong&gt; - consists solely of ASCII characters. ASCII is a subset of ISO-Latin-1, so the string’s internal representation is an ISO-Latin-1-encoded string of bytes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Libérez le raton laveur!&lt;/strong&gt; - consists solely of characters that exist in ISO-Latin-1, so the string’s internal representation is an ISO-Latin-1-encoded string of bytes. The "é" character has &lt;em&gt;code point&lt;/em&gt; 0xe9 and is represented as the byte 0xe9 internally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rhyddhewch y racŵn!&lt;/strong&gt; - the "ŵ" does not exist in ISO-Latin-1. But it does exist in Unicode, with code point 0x175. As soon as perl sees a non-ISO-Latin-1 character in a string, it switches to using something UTF-8-ish, so code point 0x175 is represented by &lt;em&gt;byte sequence&lt;/em&gt; 0xc5 0xb5. Note that while valid characters’ internal representations are valid UTF-8 byte sequences, this can also encode &lt;em&gt;invalid&lt;/em&gt; characters.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Libérez le raton laveur! Rhyddhewch y racŵn!&lt;/code&gt; - this contains both an "é" (which is in ISO-Latin-1) and a "ŵ" (which is not), so the whole string is UTF-8 encoded. The "ŵ" is as before encoded as byte sequence 0xc5 0xb5, but the "é" must also be UTF-8 encoded instead of ISO-Latin-1-encoded, so becomes byte sequence 0xc3 0xa9.&lt;/p&gt;

&lt;p&gt;But notice that ISO-Latin-1 not only contains ASCII, and characters like "é" (at code point 0xe9, remember), it also contains characters "Ã" (capital A with a tilde, code point 0xc3) and "©" (copyright symbol, code point 0xa9). So how do we tell the difference between the ISO-Latin-1 byte sequence 0xc3 0xa9 representing "Ã©" and the UTF-8 byte sequence 0xc3 0xa9 representing "é"? Remember that a representation is "a sequence of bytes that is unique for a particular combination of character and encoding". So perl stores the encoding as well as the byte sequence. It is stored as a single bit flag. If the flag is unset then the sequence is ISO-Latin-1, if it is set then it is UTF-8.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Source code encoding, the utf8 pragma, and why you shouldn’t use it
&lt;/h3&gt;

&lt;p&gt;It is possible to put non-ASCII characters into your source code. For example, consider this file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;é&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$string&lt;/span&gt;&lt;span class="s2"&gt; contains &lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; characters&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;from which some problems arise. First, if the file is encoded in UTF-8, how can perl tell when it comes across the byte sequence 0xc3 0xa9 what encoding that is? Is it ISO-Latin-1? Well, it could be. Is it UTF-8? Again, it could be. In general, it isn’t possible to tell from a sequence of bytes what encoding is in use. For backward-compatibility reasons, perl assumes ISO-Latin-1.&lt;/p&gt;

&lt;p&gt;If you save that file encoded in UTF-8, and have a UTF-8-savvy terminal, that code will output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;é contains 2 characters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which is quite clearly wrong. It interpreted the 0xc3 0xa9 as two characters, but then when it spat those two characters out your terminal treated them as one.&lt;/p&gt;

&lt;p&gt;We can tell perl that the file contains UTF-8-encoded source code by adding a &lt;code&gt;use utf8&lt;/code&gt;. We also need to fix the output encoding - &lt;code&gt;use utf8&lt;/code&gt; doesn’t do that for you, it only asserts that the source file is UTF-8 encoded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nv"&gt;utf8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nb"&gt;binmode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;STDOUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;:encoding(UTF-8)&lt;/span&gt;&lt;span class="p"&gt;");&lt;/span&gt;

&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;é&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$string&lt;/span&gt;&lt;span class="s2"&gt; contains &lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; character&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(For more on output encoding see the next section)&lt;/p&gt;

&lt;p&gt;And now we get this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;é contains 1 character
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hurrah!&lt;/p&gt;

&lt;p&gt;At this point a second problem arises. Some editors aren’t very clever about encodings and even if they correctly read a file that is encoded in UTF-8, they will save it in ISO-Latin-1. VSCode for example is known to do this at least some of the time. If that happens, you’re still asserting via &lt;code&gt;use utf8&lt;/code&gt; that the file is UTF-8, but the &lt;code&gt;"é"&lt;/code&gt; in the sample file will be encoded as byte 0xe9, and the following double-quote and semicolon as 0x22 0x3b. This results in a fatal error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Malformed UTF-8 character: \xe9\x22\x3b (unexpected non-continuation byte 0x22,
immediately after start byte 0xe9; need 3 bytes, got 1) at ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So given that you’re basically screwed if you have non-ASCII source code no matter whether you use utf8 or not, I recommend that you just don’t do it. If you need a non-ASCII character in your code, use any of the many other ways of specifying it, and if necessary put a comment nearby so that whoever next has to fiddle with the code knows what it is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xe9&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;# e-acute&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Input and output
&lt;/h3&gt;

&lt;p&gt;Strings aren’t the only things that have encodings. File handles do too. Just like how perl defaults to assuming that your source code is encoded in ISO-Latin-1, it assumes unless told otherwise that file handles similarly are ISO-Latin-1, and so if you try to print "é" to a a handle, what actually gets written is the byte 0xe9.&lt;/p&gt;

&lt;p&gt;Even if your source code has the &lt;code&gt;use utf8&lt;/code&gt; pragma, and your code contains the byte sequence 0xc3 0xa9, which will internally by decoded as the character "é", your handles are still ISO-Latin-1 and you'll get a single byte for that character. For how this happens see "PerlIO layers" below.&lt;/p&gt;

&lt;p&gt;Things get a bit more interesting if you try to send a non-ISO-Latin-1 character to an ISO-Latin-1 handle. Perl does the best it can and sends the internal representation - which is UTF-8, remember - to the handle and emits a warning "Wide character in print". Pay attention to the warnings!&lt;/p&gt;

&lt;p&gt;This behaviour is another common source of bugs. If you send the two strings "Libérez le raton laveur!" followed by "Rhyddhewch y racŵn!" to an ISO-Latin-1 handle, then the first one will sail through, correctly encoded, but the second will also go through. You’ve now got two different character encodings in your output stream and no matter what encoding is expected at the other end you’ll get mojibake.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;a&gt;&lt;/a&gt;PerlIO layers
&lt;/h4&gt;

&lt;p&gt;We’ve seen how by default input and output is assumed to be in ISO-Latin-1. But that can be changed. Perl has supported different encodings for I/O since the dawn of time - since at least perl 3.016. That’s when it started to automatically convert "\n" into "\r\n" and vice versa on MSDOS, and the &lt;code&gt;binmode()&lt;/code&gt; function was introduced in case you wanted to open a file on DOS without any translation.&lt;/p&gt;

&lt;p&gt;These days this is implemented via PerlIO layers, which allows you to open a file with all kinds of translation layers, including those which you write yourself or grab from the CPAN (see for example &lt;a href="https://metacpan.org/pod/File::BOM" rel="noopener noreferrer"&gt;File::BOM&lt;/a&gt;). You can also add and remove layers from an already open handle.&lt;/p&gt;

&lt;p&gt;In general these days, you always want to read/write UTF-8 or raw binary, so will open files something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;:encoding(UTF-8)&lt;/span&gt;&lt;span class="p"&gt;",&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;some.log&lt;/span&gt;&lt;span class="p"&gt;")&lt;/span&gt;

&lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;:raw&lt;/span&gt;&lt;span class="p"&gt;",&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image.jpg&lt;/span&gt;&lt;span class="p"&gt;")&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or to change the encoding of an already open handle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="nb"&gt;binmode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;STDOUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;:encoding(UTF-8)&lt;/span&gt;&lt;span class="p"&gt;")&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(&lt;strong&gt;NB&lt;/strong&gt; that encodings applied to bare-word file handles such as STDOUT have global effect!)&lt;/p&gt;

&lt;p&gt;Provided that we don’t have to worry about Windows, we generally will only ever have one layer doing anything significant on a handle (on Windows the &lt;code&gt;:crlf&lt;/code&gt; layer is useful in addition to any others, to cope with Windows’s endearing backward-compatibility with &lt;a href="https://en.wikipedia.org/wiki/CP/M" rel="noopener noreferrer"&gt;CP/M&lt;/a&gt;), but it's possible to have more. In general, when a handle is opened for reading, encodings are applied to data in the order that they are specified in the &lt;code&gt;open()&lt;/code&gt; function call, from left to right. When writing, they are applied from right to left.&lt;/p&gt;

&lt;p&gt;If you ever think you need more than one layer, or want a layer other than those in the examples above, see &lt;a href="https://metacpan.org/pod/PerlIO" rel="noopener noreferrer"&gt;PerlIO&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a&gt;&lt;/a&gt;The Encode module
&lt;/h2&gt;

&lt;p&gt;The above explains the "royal road", where you are in complete control of how data gets into and out of your code. In that situation, you should never need to re-encode data, as it will always be Just A Bunch Of Characters whose underlying representation you don’t care about. That is, however, often not the case in the real world where we are beset by demons. We sometimes have to deal with libraries that do their own encoding/decoding and expect us to supply them with a byte stream (&lt;a href="https://metacpan.org/pod/XML::LibXML" rel="noopener noreferrer"&gt;XML::LibXML&lt;/a&gt;, for example), or which have had incorrect or partial bug fixes applied for any of the problems mentioned above and for which we can’t easily provide a proper fix because of other code now relying on the buggy behaviour (by for example having work-arounds to correct badly-encoded data).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Encode::encode
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Encode::encode()&lt;/code&gt; function takes a string of characters and returns a string of bytes that represent that string in your desired encoding. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Libérez le raton laveur!&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;
&lt;span class="nv"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;("&lt;/span&gt;&lt;span class="s2"&gt;UTF-8&lt;/span&gt;&lt;span class="p"&gt;",&lt;/span&gt; &lt;span class="nv"&gt;$string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;Encode::&lt;/span&gt;&lt;span class="nv"&gt;FB_CROAK&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nn"&gt;Encode::&lt;/span&gt;&lt;span class="nv"&gt;LEAVE_SRC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;will return a string where the character "é" has been replaced by the two bytes  0xc3 0xa9. If the original string was encoded in UTF-8 then the underlying representation of the input and output strings will be the same, but their encodings (as stored in the single bit flag we mentioned earlier) will be different, and the output will be reported as being one character longer by the &lt;code&gt;length()&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Encode::encode&lt;/code&gt; can sometimes for Complicated Internals Optimisation Reasons modify its input. To avoid this set the &lt;code&gt;Encode::LEAVE_SRC&lt;/code&gt; bit in its third argument.&lt;/p&gt;

&lt;p&gt;If you are encoding to anything other than UTF-8 or your string may contain characters outside of Unicode then you should consider telling &lt;code&gt;encode()&lt;/code&gt; to be strict about characters that it can't encode, such as if you try to encode "ŵ" into a ISO-Latin-1 byte sequence. That's what the &lt;code&gt;Encode::FB_CROAK&lt;/code&gt; bit is about in the example - in real code the encode should be in a &lt;code&gt;try&lt;/code&gt;/&lt;code&gt;catch&lt;/code&gt; block to deal with the exception that may arise. &lt;code&gt;Encode&lt;/code&gt;'s documentation has a whole section on &lt;a href="https://metacpan.org/pod/Encode#Handling-Malformed-Data" rel="noopener noreferrer"&gt;handling malformed data&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Encode::decode
&lt;/h3&gt;

&lt;p&gt;It is quite common for us to receive data, either from a network connection or from a library, which is a UTF-8-encoded byte stream. Naively treating this as &lt;em&gt;ISO-Latin-1 characters&lt;/em&gt; will lead to doom and disaster, as the byte sequence 0xc3 0xa9 will, as already explained, be interpreted as the characters "Ã" and "©". &lt;code&gt;Encode::decode()&lt;/code&gt; takes a bunch of bytes and turns them into characters assuming that they are in a specified encoding. For example, this will return a "é" character:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="nv"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;("&lt;/span&gt;&lt;span class="s2"&gt;UTF-8&lt;/span&gt;&lt;span class="p"&gt;",&lt;/span&gt; &lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xc3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xa9&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nn"&gt;Encode::&lt;/span&gt;&lt;span class="nv"&gt;FB_CROAK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should consider how to handle a byte stream that turns out to not be valid in your desired encoding and again I recommend use of &lt;code&gt;Encode::FB_CROAK&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Encode:: everything else
&lt;/h3&gt;

&lt;p&gt;The "Encode" module provides some other functions that, on the surface, look useful. They are, mostly, not.&lt;/p&gt;

&lt;p&gt;Remember how waaaay back I briefly mentioned that perl’s internal representation for non-ISO-Latin-1 characters was UTF-8-ish and how they could contain invalid characters? That’s why you shouldn’t use &lt;code&gt;encode_utf8&lt;/code&gt; or &lt;code&gt;decode_utf8&lt;/code&gt;. You may be tempted to use &lt;code&gt;Encode::is_utf8()&lt;/code&gt; to check a string's encoding. Don't, for the same reason.&lt;/p&gt;

&lt;p&gt;You will generally not be calling &lt;code&gt;encode()&lt;/code&gt; with a string literal as its input, but with a variable as its input. However, any errors like "Modification of a read-only value attempted" are your fault, you should have told it to &lt;code&gt;Encode::LEAVE_SRC&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Don't even think about using the &lt;code&gt;_utf8_on&lt;/code&gt; and &lt;code&gt;_utf8_off&lt;/code&gt; functions. They are only useful for deliberately breaking things at a lower level than you should care about.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Debugging
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;the UTF8 flag
&lt;/h3&gt;

&lt;p&gt;The UTF8 flag &lt;em&gt;is&lt;/em&gt; a reliable indicator that the underlying representation uses multiple bytes per non-ASCII character, but that’s about it. It is &lt;em&gt;not&lt;/em&gt; a reliable indicator whether a string’s underlying representation is valid UTF-8 or that the string is valid Unicode.&lt;/p&gt;

&lt;p&gt;The result of this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="nn"&gt;Encode::&lt;/span&gt;&lt;span class="nv"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;("&lt;/span&gt;&lt;span class="s2"&gt;UTF-8&lt;/span&gt;&lt;span class="p"&gt;",&lt;/span&gt; &lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xe9&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is a string whose underlying representation is valid UTF-8 but the flag is off.&lt;/p&gt;

&lt;p&gt;This, on the other hand has the flag on but the underlying representation is not valid UTF-8 because the character is out of range:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2097153&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is an invalid character in Unicode, but perl encodes it (it has to encode it so it can store it) and turns the UTF8 flag on (so that it knows how the underlying representation is encoded):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xfff8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And finally, this variable that someone else’s broken code might pass to you contains an invalid encoding of a valid character:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xf0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x82&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x82&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x1c&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nn"&gt;Encode::&lt;/span&gt;&lt;span class="nv"&gt;_utf8_on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$str&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Devel::Peek
&lt;/h3&gt;

&lt;p&gt;This is a very useful module for looking at the internals of perl variables, in particular for looking at what perl thinks the characters are and what their underlying representation is. It exports a &lt;code&gt;Dump()&lt;/code&gt; function, which prints details about its argument’s internal structure to STDERR. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ perl -MDevel::Peek -E 'Dump(chr(0xe9))'
SV = PV(0x7fa98980b690) at 0x7fa98a00bf90
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK)
  PV = 0x7fa989408170 "\351"\0
  CUR = 1
  LEN = 10

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the purposes of debugging character encoding issues, the two important things to look at are the lines beginning with &lt;code&gt;FLAGS =&lt;/code&gt; and &lt;code&gt;PV =&lt;/code&gt;. Note that there is no UTF8 flag set, indicating that the string uses the single-byte ISO-Latin-1 encoding. And the string’s underlying representation is shown (in octal, annoyingly), as &lt;code&gt;"\351"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And here’s what it looks like when the string contains code points outside ISO-Latin-1, or has been decoded from a byte stream into UTF-8:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ perl -MDevel::Peek -E 'Dump(chr(0x3bb))'
SV = PV(0x7ff37e80b090) at 0x7ff388012390
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK,UTF8)
  PV = 0x7ff37f907350 "\316\273"\0 [UTF8 "\x{3bb}"]
  CUR = 2
  LEN = 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that the UTF8 flag has appeared, and that we are shown both the underlying representation as two octal bytes &lt;code&gt;"\316\273"&lt;/code&gt; and the characters (in hexadecimal if necessary - mmm, consistency) that those bytes represent.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;hexdump
&lt;/h3&gt;

&lt;p&gt;For debugging input and output I recommend the external &lt;code&gt;hexdump&lt;/code&gt; utility. Feed it a file and it will show you the bytes therein, avoiding any clever UTF-8-decoding that your terminal might do if you were to simply &lt;code&gt;cat&lt;/code&gt; the file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ cat greek
αβγ
$ hexdump -C greek
00000000  ce b1 ce b2 ce b3 0a                              |.......|
00000007
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It can of course also read from STDIN.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;PerlIO::get_layers
&lt;/h3&gt;

&lt;p&gt;Once you’re sure that your code isn’t doing anything perverse, but your data is still getting screwed up on input/output you can see what encoding layers are in use on a handle with the &lt;code&gt;PerlIO::get_layers&lt;/code&gt; function. &lt;code&gt;PerlIO&lt;/code&gt; is a Special built-in namespace, you don’t need to &lt;code&gt;use&lt;/code&gt; it. Indeed, if you do try to &lt;code&gt;use&lt;/code&gt; it you will fail, as it doesn’t exist as a module. Layers are returned in an array, in the order that you would tell &lt;code&gt;open()&lt;/code&gt; about them.&lt;/p&gt;

&lt;p&gt;Layers can apply to any handle, not just file handles. If you’re dealing with a socket then remember that they have both an input side and an output side which may have different layers - see &lt;a href="https://metacpan.org/pod/PerlIO" rel="noopener noreferrer"&gt;the PerlIO manpage&lt;/a&gt; for details. And also see the doco if you care about the difference between &lt;code&gt;:utf8&lt;/code&gt; and &lt;code&gt;:encoding(UTF-8)&lt;/code&gt; - although if you diligently follow the sage advice in this document you won’t care, because you won’t use &lt;code&gt;:utf8&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a&gt;&lt;/a&gt;The many ways of writing a character
&lt;/h2&gt;

&lt;p&gt;There are numerous different ways of representing a character in your code.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;String literals
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;m&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the reasons outlined above please only use this for ASCII characters.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;The chr function
&lt;/h3&gt;

&lt;p&gt;This function takes a number as its argument and returns the character with the corresponding codepoint. For example, &lt;code&gt;chr(0x3bb)&lt;/code&gt; returns &lt;code&gt;λ&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Octal
&lt;/h3&gt;

&lt;p&gt;You can use up to three octal digits &lt;code&gt;"\155"&lt;/code&gt; for ISO-Latin-1 characters only but please don’t. It’s a less familiar encoding than hexadecimal so hex is marginally easier to read, and it also suffers from the “how long is this number” problem described below.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Hexadecimal
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x{e9}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can put any number of hexadecimal digits between the braces. There is also a version of this which doesn’t use braces: &lt;code&gt;"\xe9"&lt;/code&gt;. It can only take one or two hexadecimal digits and so is only valid for ISO-Latin-1 characters. The lack of delimiters can lead to confusion and error. Consider &lt;code&gt;"\xa9"&lt;/code&gt;. Brace-less &lt;code&gt;\x&lt;/code&gt; can take one or two hex digits, so is that &lt;code&gt;\xa&lt;/code&gt; (a line-feed character) followed by the digit &lt;code&gt;9&lt;/code&gt;, or is it &lt;code&gt;\xa9&lt;/code&gt;, the copyright symbol? Brace-less &lt;code&gt;\x&lt;/code&gt; is greedy, so if it looks like there are two hex digits it will assume that there are. Only if the first digit is followed by the end-of-string or by a non-hex-digit will it assume that you meant to use the single digit form. This means that &lt;code&gt;\xap&lt;/code&gt;, for example, is a single hex digit, so is equivalent to &lt;code&gt;\x{0a}p&lt;/code&gt;, a new line followed by the letter &lt;code&gt;p&lt;/code&gt;. I think you will agree that use of braces makes things much clearer, so the brace-less variant is deprecated.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;By codepoint name
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{GREEK SMALL LETTER LAMDA}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This may sometimes be preferable to providing the (hexa)decimal codepoint with an associated comment, but it gets awful wordy awful fast. By default the name must correspond &lt;em&gt;exactly&lt;/em&gt; to that in the Unicode standard. Shorter aliases are available if you ask for them, via the &lt;code&gt;charnames&lt;/code&gt; pragma. The documentation only mentions this for the Greek and Cyrillic scripts, but they are available for all scripts which have letters. For example, these are equivalent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x{5d0}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;

&lt;span class="o"&gt;\&lt;/span&gt;&lt;span class="nv"&gt;N&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;HEBREW&lt;/span&gt; &lt;span class="nv"&gt;LETTER&lt;/span&gt; &lt;span class="nv"&gt;ALEF&lt;/span&gt;&lt;span class="p"&gt;}"&lt;/span&gt;&lt;span class="s2"&gt;

use charnames qw(hebrew);
&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="o"&gt;\&lt;/span&gt;&lt;span class="nv"&gt;N&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;ALEF&lt;/span&gt;&lt;span class="p"&gt;}"&lt;/span&gt;&lt;span class="s2"&gt;                  # א
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be careful if you ask for character-set-specific aliases as there may be name clashes. Both Arabic and Hebrew have a letter called "alef", for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nv"&gt;charnames&lt;/span&gt; &lt;span class="sx"&gt;qw(arabic)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{ALEF}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;                  &lt;span class="c1"&gt;# ا&lt;/span&gt;

&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nv"&gt;charnames&lt;/span&gt; &lt;span class="sx"&gt;qw(arabic hebrew)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{ALEF}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;                  &lt;span class="c1"&gt;# Always Hebrew, no matter the order of the imports!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A happy medium ground is to ask for &lt;code&gt;:short&lt;/code&gt; aliases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nv"&gt;charnames&lt;/span&gt; &lt;span class="sx"&gt;qw(:short)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{ALEF}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;                           &lt;span class="c1"&gt;# error&lt;/span&gt;
&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{hebrew:alef}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="se"&gt;\N{arabic:alef}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;    &lt;span class="c1"&gt;# does what it says on the tin&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Other hexadecimal
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\N{U+3bb}&lt;/span&gt;&lt;span class="p"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This notation looks a little bit more like the U-ish hexadecimal notations used in other languages while also being a bit like the &lt;code&gt;\N{...}&lt;/code&gt; notation for codepoint names. Unless you want to mix hexadecimal along with codepoint names you should probably not use this, and prefer &lt;code&gt;\x{...}&lt;/code&gt; which is more familiar to perl programmers.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;In regular expressions
&lt;/h3&gt;

&lt;p&gt;You can use any of the &lt;code&gt;\x&lt;/code&gt; and &lt;code&gt;\N{...}&lt;/code&gt; variants in regular expressions. You may also see &lt;code&gt;\p&lt;/code&gt;, &lt;code&gt;\P&lt;/code&gt;, and &lt;code&gt;\X&lt;/code&gt; as well. See &lt;a href="https://metacpan.org/dist/perl/view/pod/perlunicode.pod" rel="noopener noreferrer"&gt;perlunicode&lt;/a&gt; and &lt;a href="https://metacpan.org/dist/perl/view/pod/perlrebackslash.pod" rel="noopener noreferrer"&gt;perlrebackslash&lt;/a&gt;. You should consider use of the &lt;code&gt;/a&lt;/code&gt; modifier as that does things like force &lt;code&gt;\d&lt;/code&gt; to only match ASCII and not, say, &lt;code&gt;৪&lt;/code&gt; which looks like &lt;code&gt;8&lt;/code&gt; but is actually &lt;code&gt;BENGALI DIGIT FOUR&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;ASCII-encoded JSON strings in your code
&lt;/h3&gt;

&lt;p&gt;You may need to embed JSON strings in your code, especially in tests. I recommend that JSON should always be ASCII-encoded as this minimises the chances of it getting mangled anywhere. This introduces yet another annoying way of embedding a bunch of hex digits into text. This example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nv"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;to_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x3c0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;ascii&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;will produce the string &lt;code&gt;"\u03c0"&lt;/code&gt;. That’s the sequence of eight characters &lt;code&gt;"&lt;/code&gt; &lt;code&gt;\&lt;/code&gt; &lt;code&gt;u&lt;/code&gt; &lt;code&gt;0&lt;/code&gt; &lt;code&gt;3&lt;/code&gt; &lt;code&gt;c&lt;/code&gt; &lt;code&gt;0&lt;/code&gt; &lt;code&gt;"&lt;/code&gt;. The double quotes are how JSON says “this is a string”, and the two characters &lt;code&gt;\&lt;/code&gt; and &lt;code&gt;u&lt;/code&gt; are how JSON says “here comes a hexadecimal code point”. If you want to put ASCII-encoded JSON in your code then you need to be careful about quoting and escaping.&lt;/p&gt;

&lt;p&gt;Perl will treat the character sequence &lt;code&gt;\&lt;/code&gt; &lt;code&gt;u&lt;/code&gt; as a real back-slash followed by the letter when it is single-quoted, but in general it is always good practice to escape a back-slash that you want to be a real back-slash, to avoid confusion to the reader who may not have been paying attention to whether you’re single- or double-quoting, or in case you later change the code to use double-quotes and interpolate some variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;"I like &lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;u03c0, especially Greek pie"&lt;/span&gt;&lt;span class="p"&gt;';&lt;/span&gt;

&lt;span class="c1"&gt;# or double-quoted with interpolation&lt;/span&gt;
&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sx"&gt;qq{"I like \\u03c0, especially $nationality pie"}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Accented character vs character + combining accent
&lt;/h3&gt;

&lt;p&gt;For many characters there are two different valid ways of representing them. &lt;code&gt;chr(0xe9)&lt;/code&gt; is &lt;code&gt;LATIN SMALL LETTER E WITH ACUTE&lt;/code&gt;. The same character can be obtained with the two codepoints &lt;code&gt;"e".chr(0x301)&lt;/code&gt; - that is &lt;code&gt;LATIN SMALL LETTER E&lt;/code&gt; and &lt;code&gt;COMBINING ACUTE ACCENT&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Whether those should sort the same, compare the same, or one should be converted to t’other will vary depending on your application, so the best I can do is point you at &lt;a href="https://metacpan.org/pod/Unicode::Normalize" rel="noopener noreferrer"&gt;Unicode::Normalize&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a&gt;&lt;/a&gt;Odds and Ends
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;This website is extremely useful for looking up characters' names, codepoints, and UTF-8 encoding:

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://xahlee.info/comp/unicode_index.html" rel="noopener noreferrer"&gt;Xah Lee’s Unicode Search&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Configure your editor to highlight non-ASCII characters!

&lt;ul&gt;
&lt;li&gt;vim&lt;/li&gt;
&lt;li&gt;add this to your ~/.vimrc file:
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight viml"&gt;&lt;code&gt;&lt;span class="c"&gt;" replace the * with *.pl,*.pm,*.js etc if you only want this for some file types&lt;/span&gt;
autocmd &lt;span class="nb"&gt;BufReadPost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;BufNewFile&lt;/span&gt; * &lt;span class="nb"&gt;syntax&lt;/span&gt; &lt;span class="k"&gt;match&lt;/span&gt; nonascii &lt;span class="s2"&gt;"[^\u0000-\u007F]"&lt;/span&gt; containedin&lt;span class="p"&gt;=&lt;/span&gt;ALL
&lt;span class="nb"&gt;highlight&lt;/span&gt; nonascii guibg&lt;span class="p"&gt;=&lt;/span&gt;Red ctermbg&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="nb"&gt;term&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;standout
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>perl</category>
      <category>unicode</category>
      <category>mojibake</category>
    </item>
  </channel>
</rss>
