<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Boyd Duffee</title>
    <description>The latest articles on DEV Community by Boyd Duffee (@duffee).</description>
    <link>https://dev.to/duffee</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F650770%2F23cd7948-61b9-41c9-ad04-56cf99cfb801.jpg</url>
      <title>DEV Community: Boyd Duffee</title>
      <link>https://dev.to/duffee</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/duffee"/>
    <language>en</language>
    <item>
      <title>Wait a minute, Mr POSTman</title>
      <dc:creator>Boyd Duffee</dc:creator>
      <pubDate>Thu, 16 Oct 2025 10:39:06 +0000</pubDate>
      <link>https://dev.to/duffee/wait-a-minute-mr-postman-3l3f</link>
      <guid>https://dev.to/duffee/wait-a-minute-mr-postman-3l3f</guid>
      <description>&lt;p&gt;&lt;em&gt;Debugging POST request headers in under 40 screen rows. Doesn't actually use Postman&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;While developing a &lt;a href="https://metacpan.org/pod/Astro::ADS" rel="noopener noreferrer"&gt;Perl client&lt;/a&gt; for the Harvard &lt;a href="https://ui.adsabs.harvard.edu/" rel="noopener noreferrer"&gt;Astrophysics Data System&lt;/a&gt; &lt;a href="https://ui.adsabs.harvard.edu/help/api/api-docs.html" rel="noopener noreferrer"&gt;API&lt;/a&gt;, I was getting errors from my &lt;strong&gt;POST&lt;/strong&gt; requests. Not usually a problem to fix, but I connect to the host machine via a dodgy terminal session that hangs up when the screensaver kicks in or when the wind blows from the East. To keep from having to restart the 6+ windows that I have open in my dev env every time I go for a walk, I start a &lt;a href="https://www.tomshardware.com/software/linux/heres-how-i-multi-task-in-the-linux-terminal-with-tmux" rel="noopener noreferrer"&gt;tmux&lt;/a&gt; session running on the host which ignores the SIGHUP and is just where I left it when I reconnect, no .swp files to clean up.&lt;/p&gt;

&lt;p&gt;The problem is that I can't scroll up in tmux like I do in a regular terminal&lt;sup id="fnref1"&gt;1&lt;/sup&gt; which leaves me with 40 rows to read the error returned from the POST request. This is nowhere near enough. Hmmm...&lt;/p&gt;

&lt;p&gt;💡 Remember that you're using &lt;a href="https://metacpan.org/pod/LWP::UserAgent::Mockable" rel="noopener noreferrer"&gt;LWP::UserAgent::Mockable&lt;/a&gt; (or its &lt;a href="https://metacpan.org/dist/Mojo-UserAgent-Mockable" rel="noopener noreferrer"&gt;Mojo&lt;/a&gt; cousin) to record the tests to &lt;a href="https://dev.to/duffee/keep-on-mocking-with-a-key-girrrrl-53oj"&gt;avoid using the network during the test pipeline&lt;/a&gt;. Realise that all the traffic from those network calls are stored as plain text files and you don't have to mess around with &lt;a href="https://www.tcpdump.org/" rel="noopener noreferrer"&gt;tcpdump&lt;/a&gt; just to inspect the HTTP headers anymore.&lt;/p&gt;

&lt;p&gt;The raw file itself is a bit messy to look at (maybe I should write a quick tool that deserializes it for STDOUT), but it shows me that the &lt;strong&gt;Authorization&lt;/strong&gt; header just isn't there. &lt;em&gt;But it's in my code, I made sure. See right after the call to &lt;a href="https://docs.mojolicious.org/Mojo/UserAgent#post" rel="noopener noreferrer"&gt;post&lt;/a&gt; ...&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Ahh, after some reflection I realize that the &lt;code&gt;post&lt;/code&gt; method makes the HTTP request as soon as it's invoked, which is why I was using &lt;a href="https://docs.mojolicious.org/Mojo/UserAgent#build_tx" rel="noopener noreferrer"&gt;build_tx&lt;/a&gt; in the GET request to add my Dev Key, like so&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$tx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$self&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;ua&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;build_tx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="s"&gt;GET&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$url&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;$tx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;req&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;authorization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Bearer &lt;/span&gt;&lt;span class="p"&gt;'&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$self&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;token&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="nv"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;$tx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$self&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;ua&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$tx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, the response has changed from &lt;strong&gt;UNAUTHORIZED&lt;/strong&gt; to &lt;strong&gt;INTERNAL SERVER ERROR&lt;/strong&gt;. I try the &lt;code&gt;curl&lt;/code&gt; command suggested by the &lt;a href="https://ui.adsabs.harvard.edu/help/api/api-docs.html#post-/metrics" rel="noopener noreferrer"&gt;docs&lt;/a&gt; and that works fine. Look back in the mock file and see ... no payload. &lt;/p&gt;

&lt;p&gt;Quietly add the &lt;code&gt;json&lt;/code&gt; attribute to the transaction constructor&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$tx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$self&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;ua&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;build_tx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="s"&gt;POST&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;json&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$hash&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;because sending a JSON payload is &lt;a href="https://leanpub.com/mojo_web_clients/" rel="noopener noreferrer"&gt;so damn easy in Mojo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;All done - and &lt;a href="https://en.wikipedia.org/wiki/Bob%27s_your_uncle" rel="noopener noreferrer"&gt;Robert&lt;/a&gt; is your mother's brother.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;why yes, you &lt;em&gt;can&lt;/em&gt; scroll up and down in a tmux session when you remember to enter Copy mode with &lt;code&gt;Prefix [&lt;/code&gt; so you get the arrow keys and the Page Up/Down buttons to play with. &lt;code&gt;Enter&lt;/code&gt; to exit. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>perl</category>
      <category>restapi</category>
      <category>tmux</category>
      <category>tdd</category>
    </item>
    <item>
      <title>Neural Networks and Perl</title>
      <dc:creator>Boyd Duffee</dc:creator>
      <pubDate>Sat, 28 Jun 2025 21:37:02 +0000</pubDate>
      <link>https://dev.to/duffee/neural-networks-and-perl-4oek</link>
      <guid>https://dev.to/duffee/neural-networks-and-perl-4oek</guid>
      <description>&lt;p&gt;&lt;a href="https://www.flickr.com/photos/21649179@N00/3238536057" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32su3l9xmbkr5773fcjr.jpg" alt="Perceptron" width="640" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; &lt;em&gt;What is the State of the Art for creating Artificial Neural Networks with Perl?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Why would I want to use an ANN in the first place? Well, maybe I have some crime/unusual incident data that I want to correlate with the &lt;a href="https://metacpan.org/pod/Astro::MoonPhase::Simple" rel="noopener noreferrer"&gt;Phases of the Moon&lt;/a&gt; to test the &lt;a href="https://www.bbc.co.uk/reel/video/p0972jxw/can-a-full-moon-really-make-strange-things-happen-" rel="noopener noreferrer"&gt;Lunar Effect&lt;/a&gt;, but the data is noisy, the effect is non-linear or confounded by &lt;a href="https://metacpan.org/pod/Weather::Meteo" rel="noopener noreferrer"&gt;weather&lt;/a&gt;. For &lt;a href="https://en.wikipedia.org/wiki/Neural_network_(machine_learning)#Applications" rel="noopener noreferrer"&gt;whatever reason&lt;/a&gt; you want to “learn” a general pattern going from input to output, neural networks are one more method in your data science toolbox.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://metacpan.org/search?size=20&amp;amp;q=Neural+Network" rel="noopener noreferrer"&gt;search of CPAN&lt;/a&gt; for Neural Networks yields one page of results for you to sift through. The back propagation algorithm is a nice exercise in programming and it attracted a few attempts at the beginning of the century, starting with &lt;a href="https://metacpan.org/dist/Statistics-LTU/view/LTU.pod" rel="noopener noreferrer"&gt;Statistics::LTU&lt;/a&gt; in 1997 before there was an AI namespace in CPAN. Neural networks then get their own namespace, leading to &lt;a href="https://metacpan.org/pod/AI::NeuralNet::BackProp" rel="noopener noreferrer"&gt;AI::NeuralNet::BackProp&lt;/a&gt;, &lt;a href="https://metacpan.org/pod/AI::NeuralNet::Mesh" rel="noopener noreferrer"&gt;AI::NeuralNet::Mesh&lt;/a&gt;, &lt;a href="https://dev.toSimple"&gt;AI::NeuralNet&lt;/a&gt;&lt;a href="https://metacpan.org/pod/AI::NeuralNet::Simple" rel="noopener noreferrer"&gt;::&lt;/a&gt;&lt;a href="https://dev.toSimple"&gt;Simple&lt;/a&gt; (&lt;em&gt;for those wanting a gentle introduction to AI&lt;/em&gt;). Perl isn’t one for naming rigidity, so there’s also &lt;a href="https://metacpan.org/pod/AI::Perceptron" rel="noopener noreferrer"&gt;AI::Perceptron&lt;/a&gt;, &lt;a href="https://metacpan.org/pod/AI::NNFlex" rel="noopener noreferrer"&gt;AI::NNFlex&lt;/a&gt;, &lt;a href="https://dev.toNNEasy"&gt;AI::&lt;/a&gt;&lt;a href="https://metacpan.org/pod/AI::NNEasy" rel="noopener noreferrer"&gt;NNEasy&lt;/a&gt; and &lt;a href="https://metacpan.org/pod/AI::Nerl::Network" rel="noopener noreferrer"&gt;AI::Nerl::Network&lt;/a&gt; (&lt;em&gt;love the speeling&lt;/em&gt;). &lt;a href="https://dev.toLibNeural"&gt;AI::&lt;/a&gt;&lt;a href="https://metacpan.org/pod/AI::LibNeural" rel="noopener noreferrer"&gt;LibNeural&lt;/a&gt; is the first module in this list to wrap an external C++ library for use with Perl.&lt;/p&gt;

&lt;p&gt;Most of these have been given the thumbs up (look for &lt;strong&gt;++&lt;/strong&gt; icons near the name) by interested Perl users to indicate that it’s been of some use to them. It means the documentation is there, it installs and works for them. Is it right for you? NeilB puts a lot of work into &lt;a href="http://neilb.org/reviews/" rel="noopener noreferrer"&gt;his reviews&lt;/a&gt;, but hasn’t scratched the AI itch yet, so I’ll have to give one a try.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sometimes trawling the CPAN dredges up interesting results you weren’t thinking about. I had no idea we had &lt;a href="https://metacpan.org/pod/AI::PSO" rel="noopener noreferrer"&gt;AI::PSO&lt;/a&gt; for running Particle Swarm Optimizations, &lt;a href="https://metacpan.org/pod/AI::DecisionTree" rel="noopener noreferrer"&gt;AI::DecisionTree&lt;/a&gt; or AI::&lt;a href="https://metacpan.org/pod/AI::Categorizer" rel="noopener noreferrer"&gt;Categorizer&lt;/a&gt; to help with categorization tasks and &lt;a href="https://metacpan.org/pod/AI::PredictionClient" rel="noopener noreferrer"&gt;AI::PredictionClient&lt;/a&gt; for TensorFlow Serving. Maybe I’ll come back to these one day. Searching specifically for [Py]Torch gets you almost nothing, but I did find &lt;a href="https://metacpan.org/pod/AI::TensorFlow::Libtensorflow" rel="noopener noreferrer"&gt;AI::TensorFlow::Libtensorflow&lt;/a&gt; which provides bindings for the &lt;code&gt;libtensorflow&lt;/code&gt; deep learning library.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  MXNet
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;A flexible and efficient library for Deep Learning&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://metacpan.org/pod/AI::MXNet" rel="noopener noreferrer"&gt;AI::MXNet&lt;/a&gt; gets lots of love from users (not surprising given the popularity of convolutional neural networks). With a recent update for recurrent neural networks (&lt;a href="https://metacpan.org/pod/AI::MXNet::RNN" rel="noopener noreferrer"&gt;RNN&lt;/a&gt;) in June 2023 and the weight of an Apache project behind the underlying library, it should be the obvious choice. But checking out the project page and decision-making disaster strikes!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mxnet.apache.org/versions/1.9.1/" rel="noopener noreferrer"&gt;MXNet&lt;/a&gt; had a lot of work on it, but then was &lt;a href="https://attic.apache.org/projects/mxnet.html" rel="noopener noreferrer"&gt;retired in Sep 2023&lt;/a&gt; because the Project Management Committee were unresponsive over several months, having uploaded their consciousnesses to a datacube in Iceland or maybe they just went on to other things because of … &lt;a href="https://whimsy.apache.org/board/minutes/MXNet.html" rel="noopener noreferrer"&gt;reasons&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It should still be perfectly fine to use. That Apache project had 87 contributors, so I expect it to be feature-rich and generally bug-free. Any bugs in the Perl module could be reported/fixed and you always have the source code for the library to hack on to suit your needs. I’ll skip it this time because I’m really only after a simple ANN, not the whole Deep Learning ecosystem, and I couldn’t find the package in the Fedora repository (adding the extra friction of building it myself).&lt;/p&gt;

&lt;h2&gt;
  
  
  FANN
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;A Fast Artificial Neural Network&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://leenissen.dk/" rel="noopener noreferrer"&gt;FANN&lt;/a&gt; has been around for over 15 years is generally faster to train and run than either &lt;a href="https://leenissen.dk/fann/wp/comparing-fann-with-tensorflow-and-pytorch-which-is-right-for-you/" rel="noopener noreferrer"&gt;TensorFlow or PyTorch&lt;/a&gt;. The speed and lightweight nature make it ideal for embedded systems. Its &lt;a href="https://github.com/libfann/fann/graphs/contributors" rel="noopener noreferrer"&gt;smaller community&lt;/a&gt; may have an impact on your choice. From my 10 minute inspection, AI::FANN seemed to be the easier to get up to speed with. It had a short, simple example at the top of the docs that I could understand and run without much fuss.&lt;/p&gt;

&lt;p&gt;In contrast, AI::MXNet leads with a Convolutional Neural Net (CNN) for recognizing hand-written digits in the MNIST dataset. It gives you a feel for the depth of the feature set, at the risk of intimidating the casual reader. Mind you, if I was looking for image classification (where CNNs shine) or treating history as an input (using RNNs as mentioned above), I’d put the time in going through AI::MXNet.&lt;/p&gt;

&lt;p&gt;The downside to the original FANN site is the documentation consists of a series of blog posts that tell you all the things you &lt;em&gt;can&lt;/em&gt; do, but not &lt;em&gt;how&lt;/em&gt; to do them. You’re best bet is to read the &lt;a href="https://github.com/libfann/fann/tree/master/examples" rel="noopener noreferrer"&gt;examples source code&lt;/a&gt; like all the other C programmers out there.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;Installation was easy. You just need the FANN build libraries (header files, etc) and the Perl module that interfaces to them. You could build from &lt;a href="https://github.com/libfann/fann" rel="noopener noreferrer"&gt;source&lt;/a&gt; or get &lt;a href="https://packages.ubuntu.com/search?keywords=libfann-dev&amp;amp;searchon=names&amp;amp;suite=all&amp;amp;section=all" rel="noopener noreferrer"&gt;libfan-dev&lt;/a&gt; on Ubuntu. For me on Fedora, it was just a matter of&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dnf install fann-devel
cpanm AI::FANN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(&lt;em&gt;See &lt;a href="https://perldatascience.wordpress.com/tools/" rel="noopener noreferrer"&gt;Tools&lt;/a&gt; for using &lt;strong&gt;cpanm&lt;/strong&gt;&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;To get started, I tried out the &lt;a href="https://metacpan.org/pod/AI::FANN#SYNOPSIS" rel="noopener noreferrer"&gt;XOR example&lt;/a&gt; in the docs. XOR is a classic example of how a multi-layered perceptron (MLP) can tackle problems that are not linearly separable. The hidden layers of the MLP can solve problems inaccessible to single layer perceptrons. It gave me confidence in using a data structure to initialize the network and importing data from a file. An hour later, I was already scratching the itch that drew me to neural networks in the first place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network design and evaluation
&lt;/h3&gt;

&lt;p&gt;A nice introduction is FANN’s &lt;a href="https://leenissen.dk/fann/wp/building-your-first-neural-network-with-fann-a-step-by-step-guide/" rel="noopener noreferrer"&gt;step-by-step guide&lt;/a&gt; which will take you through a bit about learning rates and activation functions as you consider how to build and tweak your first neural network. There are few heuristics to go by, so just start playing around until you get a result.&lt;/p&gt;

&lt;p&gt;Be careful that too many neurons in the hidden layers will lead to overfitting of your data. You’ll end up with a network that can reproduce the training data perfectly, but fail to learn the underlying signal you wanted to discover. You might start with something between the number of input and output neurons. And be aware that machine learning algorithms are data-hungry.&lt;/p&gt;

&lt;p&gt;Activation functions can affect how long it takes to train your network. Previous experience with other neural network tools way back in 2005 taught us the importance of normalizing the input, ideally to a range of &lt;strong&gt;[-1, 1]&lt;/strong&gt;, because most of the training time was spent just adjusting the weights to the point where the real learning could begin. Use your own judgement.&lt;/p&gt;

&lt;p&gt;While we see the &lt;code&gt;train_on_data&lt;/code&gt; and &lt;code&gt;run&lt;/code&gt; methods in the example, you have to look down in the docs for the &lt;a href="https://metacpan.org/pod/AI::FANN#$ann-%3Etest($input,-$desired_output)" rel="noopener noreferrer"&gt;&lt;code&gt;test&lt;/code&gt;&lt;/a&gt; method which you’ll need to evaluate the trained network. The &lt;code&gt;MSE&lt;/code&gt; method will tell you the Mean Squared Error for your model and lower values are better. There’s &lt;a href="https://github.com/search?q=repo%3Alibfann%2Ffann%20MSE&amp;amp;type=code" rel="noopener noreferrer"&gt;no documentation for it yet&lt;/a&gt;, but it should do what it says on the tin.&lt;/p&gt;

&lt;p&gt;A network that gives you rubbish is no good, so we need to &lt;a href="https://stackoverflow.com/questions/44832369/how-to-correctly-evaluate-a-neural-network-model" rel="noopener noreferrer"&gt;evaluate&lt;/a&gt; how well it has learned on the training data. The &lt;a href="https://stats.stackexchange.com/search?q=%5Bneural-networks%5D+evaluation" rel="noopener noreferrer"&gt;usual process&lt;/a&gt; is to split the dataset into training and testing sets, reserving 20-30% of the data for testing. Once the network has finished training, its weights are fixed and then run on the testing data with the network’s output compared with the expected output given in the dataset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Cross-validation_(statistics)" rel="noopener noreferrer"&gt;Cross-validation&lt;/a&gt; is another popular method of evaluation, splitting the dataset into 10 subsets where you train on 9 sets and test on the 10th, rotating the sets to &lt;a href="https://stats.stackexchange.com/questions/79490/neural-network-with-and-without-cross-validation" rel="noopener noreferrer"&gt;improve&lt;/a&gt; the network’s response. Once you are satisfied with the performance of your network, you are ready to run it on live data. Just remember to sanity check the results while you build trust in the responses.&lt;/p&gt;

&lt;p&gt;Going back every time and manually creating networks with different sizes of layers sounds tedious. Ideally, I’d have a script that takes the network layers and sizes as arguments and returns the evaluation score. Couple this with the &lt;a href="https://docs.mojolicious.org/Minion" rel="noopener noreferrer"&gt;Minion&lt;/a&gt; job queue from Mojolicious (it’s nice!) and you’d have a great tool for finding the best available neural network for the given data while you’re doing other things.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Datafile Format
&lt;/h2&gt;

&lt;p&gt;The one thing not easy to find on the website is the file format specification for the datafiles, so this is what I worked out. They are space separated files of integers or floats like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Number_of_runs Number_of_inputs Number_of_outputs
Input row 1
Output row 1
Input row 2
Output row 2
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a script that will turn an array of arrayrefs from the XOR example into the &lt;a href="https://github.com/libfann/fann/blob/master/examples/xor.data" rel="noopener noreferrer"&gt;file format&lt;/a&gt; used by &lt;strong&gt;libfann&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
use v5.24; # postfix dereferencing is cool

my @xor_data = ( [[-1, -1], [-1] ],
                 [[-1, 1], [1] ],
                 [[1, -1], [1] ],
                 [[1, 1], [-1] ] ); 
write_datafile('xor.data', @xor_data);

sub write_datafile {
    my ($filename, @data) = @_;

    open my $fh, '&amp;gt;', $filename;
    my ($in, $out) = $data[0]-&amp;gt;@*;
    say $fh join q{ }, scalar @data, scalar @$in, scalar @$out; 

    for my $test (@data) {
        say $fh join q{ }, $test-&amp;gt;[0]-&amp;gt;@*;
        say $fh join q{ }, $test-&amp;gt;[1]-&amp;gt;@*;
    }
    close $fh;
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Your turn ...
&lt;/h2&gt;

&lt;p&gt;Have you used any of these modules? Share your experience to help the next person choose. Have I missed anything or got something wrong? Let us know in the comments below.&lt;/p&gt;

&lt;p&gt;Thank you for your time!&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7p11x05wu7l7v4h0kevd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7p11x05wu7l7v4h0kevd.png" alt="new Perl logo" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Image credit: “&lt;a href="https://www.flickr.com/photos/21649179@N00/3238536057" rel="noopener noreferrer"&gt;Perceptron&lt;/a&gt;” by &lt;a href="https://www.flickr.com/photos/21649179@N00" rel="noopener noreferrer"&gt;fdecomite&lt;/a&gt; is licensed under &lt;a href="https://creativecommons.org/licenses/by/2.0/?ref=openverse" rel="noopener noreferrer"&gt;CC BY 2.0&lt;/a&gt;&lt;/p&gt;

</description>
      <category>perl</category>
      <category>machinelearning</category>
      <category>ai</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Keep on Mocking with a Key, Girrrrl</title>
      <dc:creator>Boyd Duffee</dc:creator>
      <pubDate>Wed, 07 May 2025 08:53:35 +0000</pubDate>
      <link>https://dev.to/duffee/keep-on-mocking-with-a-key-girrrrl-53oj</link>
      <guid>https://dev.to/duffee/keep-on-mocking-with-a-key-girrrrl-53oj</guid>
      <description>&lt;p&gt;(with apologies to &lt;a href="https://en.wikipedia.org/wiki/Rockin%27_in_the_Free_World" rel="noopener noreferrer"&gt;Neil Young&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;tl;dr - a story is told about how the author tests a module against a third-party web API when that service is not always available and without leaking sensitive authentication tokens&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You find yourself to be an aspiring &lt;a href="https://blogs.perl.org/users/neilb/2016/04/the-perl-toolchain-pause-and-cpan.html" rel="noopener noreferrer"&gt;CPAN author&lt;/a&gt; of a web API and as a righteous follower of &lt;a href="https://en.wikipedia.org/wiki/Test-driven_development" rel="noopener noreferrer"&gt;Test Driven Development&lt;/a&gt; you want to write tests to verify that your API works as advertised. Testing is &lt;a href="https://perldoc.perl.org/Test::Tutorial" rel="noopener noreferrer"&gt;a big part&lt;/a&gt; of Perl culture, so a &lt;a href="https://stackoverflow.com/a/1731289/13056452" rel="noopener noreferrer"&gt;skeleton module&lt;/a&gt; usually comes with a &lt;strong&gt;t/&lt;/strong&gt; directory to hold your tests.&lt;/p&gt;

&lt;p&gt;Once uploaded to &lt;a href="https://metacpan.org/about/faq" rel="noopener noreferrer"&gt;CPAN&lt;/a&gt;, the &lt;a href="https://www.cpantesters.org/" rel="noopener noreferrer"&gt;CPAN Testing Service&lt;/a&gt; will run your tests on every OS and version of Perl possible, but you shouldn't require an active network connection on either end. After all, has your API failed because the end service is down for annual maintenance or the local internet company van has turned up on the CPANTS volunteer's street?&lt;/p&gt;

&lt;p&gt;No, of course not! So you &lt;strong&gt;&lt;em&gt;mock&lt;/em&gt;&lt;/strong&gt; the service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time to Mock and Roll
&lt;/h2&gt;

&lt;p&gt;When things get &lt;a href="https://www.oreilly.com/library/view/perl-testing-a/0596100922/" rel="noopener noreferrer"&gt;difficult to test&lt;/a&gt;, you could run up a &lt;a href="https://metacpan.org/pod/Test::DB::Mysql" rel="noopener noreferrer"&gt;tiny working version&lt;/a&gt; of the object in question or &lt;a href="https://metacpan.org/dist/DBI-Test/view/lib/DBI/Mock.pm" rel="noopener noreferrer"&gt;intercept the calls&lt;/a&gt; your module makes to the object &lt;a href="https://metacpan.org/pod/Test2::Mock" rel="noopener noreferrer"&gt;or module&lt;/a&gt; and return simulated responses. A mock is a bit like a cardboard cutout of what you want to test. It looks just like the real thing ... from the right angle. Here's &lt;a href="https://stackoverflow.com/questions/2665812/what-is-mocking#2666006" rel="noopener noreferrer"&gt;a longer explanation&lt;/a&gt; for the curious.&lt;/p&gt;

&lt;p&gt;There are a few different ways of mocking in Perl. I really like  &lt;a href="https://metacpan.org/pod/LWP::UserAgent::Mockable" rel="noopener noreferrer"&gt;LWP::UserAgent::Mockable&lt;/a&gt; for testing web services. It lets you record a live version of the network conversation and "playback" the response afterwards, so you don't need that connection anymore. This module runs on environment variables so you set up your defaults in a &lt;a href="https://perldoc.perl.org/perlmod#BEGIN%2C-UNITCHECK%2C-CHECK%2C-INIT-and-END" rel="noopener noreferrer"&gt;BEGIN&lt;/a&gt; block.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BEGIN {
    $ENV{ LWP_UA_MOCK } ||= 'playback';
    $ENV{ LWP_UA_MOCK_FILE } ||= __FILE__ . '-mock.out';
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My recorded filename default is &lt;strong&gt;-mock.out&lt;/strong&gt; tacked on to the end of the test file name in the same directory. There's an option of skipping the mock with the &lt;code&gt;LWP_UA_MOCK=passthrough&lt;/code&gt; option. You'll need that when you add a new network query that you haven't recorded yet.&lt;/p&gt;

&lt;p&gt;Having been inspired by this post, you'll now run off and add &lt;strong&gt;L::U::Mockable&lt;/strong&gt; to all your tests and record them. Go look at the &lt;strong&gt;-mock.out&lt;/strong&gt; files. They're all the plain text-ish traffic to and from the service. Here are some of &lt;a href="https://github.com/duffee/astro-ads/tree/master/t" rel="noopener noreferrer"&gt;mine&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But what if all you see in the file is only this?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pt0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Go and check the UserAgent. If instead of the standard Perl web module, &lt;a href="https://metacpan.org/dist/libwww-perl" rel="noopener noreferrer"&gt;LWP&lt;/a&gt;, you're using &lt;a href="https://metacpan.org/pod/Mojo::UserAgent" rel="noopener noreferrer"&gt;Mojolicious&lt;/a&gt;, the mock has just been sitting there twiddling its thumbs. You'll need to use &lt;a href="https://metacpan.org/pod/Mojo::UserAgent::Mockable" rel="noopener noreferrer"&gt;Mojo::UserAgent::Mockable&lt;/a&gt; instead, but don't despair! You haven't lost all that effort getting &lt;strong&gt;L::U::Mockable&lt;/strong&gt; to work. &lt;strong&gt;Mojo::UserAgent::Mockable&lt;/strong&gt; has a &lt;code&gt;mode=lwp-ua-mock&lt;/code&gt; option to make it behave the same way the LWP module does. You can even remove the &lt;code&gt;END&lt;/code&gt; block which you don't need now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wait, what key are we in?
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Some APIs are for services that restrict access to registered users to prevent resource abuse. To access them requires an authorisation token or developer key.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Ooops! When you looked at the &lt;strong&gt;-mock.out&lt;/strong&gt; files, did you see your &lt;strong&gt;SECRET_DEV_KEY&lt;/strong&gt; in the Authorisation header? Well you certainly don't want to upload &lt;em&gt;that&lt;/em&gt; to a public repository!&lt;/p&gt;

&lt;p&gt;Scrub the &lt;strong&gt;-mock.out&lt;/strong&gt; files with &lt;a href="https://github.com/duffee/astro-ads/blob/master/tools/scrub_mock_headers.pl" rel="noopener noreferrer"&gt;something like this&lt;/a&gt; substitution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s/Bearer \w{10,}/Bearer TOKEN_REMOVED/g
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and set the Mockable option &lt;a href="https://metacpan.org/pod/Mojo::UserAgent::Mockable#ignore_headers" rel="noopener noreferrer"&gt;ignore-headers&lt;/a&gt; because our recorded test doesn't care about actually authenticating.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;use Mojo::UserAgent::Mockable;
my $ua = Mojo::UserAgent::Mockable-&amp;gt;new(
            mode           =&amp;gt; 'lwp-ua-mockable',
            ignore_headers =&amp;gt; 'all'
         );
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can still test how your API handles an authorisation failure by recording this subtest with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;subtest 'Bad Key - Authorisation failure' =&amp;gt; sub {
    local $ENV{SECRET_DEV_KEY} = 'BAD';
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't forget to run your &lt;strong&gt;scrub_mock_headers.pl&lt;/strong&gt; script &lt;em&gt;every single time&lt;/em&gt; before committing the recorded mocks. You don't want that Key getting out "in the wild" for naughty children to misuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;a href="https://en.wikipedia.org/wiki/Hook_(song)" rel="noopener noreferrer"&gt;Hook&lt;/a&gt; brings you back
&lt;/h2&gt;

&lt;p&gt;Or should I say &lt;em&gt;DO&lt;/em&gt; forget about running the script, because you're going to save it as &lt;strong&gt;.git/commit/pre-commit&lt;/strong&gt;, and maybe check that there isn't an existing pre-commit hook already. A &lt;a href="https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks" rel="noopener noreferrer"&gt;git commit hook&lt;/a&gt; will run every time you commit so you can get on with coding that API. Just make sure the pre-commit file is executable. Try it with a minor commit before you commit the mocks and look for any error messages during the commit process.&lt;/p&gt;

&lt;p&gt;Happy Mocking!&lt;/p&gt;




&lt;p&gt;Image remixed from
"&lt;a rel="noopener noreferrer" href="https://www.flickr.com/photos/8744852@N03/1408685465"&gt;Neil Young, Heart of Gold&lt;/a&gt;" by &lt;a rel="noopener noreferrer" href="https://www.flickr.com/photos/8744852@N03"&gt;Stoned59&lt;/a&gt;
and
"&lt;a rel="noopener noreferrer" href="https://www.flickr.com/photos/48889115061@N01/2121280234"&gt;Neil Young (Crazy Horse) + Sonic Youth + Social Distorion May 15, 1991&lt;/a&gt;" by &lt;a rel="noopener noreferrer" href="https://www.flickr.com/photos/48889115061@N01"&gt;Howdy, I'm H. Michael Karshis&lt;/a&gt;
, licensed under &lt;a rel="noopener noreferrer" href="https://creativecommons.org/licenses/by/2.0/?ref=openverse"&gt;CC BY 2.0&lt;/a&gt;.
The &lt;a href="https://github.com/metacpan/perl-assets/blob/main/blessed/exports/perl-080-300.png" rel="noopener noreferrer"&gt;Perl logo&lt;/a&gt; is Copyright (c) 2024 Olaf Alders, licensed under the &lt;a href="https://creativecommons.org/licenses/by/4.0/" rel="noopener noreferrer"&gt;CC-BY License, Version 4.0&lt;/a&gt;.
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmirrors.creativecommons.org%2Fpresskit%2Ficons%2Fcc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmirrors.creativecommons.org%2Fpresskit%2Ficons%2Fcc.png" width="64" height="64"&gt;&lt;/a&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmirrors.creativecommons.org%2Fpresskit%2Ficons%2Fby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmirrors.creativecommons.org%2Fpresskit%2Ficons%2Fby.png" width="64" height="64"&gt;&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>perl</category>
      <category>testing</category>
      <category>git</category>
    </item>
    <item>
      <title>Faster tetranucleotide (k-mer) frequencies!</title>
      <dc:creator>Boyd Duffee</dc:creator>
      <pubDate>Fri, 15 Mar 2024 08:31:22 +0000</pubDate>
      <link>https://dev.to/duffee/faster-tetranucleotide-k-mer-frequencies-4pnf</link>
      <guid>https://dev.to/duffee/faster-tetranucleotide-k-mer-frequencies-4pnf</guid>
      <description>&lt;p&gt;I saw &lt;a href="https://dev.to/jmeneghin/calculating-tetranucleotide-k-mer-frequencies-969"&gt;Jennifer's post&lt;/a&gt; about re-writing her &lt;a href="https://github.com/jmeneghin/perl-for-reysenbach-lab/blob/master/get_kmer_frequencies.pl" rel="noopener noreferrer"&gt;perl scripts&lt;/a&gt; in python and how she saw a 2.5 times improvement.&lt;/p&gt;

&lt;p&gt;How could this be?  My favourite language can't be that slow.&lt;br&gt;
It must be programmer error.&lt;/p&gt;

&lt;p&gt;I have an interest in Perl and Science, so time to roll up sleeves and learn me some profiling/benchmarking. What follows is my internal monologue and the notes I scribbled down during the learning process. For those that want to follow along, I've &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/tree/main/get_kmer_frequencies" rel="noopener noreferrer"&gt;created a small repo&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;Voltaire said that Hell is other people's code. My first step was to &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies1.pl" rel="noopener noreferrer"&gt;re-write it&lt;/a&gt; into &lt;a href="http://modernperlbooks.com/" rel="noopener noreferrer"&gt;Modern Perl&lt;/a&gt; and in the process, understand what each line does. When it's written idiomatically, it's easier to refactor and I should be able to make some minor performance improvements along the way.&lt;/p&gt;

&lt;p&gt;Assume that the original script has been tested enough.  For me to be correct, I've got to produce the exact same output.  I got close, except for the header line.&lt;br&gt;
&lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies_original.pl#L99" rel="noopener noreferrer"&gt;line 99&lt;/a&gt; &lt;code&gt;print OUT "\t$prefix_$j";&lt;/code&gt; becomes&lt;br&gt;
&lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies1.pl#L89" rel="noopener noreferrer"&gt;line 89&lt;/a&gt; &lt;code&gt;print $out_fh "\t$j";&lt;/code&gt;  Yes, that's a bug because &lt;code&gt;$prefix_&lt;/code&gt; doesn't exist.&lt;/p&gt;

&lt;p&gt;Search "benchmarking tools for linux" and decide that &lt;a href="https://github.com/sharkdp/hyperfine" rel="noopener noreferrer"&gt;hyperfine&lt;/a&gt; is good for what I'm doing. Run Jennifer's new python script against my refactored perl and find that the python is 1.26 times faster for k=3 and 1.47 times faster for k=4. For the Covid-19 sequence, these are both on the order of hundreds of milliseconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hyperfine &lt;span class="nt"&gt;--warmup&lt;/span&gt; 3 &lt;span class="s1"&gt;'perl/get_kmer_frequencies.pl Covid-19_seq.fasta 3 boyd1'&lt;/span&gt; &lt;span class="s1"&gt;'python/get_kmer_frequencies.py -i Covid-19_seq.fasta -k 3 -p boyd2'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ok, not bad. Better than 2.9 times faster, but that's probably down to the way that &lt;strong&gt;hyperfine&lt;/strong&gt; warms the cache and separates out User time from System time.&lt;/p&gt;

&lt;p&gt;Oh, I should just check how much I improved when I refactored.  Run it against Jennifer's original perl script and ... hers was 1.1 times faster. Well, that was a bit embarrassing.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ahem&lt;/em&gt; I was ... aiming at improving readability, .. maintainability, y'know best practice and all that. That's my story and I'm sticking to it. ;)&lt;/p&gt;

&lt;h2&gt;
  
  
  For sanity's sake
&lt;/h2&gt;

&lt;p&gt;Check that the output of the new file is the same as the original, otherwise you've messed up the refactoring. I started using this test script with &lt;a href="https://metacpan.org/dist/Test-Harness/view/bin/prove" rel="noopener noreferrer"&gt;prove&lt;/a&gt; to make it quick and easy.&lt;br&gt;
Saved as &lt;strong&gt;i.t&lt;/strong&gt;, I run it with &lt;code&gt;prove i.t&lt;/code&gt; for the lols.&lt;br&gt;
It gets noisy when there's a problem, so I go back to running it by hand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;Test2::&lt;/span&gt;&lt;span class="nv"&gt;V0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$standard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;get_kmer_frequencies.pl&lt;/span&gt;&lt;span class="p"&gt;';&lt;/span&gt;
&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;@files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;M&lt;/span&gt; &lt;span class="nv"&gt;$a&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;M&lt;/span&gt; &lt;span class="nv"&gt;$b&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nb"&gt;glob&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;get_kmer*&lt;/span&gt;&lt;span class="p"&gt;';&lt;/span&gt;
&lt;span class="nv"&gt;ok&lt;/span&gt; &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$latest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;shift&lt;/span&gt; &lt;span class="nv"&gt;@files&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nv"&gt;isnt&lt;/span&gt; &lt;span class="nv"&gt;$latest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$standard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Files to compare&lt;/span&gt;&lt;span class="p"&gt;';&lt;/span&gt;

&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;@args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sx"&gt;qw'Covid-19.fasta 3'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nv"&gt;ok&lt;/span&gt; &lt;span class="nb"&gt;system&lt;/span&gt;&lt;span class="p"&gt;('&lt;/span&gt;&lt;span class="s1"&gt;perl&lt;/span&gt;&lt;span class="p"&gt;',&lt;/span&gt; &lt;span class="nv"&gt;$standard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;@args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;A&lt;/span&gt;&lt;span class="p"&gt;')&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Make A_kmers.txt&lt;/span&gt;&lt;span class="p"&gt;';&lt;/span&gt;
&lt;span class="nv"&gt;ok&lt;/span&gt; &lt;span class="nb"&gt;system&lt;/span&gt;&lt;span class="p"&gt;('&lt;/span&gt;&lt;span class="s1"&gt;perl&lt;/span&gt;&lt;span class="p"&gt;',&lt;/span&gt; &lt;span class="nv"&gt;$latest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;@args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;B&lt;/span&gt;&lt;span class="p"&gt;')&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Make B_kmers.txt&lt;/span&gt;&lt;span class="p"&gt;';&lt;/span&gt;

&lt;span class="nv"&gt;is&lt;/span&gt; &lt;span class="p"&gt;`&lt;/span&gt;&lt;span class="sb"&gt;diff A_kmers.txt B_kmers.txt&lt;/span&gt;&lt;span class="p"&gt;`,&lt;/span&gt; &lt;span class="sx"&gt;q{}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No differences in output&lt;/span&gt;&lt;span class="p"&gt;';&lt;/span&gt;

&lt;span class="nv"&gt;done_testing&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clever people do this from the start.&lt;br&gt;
I did this after a bug I introduced messed up the output and I hadn't immediately noticed. What it was is that I changed the key separator to a character that was found in some of those keys and it then split those keys. Oops.&lt;/p&gt;
&lt;h2&gt;
  
  
  NYTProf time
&lt;/h2&gt;

&lt;p&gt;When you get serious about optimizing programs, trying to enhance performance, you reach for &lt;a href="https://en.wikipedia.org/wiki/Profiling_(computer_programming)" rel="noopener noreferrer"&gt;profiling tools&lt;/a&gt; that can analyze your code's memory or time complexity. In Perl, &lt;a href="https://metacpan.org/pod/Devel::NYTProf" rel="noopener noreferrer"&gt;Devel::NYTProf&lt;/a&gt; comes highly recommended. I use it to collect data on the number of times each statement is called and how long it spends executing it. That way I can work out where to invest the effort making the script faster, what gives the most bang for the buck.&lt;/p&gt;

&lt;p&gt;Grab the profiler and run&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;perl -d:NYTProf get_kmer_frequencies.pl Covid-19_seq.fasta 3 boyd1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and open up the &lt;strong&gt;nytprof/index.html&lt;/strong&gt; using &lt;code&gt;nytprofhtml --open&lt;/code&gt; to see&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Calls   P   F   Exclusive Time  Inclusive Time  Subroutine
9653    1   1   31.6ms  31.6ms  main::rc_seq
25      2   1   28.1ms  59.7ms  main::process_it
82498   7   1   9.46ms  9.46ms  main::CORE:print (opcode)
3175    4   1   7.89ms  7.89ms  main::CORE:sort (opcode)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Sorting out the sort
&lt;/h3&gt;

&lt;p&gt;Obviously, the &lt;code&gt;rc_seq&lt;/code&gt; is the big sub that needs attention, but what about that &lt;code&gt;sort&lt;/code&gt;? Quickly looking at the sort on &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies1.pl#L78" rel="noopener noreferrer"&gt;Line 78&lt;/a&gt; &lt;code&gt;for my $i (keys %knucs)&lt;/code&gt; I see that there's no reason to sort those keys. Saved one sort and the script runs about the same. There's another sort &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies1.pl#L96" rel="noopener noreferrer"&gt;inside a loop&lt;/a&gt; which can be extracted out of the loop. &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies2.pl#L84" rel="noopener noreferrer"&gt;Extracting that&lt;/a&gt; made it run 1.15 times faster!&lt;/p&gt;

&lt;p&gt;Changing the header line (&lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies1.pl#L87" rel="noopener noreferrer"&gt;line 87&lt;/a&gt;) from a for loop to a &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies3.pl#L87" rel="noopener noreferrer"&gt;join over a list&lt;/a&gt; is 1 or 2 percent faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I print thee? Let me count the ways.
&lt;/h3&gt;

&lt;p&gt;Messing about with printing in the inner loop didn't gain much, but changing the key separator from a tab &lt;code&gt;"\t"&lt;/code&gt; (interpolated string) to an underscore &lt;code&gt;'_'&lt;/code&gt; (a string literal) made a 10% improvement. (it also introduced the bug noted above because the keys used the underscore. changed it to a colon - bug gone)&lt;/p&gt;

&lt;p&gt;&lt;code&gt;say&lt;/code&gt; is marginally slower than &lt;code&gt;print&lt;/code&gt; so use &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies5.pl#L93" rel="noopener noreferrer"&gt;print inside the loop&lt;/a&gt; that gets called a lot to save maybe 10% on that call. From 32ms to 28ms is a small, but nice gain for a one line change.&lt;/p&gt;

&lt;h3&gt;
  
  
  rc_seq - transforming the sequence
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;rc_seq&lt;/code&gt; sub is an &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies4.pl#L151" rel="noopener noreferrer"&gt;if-elsif block&lt;/a&gt; that splits a string into individual characters, translates &lt;strong&gt;ACGT&lt;/strong&gt; into their complement (&lt;strong&gt;TGCA&lt;/strong&gt;), reverses the array and joins it back into a string.&lt;/p&gt;

&lt;p&gt;Being Perl, we can &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies5.pl#L144" rel="noopener noreferrer"&gt;manipulate and reverse the string&lt;/a&gt; in-place. The change makes it shorter and more obvious (sometimes it runs faster). Actually, I ran this through the profiler and the sub now runs 5 times faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  process_it - collecting the frequencies
&lt;/h3&gt;

&lt;p&gt;This sub does the work of splitting the sequence into &lt;strong&gt;kmers&lt;/strong&gt; and counting them. The longest time spent here is &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies5.pl#L136" rel="noopener noreferrer"&gt;incrementing&lt;/a&gt; the &lt;code&gt;%knucs&lt;/code&gt; hash.&lt;/p&gt;

&lt;p&gt;The second longest time is spent turning the sequence into an array of letters to create all the kmer substrings. Splitting isn't bad, but &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies5.pl#L129" rel="noopener noreferrer"&gt;joining sets of letters&lt;/a&gt; together is. Use the &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies6.pl#L130" rel="noopener noreferrer"&gt;string function,&lt;/a&gt; &lt;code&gt;substr&lt;/code&gt; instead and speed that line up by 5 times.&lt;/p&gt;

&lt;p&gt;Now marginally faster the python script in speed. Over 20% faster for k=3, and 5% (+/- 5%) faster for k=4. That's a decent improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Like the end of a great song ... a Key change!
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies6.pl#L91" rel="noopener noreferrer"&gt;line 91&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my @items = map { my $key = join ':', $_, $i; $knucs{$key} // 0 } @record_keys;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script spends most of its time (55ms!!!), longer than &lt;em&gt;anything else&lt;/em&gt;, on this line.&lt;/p&gt;

&lt;p&gt;Assume that the problem isn't the &lt;code&gt;map&lt;/code&gt; but constructing the key for the lookup. Change the key to a &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies7.pl#L92" rel="noopener noreferrer"&gt;2 dimensional lookup&lt;/a&gt; and see if that improves things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WHEN&lt;/strong&gt; you finally get it right (and remember the correct order that you construct the keys in), &lt;a href="https://github.com/duffee/faster-perl-for-reysenbach/blob/main/get_kmer_frequencies/get_kmer_frequencies7.pl#L92" rel="noopener noreferrer"&gt;line 92&lt;/a&gt; is now 2.5 times faster than before and the perl script is now 40% faster than the python script.&lt;/p&gt;

&lt;p&gt;Keys are constructed/used on lines 79, 91, 135&lt;/p&gt;

&lt;h2&gt;
  
  
  STOP!!!!
&lt;/h2&gt;

&lt;p&gt;Know when to stop.&lt;/p&gt;

&lt;p&gt;There are no more obvious or easy gains here. Any more work is likely to yield small returns. Go outside, have a life or at the least consult &lt;a href="https://xkcd.com/1205/" rel="noopener noreferrer"&gt;the relevant chart&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Well, after thinking a while, maybe constructing the output could be improved, but I'm moving on. I've exceeded my goal of making the perl script as fast as the python script and learned more about refactoring and profiling. A bit like how audiophiles use your music to listen to their equipment, I've used Jennifer's science to better understand my Perl and had fun doing it.&lt;/p&gt;

&lt;p&gt;There's a niggling thought at the back of my mind, now that I feel I better understand the purpose of the script, whether &lt;a href="https://bioperl.org" rel="noopener noreferrer"&gt;BioPerl&lt;/a&gt; can do this even faster. I will leave that for another day. Oh, look &lt;em&gt;glycine&lt;/em&gt; has already done &lt;a href="https://blogs.perl.org/users/glycine/2024/03/reading-sequences-from-fasta-foramt-alignment-by-bioperl.html" rel="noopener noreferrer"&gt;most of the hard work&lt;/a&gt; for me. Many thanks!&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;p&gt;In summary, these are reflections on the changes that I made in chronological order. This may be someone's first time considering performance, so I include basic rules of thumb I used along with the things I did not know before.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Modern Perl style adds a small amount of overhead, but the sanity it brings is a price worth paying.&lt;/li&gt;
&lt;li&gt;Streamline a method of checking the output hasn't changed&lt;/li&gt;
&lt;li&gt;Don't &lt;code&gt;sort&lt;/code&gt; when order is not important&lt;/li&gt;
&lt;li&gt;Calculate constant values &lt;em&gt;outside&lt;/em&gt; of loops&lt;/li&gt;
&lt;li&gt;Use built-in list functions over loops (&lt;code&gt;join&lt;/code&gt; instead of &lt;code&gt;for&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Interpolated strings are slower than string literals (prefer single quotes over double quotes)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;say&lt;/code&gt; is slightly slower than &lt;code&gt;print&lt;/code&gt;.  Avoid it in heavy loops.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;substr&lt;/code&gt; is faster than &lt;code&gt;split&lt;/code&gt;ing and &lt;code&gt;join&lt;/code&gt;ing&lt;/li&gt;
&lt;li&gt;Creating a single hash key is slower than using a 2 level hash&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Human Genome is &lt;em&gt;way&lt;/em&gt; too large. Grab the protein sequence for Caenorhabditis elegans. It takes about 5 minutes to run.&lt;/p&gt;

&lt;p&gt;Run your frequent tests with the Covid-19 sequence. Repeated runs with anything larger take too long for rapid turnaround.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WARNING:&lt;/strong&gt; hyperfine will run each program 20 to 40 times to get decent statistics. You won't want to wait around for a file that takes 5 minutes to process a single run.&lt;/p&gt;

&lt;p&gt;I'll leave you with a couple of related references for further reading, &lt;em&gt;chrisarg&lt;/em&gt;'s work on &lt;a href="https://blogs.perl.org/users/chrisarg/2023/09/of-go-c-perl-and-fastq-file-conversion-vol-i-intro.html" rel="noopener noreferrer"&gt;parsing FastQ files fast&lt;/a&gt;&lt;br&gt;
and a marketsplash tutorial on &lt;a href="https://marketsplash.com/tutorials/perl/perl-code-profiling-tools/" rel="noopener noreferrer"&gt;Perl code profiling tools&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  In Conclusion
&lt;/h2&gt;

&lt;p&gt;My corollary to Cunningham's Law:&lt;br&gt;
Don't ask people how to make your code run faster;&lt;br&gt;
&lt;a href="https://xkcd.com/386/" rel="noopener noreferrer"&gt;Tell them their language is slow&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's taken a lot longer to reply Jennifer's post than I'd anticipated, but right now I have the warm glow that comes from being able to say, (until someone iterates on the above corollary)&lt;/p&gt;

&lt;p&gt;... &lt;strong&gt;Python is SLOW!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>perl</category>
      <category>bioinformatics</category>
      <category>profiling</category>
      <category>softwaredevelopment</category>
    </item>
  </channel>
</rss>
