DEV Community

Cover image for Load a list of lines into an array (easily)
🌌 Sébastien Feugère ☔
🌌 Sébastien Feugère ☔

Posted on • Updated on

Load a list of lines into an array (easily)

This blog post describes a common task my colleagues ask often about repeating a dynamic string in a defined token and adding some or, and, = in between, plus finishing smartly.

I like to use the Perl's __DATA__ token at the end of my scripts for this. The __DATA__ token strength is to make possible to « "embed” a file inside a Perl program then read it from the DATA filehandle ». It saves you the creation and opening of a real file and is very handy for quick prototypes and tests.

#!/usr/bin/env perl
use strict;
use warnings;

# Your script here

# Everything under is considered as 
# the end of the code
__DATA__
a
lot
lot
of
stuff
here
...
Enter fullscreen mode Exit fullscreen mode

A common practice is to load those data to an array by treating them as a file handle:

my @lines = <DATA>;
Enter fullscreen mode Exit fullscreen mode

But the values would include carriage returns, what you obviously don't want. I used two solutions for this:

my @lines;
push @lines, 
 split while <DATA>;
Enter fullscreen mode Exit fullscreen mode

This is quite readable and self-explanatory (remember Perl a natural language, it was created by a linguist). Feel free to comment if something is unclear so I could improve the post.

Ok, I have to admit a little secret:

push my @lines, 
  split while <DATA>;
Enter fullscreen mode Exit fullscreen mode

... without the pre-declaration of @lines does the same. I had to counter check it worked, but as often with Perl, when you spontaneously think of something silly, it actually works naturally (I have to admit it sometimes looks like a miracle).

If you want uniq values (you surely do), one way is to use the core module List ::Util:

use List::Util qw(uniq);

push my @lines, 
  uniq split while <DATA>;
Enter fullscreen mode Exit fullscreen mode

Another way to do it is always possible:

chomp( my @lines = uniq <DATA> );
Enter fullscreen mode Exit fullscreen mode

I actually prefer this list context solution, for it's shortness, dunno which one is the more readable, and it is good to choose the readable way.

Let's say you want to generate a series of or for your colleagues or customers. We are actually doing a super advanced language generation thing here:

#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw(uniq);

chomp( my @lines = uniq <DATA> );

for ( @lines ) {
  # $_ is the current loop element
  print generate_string( $_ );
  # $lines[-1] is the last array element
  if ( not $_ eq $lines[-1] ) {
    print ' or ';
  } else {
    print "\n";
  }
}    

sub generate_string {
  return 'line == "' . shift . '"';
}

__DATA__
a
lot
lot
of
stuff
here
...
Enter fullscreen mode Exit fullscreen mode
$ perl lines.pl
line == "a' or line == "lot' or line == "of' or line == "stuff' or line == "here' or line == "...'  
Enter fullscreen mode Exit fullscreen mode

Lots of other solutions exist, check the Perl one-liners thing that allow to learn a lot more about those kind of practices.

The quantities of cools things you can do inside this loop is infinite, from log parsing to generating code or data munging, thanks to the kindness of Perl.

References

Note

I wrote this because my memory is awful and I was tired of always searching for the exact syntax of the __data__ token to array process. Hope it will help all kinds of people including me when I type it in a search engine.

Discussion (0)