It can do anything with delimited data tables with millions of rows in a few lines of code, as fast as a Go, C counterpart program.
In Python you need to write lots of lines, use third-party libraries to normalize columns with regex, dropping rows, etc. In AWK, it's just a one liner, or you can write it in an *.awk file as well.
AWK is a secret gem for dealing with big data.
Update:
I share my AWK script where I normalize a millions of lines long delimited CSV table only with AWK, as fast as a Go counterpart program:
#!/usr/bin/awk -f# This block only runs once at the start, before the first line# Use this to print CSV header on the topBEGIN{FS="|";# input field separatorOFS="|";# output field separator}# This block runs at every line{# We will order a new named variable to every columnline=$0;# variable `$0` stores the entire lineurl=$1;title=$2;body=$3tags=$4;if(line~/^$/)next;# if line is blank, then skip itif(NF!=4)next;# if column count is not equal to 4, then skip the line# Skip any line where tags column contains the word "cars"if(index(tags,"cars")!=0){next;}# Normalize the url column with regex by only keeping the article id# Example input: <a href="https://example.com/article/foo123456">Hello</a>gsub(/.*example\.com\/article\/|[\042].*/,"",url);# outputs: foo123456# Skip lines that has non-alphanumeric characters in url column (like <>#&@)# Skip lines that has empty url column (after gsub normalization)# Skip lines where url starts with foo or barif(url!~/[[:alnum:]]/||length(url)==0||url~/^foo|^bar/){next;}# Replace multiple ; with one (needed for errorless CSV import in Postgres)gsub(/[\073]+/,";",tags);# Print the line with OFS, aka: profit! :)printurl,title,body,tags;}
I'm going to put Ruby under this one, as well. Ruby inherited a ton of Perlisms that make it competitive with Perl, awk, and sed for these types of use cases. But they're really underused Eg the following flags are all relevant here: n, p, e, i, l, a, s, 0, c, F, and the 2-letter globals ruby -e 'puts global_variables.grep /\$.$/' and BEGIN { ... } and END { ... } and ARGF and flip flops (which most people don't even know exist), and regex literals in conditionals, and the private methods that are added to main when -n and -p flags are set. IDK, probably other stuff, too, that's all off the top of my head.
AWK
It can do anything with delimited data tables with millions of rows in a few lines of code, as fast as a Go, C counterpart program.
In Python you need to write lots of lines, use third-party libraries to normalize columns with regex, dropping rows, etc. In AWK, it's just a one liner, or you can write it in an
*.awkfile as well.AWK is a secret gem for dealing with big data.
Update:
I share my AWK script where I normalize a millions of lines long delimited CSV table only with AWK, as fast as a Go counterpart program:
I'm going to put Ruby under this one, as well. Ruby inherited a ton of Perlisms that make it competitive with Perl, awk, and sed for these types of use cases. But they're really underused Eg the following flags are all relevant here: n, p, e, i, l, a, s, 0, c, F, and the 2-letter globals
ruby -e 'puts global_variables.grep /\$.$/'andBEGIN { ... }andEND { ... }andARGFand flip flops (which most people don't even know exist), and regex literals in conditionals, and the private methods that are added tomainwhen-nand-pflags are set. IDK, probably other stuff, too, that's all off the top of my head.100% agree, I love awk and use it all the time! Also: combining AWK with the other UNIX utilities such as
cat,sort,uniqetc.Have you ever read Ryan Tomayko's AWK-ward Ruby? I didn't realize that Ruby had inherited so much from AWK, but it makes me happy as a Ruby user.