Felipe Gasper

Posted on May 7, 2021

Perling and Curling

#perl #programming

Most of us probably know curl as a quick and easy way to send HTTP requests from the command line.

That tool, though, is just an interface to the curl project’s real gold: the libcurl API. Using this API, applications in all sorts of languages have easy access to the awesome power that libcurl provides. This article will discuss how to use that power in Perl.

A Quick Example

use Net::Curl::Easier;

my $easy = Net::Curl::Easier->new(
    url => 'http://perl.org',
    followlocation => 1,
)->perform();

print $easy->head(), $easy->body();

Let’s talk about what just happened.

Net::Curl::Easier is a thin wrapper around Net::Curl’s “easy” interface—“easy” is what libcurl calls it!—that smooths over some rough edges in Net::Curl.

(Full disclosure: I am Net::Curl::Easier’s maintainer.)

Once we create our “Easier” object, having given it the proper URL and told it to follow HTTP redirects (followlocation refers to HTTP’s Location header), we run perform() on the Easier object.

After that, we print the HTTP response headers and body, and we’re done!

Why not just use HTTP::Tiny?

Indeed. Well, error reporting, for one. Consider:

Net::Curl::Easier->new(
    url => 'http://blahblah',
)->perform();

If you run this you’ll probably just see Couldn't resolve host name printed to standard error. But if you dig deeper you’ll see something nifty:

use Net::Curl::Easier;
use Data::Dumper;

eval {
    Net::Curl::Easier->new(
        url => 'http://blahblah',
    )->perform();
};
print Dumper $@;

It turns out that that error isn’t just a string; it’s an exception object.

In large systems I often want to handle certain failure types differently from others. HTTP::Tiny’s errors are just strings, so type-specific failure handling with HTTP::Tiny entails parsing strings, which is brittle. What if someone decides to reword some error message for clarity, thus breaking my string parser?

With Net::Curl I can look for specific numeric error codes, documentation for which the curl project itself maintains. This is much more robust.

Don’t care. What else you got?

OK. How about this:

my $easy = Net::Curl::Easier->new(
    username => 'hal',
    userpwd => 'itsasecret',
    url => 'imap://mail.example.com/INBOX/;UID=123',
)->perform();

I just queried … an email inbox?!?

Curl doesn’t just speak HTTP; it speaks many other protocols including IMAP, LDAP, SCP, and MQTT. To see the full list of protocols that your curl supports, run curl --version.

Concurrency

Curl can also run concurrent queries. To do that I recommend using Net::Curl::Promiser. (Full disclosure: I also maintain this module.)

Example, assuming use of Mojolicious:

use Net::Curl::Easier;
use Net::Curl::Promiser::Mojo;
use Mojo::Promise;

my $easy1 = Net::Curl::Easier->new(
    url => 'http://perl.org',
    followlocation => 1,
);

my $easy2 = Net::Curl::Easier->new(
    username => 'hal',
    userpwd => 'itsasecret',
    url => 'imap://mail.example.com/INBOX/;UID=123',
);

my $easy3 = Net::Curl::Easier->new(
    username => 'hal',
    userpwd => 'itsasecret',
    url => 'scp://tty.example.com/path/to/file',
);

my $promiser = Net::Curl::Promiser::Mojo->new();

Mojo::Promise->all_settled(
    $promiser->add_handle($easy1)->then( sub {
        print $easy1->head(), $easy1->body();
    } ),
    $promiser->add_handle($easy2)->then( sub {
        # ... whatever you want with the IMAP result
    } ),
    $promiser->add_handle($easy3)->then( sub {
        # ... whatever you want with the SCP result
    } ),
)->wait();

We just grabbed a web page, queried a mailbox, and downloaded a file via SCP, all in parallel!

Note, too, that this method interfaces seamlessly with other promises. So if you have existing Mojo::UserAgent-based code, you can add requests for other protocols alongside it.

Net::Curl::Promiser also works natively with
AnyEvent and
IO::Async, should those be of greater interest to you. It also provides a convenience layer for custom select-based event loops, in case that’s how you roll.

Other Modules

Some alternatives to modules presented above:

AnyEvent::YACurl: A newer library than Net::Curl that simplifies the interface a bit. It assumes use of AnyEvent, though, so if you’re not using AE then this may not be for you.
WWW::Curl: The library of which Net::Curl is a fork. It can do much of what Net::Curl does but lacks access to libcurl’s MULTI_SOCKET interface, which is faster and more flexible than curl’s internal select-based manager for concurrent requests.
Net::Curl::Simple: A wrapper by Net::Curl’s original author. It provides some of the same conveniences as Net::Curl::Promiser and Net::Curl::Easier but uses callbacks rather than promises.

Closing Thoughts

Curl exposes an awesome breadth of functionality, of which the above examples have just scratched the surface. Check it out!

Top comments (3)

James Smith • May 7 '21

You are wrong about WWW::Curl when it comes to forking {remember forking is really bad if the codebase is large!}- it has it's own way of doing multiple requests which is very good - as it is much easier to write a dynamically queued request chain. I use this for parallelised web crawling - each request gets handled while other requests are being fetched, and can add more URLs to the search string.

This is not so easy in the methods you are using.... If you are using Anyevent it works in a similar way (but again not as efficiently) I think than WWW::Curl::Multi.

WWW::Curl::Multi is a bit harder to use - but with a few lines of wrapper script modules this is resolved nicely...

Felipe Gasper • May 7 '21

Hi!

The above should all happen within the same process, so I’m not sure where you’re coming from regarding forking.

WWW::Curl::Multi, as far as I can tell, uses curl’s internal select-based event loop exclusively, which won’t be as efficient as platform-native polling methods like epoll or kqueue. It appears to mimic the C API pretty closely, much like Net::Curl::Multi.

Do you mind clarifying what you see as “not so easy” regarding adding more URLs to the request queue?

E. Choroba • May 7 '21

HTTP::Request::FromCurl is also worth mentioning. Suitable for those who want to transfer curl commands into their HTTP::Reqeust equivalents.