grant horwood

Posted on Nov 2, 2021 • Edited on Mar 15, 2023

writing command line scripts in php: part 1; args, preflighting and more

#php

php has a lot of strengths as a web language, but it is also perfectly serviceable for general-purpose scripting. in this series of posts we're going to go over building command line scripts in php. this is part one.

why even do this?

when we tell people that we are writting command line scripts in php, the first question is usually 'why?'

certainly, there are already a lot of good languages out there for scripting. bash, of course, but also python and even perl (once called "the duct tape of the internet"), but there are good reasons to use php.

reuse of existing code: if you have a php web application and you wish to write a command line interface to leverage some or all of its functionality, it makes much more sense to use php for your script and re-use your existing code, rather than rewrite everything from scratch in, say, python.
available skills: if you or your team is long on php skills and short on python or bash, it often makes sense to just play to your strengths.
the language itself: php is actually a pretty powerful language for command scripting. not only does it provide all the standard features, but it also gives us access to things like handling signals, forking processes and even semaphores if we need it.

assumptions

we're going to be looking at writing command line scripts for unix-like operating systems. all the examples here were built on ubuntu 20.04 using php 7.4.9.

the flyover

in this installment, we're going to go over:

making our php script runnable on the command line
preflighting our environment to ensure our script runs right
parsing common command line argument strucutres, and finally
giving our script a nice name that ps recognizes

first, make it runnable

traditionally, when running a php script on the command line, you invoke the php command and pass the script as an argument, ie php /path/to/my/script.php. this works, but is ugly.

to fix this, we'll be putting a shebang at the top of our php script. the shebang is a specially-formatted first line that informs the operating system which interpreter to use when executing the script. if you've done any shell scripting, you're probably familiar with #!/bin/bash. that's a shebang.

let's open our sample script ourfancyscript.php in the editor of our choice and add:

#!/usr/bin/env php
<?php

one thing that we notice about this shebang is that it calls env. normally, in shell scripting, we reference the direct path to bash with #!/bin/bash. but, here, we'll be calling env so that the operating system searches the user's $PATH to find the php intepreter. this is important since, while bash is almost always in /bin, we have no guarantee that we know where the php interpreter lives. using env in our shebang helps increase the portability of our script across different systems. that's a good thing.

now that we have our shebang, we'll set our script to be executable. we'll use the standard permssions set for this:

chmod 755 ourfancyscript.php

once we have the execution permissions set, we can run the script simply by calling it:

/path/to/ourfancyscript.php

of course, the script does nothing, but it actually does do that nothing. so that's progress.

preflight

when we write php for a web application, we generally have a good idea of the environment it will run in. we provisioned the web server, after all.
command line scripts, however, are a different story. we have no control over the environment. it's someone else's computer.

with that in mind, it's good idea to always start your script with a call to a 'preflight' function that confirms the system has everything necessary to run the script. if any of our preflight tests do not pass, we can halt the script with an appropriate error instead of just barging ahead and making a mess.

common things we can check for in our preflight include:

minimum php version: of course we're writing for the lowest possible php version we can (portability and all!), but we should still check that we meet the minimum version. keep in mind that a three year-old amazon linux ec2 runs php 5.3 by default!

checking the php version is as straightforward as a call to the built-in phpversion() command.

necessary extensions are loaded: if our script calls for an extension, we should make sure that it's actually loaded before we start. we can do this with the built-in command extension_loaded(). so, for instance, if we want to confirm php has 'imagick' available, we could test that in our 'preflight' function like so:

if (!extension_loaded('imagick')) {
    die('imagick extension required. exiting.');
}

file access: we may want to read files or write to files or directories. it's a good idea to confirm that the user running our script has permissions to do that before starting. php has a number of built-in commands to accomplish this:

file_exists to confirm if the file exists
is_dir to determine if the file is a directory
is_writable() to test if the user has write access to the file or directory

let's put all of that together into a short sample preflight function:

#!/usr/bin/env php
<?php

/**
 * Confirm system can run script
 */
function preflight()
{
    $phpversion_array = explode('.', phpversion());
    if ((int)$phpversion_array[0].$phpversion_array[1] < 56) {
        die('minimum php required is 5.6. exiting');
    }

    if(!extension_loaded('posix')) {
        die('posix required. exiting');
    }

    if(!is_writable('/tmp')) {
        die('must be able to write to /tmp to continue. exiting.');
    }

    if(!file_exists(posix_getpwuid(posix_getuid())['dir'].'/.aws/credentials')) {
        die('an aws credentials file is required. exiting');
    }
}

in the first if block of this function we check the php version is at least 5.6. we do a little clumsy casting here to accomplish this as we only care about the major and minor numbers.

next, we confirm that the posix extension is loaded. posix is basically a set of standard ways to interface with the host operating system. definitely something we will want for our script.

we then do a fast confirmation that we can write to the /tmp directory and, finally, determine that the user has an aws credentials file.

one thing to note is in the last if block, we used a couple of those posix commands to get the running user's home directory. the '~/' construction will not work here. instead, we get the user's id number with posix_getuid() and then pass that to posix_getpwuid() to get an array of information about the user, including their home directory, from the /etc/passwd file.

parse command line arguments

handling command line arguments and switches is something most scripts need to do, so we're going to write a short function that takes the arguments passed to our script and processes them into an array that we can reference later when determining what functionality to provide.

this function handles four basic types of arguments:

switches
these are single letter arguments that are preceded by a dash, think the -a argument to ls to show hidden files.

long switches
these are the same as switches except... longer. an example would be curl accepting --silent as a synonym for -s. long switches are preceded by two dashes.

asignments
this is for passing data into our script. assignment arguments take two dashes and use an equal sign to indicate the value, ie. --outfile=/path/to/file or mysql's horrifying --password=mynothiddenpassword.

positional arguments
these are arguments without any preceding dashes; their usage is determined entirely by their position. think the linux 'move' command mv /path/to/origin /path/to/destination. there are two positional arguments here, and we know which value is assigned to the origin and which to the destination by the order they are written in.

with that in mind, we can add this function to our command line scripts to parse arguments for us:

#!/usr/bin/env php
<?php

/**
 * Parses command line args and returns array of args and their values
 *
 * @param Array $args   The array from $argv
 * @return Array
 */
function parseargs($args)
{
    $parsed_args = [];

    $args = array_slice($args, 1);
    for ($i=0;$i<count($args);$i++) {

        switch (substr_count($args[$i], "-", 0, 2)) {
            case 1:
                foreach (str_split(ltrim($args[$i], "-")) as $a) {
                    $parsed_args[$a] = isset($parsed_args[$a]) ? $parsed_args[$a] + 1 : 1;
                }
                break;

            case 2:
                $parsed_args[ltrim(preg_replace("/=.*/", '', $args[$i]), '-')] = strpos($args[$i], '=') !== false ? substr($args[$i], strpos($args[$i], '=') + 1) : 1;
                break;

            default:
                $parsed_args['positional'][] = $args[$i];
        }
    }

    return $parsed_args;
}

we can then call that function in our script with

$our_parsed_args = parseargs($argv);

a few things to note here:

the argument to parseargs() must always be $argv. this is a special variable that contains then entire command as called, including all the arguments. you can read up on $argv in the docs.
the return is an associative array that indicates which arguments were passed and their value, if any, or a count of the number of times a switch was passed.

let's take a look at an example to better explain all this. if we call our script with the following arguments,

./myscript.php -vvv --silent -ab --my-name=gbhorwood -c /path/to/inputfile /path/to/ouputfile

then parseargs() will return an array that looks like this:

Array
(
    [v] => 3
    [silent] => 1
    [a] => 1
    [b] => 1
    [my-name] => gbhorwood
    [c] => 1
    [positional] => Array
        (
            [0] => /path/to/inputfile
            [1] => /path/to/ouputfile
        )

)

there are a couple of noteworthy things here.

first, all switches and arguments that were passed to the script are now keys in this array. if this array does not have a key for an arg, it was not passed.

second, the value of each switch in this array is the number of times it was called. we called our script with the switch -vvv. that's three 'v's and, thus, the value in the array for the key 'v' is 3. this would also work if we passed -v -v -v, or any other combination, ie -v -vv.

third, the value of assignment arguments is the value that was passed. on the command line, called the script with --my-name=gbhorwood and in the array parseargs() returned, the key my-name has the value of 'gbhorwood'.

lastly, the positional key holds an array of positional arguments in the order they were passed to the script.

once we have this array from parseargs(), we can validate the request and take the necessary action. important stuff!

give our script a meaninful name

we're basically at the end of this installment, but one last worthwhile thing to do with our script is to name it.

we can set the 'title' of our script that linux shows when we run the ps(1) by using php's built-in command cli_set_process_title.

let's cook up a short script that sets the title then runs until killed (so we have time to run ps):

#!/usr/bin/env php
<?php

/**
 * Set the title of our script that ps(1) sees
 */
cli_set_process_title("ourfancyscript");

while (true) {
    print ".";
    sleep(1);
}

we've titled our fancy script ourfancyscript, so let's check that out in ps.

ps -ef | grep "ourfancyscript"
ghorwood 1981824 1981445  0 08:32 pts/2    00:00:00 ourfancyscript
ghorwood 1982111 1135804  0 08:32 pts/8    00:00:00 grep --color=auto ourfancyscript

there it is! we look like a real process, now.

it should be noted that this title is not used by either top(1) or killall(1), but it is recognized by pidof

pidof ourfancyscript
2012707

next steps

with these basics under our belt, we can now start to write effective command line scripts in php. in future installments, we will cover things like interactive user input, effective and formatted output and, eventually, some nitty-gritty unix-y stuff like trapping signals and forking processes.