DEV Community

loading...

Shellnium: Simple Selnium WebDriver for Bash

rasukarusan profile image Rasukarusan ・5 min read

Selenium can be used in many languages

Selenium, as we all know, is a tool for automated browser operations.
Selenium uses a WebDriver to perform automatic browser operations, and there are WebDrivers for various languages such as php, python, and javascript.

For example, if you want to use php, you can use facebook-webdriver like this.

// run chromedriver
$driver = RemoteWebDriver::create($host, DesiredCapabilities::chrome());
// open the URL
$driver->get('https://www.google.co.jp/');
Enter fullscreen mode Exit fullscreen mode

For javascript, you can use selenium-webdriver to do this.

// run chromedriver
const capabilities = Capabilities.chrome()
const driver = await new Builder().withCapabilities(capabilities).build()
// open the URL
await driver.get('https://www.google.co.jp/')
Enter fullscreen mode Exit fullscreen mode

It can be written in a similar way in almost any language, so if you have experience with Selenium in any language, you should be able to write it easily in other languages.

Yes, except for ShellScript.

Why isn't there a WebDriver for Bash or Zsh?

As far as I can see, there is no WebDriver for ShellScript, so if you want to try Selenium with Shell, you'll have a hard time.
You need to do npm install, composer install, etc. to get the environment ready.
I thought it would be easier for me to automate browser operations, but that's not the case.

Selenium for Shell Hackers - Shellnium

So, this is for those who want to use ShellScript to easily perform automatic browser operations.
As shown below, automatic browser operation can be achieved with ShellScript alone. All you need is bash or zsh.

How to perform automatic browser operations with Shell

The flow of automatic browser operation is as follows. All you have to do is POST JSON to the browser (driver) with curl.

post_json

  1. Browser Launch
  2. URL transition
  3. Keyboard input

one by one to get a feel for it.

0. Launch chromedriver

First of all, you need to run chromedriver. if you don't have chromedriver, just brew install chromedriver. (or download it directly from the official website).

$ chromedriver
Starting ChromeDriver 2.40.565386 (45a059dc425e08165f9a10324bd1380cc13ca363) on port 9515
Only local connections are allowed.
Enter fullscreen mode Exit fullscreen mode

Then chromedriver wait at http://localhost:9515.

1. Launching the Browser & Getting the Session ID

Let's start the browser from Shell.

curl -X POST -H 'Content-Type: application/json' \
     -d '{"desiredCapabilities": { "browserName": "chrome" }}' \
     http://localhost:9515/session
Enter fullscreen mode Exit fullscreen mode

That's it, it's that simple. If you run it, you will see the browser start up.
In order to send commands to this browser in the future, we will need to specify the session and send the commands, so we will need to obtain the session ID.

The sessionID will be returned as JSON in the above curl response, so extract the sessionId in it. Since we are using ShellScript, it will be easier to use jq. Of course, using grep is also good.

$ curl ... http://localhost:9515/session | jq -r '.sessionId'
# Get the sessionID
e1c3d895e7c7d42383c110e5e4cd7bbf
Enter fullscreen mode Exit fullscreen mode

2. URL transition

curl -X POST -H 'Content-Type: application/json' \
    -d '{"url":"https://google.co.jp"}' http://localhost:9515/session/${SESSION_ID}/url
Enter fullscreen mode Exit fullscreen mode

You will be redirected to the top page of Google. The ${SESSION_ID} is the one you obtained earlier.

3. Getting elements & Keyboard input

Now let's do some Selenium-like things. First, we'll do something like findElementByName() to get the element from the name attribute.

curl -X POST -H 'Content-Type: application/json' \
    -d '{"using":"name", "value": "q"}' http://localhost:9515/session/${SESSION_ID}/element \
    | jq -r '.value.ELEMENT'
Enter fullscreen mode Exit fullscreen mode

You can get the element with {"using": "property name", "value": "value"}. Google's search box has name="q", so you can specify it as {"using": "name", "value": "q"}.

search_box

You can get the element ID of the search box by executing the above curl.

$ curl ... http://localhost:9515/session/${SESSION_ID}/element | jq -r '.value.ELEMENT'
# Get the element ID
0.5757440182130869-1
Enter fullscreen mode Exit fullscreen mode

An "element ID" is required to perform any action using the element, such as clicking on the element, keyboard input, or focus.

In the case of keyboard input, it looks like this

curl -s -X POST -H 'Content-Type: application/json' \
-d '{"value": ["animal\n"]}' http://localhost:9515/session/${SESSION_ID}/element/${elementId}/value
Enter fullscreen mode Exit fullscreen mode

The only thing you need to be careful about is to pass the string to be input in array format. You can also add \n to the end to make it act as an ENTER.

Works like Selenium

If you combine the above into a single ShellScript and run it, you have a complete Selenium.

#!/usr/bin/env bash

main() {
    # Open the URL
    navigate_to 'https://google.co.jp'

    # Get the search box
    local searchBox=$(find_element 'name' 'q')

    # keyboard input and searching
    send_keys $searchBox "animal\n"
}

get_session_id() {
    curl -s -X POST -H 'Content-Type: application/json' \
        -d '{
            "desiredCapabilities": {
                "browserName": "chrome"
            }
        }' \
        ${ROOT}/session | jq -r '.sessionId'
}

ROOT=http://localhost:9515
SESSION_ID=$(get_session_id)
BASE_URL=${ROOT}/session/${SESSION_ID}


navigate_to() {
    local url=$1
    curl -s -X POST -H 'Content-Type: application/json' -d '{"url":"'${url}'"}' ${BASE_URL}/url
}

find_element() {
    local property=$1
    local value=$2
    curl -s -X POST -H 'Content-Type: application/json' \
    -d '{"using":"'$property'", "value": "'$value'"}' ${BASE_URL}/element | jq -r '.value.ELEMENT'
}

send_keys() {
    local elementId=$1
    local value=$2
    curl -s -X POST -H 'Content-Type: application/json' \
    -d '{"value": ["'$value'"]}' ${BASE_URL}/element/${elementId}/value >/dev/null
}

main
Enter fullscreen mode Exit fullscreen mode

It's very simple: by making the part of curl that throws JSON into a small function, if you just look at the contents of the main function, it's already a complete WebDriver description.
The parts that have been put together into functions can be put together in selenium.sh, etc. and sourced as shown below for a very simple look.

#!/usr/bin/env bash
source ./selenium.sh

main() {
    # Open the URL
    navigate_to 'https://google.co.jp'

    # Get the search box
    local searchBox=$(find_element 'name' 'q')

    # keyboard input and searching
    send_keys $searchBox "animal\n"
}
main
Enter fullscreen mode Exit fullscreen mode

Bonus: iTerm 🤝 Selenium

So, I attached iTerm and Selenium together with ShellScript.

While working with iTerm, the browser is automatically operating behind the scenes.

More

I've created a repository for shellnium, so have a look at that too!
https://github.com/Rasukarusan/shellnium

Reference

WebDriver

Discussion (0)

pic
Editor guide