DEV Community

Serverless web scraper in Ruby - tutorial

Marcin K. on October 05, 2019

Imagine you have this awesome web app that will make you very rich someday. This app has some end-user tests. You used Selenium to automate all the...
Collapse
 
kronos35 profile image
Kronos35

I am getting this error my dude:

Function\u003cSelenium::WebDriver::Error::UnknownError\u003e","errorMessage":"unknown error: Chrome failed to start: exited abnormally\n  (chrome not reachable)\n  (The process started from chrome location bin/headless-chromium is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Collapse
 
mknycha profile image
Marcin K.

Hmmm I have just tried re-doing this tutorial on a new lambda function but I was not able to replicate this issue. So in Docker it works fine, but the issue appears only after upload to lambda?
What is the exact selenium-webdriver gem version that you're using?

Collapse
 
kronos35 profile image
Kronos35 • Edited

Hey, I used the exact same version used in this tutorial, but it looks like they changed headless chrome a little bit. Anyways whatever the case I managed to make it work by adding --disable-dev-shm-usage to the Selenium Chrome options.

You should update the tutorial to include this option.

Thread Thread
 
mknycha profile image
Marcin K.

Ok, I will. Thanks for your comment

Collapse
 
hguzman profile image
Henry Miguel Guzmán Escorcia • Edited

My code does not work... :(

require 'json'
require 'selenium-webdriver'

def lambda_handler(event:, context:)
  setup_driver
  # driver.navigate.to 'http://www.google.com'
  # element = driver.find_element(name: 'q')
  # element.send_keys 'Pizza'
  # element.submit
  # title = driver.title
  # driver.quit
  { statusCode: 200, body: JSON.generate("Hola mundo") }
end

def setup_driver
    options = Selenium::WebDriver::Chrome::Options.new(binary: 'bin/headless-chromium')
    options.add_argument('--headless')
    options.add_argument('--disable-gpu')
    options.add_argument('--window-size=1280x1696')
    options.add_argument('--disable-application-cache')
    options.add_argument('--disable-infobars')
    options.add_argument('--no-sandbox')
    options.add_argument('--hide-scrollbars')
    options.add_argument('--enable-logging')
    options.add_argument('--log-level=0')
    options.add_argument('--single-process')
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--homedir=/tmp')
    service = Selenium::WebDriver::Service.chrome(path: 'bin/chromedriver')
    @driver = Selenium::WebDriver.for :chrome, service: service, options: options
    # @driver.manage.timeouts.implicit_wait = 30
end
Enter fullscreen mode Exit fullscreen mode
START RequestId: c6fca6cd-e2e6-4782-8ca6-e10197058471 Version: $LATEST
Error raised from handler method{
  "errorMessage": "unable to connect to chromedriver 127.0.0.1:9515",
  "errorType": "Function<Selenium::WebDriver::Error::WebDriverError>",
  "stackTrace": [
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/service.rb:200:in `connect_until_stable'",
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/service.rb:111:in `block in start'",
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/socket_lock.rb:41:in `locked'",
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/service.rb:108:in `start'",
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/driver.rb:303:in `service_url'",
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/chrome/driver.rb:40:in `initialize'",
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/driver.rb:46:in `new'",
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/driver.rb:46:in `for'",
    "/var/task/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver.rb:88:in `for'",
    "/var/task/prueba.rb:30:in `setup_driver'",
    "/var/task/prueba.rb:5:in `lambda_handler'"
  ]
}END RequestId: c6fca6cd-e2e6-4782-8ca6-e10197058471
REPORT RequestId: c6fca6cd-e2e6-4782-8ca6-e10197058471  Duration: 20236.86 ms   Billed Duration: 20300 ms   Memory Size: 512 MB Max Memory Used: 62 MB  Init Duration: 303.13 ms    
Enter fullscreen mode Exit fullscreen mode
source 'https://rubygems.org'
gem 'selenium-webdriver'
Enter fullscreen mode Exit fullscreen mode

Chrome81.0.4044.138
stable-headless-chromium-amazonlinux-2017-03.zip

Lambda AWS

Please help me.

Collapse
 
mknycha profile image
Marcin K.

Hi,

Sorry for the late reply.
Your code and Gemfile are ok.
It looks like you're running it on Ruby 2.7 in lambda and it's not compatible with this chromedriver version.
Unfortunately, chromedriver must be compatible with your serverless chrome and ruby version, it's not easy to find a match.
The easiest solution, for now, would be to downgrade to ruby 2.5 in lambda - just create a new lambda function with this version.

Collapse
 
chaebyunghoon profile image
Hoonki

Hello. from now on, aws plan to deprecate ruby 2.5.
So we have to migrate version of ruby 2.5 to 2.7.
How can i find compatible chromedriver version with ruby 2.7? Can you send me an reference? Thank you.

Collapse
 
activklaus profile image
activklaus

Hi Marcin,
AWS will stop supporting Ruby 2.5 in a few weeks. Do you have any update on chromedriver compatible with Ruby 2.7?

Your article was the most helpful source for creating a scraper with selenium and ruby for AWS lambda (great work btw!).

So I was hoping you have some news about how to build the scraper with Ruby 2.7
Thanks

Collapse
 
kronos35 profile image
Kronos35

I am working on that as well, if you find a way to do that send me a message, I'll share the info I gather with you as well.

Collapse
 
chaebyunghoon profile image
Hoonki

Hello, Kronos, Did you find the solution? I am working on that, but I don't have any solutions so far. If you find the solution? Could you tell me about that? Thank you.

Thread Thread
 
kronos35 profile image
Kronos35

As a matter of fact I did I uploaded a short answer to a question in Stack Overflow
I provided some guidance you can check my solution here:

stackoverflow.com/questions/678419...

Collapse
 
kronos35 profile image
Kronos35 • Edited

Now that ruby 2.5 is being deprecated by the end of July it'd be useful to update this tutorial to include a compatible chromedriver binary.

Otherwise this tutorial, and all projects inspired by it would be rendered useless.

Collapse
 
ashae81903870 profile image
Asha E • Edited

When I run this command,
docker run --rm -v "$PWD":/var/task --mount type=tmpfs,target=/dev/shm,readonly=true lambci/lambda:ruby2.5 lambda_function.lambda_handler

Init error when loading handler lambda_function.lambda_handler

"errorMessage": "Could not find childprocess-3.0.0 in any of the sources",
"errorType": "InitBundler::GemNotFound",

Used the same code and gem versions as yours

Collapse
 
ashae81903870 profile image
Asha E

I had to create ruby layers
stackoverflow.com/questions/536342...

Thanks for this post

Collapse
 
abuzzany profile image
Angel Buzany

Hi Marcin, I have a question

In the step "Install chromedriver and serverless chrome" where should I run the commands?

Collapse
 
kronos35 profile image
Kronos35

In your bash console, I assume this was developed using linux, so to open your linux console type ctrl+alt+t. there you chould use the cd command to change the directory you're working on and download the drivers directly there.