Intro
First of all, clarify that this is an alternative way to obtain a goal that can be achieved with a much more documented path on the web, which is to run a docker-compose where one of the services is an instance of Selenium standalone, then our application it would connect to it via localhost:4444. I also clarify that I am willing to improve, and any constructive criticism is more than accepted to improve the project or even my knowledge and please, if it is not respectful, do not comment. I really don’t know why I decided not to follow the more documented path. There is plenty of help on the internet and when I tried it it worked. But I was not convinced that my app depends on the Selenium standalone service, mainly because it was slow. So I decided to look for an alternative.
Story
For the job, I needed to build an RPA that enters a home banking, navigates to the bank statement section, downloads the records for the next 5 business days, processes them (the fun, hard part that has nothing to do with this post) and load them into our ERP where they will later be used to make reports and administrative issues. The idea is simple, the problem is that home banking does not have an API (disappointing) and it was a bit difficult to scrape as it was built with little enthusiasm, so I decided to use Selenium to be able to download the information in CSV format and start treat her. I clarify at this point that my project is written in TypeScript and runs in a Docker container in an Ubuntu server 22.04 image. It can also be easily adapted to any technology.
Selenium standalone
As I said in the introduction, the easiest option was a container running Selenium standalone and connecting via network. This felt slow, forced, didn’t feel like it really worked like it did on my machine (which is what I’m looking for when dockerizing something) and I just wasn’t convinced. I admit that it is a good tool and probably you can use it better than me. There is also a way to create a container where a network of Selenium instances is found, I did not see the need to use it, as they say in Argentina, it seemed like “smoke” to me.
Process
Already decided to run Selenium on the server as if it were on my machine, I started to do some tests that I recommend to get an idea of what is going to happen. First we need to block our dependencies, in this case we are interested in doing it with Chromedriver, Chrome (yes, we take it as a dependency here) and Selenium-webdriver (it is not that important). In my case, I’m using Chrome version 105.x.x (we’ll lock it in the DockerFile), so my Chromedriver version will be locked at 105.0.0. This is how my “package.json” looks like since I developed it in TypeScript:
Then I made sure to look for the highest performance in my headless webdriver (clearly because of the server). For this use the following arguments:
With my webdriver up and running, I needed to make sure it would work within an instance of Ubuntu Server (or your preferred server OS). When doing this you will notice that if you chose a Linux without a graphical interface, it will most likely not have Chrome, also we need a specific version of Chrome, otherwise our webdriver will break. I discovered that we can download it in the following way (in Ubuntu):
Already with our Chrome, we do tests and surprisingly it works, in the end it was easy! Now we have to put this in a container, so we create the Dockerfile of our project something similar to mine (always keeping in mind that I decided to develop it in TypeScript and use Ubuntu server 22.04):
Finally we modify our “package.json” by adding the following scripts changing the name of the project:
Now yes, we go to our terminal, we run what we have to run (In my case, npm run docker:run), and WOW!
Conclusions
I do not know if it is the best way, the shortest, the most optimal or the most practical. What I do know is that it worked for me and there is no documentation of this. I hope that some traveler will serve and can solve what he has to solve. Thanks so much for reading.
Top comments (0)