DEV Community: Olavo

Old PHP 5, cURL, and TLS 1.2 = SSL connect error

Olavo — Thu, 18 May 2023 20:01:41 +0000

You have good and old PHP projects that never crack, work well and you don't have anything to worry about it. But you are using cURL to API connections, and suddenly receive the error:

SSL connect error

Immediately you think… it'll be easy to fix. Just force cURL to ignore SSL checks:

CURLOPT_SSL_VERIFYPEER    => false,
CURLOPT_SSL_VERIFYHOST    => false,

Now, just relax and test:

SSL connect error

At that moment you start to be afraid about how to fix it. When you got an idea. Force the TLS version and cross your fingers to work

$ch = curl_init('https://google.com/');

//Force requsts to use TLS 1.2
curl_setopt ($ch, CURLOPT_SSLVERSION, 6);

$result = curl_exec($ch);
curl_close($ch);

And again :

SSL connect error

Ok, maybe now is a good time to start crying.

The solution

Keeping the jokes aside, what happens is the cURL can't really force the TLS even if set to ignore SSL or to use TLS version X.
On the other hand, could be impossible to update the infrastructure from some systems, furthermore if they are isolated and designed a long time ago using old versions of Apache and PHP.
But for our luck, we have stream_context 👏👏👏
The stream_context creates and returns a stream with any options supplied.

stream_context_create(?array $options = null, ?array $params = null): resource

Same as cURL, using stream context you can set headers, methods, and pass parameters. Like this :

$opts = array(
  'http'=>array(
    'method'=>"GET",
    'header'=>"Accept-language: en\r\n" .
              "Cookie: foo=bar\r\n"
  )
);

$context = stream_context_create($opts);

/* Sends an http request to www.example.com 
   with additional headers shown above */
$fp = fopen('http://www.example.com', 'r', false, $context);
fpassthru($fp);
fclose($fp);

One reason why stream_context_create might work when cURL does not is thatstream_context_create may be more forgiving of server or network issues that can cause cURL to fail. For example, if a server certificate is invalid or expired, cURL may fail to connect, whereas stream_context_create may still establish a connection if the verify_peer option is set to false.
Wow, that was close!

Finally, we have a solution and I wanna share it with you. Maybe it can help you to sleep better :)

// Requiring TLS 1.2:
$ctx = stream_context_create([
    'ssl' => [
        'crypto_method' => STREAM_CRYPTO_METHOD_TLSv1_2_CLIENT
    ]
]);
$html = file_get_contents('https://google.com/', false, $ctx);

Hope it helps you!
Cheers, and let's stay connected on Linkedin !

Nodejs Asynchronous Multithreading Web Scraping

Olavo — Thu, 18 May 2023 19:57:01 +0000

Nodejs Asynchronous Multithreading Web Scraping
Reading online data multiple times faster ;)

What is Web Scraping?

Web scraping is the process of extracting data from websites. In today’s world, web scraping has become an essential technique for businesses and organizations to gather valuable data for their research and analysis. Node.js is a powerful platform that enables developers to perform web scraping in an efficient and scalable manner.

What is Multithreaded Web Scraping?

Multithreaded web scraping is a technique that involves dividing the web scraping task into multiple threads. Each thread performs a specific part of the scraping process, such as downloading web pages, parsing HTML, or saving data to a database. By using multiple threads, the scraping process can be performed in parallel, which can significantly improve the speed and efficiency of the scraping task.

Why use Multithreaded Web Scraping?

There are several reasons why multithreaded web scraping is beneficial. Firstly, it can significantly reduce the time required to scrape large amounts of data from multiple websites. Secondly, it can improve the performance of the scraping process by utilizing the resources of the machine more efficiently. Lastly, it can help avoid potential roadblocks like getting blocked by a website due to the overloading of requests from a single IP address.

How to implement Multithreaded Web Scraping in Node.js?

To implement multithreaded web scraping in Node.js, we can use a library called “cluster”. The cluster library enables the creation of child processes that can run in parallel and communicate with each other through a shared memory space. By creating multiple child processes, we can distribute the scraping task across all available cores of the CPU.

Running the code
In this code example, we use tabnews.com.br as a target. The objective is to generate the JSON files listing the article’s title and URL to each page.

Our code will :

1 — Start the master process and fork each cluster process based on CPUs available;

2 — Apply the Web Scraping engine to each cluster;

3 — Read the page, generate de screenshot, and breakdown content in the article list;

4 — Save a .json file with the article’s title and URL;

5 — Finish the process and restart another;

The Code !

Get all code on GitHub.

Let’s stay connected

Hope be useful and you enjoy it!

Connect me on Linkedin and follow me to see what comes next ;)

Cya ! :)

Creating a Google Chrome Extension using HTML + CSS + Javascript

Olavo — Thu, 18 May 2023 19:50:23 +0000

An extension can be useful for several projects and specific situations, so I decided to make it available to you so that you understand a little more about how an extension for Google Chrome can be useful and simple to implement. In addition, you who are probably a regular reader of TabNews, will have an easy way to follow the articles.

You can check this out on GitHub.

Feel free to send your PR and improve the implementation of this small project. Below I describe a little more how it is and some challenges to implementing it.

TabNews Reader

TabNews RSS reader with recent article listing function and Dark Mode option enabled according to user’s default selection.

Challenges
Logically, this is not an official TabNews project, so access to RSS is blocked via CORS. To get around reading the data, I used a free routing proxy that basically loads the data and returns it to the application.

Usage
Sign in on Google Chrome
More Tools;
Extensions;
Activate “Developer Mode” at the top right;
“Load without compression” top left button;
Ready. The TabNews Reader extension will be installed in your browser. Just access it along with the other extensions.

Google Chrome Store
It is possible to package and submit the extension to the Chrome Store, but at a one-time cost for extension developers. As this is not my case, I will not go up to the store (for now) ;)

For those who want to know more about uploading their extension to the Google Chrome Store: https://developer.chrome.com/docs/webstore/register/

Finishing
Well, that’s it. I hope this material is useful to you and your projects. Follow me on Linkedin and stay on top of many things to come ;)