DEV Community

Cover image for Create a feed reader bot with PHP (2)
Muhammad MP
Muhammad MP

Posted on • Edited on

Create a feed reader bot with PHP (2)

In the previous part, we thought about the main functionality of our bot and implemented three commands to see a list of feeds, add a new feed and remove an added feed.

In this part, we create another file that reads the feeds. Consider that we have created bot.php to be called by Telegram server, but there is another part that should be called by a job scheduler (take cron as an example) every five minutes (or any time interval that you prefer). I use cron-job.org for doing that; it is free and easy to use.

Creating cron.php

Create a file named cron.php in the project directory, import autoload.php and our configuration files and create an instance of TeleBot:



<?php

use TeleBot\TeleBot;

require_once __DIR__ . '/vendor/autoload.php';

$config = require_once __DIR__ . '/config/bot.php';
$feeds = json_decode(file_get_contents(__DIR__ . '/config/feeds.json'))->feeds;

$tg = new TeleBot($config['bot_token']);


Enter fullscreen mode Exit fullscreen mode

Reading the feeds

Next, we add the following code:



$links = "<b>🆕 New posts:</b>\n\n";
$linkCount = 0;
foreach ($feeds as $feed) {    
    $context = stream_context_create([
        "http" => [
            "method" => "GET",
            "header" => "User-Agent: FeedReaderBot",
        ],
    ]);

    $feedContent = new SimpleXMLElement(file_get_contents($feed->url, false, $context));

    foreach ($feedContent->channel->item as $item) {
        $links .= "<a href=\"{$item->link}\">▪️ {$item->title}</a>\n";
        $linkCount++;
    }
}

if ($linkCount === 0) {
    die();
}

$tg->sendMessage([
    'chat_id' => $config['owner_user_id'],
    'text' => $links,
    'parse_mode' => 'html',
    'disable_web_page_preview' => true,
]);


Enter fullscreen mode Exit fullscreen mode
  1. Dev.to prevents us from loading the feeds; we add User-Agent header and we do not look like the bots anymore. 😎
  2. We parse the XML content using the SimpleXMLElement class; this is an internal feature in PHP, as we parse JSON with ease and we do not need to install any packages.
  3. We create the list in a foreach loop.
  4. After the first loop that iterates all of the feed objects, we send a message; using HTML format (because we have had <a> tag) and disabling the embedded links.

Now, enter the address of cron.php in your web browser; you should see an elegant list:
Our elegant list

Oh, wait! If I refresh the page, it sends the same list! It is unacceptable, because we only want the new posts, not the latest posts!

Only new posts!

Yes, you are completely right! For avoiding this problem, we must save the last link in our feed object:



{
    "url": "feed-url-here",
    "reader": "dev.to",
+    "last_item_url": "some-url-here"
}


Enter fullscreen mode Exit fullscreen mode

Add the new field in bot.php:



$feeds[] = [
    'url' => $url,
    'reader' => 'dev.to',
    'last_item_url' => '',
];


Enter fullscreen mode Exit fullscreen mode

We also need some changes in cron.php:



$links = "<b>🆕 New posts:</b>\n\n";
$linkCount = 0;
foreach ($feeds as $feed) {
    $context = stream_context_create([
        "http" => [
            "method" => "GET",
            "header" => "User-Agent: FeedReaderBot",
        ],
    ]);

    $feedContent = new SimpleXMLElement(file_get_contents($feed->url, false, $context));

    $latestPostLink = (string) $feedContent->channel->item[0]->link;

    if ($latestPostLink === $feed->last_item_url) {
        break;
    }

    foreach ($feedContent->channel->item as $item) {
        if ((string) $item->link === $feed->last_item_url) {
            break;
        }

        $links .= "<a href=\"{$item->link}\">▪️ {$item->title}</a>\n";
        $linkCount++;
    }

    $feed->last_item_url = $latestPostLink;
}

if ($linkCount === 0) {
    die();
}

file_put_contents(__DIR__ . '/config/feeds.json', json_encode(['feeds' => $feeds]));

$tg->sendMessage([
    'chat_id' => $config['owner_user_id'],
    'text' => $links,
    'parse_mode' => 'html',
    'disable_web_page_preview' => true,
]);



Enter fullscreen mode Exit fullscreen mode

If the last post link is the same as the one already stored in feeds.json, we will skip the current feed object and check for the next. Otherwise, we add to the end of the list until we reach the last post link that is saved. Finally, we assign $latestPostLink to the last_item_url property of the feed object and save the $feeds array (that involves the updated objects) in the feeds.json outside the loop.

Add a cron-job

We do not want to enter the URL every five minutes, because if we had that much time, we would not need RSS/Atom/JSON feeds! So you should create a cron-job to send a request to our script every five minutes.
Creating a cron-job in cron-job.org

Hooray! Wait for the updates...
Thank you for following the tutorial and I hope you enjoyed it.


We used TeleBot to build this bot; you can support me by staring the repository:
https://github.com/muhammadmp97/TeleBot

And here is what we have done by now:
https://github.com/muhammadmp97/FeedReaderBot/tree/d238ac8429db6777b24a51f8618d17c2547a7254

Top comments (2)

Collapse
 
alibayat profile image
Ali Bayat

Good Article, i just have minor notes that might come in handy when you want to scale up this app for production.

as i see, there are references to SimpleXMLElement class along with file_get_contents and file_put_contents functions... which is totally cool in any generic php application.
but as you may know those functions take some synchronous processing behind the scenes.

i would suggest using Guzzle async approach for the network calls.. (it might show some strange behaviors but it wont hold php-fpm)

asynchronous implementation of the file system can be a bit tricky, but since we have Fibers in new versions of PHP, thats not gonna be a problem. you can do stuff like:

function file_get_contents(string $pathname): string
{
    $result = '';
    $fp = fopen($pathname, 'rb');

    while (!feof($fp)) {
        $chunk = 1024;
        Fiber::getCurrent() && ($chunk = (Fiber::suspend() ?: 1024));
        $result .= fread($fp, $chunk);
    }

    return $result;
}
Enter fullscreen mode Exit fullscreen mode

and at the very end Cronjobs are what all php developers are used to, but nowadays we have different event loops in php..

keep up the good work.

Collapse
 
muhammadmp profile image
Muhammad MP

Thank you for reading and providing these good points.

I might improve the whole project after ending the tutorial and add some flexibility to have different type of feeds. We can have a feed reader for twitter and other social networks as well! 😍