DEV Community

Eakam
Eakam

Posted on • Edited on

Telescope - filtering feed URLs - progress

I have been working on filtering feed URLS for telescope (issue 3688). I started by looking at the main blog hosts mentioned in the issue - dev.to, medium.com, blogspot.com, and wordpress.com. For most of them, I was able to find blog URLs to test and find the feed URLs returned by the feed discovery service.

For wordpress.com, I had to create an account, and play around with the UI to find out how posts were created, and how you could visit the site. Basically, once you add a post and publish it, you can click Visit Site to get redirected to the site URL. I used this URL to get a list of the feed URLs for wordpress.com.

To find out the feed URLs that could be used for viewing posts, I simply viewed their contents in the browser. If this was unreadable, I downloaded the response into a file by navigating to the URL in a new tab in Firefox and used VS Code to open the file. Then, I used an XML formatter to make the contents of the file more readable and confirmed that the URL response had the posts for the blog.

Once I had collected a list of valid feed URLs for various hosts, I noticed that there were three patterns

  • https://.../feed/userName (dev.to and medium.com)
  • https://blogName.blogspot.com/feeds/posts/default
  • https://blogName.wordpress.com/feed

I also found that there was an option to set up a custom domain for these blog hosts. Initially, my plan was to set some sort of a whitelist to only allow valid feed URLs. However, with custom domains, this could cause false positive or false negatives. So, I decided to use a blacklist filtering method instead. There were only a couple of feed URLs returned such as the wordpress comments feed: https://blogName.wordpress.com/comments/feed.
I would simply add a function to filter out any feed URLs that matched the pattern for the URLs in the blacklist. For example, a feed URL which ends with /comments/feed should not be returned.

Thus, I added a function to filter the feed URLs before returning them. Next, I need to test the sign-up process with various blog hosts to confirm that the feed URLs are returned correctly, and posts can be pulled successfully. I would also need to write some tests for the new function.

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post →

Top comments (0)

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More