How to automatically monitor new ML research papers on Arxiv by keyword

Siddhant Sharma — Thu, 25 Jun 2026 16:20:12 +0000

Staying on Top of ML Research

With ~10,000 new papers on Arxiv every month, staying current in your specific niche is nearly impossible through manual browsing.

The Automation

I built an Arxiv scraper on Apify that:

Keyword search: Define the topics you care about (e.g., "diffusion models", "LLM alignment", "RLHF")
Scheduled runs: Set it to check daily or hourly
Structured output: Returns paper title, authors, abstract, arXiv URL, PDF link, and categories
Easy integration: JSON output works with any webhook, Slack bot, or Notion database

Example use: Slack Bot

import requests

# Run the scraper
result = requests.post(
    "https://api.apify.com/v2/acts/technicaldost~arxiv-paper-scraper/run-sync",
    json={"keywords": ["diffusion models"], "maxResults": 10}
)

# Post to Slack
for paper in result.json():
    requests.post("YOUR_SLACK_WEBHOOK", json={
        "text": f"*New paper*: {paper[title]}\n{paper[url]}"
    })

Why This Matters

Researchers and engineers waste hours browsing Arxiv. An automated pipeline means:

Zero missed papers in your niche
Daily digest delivered to your preferred platform
Easy collaboration with teams (shared paper feeds)

Try it on the Apify Store — free tier available.

How to automatically detect any company's tech stack and logo from just their domain name

Siddhant Sharma — Wed, 24 Jun 2026 16:04:21 +0000

The Problem

When you're doing sales prospecting, competitor research, or lead generation, one of the most tedious tasks is manually visiting each company's website to figure out:

What tech stack do they use?
What's their company logo for your CRM?
Where are they on social media?
How do I contact them?

The Solution

I built a simple API that takes a domain name and returns all this data automatically. Let me walk you through how it works.

How It Works

Input: A company domain name (e.g., example.com)
Process: The scraper visits the website, analyzes its HTML, headers, and scripts
Output: Tech stack detection, company logo URL, social profiles, contact info, and industry classification

Tech Stack Detection

The API uses pattern matching against 50+ known indicators:

Framework-specific meta tags and script patterns
CDN and hosting headers
Analytics and tracking scripts
CMS signatures

Logo Extraction

Multiple fallback strategies:

Open Graph image tags
Apple touch icons
JSON-LD structured data
Clearbit API fallback

API Usage

import requests

response = requests.post(
    "https://api.apify.com/v2/acts/technicaldost~company-intelligence-api/run-sync",
    json={"domains": ["example.com"]}
)
data = response.json()
print(data["techStack"], data["logo"])

Why I Built This

As someone who builds web scrapers and automation tools, I found myself repeatedly writing the same domain-analysis code for different projects. This API consolidates all that into one endpoint.

Check it out on the Apify Store — free tier available with 1000 results/month.

DEV Community: Siddhant Sharma