Eleftheria Batsou

Posted on Nov 20

Keeping Your GitHub Profile in Sync with Your Articles: Automating dev.to, Cosine.sh and More Platforms

#ai #automations #cosine #github

Most of us write across multiple platforms: personal blogs, dev.to, company blogs, and more. But our GitHub profile — the place other developers often visit first— rarely reflects all of that activity.

In this article, we’ll build a simple system that:

📚 Pulls your latest articles from:

dev.to
cosine.sh/blog (only the posts you’ve authored) / you can add your personal blog, or company's blog, or whatever else you're using

📃Merges them into a single list
📅Shows the latest 6 on your GitHub profile README
♻️Automatically updates once per day via GitHub Actions

p.s. If you don't want to read, here's a video with the whole process:

What we’re building

End goal:

Your GitHub profile README (e.g. https://github.com/YOUR_USERNAME) will have a section like:

## Recent Articles

<!-- recent-blog-posts start -->
<!-- recent-blog-posts end -->

A GitHub Action runs daily, fetches your latest posts from dev.to and cosine.sh/blog, and replaces everything between those markers with a grid of:

Cover image
Article title (linked)
Source (dev.to or Cosine or in your case -> whatever you want)
Date It always shows the 6 most recent posts across both platforms, sorted by date.

High-level architecture:

GitHub Action on a schedule (cron + workflow_dispatch).
A Python script that:

• Calls the dev.to API for your username.

• Scrapes cosine.sh/blog and filters for your posts only.

• Merges and sorts the posts.

• Renders a small HTML grid.

• Replaces the marker section in README.md.

3.Action commits the updated README back to your profile repo.

Step 1: Add markers to your GitHub profile README

First, in your profile repository (same name as your username, e.g. EleftheriaBatsou/EleftheriaBatsou), edit README.md and add a section like this:

## Recent Blog Posts

<!-- recent-blog-posts start -->
<!-- recent-blog-posts end -->

Those comments are the “anchors” the script will use to know where to inject the generated content. Everything between them will be replaced automatically.

Step 2: Create the GitHub Action workflow

Create a file:

.github/workflows/update_blog.yml

with this content:

name: Update Recent Blog Posts

on:
  schedule:
    - cron: "0 3 * * *"  # daily at 03:00 UTC
  workflow_dispatch:      # allow manual runs from the Actions tab

jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install requests beautifulsoup4 python-dateutil

      - name: Update README with latest posts
        env:
          # Optional: enable verbose logs during development
          RECENT_BLOG_VERBOSE: "false"
          # Optional: seed known Cosine URLs if you want to be explicit
          # COSINE_AUTHOR_URLS: "https://cosine.sh/blog/cosine-vs-codex-vs-windsurf,https://cosine.sh/blog/projects-you-can-build-with-cosine"
        run: |
          python scripts/update_blog.py

      - name: Commit changes
        run: |
          if [[ -n "$(git status --porcelain)" ]]; then
            git config user.name "github-actions[bot]"
            git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
            git add README.md
            git commit -m "chore: update recent blog posts"
            git push
          else
            echo "No changes to commit."
          fi

Notes:

workflow_dispatch lets you trigger the workflow manually from GitHub’s UI.
cron runs it once a day.
We install only a few, standard Python libraries: requests, beautifulsoup4, python-dateutil.

Step 3: The Python script that does the work

Create:

scripts/update_blog.py

with the following:

import os
import re
import sys
import json
import time
import unicodedata
import requests
from bs4 import BeautifulSoup
from datetime import datetime, timezone
from dateutil import parser as dateparser
import xml.etree.ElementTree as ET

README_PATH = "README.md"
START_MARK = "<!-- recent-blog-posts start -->"
END_MARK = "<!-- recent-blog-posts end -->"

DEVTO_USERNAME = "eleftheriabatsou"
COSINE_BLOG_INDEX = "https://cosine.sh/blog"
COSINE_SITEMAP = "https://cosine.sh/sitemap.xml"
AUTHOR_NAME = "Eleftheria Batsou"
AUTHOR_ROLE = "Developer Advocate"
AUTHOR_TWITTER_HANDLE = "BatsouElef"  # used for x.com/twitter.com handle checks
MAX_POSTS = 6
TIMEOUT = 20

VERBOSE = os.getenv("RECENT_BLOG_VERBOSE", "").lower() in {"1", "true", "yes"}

# Optional manual fallback: extra Cosine post URLs (comma-separated env var)
COSINE_AUTHOR_URLS = [u.strip() for u in os.getenv("COSINE_AUTHOR_URLS", "").split(",") if u.strip()]

# Known Cosine posts (helps when structure changes)
DEFAULT_KNOWN_COSINE = {
    "https://cosine.sh/blog/cosine-vs-codex-vs-windsurf",
    "https://cosine.sh/blog/projects-you-can-build-with-cosine",
    "https://cosine.sh/blog/ai-coding-tools-comparison",
}

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (compatible; GitHubAction; +https://github.com/EleftheriaBatsou)",
    "Accept-Language": "en",
    "Referer": "https://cosine.sh/",
})


 
def log(msg):
    if VERBOSE:
        print(msg)


 
def normalize_text(s):
    if not s:
        return ""
    s = unicodedata.normalize("NFKC", s)
    s = s.replace("\u00A0", " ")
    s = re.sub(r"\s+", " ", s)
    return s.strip()


 
def normalize_date(dt):
    if not dt:
        return None
    if dt.tzinfo is None:
        return dt.replace(tzinfo=timezone.utc)
    return dt.astimezone(timezone.utc)


 
def safe_get(url, timeout=TIMEOUT):
    try:
        r = session.get(url, timeout=timeout)
        r.raise_for_status()
        return r
    except Exception as e:
        log(f"[WARN] GET failed: {url} -> {e}")
        return None


 
def fetch_devto_posts():
    url = f"https://dev.to/api/articles?username={DEVTO_USERNAME}"
    resp = safe_get(url)
    if not resp:
        return []
    try:
        data = resp.json()
    except Exception as e:
        log(f"[WARN] dev.to JSON parse failed: {e}")
        return []
    posts = []
    for item in data:
        published = item.get("published_at") or item.get("created_at")
        try:
            dt_raw = dateparser.parse(published)
        except Exception:
            dt_raw = None
        dt = normalize_date(dt_raw)
        posts.append({
            "source": "dev.to",
            "title": item.get("title") or "",
            "url": item.get("url") or "",
            "cover_image": item.get("cover_image") or item.get("social_image") or "",
            "date": dt,
            "date_str": dt.strftime("%Y-%m-%d") if dt else (published or ""),
        })
    log(f"[INFO] dev.to posts fetched: {len(posts)}")
    return posts


 
def get_cosine_links_from_index():
    resp = safe_get(COSINE_BLOG_INDEX)
    if not resp:
        return set()
    soup = BeautifulSoup(resp.text, "html.parser")
    links = set()
    for a in soup.select("a[href^='/blog/']"):
        href = a.get("href", "").strip()
        if not href:
            continue
        if href.rstrip("/").endswith("/blog"):
            continue
        if href.count("/") >= 2:
            full = f"https://cosine.sh{href}" if href.startswith("/") else href
            links.add(full)
    log(f"[INFO] Cosine index links found: {len(links)}")
    return links


 
def get_cosine_links_from_sitemap():
    resp = safe_get(COSINE_SITEMAP)
    if not resp:
        return set()
    links = set()
    try:
        root = ET.fromstring(resp.text)
        for loc in root.iter():
            if loc.tag.endswith("loc"):
                url = (loc.text or "").strip()
                if "/blog/" in url:
                    links.add(url)
        log(f"[INFO] Cosine sitemap blog links found: {len(links)}")
    except Exception as e:
        log(f"[WARN] sitemap parse error: {e}")
    return links


 
def detect_author(page):
    # 1) Meta name="author"
    meta_author = page.find("meta", attrs={"name": "author"})
    if meta_author and meta_author.get("content"):
        return normalize_text(meta_author["content"])

    # 2) Common meta properties
    for prop in ["article:author", "og:article:author"]:
        m = page.find("meta", attrs={"property": prop})
        if m and m.get("content"):
            return normalize_text(m["content"])

    # 3) rel=author links
    rel_author = page.select_one("a[rel='author']")
    if rel_author:
        txt = normalize_text(rel_author.get_text())
        if txt:
            return txt

    # 4) JSON-LD
    for ld in page.find_all("script", type="application/ld+json"):
        try:
            data = json.loads(ld.string or "")
        except Exception:
            continue
        if isinstance(data, dict):
            a = data.get("author")
            if isinstance(a, dict) and a.get("name"):
                return normalize_text(a["name"])
            if isinstance(a, list) and a:
                entry = a[0]
                if isinstance(entry, dict) and entry.get("name"):
                    return normalize_text(entry["name"])
        elif isinstance(data, list):
            for item in data:
                if isinstance(item, dict):
                    a = item.get("author")
                    if isinstance(a, dict) and a.get("name"):
                        return normalize_text(a["name"])
                    if isinstance(a, list) and a:
                        entry = a[0]
                        if isinstance(entry, dict) and entry.get("name"):
                            return normalize_text(entry["name"])

    # 5) Visible text fallback: name + role or twitter handle
    page_text = normalize_text(page.get_text(" "))
    name_match = re.search(r"Eleftheria\s+Batsou", page_text, flags=re.IGNORECASE)
    role_match = re.search(r"Developer\s+Advocate", page_text, flags=re.IGNORECASE)
    twitter_match = re.search(rf"(x\.com|twitter\.com)/{re.escape(AUTHOR_TWITTER_HANDLE)}", page_text, flags=re.IGNORECASE)

    if name_match and (role_match or twitter_match):
        return AUTHOR_NAME
    return None


 
def parse_post_page(url):
    r = safe_get(url)
    if not r:
        return None
    page = BeautifulSoup(r.text, "html.parser")

    author = detect_author(page)
    if not author or "eleftheria" not in author.lower() or "batsou" not in author.lower():
        log(f"[SKIP] Not authored by {AUTHOR_NAME}: {url} (detected: {author})")
        return None

    # Title
    title = None
    og_title = page.find("meta", property="og:title")
    if og_title and og_title.get("content"):
        title = normalize_text(og_title["content"])
    if not title and page.title and page.title.string:
        title = normalize_text(page.title.string)
    if not title:
        h1 = page.find("h1")
        if h1:
            title = normalize_text(h1.get_text())
    if not title:
        title = url

    # Date
    dt = None
    date_str = ""
    pub_meta = page.find("meta", property="article:published_time")
    if pub_meta and pub_meta.get("content"):
        try:
            dt = dateparser.parse(pub_meta["content"])
        except Exception:
            date_str = pub_meta["content"]

    if not dt:
        time_el = page.find("time")
        candidate = (
            (time_el.get("datetime") if time_el else None)
            or (normalize_text(time_el.get_text()) if time_el else None)
        )
        if candidate:
            try:
                dt = dateparser.parse(candidate)
            except Exception:
                date_str = candidate

    if not dt:
        page_text = normalize_text(page.get_text(" "))
        m = re.search(r"(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},\s+\d{4}", page_text)
        if m:
            candidate = m.group(0)
            try:
                dt = dateparser.parse(candidate)
            except Exception:
                date_str = candidate

    dt = normalize_date(dt)
    if dt and not date_str:
        date_str = dt.strftime("%Y-%m-%d")

    # Cover image
    cover = ""
    og_img = page.find("meta", property="og:image")
    if og_img and og_img.get("content"):
        cover = og_img["content"].strip()
    if not cover:
        twitter_img = page.find("meta", property="twitter:image")
        if twitter_img and twitter_img.get("content"):
            cover = twitter_img["content"].strip()
    if not cover:
        img = page.select_one("article img") or page.find("img")
        if img and img.get("src"):
            src = img["src"].strip()
            cover = src if src.startswith("http") else f"https://cosine.sh{src}" if src.startswith("/") else src

    return {
        "source": "Cosine",
        "title": title,
        "url": url,
        "cover_image": cover,
        "date": dt,
        "date_str": date_str or (dt.strftime("%Y-%m-%d") if dt else ""),
    }


 
def fetch_cosine_author_posts():
    index_links = get_cosine_links_from_index()
    sitemap_links = get_cosine_links_from_sitemap()
    manual_links = set(COSINE_AUTHOR_URLS)
    links = sorted(index_links.union(sitemap_links).union(DEFAULT_KNOWN_COSINE).union(manual_links))

    posts = []
    for url in links:
        time.sleep(0.25)  # be gentle
        p = parse_post_page(url)
        if p:
            posts.append(p)

    log(f"[INFO] Cosine posts authored by {AUTHOR_NAME}: {len(posts)}")
    return posts


 
def render_markdown_grid(posts):
    # HTML grid: 2 columns x 3 rows; images width-limited to 280px
    rows = []
    items = posts[:MAX_POSTS]
    if len(items) % 2 == 1:
        items.append({"title": "", "url": "", "cover_image": "", "source": "", "date_str": ""})

    def cell_html(p):
        if not p.get("title"):
            return "<td></td>"
        img_html = f'<img src="{p["cover_image"]}" alt="cover" style="width:280px; max-width:100%; border-radius:8px;" />' if p.get("cover_image") else ""
        title_html = f'<a href="{p["url"]}">{p["title"]}</a>'
        meta_html = f'{p.get("source","")} • {p.get("date_str","")}'
        return f"<td valign='top' style='padding:8px;'>{img_html}<div style='margin-top:6px; font-weight:600;'>{title_html}</div><div style='color:#666;'>{meta_html}</div></td>"

    for i in range(0, len(items), 2):
        left = cell_html(items[i])
        right = cell_html(items[i+1])
        rows.append(f"<tr>{left}{right}</tr>")

    html = []
    html.append("")
    html.append("### Recent Articles")
    html.append("")
    html.append("<table>")
    for r in rows[:3]:  # 3 rows max
        html.append(r)
    html.append("</table>")
    html.append("")
    html.append("_Auto-updated daily from dev.to and cosine.sh/blog_")
    html.append("")
    return "\n".join(html)


 
def update_readme_section(new_content):
    if not os.path.exists(README_PATH):
        print("README.md not found.", file=sys.stderr)
        sys.exit(1)

    with open(README_PATH, "r", encoding="utf-8") as f:
        readme = f.read()

    if START_MARK not in readme or END_MARK not in readme:
        print("Markers not found in README.md. Please add the markers to enable updates.", file=sys.stderr)
        sys.exit(1)

    pattern = re.compile(
        re.escape(START_MARK) + r"(.*?)" + re.escape(END_MARK),
        re.DOTALL
    )
    updated = pattern.sub(
        START_MARK + "\n" + new_content + "\n" + END_MARK,
        readme
    )

    if updated != readme:
        with open(README_PATH, "w", encoding="utf-8") as f:
            f.write(updated)
        print("README.md updated.")
    else:
        print("README.md already up to date.")


 
def main():
    devto = fetch_devto_posts()
    cosine = fetch_cosine_author_posts()
    all_posts = devto + cosine

    def sort_key(p):
        return p["date"] or datetime.min.replace(tzinfo=timezone.utc)

    all_posts.sort(key=sort_key, reverse=True)
    latest = all_posts[:MAX_POSTS]

    md = render_markdown_grid(latest)
    update_readme_section(md)


 
if __name__ == "__main__":
    main()

Debugging: when sites don’t behave like APIs

One interesting part of this build is cosine.sh/blog itself. Unlike dev.to, it doesn’t expose a dedicated public JSON API for blog posts, so we:

Use the blog index for links: https://cosine.sh/blog
Use the sitemap for a more complete list: https://cosine.sh/sitemap.xml
Crawl each page and detect: The author, the date, the cover image

The author detection has to be robust, because templates and metadata can vary. We look for:

meta name="author" or meta property="article:author".
JSON-LD author.name.
A visible byline that looks like: “Eleftheria Batsou”, “Developer Advocate”, and/or your X/Twitter handle.

This is the kind of thing Cosine is very good at automating. You could easily imagine giving Cosine a task like:

“Make sure my GitHub profile README always shows my last 6 articles from dev.to and cosine.sh/blog. Parse the author name and role correctly, and don’t break if Cosine’s blog markup changes slightly.”

and letting it iterate until the pipelines and selectors are robust.

Why this is useful

Some practical benefits:

Your GitHub profile stays up to date without you thinking about it.
Recruiters or collaborators see your latest thinking, not just your repos.
You can write wherever it makes sense (personal blog, dev.to, company blog like Cosine) and still have a single “portfolio view” on GitHub.
The setup is simple: one workflow file, one Python script.

This also fits nicely with the way Cosine approaches development work:

Small, automatable tasks
Clear, visible diffs (your README changes are just regular commits)
Easy to extend over time

Ideas for future features

Once this is working, there are several directions you can take it:

1.Include tags or topics:
Parse tags from dev.to and Cosine and show them under each article title:

“React, TypeScript”
“AI tooling, Developer Experience”

2.Filter by category:
For example, only show “Insights” posts from Cosine:

Filter by URL pattern like /blog/… + tag
Or parse category labels from the page.

3.Add fallback text mode:
Some people prefer plain Markdown instead of HTML tables. You could add a configuration flag that switches between:

Grid layout (HTML table, smaller covers)
Simple list layout (Markdown):

- [Title](url) — Source • 2025-11-17

4.Cache responses
To be friendly to dev.to and Cosine, you could store a simple cache file in the repo (or in Actions artifacts). This isn’t strictly necessary for a daily cron, but becomes helpful if you start running the workflow more frequently.

5.Integrate Cosine directly
Right now, we’re using raw Python + GitHub Actions. You could:
Use Cosine to:

Generate and maintain this script over time.
Automatically adjust selectors when the Cosine blog layout changes.
Add tests to validate that your Cosine posts are still being detected correctly.

This is similar to another workflow where we synced YouTube videos into a GitHub profile. These are exactly the kinds of “small but annoying” tasks that Cosine can take off your plate while keeping everything transparent and reviewable.

Conclusion

We’ve built a small but powerful system:

→ A scheduled GitHub Action
→ A Python script that:

Fetches your dev.to posts via their API
Scrapes cosine.sh/blog and filters posts authored by you
Merges, sorts, and renders the latest 6 into a 2×3 grid
Updates your GitHub profile README automatically

It’s not a big framework or a complicated service—just a focused tool that keeps your profile in sync with your writing across platforms.

This is the sweet spot where tools like Cosine shine: automating the repetitive glue work around your developer presence, while still giving you full control and visibility.

If you’d like to go further, you could:

Extend this to YouTube, personal blogs, or newsletters.
Have Cosine manage the workflow, selectors, and tests for you.
Turn this into a reusable template for your team’s GitHub profiles.

Either way, you now have a pattern: small automations that keep your developer identity consistent across platforms—with your GitHub profile as the source of truth.

Happy Coding ✌️

DEV Community