Olamide Olaniyan

Posted on May 6

What I Learned Building a Diff Engine for Social Profiles, Posts, and Follower Counts

#ai #webdev #programming #tutorial

The first version of my social monitoring system was technically correct and practically useless.

Every poll produced changes.

Follower count up by 3.

Bio spacing changed.

A post metric moved slightly.

A timestamp field came back in a different format.

The system was full of "change," and almost none of it mattered.

That was when I realized a social monitoring product does not really need polling logic first.

It needs a good diff engine.

Because if you cannot decide what changed meaningfully, the rest of the stack becomes noise generation.

So this is the diff model I use now for profiles, posts, and follower counts: what I compare, what I ignore, how I implement it in JavaScript and Python, and where a public social data layer like SociaVault makes the whole workflow much easier.

What a Good Diff Engine Actually Does

I think of a diff engine as having three jobs:

normalize the old and new state into comparable shapes
detect meaningful differences
classify those differences by importance

That third part is the one most implementations skip.

A change is not automatically an event.

That distinction is the difference between useful monitoring and alert fatigue.

The Three Objects I Compare Most Often

In social systems, I usually diff three categories.

1. Profiles

follower count
following count
bio
display name
profile image URL
post count or video count

2. Posts

caption or body text
likes
comments
views
pinned status if available
availability or deletion state

3. Collections

newest post IDs
newest comment IDs
active ad signatures
landing page URL sets

The rules are different for each one.

That matters.

My Threshold Rule

This one changed the quality of my alerts immediately:

numeric fields need thresholds. text fields need normalization. collection fields need set comparison.

If you diff all three types the same way, you get garbage.

For example:

follower changes need percentage or absolute thresholds
bios should be trimmed and whitespace-normalized first
post IDs should be compared as sets
metric deltas should often be classified, not just detected

That is the whole trick.

JavaScript Version: Diffing Social State Without Lying

This is the kind of diff function I like in Node services.

function normalizeText(value) {
  return (value || '').replace(/\s+/g, ' ').trim();
}

function diffNumber(previous, current, options = {}) {
  const { minAbsolute = 0, minPercent = 0 } = options;
  const prev = Number(previous || 0);
  const curr = Number(current || 0);
  const delta = curr - prev;
  const percent = prev > 0 ? Math.abs(delta / prev) * 100 : 0;

  const changed = Math.abs(delta) >= minAbsolute || percent >= minPercent;

  return {
    changed,
    previous: prev,
    current: curr,
    delta,
    percent: Number(percent.toFixed(2)),
  };
}

function diffText(previous, current) {
  const prev = normalizeText(previous);
  const curr = normalizeText(current);

  return {
    changed: prev !== curr,
    previous: prev,
    current: curr,
  };
}

function diffSet(previous = [], current = []) {
  const prev = new Set(previous);
  const curr = new Set(current);

  const added = [...curr].filter(item => !prev.has(item));
  const removed = [...prev].filter(item => !curr.has(item));

  return {
    changed: added.length > 0 || removed.length > 0,
    added,
    removed,
  };
}

function diffProfile(previous, current) {
  const followers = diffNumber(previous.followers, current.followers, {
    minAbsolute: 100,
    minPercent: 5,
  });

  const posts = diffNumber(previous.posts, current.posts, {
    minAbsolute: 1,
  });

  const bio = diffText(previous.bio, current.bio);
  const displayName = diffText(previous.displayName, current.displayName);

  const changes = [];

  if (followers.changed) {
    changes.push({ type: 'follower_change', ...followers, severity: 'medium' });
  }

  if (posts.changed && posts.delta > 0) {
    changes.push({ type: 'new_post_count', ...posts, severity: 'high' });
  }

  if (bio.changed) {
    changes.push({ type: 'bio_updated', ...bio, severity: 'low' });
  }

  if (displayName.changed) {
    changes.push({ type: 'display_name_updated', ...displayName, severity: 'low' });
  }

  return changes;
}

const previousProfile = {
  followers: 10240,
  posts: 84,
  bio: 'Helping creators grow faster',
  displayName: 'Creator Ops',
};

const currentProfile = {
  followers: 10980,
  posts: 85,
  bio: 'Helping creators grow faster with better workflows',
  displayName: 'Creator Ops',
};

console.log(diffProfile(previousProfile, currentProfile));

That pattern scales well because each diff type stays simple.

Then you can route the resulting changes into alerts, digests, dashboards, or logs.

Python Version: Same Diff Model, Easy to Batch

In Python I use almost the same logic, just in a more batch-friendly format.

def normalize_text(value):
    return ' '.join((value or '').split())


def diff_number(previous, current, min_absolute=0, min_percent=0):
    prev = float(previous or 0)
    curr = float(current or 0)
    delta = curr - prev
    percent = abs(delta / prev) * 100 if prev > 0 else 0

    changed = abs(delta) >= min_absolute or percent >= min_percent

    return {
        'changed': changed,
        'previous': prev,
        'current': curr,
        'delta': delta,
        'percent': round(percent, 2),
    }


def diff_text(previous, current):
    prev = normalize_text(previous)
    curr = normalize_text(current)
    return {
        'changed': prev != curr,
        'previous': prev,
        'current': curr,
    }


def diff_set(previous=None, current=None):
    prev = set(previous or [])
    curr = set(current or [])

    added = sorted(curr - prev)
    removed = sorted(prev - curr)

    return {
        'changed': bool(added or removed),
        'added': added,
        'removed': removed,
    }


def diff_profile(previous, current):
    followers = diff_number(previous.get('followers'), current.get('followers'), min_absolute=100, min_percent=5)
    posts = diff_number(previous.get('posts'), current.get('posts'), min_absolute=1)
    bio = diff_text(previous.get('bio'), current.get('bio'))
    display_name = diff_text(previous.get('displayName'), current.get('displayName'))

    changes = []

    if followers['changed']:
        changes.append({'type': 'follower_change', 'severity': 'medium', **followers})

    if posts['changed'] and posts['delta'] > 0:
        changes.append({'type': 'new_post_count', 'severity': 'high', **posts})

    if bio['changed']:
        changes.append({'type': 'bio_updated', 'severity': 'low', **bio})

    if display_name['changed']:
        changes.append({'type': 'display_name_updated', 'severity': 'low', **display_name})

    return changes


previous_profile = {
    'followers': 10240,
    'posts': 84,
    'bio': 'Helping creators grow faster',
    'displayName': 'Creator Ops',
}

current_profile = {
    'followers': 10980,
    'posts': 85,
    'bio': 'Helping creators grow faster with better workflows',
    'displayName': 'Creator Ops',
}

print(diff_profile(previous_profile, current_profile))

The Most Important Thing: Severity Classification

This is where the diff engine becomes a monitoring system instead of just a comparison utility.

I try to classify changes like this:

low: bio text, display name formatting, minor stat movement
medium: meaningful follower movement, comment spikes, important profile metadata changes
high: new post detected, post removed, active campaign shift, landing page change

Once you do that, it becomes much easier to decide:

what goes to Slack immediately
what gets saved for a daily digest
what only belongs in the audit log

Without severity, every change competes for attention equally. That is how monitoring systems become unreadable.

Honest Alternatives

There are a few ways to approach this.

Raw object diff libraries

Great for debugging.

Usually too noisy for social monitoring.

Event sourcing everything

Powerful if you have a larger architecture and a reason to keep full history.

Overkill for many small tools.

Handwritten per-entity diff logic

This is still my favorite for social systems.

It is boring, explicit, and much easier to reason about than magic diff output.

Where SociaVault Fits

This is one of those layers where I want the upstream data source to be the boring part.

I use SociaVault for the public social data collection layer, then keep my own diff logic in application code.

That lets me work on the part that actually creates value: deciding what changed and why it matters.

That is the product problem. Collection is just a dependency.

Final Take

The hardest part of social monitoring is not polling.

It is meaning.

Your diff engine decides whether your product becomes a useful signal layer or a machine for generating trivia.

Normalize first. Use thresholds for numbers. Use text normalization for strings. Use set logic for collections. Add severity.

That combination took my monitoring systems from noisy to actually usable.

And if you want to spend more of your time on that diff logic instead of on collection plumbing, SociaVault is a good place to start.

webdev #monitoring #javascript #python #backend

DEV Community