Training a model to predict a persuasion score for documents, but hitting a wall

#machinelearning #development #buildinpublic

I’m building DocBeacon: secure document sharing and tracking platform. It shows exactly how readers interact with each page: scroll depth, dwell time, replays, and even heatmaps of attention.

Recently I’ve been thinking about going a step further. Instead of just showing behavior metrics, what if DocBeacon could estimate how emotionally or cognitively engaged a reader is with a document? Something like a persuasion score that reflects how much a sales proposal actually moved them, did they seem convinced, neutral, or totally uninterested?

The basic idea sounds simple: use past reading behavior data and train a model to predict the likelihood of acceptance. But here’s the problem: I have plenty of behavioral data (how people read), yet no solid labels on what happened after they read. Without knowing whether the reader ended up accepting the proposal, replying, or ghosting, the model can’t really learn meaningful correlations.

Has anyone here tackled a similar cold-start problem?
Would it make sense to infer pseudo labels using proxy signals such as follow-up activity or revisit patterns? Or maybe combine user feedback loops to build a semi-supervised system?

Curious to hear how others would approach building a persuasion predictor when you only have half the story.

Top comments (1)

Howard Shaw • Nov 24

What I find tricky here is defining ground truth for persuasion. Even if we could collect post-reading actions like replies or deal closures, they’re still noisy indicators of actual conviction. Maybe the better approach is to start with relative comparisons. for example, which proposal version performed better given similar readers and use ranking models instead of pure classification.
Curious if anyone has tried that path.