DEV Community: Pragnesh Shah

The AI Question That Silences Every Leadership Meeting

Pragnesh Shah — Tue, 16 Jun 2026 07:40:00 +0000

The AI Question That Silences Every Leadership Meeting

One question will tell you whether your AI programme will deliver returns this year. It has nothing to do with which model you're running.

Every AI strategy meeting follows the same script. Energy is high. Decks are polished. Everyone agrees AI is transformative.

Then someone asks:

"What percentage of our AI investment is going into people versus platforms?"

The room goes quiet.

The Ratio worth examining

When less than 70% goes toward people — training, workflow redesign, change management, giving teams space to work differently — it raises a question worth sitting with: are we solving the right problem?

Most organisations I've seen succeed get this right. The rest get it backwards. They spend 80% on platforms, models, and infrastructure. Then they wonder why adoption is low, value is invisible, and the board keeps asking "where's the ROI?"

The technology was never the bottleneck. Readiness was.

You're spreading thin and calling it strategy

Here's another pattern worth naming: AI deployed thinly across every function. A chatbot here. A summarisation tool there. A coding assistant in engineering.

Each produces a modest gain. None produces transformation.

The freed-up capacity goes unmonetised because nobody designed what happens next with the time saved.

The teams I've seen ship actual AI value all did the same counterintuitive thing: they picked three domains or fewer, went deep, and refused to expand until the economics were proven. Everyone else is running twenty pilots and measuring nothing.

What does this look like in practice?

A financial services firm concentrating AI on credit decisioning, fraud detection, and client onboarding — three domains, measurable results in six months.
A manufacturer applying AI to demand forecasting, quality inspection, and supply chain optimisation — repeatable capability before expansion.

The principle: depth before breadth. Build the muscle in one area, prove it pays, then replicate.

If you had to pick only three areas where AI must deliver measurable revenue or cost reduction this year — which three would survive the cut?

Where the returns actually are

The productivity gains are real. But they follow a specific pattern.

For developers: Claude Code, Kiro, OpenAI Codex — these aren't incrementally faster. They're structurally different. What once required an eight-person squad now collapses to a product owner and a full-stack engineer working alongside AI. I've seen teams report 3-5x throughput on well-scoped tasks. Not on everything. On well-scoped tasks.

For decision-makers: Strategy teams synthesise market signals, model scenarios, and pressure-test assumptions in hours rather than weeks. The value comes not from replacing judgement but from compressing the time to exercise it.

For operations: Document processing, compliance checking, scheduling, reporting — redesigned end-to-end rather than patched with point solutions. Admin teams that spent 60% of their time on manual processes now spend that time on exception handling and relationship management.

The common thread? None of these gains came from choosing a better model. They came from redesigning how people work.

The Ceiling Isn't Technical

Put simply: the ceiling on your AI impact isn't the technology. It's your organisation's readiness to absorb and scale it.

The organisations capturing value aren't those with the fanciest tools. They're the ones with an execution engine — the combination of talent, operating model, and cultural readiness that allows them to repeatedly turn AI capability into outcomes that show up in the numbers.

Boards are no longer asking "Are you doing AI?" They're asking "Where is it appearing in the financials?"

If your answer is still a slide deck — the window is narrowing.

The question

Next time you're debating which AI platform to adopt or which model to fine-tune, try this instead:

"What percentage of our AI investment is going into people versus platforms?"

If it's below 70% on people — you've found your problem.

What's your ratio? I'd genuinely like to know.

#ai #discuss #career #programming

I Used DeepSeek R1 + Open Data to Challenge AI ROI

Pragnesh Shah — Thu, 04 Jun 2026 13:00:00 +0000

Three conversations are happening in organisations right now.

"We deployed AI right before the rebound — the numbers recovered immediately."

"Our AI rollout worked. Numbers are up, team is faster."

"We adopted AI because everyone else did. Eighteen months later, it hasn't moved the bottom line."

Here is the uncomfortable question that runs through all three: would those results have happened anyway?

The concept that explains why that question is so hard to answer is 150 years old — and it has never been more relevant than it is right now. It is called regression to the mean. Let me show you how it applies across two lenses of real data.

TL;DR: Regression to the mean explains why AI investments appear to work — even when they haven't moved anything. SEC data shows 0% natural recovery in 27 trough companies. Stack Overflow and METR data show a 39-point gap between perceived and measured productivity. Three questions to ask before your next board deck are at the bottom.

Regression to the Mean: The Concept Nobody Applies to AI
Lens A — SEC Revenue Data: The Baseline You Never Set
Lens B — SO 2024 & METR 2025: The 39-Point Perception Gap
Solution Design: DeepSeek R1 on AWS Bedrock
The Three Questions Worth Asking

Regression to the Mean: The Concept Nobody Applies to AI

Regression to the mean (RTM) describes a simple statistical reality: extreme outcomes drift back toward the average on their own. Not because anything changed — because extreme performance contains more noise than signal, and noise does not persist.

Let me explain with an example. Take every footballer rated in the top 10% one season. The following season, most score lower — not because they declined, but because extreme performance is partly skill and partly circumstance, and circumstance does not persist. The lowest-rated players improve for the same reason. Neither group changed fundamentally. The distribution pulled them back toward the centre.

Once seen in sport, it is impossible to unsee in a quarterly revenue report. When a metric recovers after an AI deployment, the question is never "would this have happened anyway?" It should be the first question. In AI investment, RTM is the most systematically ignored concept in the room. The AI gets the credit. The measurement goes unquestioned. The baseline is never set.

RTM takes two forms in AI measurement — one statistical, one cognitive. Both produce the same wrong conclusion.

Lens A — SEC Revenue Data: The Baseline You Never Set

Data: SEC EDGAR Financial Statement Data Sets 2024 Q4 · SIC 7372 · Download datasets

Software companies that hit a revenue trough recovered — with and without an AI initiative. Among the companies that did invest in AI, some genuinely outperformed. Others recovered at the same rate as those that did nothing. The aggregate signal does not separate them cleanly.

The FOMO dynamic made this worse. Many organisations did not invest because of operational need. They invested because a competitor mentioned AI in an earnings call. Because the strategy deck required it. Because 'AI first' became the default answer before the question was asked.

That is a fundamentally different trigger — and it produces a different measurement problem.

When the decision to invest is caused by an extreme trough, statistical reversion is already loaded. The AI initiative and the natural recovery happen simultaneously. Attributing the recovery to the initiative is not dishonest. It is just unmeasured.

The companies that cannot answer 'what would have happened without it?' have not measured their AI. They have measured their quarter.

Top 20 most volatile companies — Q1 vs Q2 revenue (USD M, log scale). Green: recovered. Red: declined. SEC EDGAR, SIC 7372, Q3 2023 – Q4 2024.

What DeepSeek R1 found — Lens A

0% recovery in 27 large-drop companies contradicts RTM expectations. Mean QoQ (+131.1%) is 34× the median (+3.8%) — a handful of outliers distorting the aggregate signal.

Technical finding (DeepSeek R1)

While 55% of companies showed sequential revenue growth, the complete absence of recovery among 27 largest decliners contradicts classical RTM patterns. Extreme positive skew (median 3.8% vs mean 131.1%) suggests concentrated outliers rather than systemic improvement. This distribution questions blanket AI impact claims — most recoveries cluster in already-stable performers.

Executive summary (DeepSeek R1)

Not one company with severe revenue drops recovered naturally. The staggering 131% average growth figure hides that 3/4 of companies saw less than 4% improvement — the true drivers were pre-existing strong performers, not AI interventions.

So what? Before the next board deck: what was the 12-month rolling average before the trough that triggered this investment? That is your baseline. The quarter you invested from is not. Demand counterfactual modelling before renewing any AI contract.

Lens B — SO 2024 & METR 2025: The 39-Point Perception Gap

Data: Stack Overflow Developer Survey 2024 (n=33,239) · METR 2025 controlled study · Download datasets

Developers using AI tools took 19% longer on real tasks whilst believing they were 20% faster.

That is a 39-percentage-point gap between perception and measured reality. It holds across every experience level — junior developers, senior developers, architects. The number does not move.

AI sentiment (normalised) by experience — SO 2024, n=33,239. Orange dashed: +20% perceived (METR 2025). Red dotted: −19% measured (METR 2025).

Why this is not classical RTM — and why that matters more

Classical RTM requires the same subjects measured twice. The SO 2024 survey and METR 2025 are different populations, measured once each. Strict RTM does not apply.

What this is instead is RTM's cognitive cousin — and it produces the same wrong conclusion by a different route.

Regression to the mean describes what happens to numbers measured from an extreme. The 39-point perception gap describes what happens to judgement formed at an extreme moment — adoption. Developers anchor their baseline at maximum enthusiasm. Every measurement taken from that anchor inflates the perceived gain. Different mechanism, identical error.

What DeepSeek R1 found — Lens B

R1 received adoption rates, trust scores, and sentiment distributions. The chart below is what it returned.

Mechanism: cognitive attribution bias. Perceived productivity gain +20% vs measured change −19%. The shaded gap is the 39-point perception-reality chasm.

Technical finding (DeepSeek R1)

The 39-point METR gap aligns with predictable variance patterns: senior developers show both highest volatility (σ=0.91) and largest sentiment/reality disconnect. While AI adoption correlates with a 0.49-point sentiment lift, the −19 actual METR reveals systemic attribution errors. Experience cohorts' inverse trust/performance relationship suggests senior teams over-attribute natural workflow variance to tools.

Executive summary (DeepSeek R1)

Engineering leaders report 20% productivity gains from AI tools, but objective metrics show 19% decline — a 39-point reality gap. This mirrors the 13.4% sentiment drop from 2023's AI hype cycle, showing how teams misattribute normal performance variance to technology interventions.

So what? The 39-point gap is the number to take into your next vendor conversation. Ask for task-completion data from a controlled study — not survey sentiment. If they cannot produce it, you are buying confidence, not capability.

What this means for your measurement

If your organisation is in boardroom conversation two — "our AI rollout worked, the numbers are up" — this lens applies directly. The question is not whether the numbers went up. They probably did. The question is whether the team's enthusiasm at the moment of adoption has been baked into the baseline you are measuring from.

Most AI productivity measurements are taken by people who chose to believe in the tool, at the moment they chose to believe in it. That is the least reliable baseline possible. The 39-point gap is what happens when you measure from there and call it evidence.

The business reality in 2025: AI is now a board mandate in most organisations, not an operational choice. That means adoption is happening regardless of individual conviction — which makes the attribution problem worse, not better. FOMO-driven adoption inflates the enthusiast pool, inflates the baseline sentiment, and inflates the reported gains. The underlying performance data, as METR measured it, points the other way.

Solution Design: DeepSeek R1 on AWS Bedrock

Same pipeline for both lenses: Python computes stats → DeepSeek R1 reasons over them → structured JSON drives charts and written findings.

One prompt, two lenses

One DeepSeek R1 prompt handles both datasets. Only the context block changes per lens — topic, audience, purpose. The model receives the full stats dictionary and returns structured JSON. Here is the actual Lens A output:

{
  "rtm_present": false,
  "mechanism": "Extreme Value Distortion",
  "key_finding": "0% recovery in large-drop companies contradicts RTM despite 55.1% overall recovery.",
  "surprise": "Mean QoQ of 131.1% vs median 3.8% reveals extreme outlier-driven skew.",
  "chart_points": [
    {"label": "Median QoQ growth",        "value": 3.8,   "annotation": "Typical company performance"},
    {"label": "Large-drop recovery rate", "value": 0.0,   "annotation": "Zero bounce-back despite RTM expectations"},
    {"label": "Mean QoQ growth",          "value": 131.1, "annotation": "Skewed by extreme outliers"}
  ],
  "gap_label": "Median-Mean Disparity",
  "gap_value": 127.3,
  "highlight_note": "29 companies drove 98% of growth",
  "narration": "While 55% of companies showed sequential revenue growth, the complete absence of recovery among 27 largest decliners contradicts classical RTM. Extreme positive skew (median 3.8% vs mean 131.1%) suggests concentrated outliers rather than systemic improvement.",
  "executive_summary": "Not one company with severe revenue drops recovered naturally. The 131% average hides that 3/4 of companies saw less than 4% improvement — true drivers were pre-existing strong performers."
}

Chart axes, annotations, and shaded gap regions are all derived from this output. Change the dataset, the chart changes. Change the audience context, the written outputs change.

R1's thinking (excerpt — Lens A)

Before producing JSON, R1 reasons through the data. Here is a direct excerpt from its chain-of-thought:

The overall recovery rate is 55.1% — slightly above half. RTM would predict that
companies performing worse than average would improve. That's consistent with
the 55% figure. But the mean QoQ is 131.1%, way higher than the median of 3.8%.
A highly skewed distribution. A few companies had massive increases pulling the
mean up. That skewness might indicate outliers influencing the average.

The large drop recovery rate being 0% complicates things. If those with the
biggest drops didn't recover, maybe RTM isn't the main factor for them. Their
issues might be structural — not random fluctuations that naturally correct.

RTM might be present in the general recovery (55%), but the 0% recovery in
large drops is unexpected. The mechanism: cognitive bias where companies
attribute the high mean (from a few strong recoveries) to AI, ignoring the
complete lack of recovery in the worst performers.

Three numbers fed the model: QoQ recovery rate (55.1%), large-drop recovery rate (0.0%), mean-median gap (131.1 − 3.8 = 127.3 points).

What failed first

Mistral 7B hallucinated the acronyms it was told not to expand. Three runs, three different inventions for "RTM": Root Mean Square Error, Return on Technology Investment, Relative Technological Maturity. MATH-500 score: ~42% vs DeepSeek R1's 97.3%. At that capability gap, token priors beat instructions. Model replaced.

Over-prescribed prompts killed the output. Ten-point briefs — paragraph structure, tone, anchoring numbers — produced technically correct, completely bland text. Stripping to context + data + one instruction fixed it. The model needs room to reason, not a script to follow.

A normalisation rule in the prompt broke JSON output. "Multiply raw scores by 50 if on a −2 to +2 scale" caused R1 to burn its entire thinking budget on a scale conflict. Empty JSON returned. Fix: handle normalisation in plot_rtm(), not in natural language. Scale conversion belongs in code.

Numeric outputs were stable across runs; interpretive labels were not (even at temperature: 0). The same data produced consistent numbers every time. The mechanism label varied. Worth keeping in mind when using reasoning models in a pipeline: anchor on the numbers, treat the prose framing as context.

The Three Questions Worth Asking

1. What was my baseline before the trough that triggered this investment?
Not the crisis quarter. The 12-month rolling average before it.

2. What would have happened without the intervention?
If you cannot model the counterfactual, you have not measured the AI. You have measured the quarter.

3. Have I measured across at least three consecutive cycles?
One data point is noise. Two is coincidence. Three is a pattern worth acting on.

What's your experience measuring AI ROI in your organisation? Has your team found a reliable counterfactual approach? Drop a comment — I'd genuinely like to know what's working.

I Built a Document Triage with Telegram, n8n, and AWS Bedrock — 6 Decisions That Shaped a Self-Hosted AI Document Analyst

Pragnesh Shah — Mon, 01 Jun 2026 16:02:03 +0000

The volume of documents landing on my mobile outpaces my ability to read them. Research papers, articles, books — shared as PDFs across channels faster than any individual can process. The question isn't 'how do I read more?' — it's 'how do I decide what deserves my attention in 30 seconds, not 30 minutes?'

So I built this triage system

Telegram mandates HTTPS for webhooks. I had no domain to attach a certificate to. Here's how a self-signed cert, an Nginx proxy, and a reverse-engineered secret token got a PDF-summarising bot into production on EC2 with AWS Bedrock doing the thinking.

TL;DR: No domain → self-signed cert uploaded to Telegram API → Nginx TLS termination → n8n orchestration → Bedrock summarisation. Zero access keys. Full architecture below.

Constraints
Architecture: Telegram to Bedrock on EC2
Key Design Decisions
What Failed First
The Implementation That Shipped
What I'd Do Differently
Takeaways for Architects

Constraints

Constraint	Value
Domain	None available — no Route 53 hosted zone, no ACM certificate possible
HTTPS	Mandatory — Telegram rejects webhook registration without TLS
Auth model	No long-lived credentials in config files or environment variables
Orchestration	Visual workflow preferred — needs to be modifiable without redeployment
LLM	AWS-native, no external API keys beyond what IAM provides

Architecture: Telegram to Bedrock on EC2

Data flow: User sends PDF → Telegram delivers webhook POST to EC2:443 → Nginx terminates TLS, proxies to n8n:5678 → n8n downloads file, extracts text, invokes Bedrock → response sent back via Telegram Bot API.

Key Design Decisions

Every architecture is a set of trade-offs made explicit. Here are the ones that shaped this system:

#	Decision	Why	Trade-off Accepted
1	Self-signed cert + Nginx (no domain needed)	Telegram accepts uploaded certs via `setWebhook` API; eliminates domain dependency entirely	Browser shows cert warnings; webhook registration is manual
2	n8n over custom code	Visual workflow with built-in Telegram, PDF extraction, and LLM chain nodes. Hours to build, not weeks.	Undocumented webhook secret behaviour; version-pinning required
3	EC2 direct deployment	Cloud infrastructure with native internet connectivity. Predictable networking for webhook delivery.	Monthly compute cost; single point of failure without ASG
4	IAM role, not access keys	Zero rotation burden, no exposure risk, automatic credential refresh via instance metadata	None — strictly superior for EC2-hosted workloads
5	Manual webhook with computed secret	n8n can't upload self-signed certs when registering webhooks; manual `setWebhook` call bridges the gap	Must re-run registration script after workflow changes
6	Encryption key in Secrets Manager	Key loss = total credential loss. Secrets Manager provides audit logging, durability, and prevents accidental exposure in config files	Extra API call at startup (~100ms latency)

What Failed First

Tunnel services: unreliable for webhook delivery

I tried localhost.run, cloudflared, and ngrok to expose a local n8n instance. All three connected via SSH but HTTP traffic never arrived reliably. Tunnel services that rely on custom subdomains (*.lhr.life, *.trycloudflare.com) introduce a dependency outside your control — DNS resolution, uptime, and connection stability are all delegated to a third party. For a webhook endpoint that needs to be reachable 24/7, that's a risk I wasn't willing to accept.

Webhook 403s: proxy trust misconfiguration

After deploying to EC2 with Nginx in front of n8n, every Telegram webhook returned 403 Forbidden. The fix: N8N_PROXY_HOPS=1. Without it, n8n doesn't trust the X-Forwarded-For header from Nginx and rejects requests as spoofed.

Encryption key mismatch: silent data loss

I recreated the Docker container with a different encryption key. n8n showed 'Set up owner account' — all existing workflows and credentials were gone. The encryption key and the Docker volume are a coupled pair. Change one without the other and you lose everything.

The undocumented secret token

n8n v2.22.5 enforces a secret token on Telegram webhook requests. After extensive testing and reading the source code, I found the formula follows this pattern:

secret_token = {workflowId}_{nodeId}

The workflowId is visible in the browser URL. The nodeId appears when you click the Telegram Trigger node. This isn't surfaced in the n8n documentation or UI — it required empirical discovery. Without the correct secret, Telegram receives a 403 on every delivery and your bot stays silent.

The Implementation That Shipped

Docker: n8n with IAM role access

# Fetch encryption key from Secrets Manager (never hardcode)
N8N_KEY=$(aws secretsmanager get-secret-value \
  --secret-id <your-secret-id> \
  --region <your-region> \
  --query SecretString \
  --output text)

sudo docker run -d \
  --name n8n \
  --restart always \
  -e WEBHOOK_URL=https://<your-ec2-public-ip> \
  -e N8N_EDITOR_BASE_URL=https://<your-ec2-public-ip> \
  -e N8N_ENCRYPTION_KEY=${N8N_KEY} \
  -e GENERIC_TIMEZONE=Europe/London \
  -e N8N_AWS_SYSTEM_CREDENTIALS_ACCESS_ENABLED=true \
  -e N8N_PROXY_HOPS=1 \
  -e NODE_TLS_REJECT_UNAUTHORIZED=0 \
  -p 5678:5678 \
  -v n8n_data:/home/node/.n8n \
  docker.n8n.io/n8nio/n8n:2.22.5

Key decisions embedded in this config:

N8N_AWS_SYSTEM_CREDENTIALS_ACCESS_ENABLED=true — uses EC2 IAM role, no access keys
N8N_PROXY_HOPS=1 — trusts exactly one proxy layer (Nginx)
N8N_ENCRYPTION_KEY fetched from Secrets Manager at runtime, never stored on disk
Version pinned to 2.22.5 — webhook secret behaviour changes between versions

Nginx: TLS termination

server {
    listen 443 ssl;
    server_name _;

    ssl_certificate     /etc/ssl/certs/n8n.pem;
    ssl_certificate_key /etc/ssl/private/n8n.key;

    client_max_body_size 50m;

    location / {
        proxy_pass http://localhost:5678;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        proxy_buffering off;
        proxy_request_buffering off;
    }
}

proxy_request_buffering off is critical — without it, Nginx buffers the request body and n8n fails to parse multipart uploads (PDFs).

Webhook registration: the self-signed cert trick

curl -s \
  -F "url=https://<your-ec2-public-ip>/webhook/<your-webhook-id>/webhook" \
  -F "certificate=@/etc/ssl/certs/n8n.pem" \
  -F "secret_token=<workflowId>_<nodeId>" \
  "https://api.telegram.org/bot<your-bot-token>/setWebhook"

Three things happen in this single call:

Telegram learns the webhook URL
The self-signed public cert is uploaded — Telegram will trust it for future deliveries
The secret token is registered — Telegram includes it as a header, n8n validates it

Workflow node chain

The LLM prompt:

Analyse the following document and provide:
1. Title/Subject
2. Key Insights (3-5 bullet points)
3. Summary (2-3 paragraphs)

Document text:
{{ $json.text }}

System message: 'You are a document analyst. Provide clear summaries. Format for Telegram.'

What I'd Do Differently

Register a domain. Self-signed certs work but add operational friction — manual webhook re-registration after cert renewal, browser warnings on the editor. A domain costs as little as $5/year, and with ACM providing free certificates, the entire self-signed complexity disappears. For an MVP this was acceptable; for anything beyond, it's the first thing I'd change.

Switch to PostgreSQL for n8n's backend. n8n officially supports PostgreSQL as its production database (SQLite is the default; MySQL/MariaDB are deprecated). SQLite lives inside the Docker volume — it locks on writes, doesn't support safe hot backups, and is incompatible with n8n's queue mode for horizontal scaling. Amazon RDS for PostgreSQL or Aurora Serverless v2 would give managed backups, point-in-time recovery, and a path to multi-worker deployments without touching the application layer.

Add an Auto Scaling Group with min=1 and a launch template. The current architecture is a single EC2 instance — one availability zone, one point of failure. An ASG with the same user-data script gives self-healing (automatic replacement on health check failure) without adding architectural complexity. Combined with a domain and ACM, this moves the system from 'working prototype' to 'production-grade' with minimal additional cost.

Takeaways for Architects

Self-signed certificates are a valid pattern — but only when the webhook consumer explicitly supports certificate upload. Telegram does. Most services (Stripe, GitHub, Slack) do not. Validate this before committing to the architecture.

IAM roles are strictly superior to access keys for EC2 workloads. No rotation, no exposure risk, automatic refresh. There's no trade-off here — just use them.

Pin your versions. n8n's webhook secret enforcement appeared between versions without a migration path. n8n:latest is a liability. Pin, test upgrades in staging, keep a rollback plan.

Debug webhooks layer by layer. Five distinct failure modes: network, TLS, authentication, application logic, response formatting. Each must pass before testing the next. Skipping layers leads to circular troubleshooting.

Put intelligence in the prompt, not the infrastructure. The entire 'AI' part of this system is a 5-line prompt template. The other 95% of effort was infrastructure — certificates, secrets, networking, Docker volumes. The model is a commodity. Getting data to and from it reliably is the craft.

What's your approach when you need HTTPS webhooks but can't get a domain? I'd be curious whether others have hit the same self-signed cert pattern — or found a better workaround. Drop a comment below.

DEV Community: Pragnesh Shah

The AI Question That Silences Every Leadership Meeting

The AI Question That Silences Every Leadership Meeting

The Ratio worth examining

You're spreading thin and calling it strategy

Where the returns actually are

The Ceiling Isn't Technical

The question

I Used DeepSeek R1 + Open Data to Challenge AI ROI

Contents

Regression to the Mean: The Concept Nobody Applies to AI

Lens A — SEC Revenue Data: The Baseline You Never Set

What DeepSeek R1 found — Lens A

Lens B — SO 2024 & METR 2025: The 39-Point Perception Gap

Why this is not classical RTM — and why that matters more

What DeepSeek R1 found — Lens B

What this means for your measurement

Solution Design: DeepSeek R1 on AWS Bedrock

One prompt, two lenses

R1's thinking (excerpt — Lens A)

What failed first

The Three Questions Worth Asking

I Built a Document Triage with Telegram, n8n, and AWS Bedrock — 6 Decisions That Shaped a Self-Hosted AI Document Analyst

Table of Contents

Constraints

Architecture: Telegram to Bedrock on EC2

Key Design Decisions

What Failed First

Tunnel services: unreliable for webhook delivery

Webhook 403s: proxy trust misconfiguration

Encryption key mismatch: silent data loss

The undocumented secret token

The Implementation That Shipped

Docker: n8n with IAM role access

Nginx: TLS termination

Webhook registration: the self-signed cert trick

Workflow node chain

What I'd Do Differently

Takeaways for Architects