DEV Community

Cover image for PDF Integrity Report: February 2026
Iurii Rogulia
Iurii Rogulia

Posted on • Originally published at htpbe.tech

PDF Integrity Report: February 2026

Originally published at htpbe.tech. The version on htpbe.tech stays in sync with the latest detection algorithm — refer to it for the canonical text.

Every month we look at aggregate, anonymized data from checks processed through the HTPBE? web interface and publish what we find. No file contents, no personally identifiable information — only the structural and metadata signals our algorithm uses to detect modifications.

February 2026: 418 PDFs analyzed through the website, 28 calendar days, steady daily volume.


The Top Line

Metric Value
Total PDFs analyzed 418
Flagged as modified 169 (40.4%)
Clean 249 (59.6%)
Total data volume 210.3 MB
Total pages analyzed 1,902

Two in five PDFs submitted through the website in February showed signs of post-creation modification. That is a higher rate than cross-industry averages suggest — but it reflects the selection bias of fraud detection workflows: people check documents when they have a reason to be concerned.


Modification Confidence Distribution

Confidence level Count Share
None (no modification detected) 211 50.5%
High (strong structural evidence) 24 5.7%
100% (cryptographic proof or definitive markers) 145 34.7%
cannot determine (consumer software origin) 38 9.1%

More than a third of all uploaded PDFs carried 100% modification confidence — meaning the evidence was unambiguous, not probabilistic. These documents carry stacked forensic signals — a date mismatch, incremental update artifacts, and tool-signature inconsistencies.

Files with high-confidence (but not 100%) findings deserve attention: 24 files, 5.7% of the total. These documents show strong structural evidence — suspicious fields, questionable timestamps — but no single finding rises to the level of cryptographic proof. In a compliance workflow, these warrant manual review.


How Modifications Are Detected

Among the 169 flagged files, the algorithm identified the following signals:

Detection signal(s) Files % of modified
Modification date differs (only) 58 34.3%
Incremental updates + modification date differs 31 18.3%
Incremental updates (only) 15 8.9%
Incremental updates + suspicious update pattern 15 8.9%
No explicit signal (rule-based verdict) 15 8.9%
All three: incremental + suspicious + date 8 4.7%
Invalid date sequence + anomalies + date differs 6 3.6%
Tool signature mismatch combinations 7 4.1%

The single most common detection signal — appearing in 62% of flagged files — is a discrepancy between the embedded creation and modification timestamps. A document edited in an external tool will often have its modification date updated while the original creation date remains as set by the authoring software. This divergence, when combined with other signals, becomes a strong forensic indicator.

Incremental updates were detected in 97 files (23.2% of all February checks). This is the PDF mechanism that allows appending content — annotations, form data, revised pages — without rewriting the file. Among those 97 files, the average update chain length was 2.6 revisions. Crucially, 59 of those 97 files (60.8%) were also classified as modified. The remaining 40% showed incremental updates consistent with legitimate workflows: annotations, digital signatures, or form completion.

Critical modification markers across all flagged files:

  1. Different creation and modification dates — 113 files
  2. Multiple cross-reference tables (incremental updates) — 40 files
  3. Known PDF editing tool detected — 15 files

The Software Ecosystem

PDF metadata reveals which software created and last touched a document. February showed a clearly Microsoft-centric picture, with significant freelance-platform presence.

Top producers (the application that last wrote the file):

Producer Files Share
Microsoft: Print To PDF 24 5.7%
PDFium 20 4.8%
mPDF 8.2.5 18 4.3%
Upwork 16 3.8%
Microsoft® Word for Microsoft 365 12 2.9%
iLovePDF 11 2.6%
Style Report 11 2.6%
OpenPDF 1.3.26 11 2.6%
PDFsharp 1.50 10 2.4%

Top creators (the original authoring application):

Several patterns worth noting.

Microsoft Word fragments into multiple entries. Word 2016, Word 2019, Word for Microsoft 365, and the generic “Microsoft Word” string together account for 41 files — the single largest authoring platform if consolidated. Organizations upgrading their Office installations leave version-heterogeneous document archives, and all of those versions end up in fraud detection queues.

iLovePDF in the producer field signals documents that were processed through an online PDF manipulation service after their original creation. When a file lists iLovePDF as producer but names Microsoft Word or Chromium as creator, the document went through an intermediate editing step that the creator field does not acknowledge. Eleven files carried this pattern in February.

Upwork appears in both creator and producer (16 files each). The Upwork platform generates its own PDFs — contracts, payment statements, work history reports — and they are being submitted for authenticity analysis by counterparties before acting on them. This reflects a real-world use case: recipients checking freelance platform documents before releasing funds or signing agreements.

mPDF 8.2.5 (18 files as producer) is a PHP PDF library used by web applications to generate invoices, receipts, and reports programmatically. These are application-generated documents, not user-authored files — which makes any structural inconsistency more notable, since they should be templated and uniform.

PDFium appearing in both creator and producer (20 and 21 files respectively) reflects Chrome-based PDF generation — printouts from web applications, saved browser pages, Google Docs exports.


PDF Version Landscape

PDF Version Files Share
1.7 154 36.8%
1.4 113 27.0%
1.5 66 15.8%
1.6 35 8.4%
1.3 36 8.6%
2.0 3 0.7%
1.2 3 0.7%
Invalid/missing 7 1.7%

PDF 1.7 leads at 36.8%, with 1.4 a strong second at 27%. Together they account for nearly two thirds of the sample. PDF 2.0 — the ISO 32000-2 standard from 2017 — appears in just 3 files (0.7%), reflecting how slowly the ecosystem adopts new specifications.

Seven files had an invalid or unparseable version string. A well-formed PDF should always declare its version in the file header; losing this field is a sign of either corruption or aggressive editing that stripped the header.


Digital Signatures: Present but Not Protective

11 PDFs carried embedded digital signatures (2.6% of the total). Of those, 3 had been modified after the signature was applied — a 27.3% post-signature modification rate among signed documents.

The mechanism most commonly exploited here is incremental updates. The PDF specification permits content to be appended after a signature is applied, provided the additions are limited to explicitly permitted operations. Some editors exploit the ambiguity of what constitutes a “permitted” change to introduce substantive content modifications — revised figures, changed dates, altered party names — while preserving a signature that remains cryptographically valid within its original scope.

The result: a document that displays a valid signature indicator in a viewer, but whose content has changed since signing. The signature covers what it covered when it was applied; it does not cover what was added afterward.

In practice, most organizations treat the presence of a signature field as sufficient fraud detection. Active signature validation — which would surface these post-signature modifications — is rarely performed outside of legal and financial workflows with formal fraud detection requirements.


Document Profile

The average PDF checked through the website in February:

  • Average size: 0.50 MB
  • Largest file: 9.70 MB
  • Average page count: 4 pages
  • Total pages analyzed: 1,902

The half-megabyte average is consistent with the document types typically submitted for fraud detection: invoices, contracts, bank statements, certificates. Short documents with specific numerical or legal content — where a changed figure or date has real financial or legal consequence.

Metadata completeness averaged 76 out of 100. The score measures how many of the eight standard PDF metadata fields (title, author, creator, producer, creation date, modification date, subject, keywords) are populated. Missing creation dates affected 53 files (12.7%) — removing one of the cleaner forensic signals and increasing reliance on structural analysis.


Daily Volume

Usage was steady throughout February, without dramatic spikes:

Feb 06: 28    Feb 14: 11    Feb 22:  1
Feb 07: 10    Feb 15: 20    Feb 23: 24
Feb 08:  1    Feb 16: 25    Feb 24: 12
Feb 09: 11    Feb 17: 28    Feb 25: 47
Feb 10: 25    Feb 18: 11    Feb 26: 20
Feb 11: 16    Feb 19: 20    Feb 27: 24
Feb 12: 32    Feb 20: 14    Feb 28:  4
Feb 13: 20    Feb 21: 14
Enter fullscreen mode Exit fullscreen mode

The peak day was February 25 with 47 checks — roughly 1.7× the monthly daily average of 27.5. No batch processing, no anomalous spikes. The distribution reflects organic usage: higher on weekdays, quieter on weekends, with the first week of the month running slightly lighter than the rest.


Other Signals

JavaScript in PDFs: zero across all 418 files. No embedded JavaScript was detected in February. This is consistent with the document types: invoices, contracts, and certificates do not use interactive scripting.

Embedded files: 4 (less than 1%). PDFs can contain binary attachments. Four documents carried embedded content. Not unusual, but worth flagging in any workflow where file attachments introduce compliance risk.

Suspicious tool patterns: 50 files (12.0%). This flag indicates that the creator–producer metadata combination is internally inconsistent in ways that suggest an unacknowledged intermediate processing step. The file claims a creation toolchain that does not match its structural fingerprint.


Summary

February 2026 by the numbers:

  • 40.4% of submitted PDFs showed modification signals — significantly above the commonly cited 25–30% industry baseline, consistent with the self-selection of fraud detection workflows
  • Modification date discrepancy is the leading forensic indicator, present in 62% of flagged files
  • Microsoft Office ecosystem (Word across multiple versions, Print to PDF) is the primary authoring environment in this sample
  • iLovePDF and online editors leave traceable producer-field evidence in files that subsequently pass through fraud detection
  • Upwork documents are a recurring fraud detection target — freelance contracts and payment records being checked by counterparties
  • Digital signatures do not guarantee post-signature integrity — 27% of signed files in this sample were modified after signing
  • PDF 2.0 adoption remains below 1% despite being available for nearly a decade

The 40.4% modification rate is the most important number from February. It means that when someone uploads a PDF to check its authenticity, there is more than a one-in-three chance the document will come back flagged. That is not a marginal outcome — it is why fraud detection workflows exist.


Data covers all checks submitted through the HTPBE? web interface in February 2026 (UTC). File contents are not stored or analyzed; only structural metadata signals are retained. All figures are aggregate and anonymized.

Top comments (0)