DEV Community

Adda
Adda

Posted on

How to Automate Canadian T4 Slip Parsing with an API (No OCR Setup Required)

Every year, thousands of mortgage brokers, accountants, and HR professionals in Canada manually re-enter data from T4 slips. Box 14, box 22, box 16, box 18 — copied one by one from a PDF into a spreadsheet or loan application.

It's slow, error-prone, and completely unnecessary in 2025.

In this article, I'll show you how to automate T4 extraction in under 10 minutes using a REST API — no OCR libraries, no ML model, no infrastructure to maintain.


What is a T4 Slip?

A T4 (Statement of Remuneration Paid) is a Canadian tax document issued by employers to employees every year. It reports employment income and all payroll deductions to the CRA.

Key boxes that matter for most workflows:

Box Description
Box 14 Employment income
Box 16 CPP contributions
Box 17 QPP contributions (Quebec)
Box 18 EI premiums
Box 22 Income tax deducted
Box 24 EI insurable earnings
Box 26 CPP/QPP pensionable earnings

The Problem With Traditional OCR

Most OCR libraries (Tesseract, AWS Textract, Google Document AI) are general-purpose. They extract raw text — but they don't understand that "14" followed by a number means "employment income."

You still end up writing a parser, maintaining regex patterns, and handling every variation of T4 layout across different payroll systems (ADP, Ceridian, Payworks, Nethris...).


A Better Approach: Document Intelligence API

Instead of building and maintaining OCR + parsing logic, you can call an API that handles everything — including scanned T4s.

Here's how it works:

  1. Convert the T4 PDF to base64
  2. Send it to the API
  3. Get back structured JSON with all boxes

Step-by-Step Implementation

Step 1 — Convert the PDF to Base64

Node.js:

const fs = require('fs');

const pdfBuffer = fs.readFileSync('./t4-2024.pdf');
const base64 = pdfBuffer.toString('base64');
Enter fullscreen mode Exit fullscreen mode

Python:

import base64

with open("t4-2024.pdf", "rb") as f:
    base64_str = base64.b64encode(f.read()).decode("utf-8")
Enter fullscreen mode Exit fullscreen mode

PowerShell (Windows):

$base64 = [Convert]::ToBase64String([IO.File]::ReadAllBytes("C:\path\to\t4-2024.pdf"))
Enter fullscreen mode Exit fullscreen mode

Step 2 — Call the API

const response = await fetch('https://docusense.stackapi.dev/api/v1/documents/t4', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    fileBase64: base64,
    mimeType: 'application/pdf',
    taxYear: 2024,
    language: 'fr'  // or 'en'
  })
});

const result = await response.json();
console.log(result.data);
Enter fullscreen mode Exit fullscreen mode

Step 3 — Handle the Response

{
  "success": true,
  "data": {
    "taxYear": 2024,
    "employerName": "Entreprise ABC Inc.",
    "payrollAccountNumber": "123456789RP0001",
    "employeeName": "Jean Tremblay",
    "socialInsuranceNumber": "***-***-456",
    "province": "QC",
    "boxes": {
      "box14": 72000.00,
      "box16": null,
      "box17": 3799.80,
      "box18": 1049.12,
      "box22": 17850.00,
      "box24": 61500.00,
      "box26": 68500.00,
      "box44": null,
      "box46": 500.00
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Notice that the SIN is automatically masked (***-***-456) — privacy is handled for you.


Real-World Example: Mortgage Pre-Qualification

Here's a complete example that extracts T4 data and calculates whether an applicant qualifies for a mortgage:

async function checkMortgageEligibility(t4PdfPath, propertyPrice, downPayment) {
  // Step 1: Extract T4 data
  const base64 = fs.readFileSync(t4PdfPath).toString('base64');

  const t4Response = await fetch('https://docusense.stackapi.dev/api/v1/documents/t4', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${process.env.API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({ fileBase64: base64, mimeType: 'application/pdf', taxYear: 2024 })
  });

  const { data: t4 } = await t4Response.json();
  const annualIncome = t4.boxes.box14;

  // Step 2: Calculate mortgage using FinGuard API
  const loanAmount = propertyPrice - downPayment;

  const mortgageResponse = await fetch('https://api.stackapi.dev/api/v1/calculators/mortgage', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${process.env.API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      loanAmount,
      interestRate: 5.5,
      loanTermYears: 25
    })
  });

  const { data: mortgage } = await mortgageResponse.json();

  // Step 3: Check GDS ratio (should be under 32%)
  const monthlyIncome = annualIncome / 12;
  const gdsRatio = (mortgage.monthlyPayment / monthlyIncome) * 100;

  return {
    employeeName: t4.employeeName,
    annualIncome,
    monthlyPayment: mortgage.monthlyPayment,
    gdsRatio: gdsRatio.toFixed(2),
    eligible: gdsRatio < 32
  };
}

// Usage
const result = await checkMortgageEligibility('./jean-t4.pdf', 450000, 90000);
console.log(result);
// {
//   employeeName: "Jean Tremblay",
//   annualIncome: 72000,
//   monthlyPayment: 2134.50,
//   gdsRatio: "35.57",
//   eligible: false
// }
Enter fullscreen mode Exit fullscreen mode

What About Scanned T4s?

Scanned T4s (image-based PDFs) are handled automatically — no extra configuration needed. The same API call works for both digital and scanned documents.

You can also send image files directly:

// Works with JPG, PNG, WEBP too
body: JSON.stringify({
  fileBase64: base64,
  mimeType: 'image/jpeg',  // or image/png, image/webp
  taxYear: 2024
})
Enter fullscreen mode Exit fullscreen mode

Quebec Specifics: RL-1 Support

If you work with Quebec employees, you'll also need to process RL-1 slips (Relevé 1 from Revenu Québec). The same approach works:

const rl1Response = await fetch('https://docusense.stackapi.dev/api/v1/documents/rl1', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${process.env.API_KEY}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    fileBase64: base64,
    mimeType: 'application/pdf',
    taxYear: 2024,
    language: 'fr'
  })
});
Enter fullscreen mode Exit fullscreen mode

Returns all RL-1 cases: A (employment income), B (QPP), C (EI), E (Quebec tax withheld), G (QPP admissible salary), H/I (QPIP)...


Getting Started

  1. Sign up for a free API key at RapidAPI — DocuSense API
  2. The free tier includes 100 documents/month — enough to test thoroughly
  3. T4 extraction requires the PRO plan ($19/month)

Summary

Approach Setup time Maintenance Scanned PDFs
Tesseract OCR + custom parser 2-3 days High Poor
AWS Textract 1 day Medium Good
Google Document AI 1 day Medium Good
DocuSense API 10 minutes Zero Yes

If you're building any Canadian fintech workflow that touches T4s — mortgage applications, payroll software, accounting tools, income verification — this will save you significant development and maintenance time.


Have questions about the API or want to share what you're building? Drop a comment below.

Top comments (0)