DEV Community

Crime Brasil
Crime Brasil

Posted on • Originally published at crimebrasil.com.br

How We Built an Open Crime Data Platform for Brazil (3 Million Records)

How We Built an Open Crime Data Platform for Brazil (3 Million Records)

Brazil has one of the world's highest homicide rates — over 47,000 murders per year. Yet accessing crime statistics is surprisingly hard. Data is fragmented across 27 state secretariats, each with different formats, different crime taxonomies, and wildly inconsistent update schedules.

We built Crime Brasil to fix that.

What it does

Crime Brasil is an interactive crime map covering Rio Grande do Sul, Rio de Janeiro, and Minas Gerais — with bairro (neighborhood) level granularity for RS, and municipality-level for the others.

Key features:

  • 3+ million individual crime records for RS (2022–2026)
  • Choropleth maps at state → city → neighborhood drill-down
  • Per-100K-inhabitants rate normalization (so you can actually compare)
  • Crime type filters (homicide, robbery, drug offenses, etc.)
  • Time range selector (last 6/12/24/36 months or full history)
  • API access — all data available via REST endpoints

The tech stack

  • Backend: FastAPI + SQLAlchemy + SQLite (yeah, SQLite — handles 3M rows fine with proper indexes)
  • Frontend: Next.js 14 + Leaflet.js + Tailwind CSS
  • Deployment: Docker + zero-downtime atomic alias swap (no Kubernetes needed at this scale)
  • Data pipeline: Python ETL from SSP/RS ZIP files, ISP/RJ CSV, SEJUSP/MG CSV

The hardest part: bairro name normalization

The same neighborhood appears as:

  • CENTRO HISTÓRICO
  • CENTRO HISTORICO
  • centro historico
  • CENTRO-HISTÓRICO
  • CTR HIST

Across police records, census data, and map boundaries.

We built a phonetic normalization pipeline using custom Brazilian Portuguese rules — strips accents, normalizes common abbreviations (RUA → R, AVENIDA → AV), handles compound names. This alone took about 3 weeks of iteration against real data.

Why SQLite, not PostgreSQL

Conventional wisdom says "use Postgres for anything serious." But SQLite with covering indexes on (state, year_month, tipo_enquadramento) serves our heatmap queries in <100ms at 3M rows. The database is ~800MB. It fits on any $5 VPS. Zero connection pooling headaches. Zero migrations for most schema changes.

We do have one performance rule: always use BETWEEN for year filters, never LIKE. SQLite uses the covering index for BETWEEN '{ano}-01' AND '{ano}-12' but not for LIKE '{ano}-%'.

Open data

All datasets are published:

What's next

  • São Paulo coverage (aggregated municipal-level data already being processed)
  • API authentication for high-volume consumers
  • Historical comparison charts per location

The entire REST API is public: https://crimebrasil.com.br/api/heatmap/municipios returns municipality-level heatmap data, https://crimebrasil.com.br/api/location-stats returns detailed stats per location.

Happy to answer questions about the data pipeline, the normalization approach, or the zero-downtime deploy architecture.


Crime Brasil: https://crimebrasil.com.br — open crime data for Brazil

Top comments (0)