DEV Community

Cover image for Demystifying CPF and CNPJ Check Digit Algorithms: A Clear and Concise Approach
Leandro Lima
Leandro Lima

Posted on

Demystifying CPF and CNPJ Check Digit Algorithms: A Clear and Concise Approach

I vividly rememeber my first encounter with the CPF (Brazillian ID) validation algorihm during my underagraduated studies. While applying for an internship at the Institute of Exact Sciences at UFMG, the Federal University of Minas Gerais, we were asked to write a Java code, by hand, to validate CPF check digits after a brief explanation of the algorithm.

Since then, I've come across this problem several times in different professional contexts, often resorting to copying solutions from the internet and adding some unit tests. However, each time, I'm struck by the recurring issues in these solutions. They tend to be more rooted in an imperative paradigm than the exptected object-oriented approach for Java codes. But, what bothers me even more, the high cognitive load these implementations impose that turns impratical to read, and understand the code's intent.

An interested developer who hasn't yet needed to implement this code can easily find solutions in any programming language. However, they all tend to be presented the same way: a naive replication of the explanation for how the CPF check digits are implemented. It seems that few people take the time to understand the reasoning behind this approach.

The collision problem

In software development, the concept of collision avoidance is often encountered in hash code algorithms, particularly with the use of prime number modulus. The check digits in CPF (Brazillian ID) and CNPJ (Brazillian company ID) function similarly, fucusing on avoiding collisions. This ensures that a simple summation of digits doesn't mistakenly validate incorrect entries, as multiple combinations can produce the same sum.

To mitigate this, a common practice is to apply weighted sums, miltiplying each digit by a specific factor. You can think of this as spreading the digits out along a line; the multiplication makes it less likely for multiple digits land in the same position. It makes sense, then, that the digit's position in the number determines its weight.

To further enhance reliability and minimize the risk of collisions, the sum is taken modulo 11, and this result is subtracted from the same prime number. To ensure that the check digit remains a single digit, results of 10 and 11 are converted to 0.

The cognitive load

The algorithm used to calculate the check digits for CPF and CNPJ can be difficult to understand. While the overall motivation behind the algorithm might be clear, it's ofen challenging to grasp the specific role of each part. This complexit arises partly because the caculation involves a series of mathematical computations that are often lumped together in a single, large method. Additionally, the weights, tipically presented as an inexplicable array, can appear illogical.

To combat this, I focus on reducing the amount of code that laks self-explanation. By adhering to the Single Responsability Principal (the "S" in SOLID), I strive to create simpler, more understandlable methods. I also make an effor to define key concepts through meaningful variable names, aiming to establish a ubiquitous language within the codebase. With this approach, I sought to identify what differentiates the method used for CPF check digits from that used for CNPJ, as software that requires one often needs the other. The core functionality of the code is showed below, also, for a further view, including the complete code and associated unit tests, please visit my GitHub repository.

  private String getCheckDigits(String document, int maxWeight) {
    final int lengthWithoutCheckDigits = getBaseDigitsLength(document);

    int firstWeightedSum = 0;
    int secondWeightedSum = 0;
    for (int i = 0; i < lengthWithoutCheckDigits; i++) {
      final int digit = Character.getNumericValue(document.charAt(i));
      final int maxIndex = lengthWithoutCheckDigits - 1;
      final int reverseIndex = maxIndex - i;
      firstWeightedSum += digit * calculateWeight(reverseIndex, maxWeight);
      // Index is incremented, starting from 3, skipping first check digit.
      // The first part will be added later as the calculated first check digit times its corresponding weight.
      secondWeightedSum += digit * calculateWeight(reverseIndex + 1, maxWeight);
    }

    final int firstDigit = getCheckDigit(firstWeightedSum);
    // Add the first part as the first check digit times the first weight.
    secondWeightedSum += MIN_WEIGHT * firstDigit;
    final int secondDigit = getCheckDigit(secondWeightedSum);

    return String.valueOf(firstDigit) + secondDigit;
  }

  private int calculateWeight(int complementaryIndex, int maxWeight) {
    return complementaryIndex % (maxWeight - 1) + MIN_WEIGHT;
  }

  private int getCheckDigit(int weightedSum) {
    final var checkDigit = enhanceCollisionAvoidance(weightedSum);
    return checkDigit > 9 ? 0 : checkDigit;
  }

  private int enhanceCollisionAvoidance(int weightedSum) {
    final var weightSumLimit = 11;
    return weightSumLimit - weightedSum % weightSumLimit;
  }
Enter fullscreen mode Exit fullscreen mode

Compare the result that calculates check digits for both CNPJ and CPF with the tipical solution found in Internet:

public class ValidaCNPJ {

  public static boolean isCNPJ(String CNPJ) {
// considera-se erro CNPJ's formados por uma sequencia de numeros iguais
    if (CNPJ.equals("00000000000000") || CNPJ.equals("11111111111111") ||
        CNPJ.equals("22222222222222") || CNPJ.equals("33333333333333") ||
        CNPJ.equals("44444444444444") || CNPJ.equals("55555555555555") ||
        CNPJ.equals("66666666666666") || CNPJ.equals("77777777777777") ||
        CNPJ.equals("88888888888888") || CNPJ.equals("99999999999999") ||
       (CNPJ.length() != 14))
       return(false);

    char dig13, dig14;
    int sm, i, r, num, peso;

// "try" - protege o código para eventuais erros de conversao de tipo (int)
    try {
// Calculo do 1o. Digito Verificador
      sm = 0;
      peso = 2;
      for (i=11; i>=0; i--) {
// converte o i-ésimo caractere do CNPJ em um número:
// por exemplo, transforma o caractere '0' no inteiro 0
// (48 eh a posição de '0' na tabela ASCII)
        num = (int)(CNPJ.charAt(i) - 48);
        sm = sm + (num * peso);
        peso = peso + 1;
        if (peso == 10)
           peso = 2;
      }

      r = sm % 11;
      if ((r == 0) || (r == 1))
         dig13 = '0';
      else dig13 = (char)((11-r) + 48);

// Calculo do 2o. Digito Verificador
      sm = 0;
      peso = 2;
      for (i=12; i>=0; i--) {
        num = (int)(CNPJ.charAt(i)- 48);
        sm = sm + (num * peso);
        peso = peso + 1;
        if (peso == 10)
           peso = 2;
      }

      r = sm % 11;
      if ((r == 0) || (r == 1))
         dig14 = '0';
      else dig14 = (char)((11-r) + 48);

// Verifica se os dígitos calculados conferem com os dígitos informados.
      if ((dig13 == CNPJ.charAt(12)) && (dig14 == CNPJ.charAt(13)))
         return(true);
      else return(false);
    } catch (InputMismatchException erro) {
        return(false);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This piece of code is just for CNPJ!

Conclusion

While the result code may appear somewhat verbose, my emphasis on clarity and self-explanation led to an outcome that I am satisfied with. The code is designed to be more intuitive, offering greater confidence in its correctness and, also, most of the core functionality is visible without scrolling down the page.

I welcome any suggestions for further improvement, so please feel free to share your feedback.

Top comments (0)