<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Camila Cardoso</title>
    <description>The latest articles on DEV Community by Camila Cardoso (@camilacrdoso).</description>
    <link>https://dev.to/camilacrdoso</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F574348%2F2ade0340-befe-4d26-801a-b30307c1d9a8.jpg</url>
      <title>DEV Community: Camila Cardoso</title>
      <link>https://dev.to/camilacrdoso</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/camilacrdoso"/>
    <language>en</language>
    <item>
      <title>Obtendo dados de CNPJ's com a API Minha Receita</title>
      <dc:creator>Camila Cardoso</dc:creator>
      <pubDate>Mon, 08 Feb 2021 19:39:55 +0000</pubDate>
      <link>https://dev.to/camilacrdoso/obtendo-dados-de-cnpj-s-com-a-api-minha-receita-2hcd</link>
      <guid>https://dev.to/camilacrdoso/obtendo-dados-de-cnpj-s-com-a-api-minha-receita-2hcd</guid>
      <description>&lt;p&gt;Dados de CNPJ's podem ser uma rica fonte de informações para diversos tipos de análises. Entretanto, frequentemente só temos acesso ao número de um CNPJ, não conhecendo informações importantes como sua atividade principal, situação cadastral, sócios da empresa ou capital social, por exemplo. Felizmente, podemos obter essas e outras informações através da &lt;a href="https://docs.minhareceita.org/" rel="noopener noreferrer"&gt;API Minha Receita&lt;/a&gt;, uma iniciativa da sociedade civil que disponibiliza, de forma gratuita e acessível, dados da Receita Federal.&lt;/p&gt;

&lt;p&gt;Neste projeto, utilizaremos Python para fazer requisições à API Minha Receita e armazenar informações sobre &lt;strong&gt;+46mil CNPJ's&lt;/strong&gt;. Veremos:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Como realizar conexão com PostgreSQL para que o Python possa acessar o conteúdo de um banco de dados;&lt;/li&gt;
&lt;li&gt;Como fazer requisições para a API Minha Receita e obter informações associadas a um CNPJ segundo a Receita Federal;&lt;/li&gt;
&lt;li&gt;Como armazenar a resposta da API em um banco de dados PostgreSQL.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conexão com PostgreSQL
&lt;/h2&gt;

&lt;p&gt;Nosso objetivo é fazer requisições à API Minha Receita para obter dados de +46mil CNPJ's. Esses CNPJ's estão armazenados em uma tabela PostgreSQL, por isso, primeiro precisamos realizar a conexão com o banco de dados e fazer com que o Python tenha acesso à lista de CNPJ's.  &lt;/p&gt;

&lt;p&gt;Para fazer essa conexão, estamos usando a biblioteca &lt;a href="https://www.psycopg.org/docs/" rel="noopener noreferrer"&gt;Psycopg&lt;/a&gt;, que permite que o Python conecte-se a um banco de dados PostgreSQL. A função &lt;code&gt;sql_conection()&lt;/code&gt; realiza a conexão e será utilizada em todos os momentos em que for necessário interagir com o banco de dados.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
 python
import psycopg2
def sql_connection():
    # Realiza a conexão com o banco de dados PostgreSQL e retorna a variável que será utilizada para interagir com o banco
    database = ''
    user = ''
    password = ''
    host = ''
    port = ''
    con = psycopg2.connect(database=database,
                           user=user,
                           password=password,
                           host=host,
                           port=port)
    return con


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Nosso banco de dados contém uma tabela com vários dados, entre os quais está o CNPJ. Vamos fazer uma query para essa tabela e armazenar os CNPJ's únicos em um dataframe.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
 python
import pandas as pd
import pandas.io.sql as sqlio
def query_to_df(query):
    # Dada uma query PostgreSQL, realiza a query no banco de dados e armazena o resultado em um dataframe
    con = sql_connection()
    data_set = sqlio.read_sql_query(query, con)
    return data_set


query = 'select distinct(cnpj) from table_name'
df_cnpj = query_to_df(query)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Fazendo requisições para a API Minha Receita
&lt;/h2&gt;

&lt;p&gt;No meu caso, o código acima retorna um dataframe com 46.911 CNPJ's, mas, por enquanto, não temos nenhuma informação adicional sobre eles. Felizmente, podemos fazer requisições à API Minha Receita e obter informações da Receita Federal sobre cada um desses CNPJ's. &lt;/p&gt;

&lt;p&gt;A função &lt;code&gt;get_cnpj_data(cnpj)&lt;/code&gt; utiliza a biblioteca &lt;a href="https://requests.readthedocs.io/en/master/" rel="noopener noreferrer"&gt;Requests&lt;/a&gt; para realizar uma requisição à API, procurando por um CNPJ, e retorna os dados encontrados.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_cnpj_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cnpj&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Dado um CNPJ, faz uma requisição para a API Minha Receita. Caso a requisição seja bem sucedida, retorna o conteúdo da requisição em formato json
&lt;/span&gt;    &lt;span class="n"&gt;minha_receita_api_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://minhareceita.org/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minha_receita_api_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cnpj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Para nos familiarizarmos com a resposta da API, vamos fazer uma requisição de exemplo e utilizar a função jprint() para gerar uma visualização amigável do objeto json retornado pela API.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;jprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Cria visualização amigável de um objeto json
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sort_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ensure_ascii&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;cnpj_example&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cnpj&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;19131243000197&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;response_example&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_cnpj_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cnpj_example&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;jprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_example&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;O código acima deve retornar algo semelhante a:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

{
    "bairro": "BELA VISTA",
    "capital_social": 0,
    "cep": "01311902",
    "cnae_fiscal": 9430800,
    "cnae_fiscal_descricao": "Atividades de associações de defesa de direitos sociais",
    "cnaes_secundarias": [
        {
            "codigo": 9493600,
            "descricao": "Atividades de organizações associativas ligadas à cultura e à arte"
        },
        {
            "codigo": 9499500,
            "descricao": "Atividades associativas não especificadas anteriormente"
        },
        {
            "codigo": 8599699,
            "descricao": "Outras atividades de ensino não especificadas anteriormente"
        },
        {
            "codigo": 8230001,
            "descricao": "Serviços de organização de feiras, congressos, exposições e festas"
        },
        {
            "codigo": 6204000,
            "descricao": "Consultoria em tecnologia da informação"
        }
    ],
    "cnpj": "19131243000197",
    "codigo_municipio": 7107,
    "codigo_natureza_juridica": 3999,
    "complemento": "ANDAR 4",
    "data_exclusao_do_simples": null,
    "data_inicio_atividade": "2013-10-03",
    "data_opcao_pelo_simples": null,
    "data_situacao_cadastral": "2013-10-03",
    "data_situacao_especial": null,
    "ddd_fax": "",
    "ddd_telefone_1": "11  23851939",
    "ddd_telefone_2": "",
    "descricao_matriz_filial": "Matriz",
    "descricao_porte": "Demais",
    "descricao_situacao_cadastral": "Ativa",
    "descricao_tipo_logradouro": "AVENIDA",
    "identificador_matriz_filial": 1,
    "logradouro": "PAULISTA 37",
    "motivo_situacao_cadastral": 0,
    "municipio": "SAO PAULO",
    "nome_cidade_exterior": "",
    "nome_fantasia": "REDE PELO CONHECIMENTO LIVRE",
    "numero": "37",
    "opcao_pelo_mei": false,
    "opcao_pelo_simples": false,
    "porte": 5,
    "qsa": [
        {
            "cnpj": "19131243000197",
            "cnpj_cpf_do_socio": "",
            "codigo_qualificacao_representante_legal": 0,
            "codigo_qualificacao_socio": 16,
            "cpf_representante_legal": "",
            "data_entrada_sociedade": "2019-10-25",
            "identificador_de_socio": 2,
            "nome_representante_legal": "",
            "nome_socio": "FERNANDA CAMPAGNUCCI PEREIRA",
            "percentual_capital_social": 0
        }
    ],
    "qualificacao_do_responsavel": 16,
    "razao_social": "OPEN KNOWLEDGE BRASIL",
    "situacao_cadastral": 2,
    "situacao_especial": "",
    "uf": "SP"
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Obtendo dados de +46mil CNPJ's
&lt;/h2&gt;

&lt;p&gt;Agora que fizemos uma requisição de exemplo e estamos familiarizados com a resposta da API, podemos seguir para a parte divertida e finalmente obter os dados da Receita Federal sobre os 46.911 CNPJ's nos quais estamos interessados.  &lt;/p&gt;

&lt;p&gt;Primeiro, vamos criar uma tabela PostgreSQL para armazenar nossos dados. Entre todas as informações que retornam da API, escolhemos apenas algumas variáveis principais, que são mais interessantes para as análises futuras.  &lt;/p&gt;

&lt;p&gt;Utilizamos o método &lt;a href="https://www.psycopg.org/docs/cursor.html" rel="noopener noreferrer"&gt;connection.cursor()&lt;/a&gt;, que permite que o Python execute comandos em um banco de dados PostgreSQL. Repare que, após executar o comando, é necessário utilizar também o método &lt;a href="https://www.psycopg.org/docs/connection.html#connection.commit" rel="noopener noreferrer"&gt;connection.commit()&lt;/a&gt; para garantir que as mudanças no banco de dados sejam feitas.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_sql_table&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Cria uma tabela PostgreSQL para armazenar os dados dos CNPJ's
&lt;/span&gt;    &lt;span class="n"&gt;con&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sql_connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;create_table_command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;create table if not exists cnpj_data(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cnpj varchar(100),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;razao_social varchar(250),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nome_fantasia varchar(250),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atividade_principal_codigo varchar(50),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atividade_principal_descricao varchar(1000),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;situacao_cadastral varchar(50),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;capital_social float,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;porte varchar(50),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;codigo_natureza_juridica int,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data_abertura date,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cep varchar(50),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;municipio varchar(100),&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;uf varchar(2)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;create_table_command&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;create_sql_table&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Uma vez criada a tabela, podemos começar a alimentá-la com os dados dos CNPJ's. Para isso, vamos considerar a orientação da &lt;a href="https://www.postgresql.org/docs/current/populate.html#POPULATE-COPY-FROM" rel="noopener noreferrer"&gt;documentação do PostgreSQL&lt;/a&gt; de que a melhor forma de alimentar uma tabela é usando o comando COPY FROM. Uma demonstração sobre o melhor desempenho do COPY FROM em relação a outros métodos pode ser encontrada &lt;a href="https://hakibenita.com/fast-load-data-python-postgresql" rel="noopener noreferrer"&gt;nesta análise&lt;/a&gt;, que avalia a performance de diferentes formas de importar dados de uma fonte remota para uma tabela PostgreSQL.&lt;/p&gt;

&lt;p&gt;Assim, para alimentar nossa tabela, seguiremos o seguinte fluxo:&lt;br&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsnkj5902388jh90zgo5f.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsnkj5902388jh90zgo5f.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vamos começar definindo as funções que utilizaremos para armazenar os dados da resposta da API em um dataframe. Repare que, na função &lt;code&gt;cnpj_data_to_df()&lt;/code&gt;, caso o CNPJ não seja encontrado e a resposta da API volte com valor nulo, iremos adicionar o CNPJ ao dataframe mesmo assim, porém com valores nulos nas colunas de informações. Isso nos ajudará a ter controle sobre quais CNPJ's foram encontrados e quais não foram.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Limpa um texto, retirando quebras de linha e ponto-e-vírgulas
&lt;/span&gt;    &lt;span class="n"&gt;clean_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;full_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;clean_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;full_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;clean_text&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cnpj_data_to_df&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cnpj&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Recebe um dataframe contendo os CNPJ's e, a partir das requisições à API Minha Receita, estrutura um dataframe contendo os dados do CNPJ
&lt;/span&gt;    &lt;span class="n"&gt;data_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cnpj&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;razao_social&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nome_fantasia&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atividade_principal_codigo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atividade_principal_descricao&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;situacao_cadastral&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;capital_social&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;porte&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;codigo_natureza_juridica&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data_abertura&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cep&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;municipio&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;uf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cnpj&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df_cnpj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cpf_cnpj&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;cnpj_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_cnpj_data&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cnpj&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;new_row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cnpj&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;razao_social&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;razao_social&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nome_fantasia&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nome_fantasia&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atividade_principal_codigo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cnae_fiscal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atividade_principal_descricao&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cnae_fiscal_descricao&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;situacao_cadastral&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;descricao_situacao_cadastral&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;capital_social&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;capital_social&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;porte&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;descricao_porte&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;codigo_natureza_juridica&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;codigo_natureza_juridica&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data_abertura&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data_inicio_atividade&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cep&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cep&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;municipio&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;municipio&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;uf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;uf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;new_row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cnpj&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cnpj&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;data_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_set&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ignore_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data_set&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Agora podemos definir a função &lt;code&gt;df_to_csv()&lt;/code&gt;, que exporta o dataframe para um csv temporário, e a função &lt;code&gt;csv_to_sql_table()&lt;/code&gt;, que utiliza o COPY FROM para alimentar a tabela PostgreSQL.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;df_to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_set&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Exporta um dataframe para um arquivo csv
&lt;/span&gt;    &lt;span class="n"&gt;data_set&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quoting&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QUOTE_MINIMAL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;csv_to_sql_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Alimenta a tabela PostgreSQL com os dados dos CNPJ's
&lt;/span&gt;    &lt;span class="n"&gt;con&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sql_connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;copy cnpj_data from %s delimiter %s csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;csv_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Com todas as funções definidas, podemos agora iterar sobre o dataframe &lt;code&gt;df_cnpj&lt;/code&gt;, que contém os CNPJ's, e fazer requisições à API Minha Receita. Vamos armazenar as respostas na tabela PostgreSQL em lotes de 1000 CNPJ's para garantir que, caso ocorra um erro enquanto o código roda, não perderemos os dados das requisições já realizadas.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;csv_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\temp_cnpj_data.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;df_cnpj_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cnpj&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;new_batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cnpj_data_to_df&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cnpj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nf"&gt;df_to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;csv_to_sql_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df_cnpj_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_cnpj_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ignore_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Para finalizar, vamos checar quantos CNPJ's foram encontrados e quantos não foram.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;select * from cnpj_data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;br&gt;
&lt;span class="n"&gt;df_cnpj_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query_to_df&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class="n"&gt;empty_columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;razao_social&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nome_fantasia&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atividade_principal_codigo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atividade_principal_descricao&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;situacao_cadastral&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;capital_social&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;porte&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;codigo_natureza_juridica&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data_abertura&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cep&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;municipio&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;uf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;br&gt;
                 &lt;span class="p"&gt;]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class="n"&gt;df_found_cnpj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_cnpj_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;empty_columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;br&gt;
&lt;span class="n"&gt;percentage_not_found_cnpj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_found_cnpj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cnpj_data&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;&lt;br&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;De&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cnpj_data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CNPJs, foram encontrados&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_found_cnpj&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;br&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Não foram encontrados&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;percentage_not_found_cnpj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;% dos CNPJs buscados.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&lt;p&gt;De 46911 CNPJs, foram encontrados 46851&lt;br&gt;
Não foram encontrados 0.13 % dos CNPJs buscados.&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Conclusões&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;Neste projeto, utilizamos Python para acessar e manipular bancos de dados PostgreSQL. Conseguimos obter informações de +46mil CNPJ's, com apenas 0,13% dos dados não encontrados. Para isso, a API Minha Receita foi um recurso essencial, já que ela disponibiliza os dados da Receita Federal de forma gratuita e acessível.  &lt;/p&gt;

&lt;p&gt;Com este tutorial, espero ter contribuído para desvendar possíveis dúvidas sobre como utilizar a biblioteca Psycopg para conectar o Python a um banco de dados PostgreSQL e a biblioteca Requests para realizar requisições a uma API. &lt;/p&gt;

&lt;p&gt;Espero também que o projeto contribua para que mais pessoas conheçam e utilizem a API Minha Receita. &lt;/p&gt;

&lt;p&gt;O meu repositório no Github com o notebook do projeto está disponível &lt;a href="https://github.com/camilacrdoso/fetching-storing-cnpj-data" rel="noopener noreferrer"&gt;neste link&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>postgres</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
