Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL.

Overview

cnpj-mysql

Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL.

Dados públicos de cnpj no site da Receita:

A partir de 2021 os dados da Receita Federal estão disponíveis no link https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/cadastros/consultas/dados-publicos-cnpj (ou http://200.152.38.155/CNPJ/) em formato csv zipado.

Pré-requisitos:

Python 3.8;
Bibliotecas pandas, dask, sqlalchemy e pymysql.

Utilizando o script:

Baixe todos os arquivos zipados do site da Receita e salve na pasta "dados-publicos-zip".
O download no site da Receita é lento, pode demorar várias horas. Sugiro utilizar um gerenciador de downloads.

Crie uma pasta com o nome "dados-publicos".

No servidor MYSQL, crie um database, por exemplo, cnpj.
Especifique os parâmetros no começo do script:
dbname = 'cnpj'
username = 'root'
password = ''
host = '127.0.0.1'

Para iniciar esse script, em um console DOS digite
python dados_cnpj_para_mysql.py

A execução durou cerca de 5hs em um notebook i7 de 8a geração.

Outras referências:

Para trabalhar com os dados de cnpj no formato SQLITE, use o meu projeto (https://github.com/rictom/cnpj-sqlite).
O projeto (https://github.com/rictom/rede-cnpj) utiliza os dados públicos de CNPJ para visualização de relacionamentos entre empresas e sócios.

Histórico de versões

versão 0.1 (novembro/2021)

  • primeira versão
Sample code to extract data directly from the NetApp AIQUM MySQL Database

This sample code shows how to connect to the AIQUM Database and pull user quota details from it. AIQUM Requirements: 1. AIQUM 9.7 or higher. 2. An

1 Nov 08, 2021
Makes it easier to write raw SQL in Python.

CoolSQL Makes it easier to write raw SQL in Python. Usage Quick Start from coolsql import Field name = Field("name") age = Field("age") condition =

Aber 7 Aug 21, 2022
GINO Is Not ORM - a Python asyncio ORM on SQLAlchemy core.

GINO - GINO Is Not ORM - is a lightweight asynchronous ORM built on top of SQLAlchemy core for Python asyncio. GINO 1.0 supports only PostgreSQL with

GINO Community 2.5k Dec 29, 2022
Confluent's Kafka Python Client

Confluent's Python Client for Apache KafkaTM confluent-kafka-python provides a high-level Producer, Consumer and AdminClient compatible with all Apach

Confluent Inc. 3.1k Jan 05, 2023
Dlsite-doujin-renamer - Dlsite doujin renamer tool with python

dlsite-doujin-renamer Features 支持深度查找带有 RJ 号的文件夹 支持手动选择文件夹或拖拽文件夹到软件窗口 支持在 config

111 Jan 02, 2023
Py2neo is a client library and toolkit for working with Neo4j from within Python

Py2neo Py2neo is a client library and toolkit for working with Neo4j from within Python applications. The library supports both Bolt and HTTP and prov

py2neo.org 1.2k Jan 02, 2023
dask-sql is a distributed SQL query engine in python using Dask

dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily

Nils Braun 271 Dec 30, 2022
Baserow is an open source no-code database tool and Airtable alternative

Baserow is an open source no-code database tool and Airtable alternative

1.3k Jan 01, 2023
A collection of awesome sqlite tools, scripts, books, etc

Awesome Series @ Planet Open Data World (Countries, Cities, Codes, ...) • Football (Clubs, Players, Stadiums, ...) • SQLite (Tools, Books, Schemas, ..

Planet Open Data 205 Dec 16, 2022
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d

Zdenek Svoboda 47 Dec 07, 2022
Python Wrapper For sqlite3 and aiosqlite

Python Wrapper For sqlite3 and aiosqlite

6 May 30, 2022
Official Python low-level client for Elasticsearch

Python Elasticsearch Client Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Py

elastic 3.8k Jan 01, 2023
A wrapper for SQLite and MySQL, Most of the queries wrapped into commands for ease.

Before you proceed, make sure you know Some real SQL, before looking at the code, otherwise you probably won't understand anything. Installation pip i

Refined 4 Jul 30, 2022
Async database support for Python. 🗄

Databases Databases gives you simple asyncio support for a range of databases. It allows you to make queries using the powerful SQLAlchemy Core expres

Encode 3.2k Dec 30, 2022
MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

PyMySQL 2.2k Dec 25, 2022
Simple Python demo app that connects to an Oracle DB.

Cloud Foundry Sample Python Application Connecting to Oracle Simple Python demo app that connects to an Oracle DB. The app is based on the example pro

Daniel Buchko 1 Jan 10, 2022
High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

elastic 3.6k Jan 03, 2023
SQL for Humans™

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Ken Reitz 6.9k Jan 03, 2023
Records is a very simple, but powerful, library for making raw SQL queries to most relational databases.

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Kenneth Reitz 6.9k Jan 03, 2023
Python client for Apache Kafka

Kafka Python client Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the offici

Dana Powers 5.1k Jan 08, 2023