Bigdata Simulation Library Of Dream By Sandman Books

Related tags

Data AnalysisSADMAN
Overview

BIGDATA SIMULATION LIBRARY OF DREAM BY SANDMAN BOOKS

=================

Solution Architecture

delta

Description


In the realm of Dreaming, its ruler SANDMAN, DREAM has a certain hobby; books. In his castle there is a Library in which they are kept, among other things, stories conceived by their authors but never written in our reality; Lucien, the person responsible for his organization, needs some help. Many people dream of published books, sales markets, stories that, in their reality, they would never imagine conceiving. And this voluminous data needs to be worked on. In order not to get lost in the information, Lucien receives all his dreams in a Non-Relational bank, MONGO. And he needs this to be organized in a relational way, that is, each author in his proper place. For that he pulled our dream and saw this Architecture where data arrives in MONGO undergo a transformation process in the STAGIN area and are populated in MYSQL. In its population, we split two final tables. One in its raw state, for complete queries, and another with metrics that informs the number of dreamers, their books and the total number of files. In this way, data is more organized, undergoing deduplication and consolidation processes.

Glossary of Data


Fields Type Description
_id long undescore ID
kind string type of book or text file
title string title of book
subtitle string subtitle of book
author array one or more authors who can dream of stories
publisher string publisher or not dreamed of by the author
publishedDate string year of published
edition string which edition does the book belong to
sample string sample of books
type string ISBN code
identifier string isbn identification number
pageCount integer number of pages
capCount integer number of chapters
wordCount integer number of words
categories string literary genre
original_price double original price
current_prefix string country currency prefix
current_sufix string country currency name
barcode string barcode
dreaming_date string the day you had the dream

image


delta

Start the Project


To run the project, you need to install the dependencies located in the "dependencies" folder and in the root of the project, run the shell_script "run_script.sh".

Sample of Payload MONGO


mongo

{
        "_id" : ObjectId("61b1fe6944dd42158674af31"),
        "kind" : "books#volume",
        "volumeInfo" : {
                "title" : "STORE VISIT",
                "Subtitle" : "GLASS IT HAIR MEMBER KEY ALMOST QUALITY. MARKET ALREADY AIR STILL ARTICLE. DECADE DECADE MEASURE PRESENT HUMAN MORNING. BIG BLOOD ECONOMIC FRONT SUCCESS AGO THEM. EVERY SON TROUBLE SIMPLE.",
                "author" : [
                        "PETER RODRIGUEZ",
                        "KELLY TORRES"
                ]
        },
        "publisher" : "FALL AWAY ABOUT INDEPENDENT",
        "publishedDate" : "1994",
        "edition" : "7º EDITION",
        "sample" : "...onto sport room audience. page dinner hundred. week statement should watch she even ball.\nour able tv break defense seek baby. employee last around music produce reach tv..",
        "industryIdentifiers" : [
                {
                        "type" : "ISBN_10",
                        "identifier" : "1-55027-208-X"
                },
                {
                        "type" : "ISBN_10",
                        "identifier" : "0-405-30324-6"
                }
        ],
        "pageCount" : 796,
        "wordCount" : 83331,
        "capCount" : 14,
        "categories" : [
                "NOVEL"
        ],
        "saleInfo" : {
                "original_price" : 78,
                "current_prefix" : "LAK",
                "current_sufix" : "Lao kip",
                "barcode" : "6747254889534"
        }
}

Sample of Payload in MYSQL


library

_id  |kind        |title                                                           |subtitle                                                                                                                                                                                                                                                       |author                                       |publisher                               |publishedDate|edition   |sample                                                                                                                                                                                                     |type   |identifier       |pageCount|wordCount|capCount|categories               |original_price|current_prefix|current_sufix              |barcode      |dreaming_date|
-----+------------+----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+----------------------------------------+-------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+-----------------+---------+---------+--------+-------------------------+--------------+--------------+---------------------------+-------------+-------------+
  670|csv#volume  |GOOD DETERMINE OF                                               |DON'T HAVE                                                                                                                                                                                                                                                     |KAREN ODOM                                   |DARK WAR INDEPENDENT                    |1982         |9º EDITION|...involve star apply later including truth. next while nor worry staff economic.¶condition region write college. return half offer. popular could direction above fish..                                  |ISBN_10|978-1-4340-7508-6|      479|    64333|      11|EPISTOLARY NOVEL         |          97.0|BHD           |Bahraini dinar             |3894426059691|     20211209|

resume

metric          |value|
----------------+-----+
Total of Dreamns|71395|
Total of Books  |59154|
Total of Data   |78000|
Owner
Maycon Cypriano
DATA ENGINEER | DATA SCIENCE | DATA PYTHON | DATA DRIVEN |
Maycon Cypriano
This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

Denny Imanuel 1 Dec 29, 2021
An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

largeVis This is an implementation of the largeVis algorithm described in (https://arxiv.org/abs/1602.00370). It also incorporates: A very fast algori

336 May 25, 2022
MotorcycleParts DataAnalysis python

We work with the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.

NASEEM A P 1 Jan 12, 2022
PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Python asymptotic Partial Directed Coherence and Directed Coherence estimation package for brain connectivity analysis. Free software: MIT license Doc

Heitor Baldo 3 Nov 26, 2022
Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Sensitivity Analysis Library (SALib) Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the

SALib 663 Jan 05, 2023
Stitch together Nanopore tiled amplicon data without polishing a reference

Stitch together Nanopore tiled amplicon data using a reference guided approach Tiled amplicon data, like those produced from primers designed with pri

Amanda Warr 14 Aug 30, 2022
INF42 - Topological Data Analysis

TDA INF421(Conception et analyse d'algorithmes) Projet : Topological Data Analysis SphereMin Etant donné un nuage des points, ce programme contient de

2 Jan 07, 2022
Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

Tirthajyoti Sarkar 249 Jan 08, 2023
Meltano: ELT for the DataOps era. Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.

Meltano is open source, self-hosted, CLI-first, debuggable, and extensible. Pipelines are code, ready to be version c

Meltano 625 Jan 02, 2023
A library to create multi-page Streamlit applications with ease.

A library to create multi-page Streamlit applications with ease.

Jackson Storm 107 Jan 04, 2023
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Fastlane An ETL framework for building pipelines, and Flask based web API/UI for monitoring pipelines. Project structure fastlane |- fastlane: (ETL fr

Dan Katz 2 Jan 06, 2022
Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021
This is a repo documenting the best practices in PySpark.

Spark-Syntax This is a public repo documenting all of the "best practices" of writing PySpark code from what I have learnt from working with PySpark f

Eric Xiao 447 Dec 25, 2022
Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

14 Jun 22, 2022
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022
Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

tldextract Python Module tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and s

John Kurkowski 1.6k Jan 03, 2023
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

PandasVault ⁠— Advanced Pandas Functions and Code Snippets The only Pandas utility package you would ever need. It has no exotic external dependencies

Derek Snow 374 Jan 07, 2023
Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Débora Mendes de Azevedo 1 Feb 03, 2022
track your GitHub statistics

GitHub-Stalker track your github statistics 👀 features find new followers or unfollowers find who got a star on your project or remove stars find who

Bahadır Araz 34 Nov 18, 2022