Proyecto - Análisis de texto de eventos históricos

Overview

Acceder al código desde Google Colab para poder ver de manera adecuada todas las visualizaciones y poder interactuar con ellas.

Link de acceso: https://colab.research.google.com/drive/1XqDm6szrNG8ZdH37EVITPCSw7BDZFFQ5?usp=sharing

Corto video explicativo: https://youtu.be/ZDPXc56jOj4

Proyecto Big Data - Análisis de texto de eventos históricos

Declaración del conjunto de datos

Contamos con un dataset en formato JSON proveniente del repositorio 'awesome-json-datasets' en la sección 'Historical Events' sobre eventos históricos (disponible en: https://github.com/jdorfman/awesome-json-datasets). Este dataset cuenta con información desde el año 299 A.C. hasta el año 2013. Recopila sucesos importantes en el mundo a lo largo de este periodo señalado.

La estrucutra de cada recopilación es la siguiente:

{
    "date": "fecha del acontecimiento",
    "description": "descripción del evento en cuestión",
    "lang": "lenguaje de la descripción",
    "category1": "catergoría interna del dataset",
    "granularity": "granularidad"
}

Como se puede ver, no cuenta con una estructura compleja, y sus campos más importantes son 'date' que nos indica la fecha del suceso y 'description' donde se encuentran todos los detalles del evento. Este dataset cuenta con 20.330 registros diferentes.

Planteamiento de la problemática y diseño de la solución (tecnologías a implementar)

Se plantea realizar un análisis descriptivo de esta información a nivel de país, agrupando sus eventos históricos y ver qué palabras son recurrentes en estos eventos. Así nos podemos dar una rápida percepción de la historia de un país en concreto. También se plantea analizar palabras clave en los eventos históricos como lo son 'guerra', 'atentados', 'ataque', 'muertos', 'descubrimiento', 'invención' y ver que tan concurrentes son a lo largo de la historia.

Para esta labor, nos apoyaremos de la herramienta MongoDB en su entorno de Python Pymongo. Este sistema de base de datos NoSQL nos ayudará a manejar adecuadamente el formato de este dataset (JSON) y más importante aún con el tratamiento de textos. Para esto último nos apoyaremos en dos funcionalidades de MongoDB: En el uso de expresiones regulares para busqueda en campos de texto y en las operaciones Map-Reduce. Junto con MongoDB, nos apoyaremos en las librerías propias de analítica de datos de Python. Con esto se pretenderá alcanzar los objetivos de este proyecto.

Hitchhikers-guide - The Hitchhiker's Guide to Data Science for Social Good

Welcome to the Hitchhiker's Guide to Data Science for Social Good. What is the Data Science for Social Good Fellowship? The Data Science for Social Go

Data Science for Social Good 907 Jan 01, 2023
A program to generate random numbers b/w 0 to 10 using time

random-num-using-time A program to generate random numbers b/w 0 to 10 using time it uses python's in-built module datetime and an equation which retu

Atul Kushwaha 1 Oct 01, 2022
Functions to analyze Cell-ID single-cell cytometry data using python language.

PyCellID (building...) Functions to analyze Cell-ID single-cell cytometry data using python language. Dependecies for this project. attrs(=21.1.0) fo

0 Dec 22, 2021
Beatsaber for Python

beatsaber Beatsaber for Python It was automatically generated with mkpylib. If you're reading this message, it m

Shawn Presser 3 Jul 30, 2021
This is the Code Institute student template for Gitpod.

Welcome AnaG0307, This is the Code Institute student template for Gitpod. We have preinstalled all of the tools you need to get started. It's perfectl

0 Feb 02, 2022
A general illumination correction method for optical microscopy.

CIDRE About CIDRE is a retrospective illumination correction method for optical microscopy. It is designed to correct collections of images by buildin

Kevin Smith 31 Sep 07, 2022
A one place destination to check whatever is trending on the top social and news websites at present.

UpTrend A one place destination to check whatever is trending on the top social and news websites at present. Explore the docs » View Demo · Report Bu

Google Developer Student Clubs - JGEC 10 Oct 03, 2021
A Powerful Tool For Making Combo List(All possible modes)

ComboMaker A Powerful Tool For Making Combo List Introduction Check out all possible Combo list build modes with this tool =) How to Install Open the

MasterBurnt 7 Jan 07, 2023
A minimalist personal blogging system that natively supports Markdown, LaTeX, and code highlighting.

December Welcome to the December blogging system's code repository! Introduction December is a minimalist personal blogging system that natively suppo

TriNitroTofu 10 Dec 05, 2022
An app to automatically take attendance by scanning students' bar coded ID card as they enter the classroom.

Auto Classroom Attendance This application may be run on a PC to automatically scan students' ID card using a generic bar code scanner and output the

1 Nov 10, 2021
This Curve Editor, written by Jehee Lee in 2015

Splines Abstract This Curve Editor, written by Jehee Lee in 2015, is a freeware. You can use, modify, redistribute the code without restriction. This

Movement Research Lab 8 Mar 11, 2022
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Olga Tsiouri 1 Jan 23, 2022
An interactive course to git

OperatorEquals' Sandbox Git Course! Preface This Git course is an ongoing project containing use cases that I've met (and still meet) while working in

John Torakis 62 Sep 19, 2022
A project to work with databases in 4 worksheets, insert, update, select, delete using Python and MySqI

A project to work with databases in 4 worksheets, insert, update, select, delete using Python and MySqI As a small project for school or college hope it is useful

Sina Org 1 Jan 11, 2022
Python project that aims to discover CDP neighbors and map their Layer-2 topology within a shareable medium like Visio or Draw.io.

Python project that aims to discover CDP neighbors and map their Layer-2 topology within a shareable medium like Visio or Draw.io.

3 Feb 11, 2022
A toolkit for developing and deploying serverless Python code in AWS Lambda.

Python-lambda is a toolset for developing and deploying serverless Python code in AWS Lambda. A call for contributors With python-lambda and pytube bo

Nick Ficano 1.4k Jan 03, 2023
A python script to get your activity

activities A python script to get your activity Not complete Requirements Python (=3.7) Pip (for python = 3.7) Git Pip packages psutil asyncio aioht

StarNumber 3 Nov 07, 2021
Export transactions for an algorand wallet to a CSV file

algorand_txn_csv_exporter - (Algorand transaction CSV exporter) This script will export transactions for an algorand wallet to a CSV file. It is inten

TeneoPython01 5 Jun 19, 2022
Dungeon Dice Rolls is an aplication that the user can roll dices (d4, d6, d8, d10, d12, d20 and d100) and store the results in one of the 6 arrays.

Dungeon Dice Rolls is an aplication that the user can roll dices (d4, d6, d8, d10, d12, d20 and d100) and store the results in one of the 6 arrays.

Bracero 1 Dec 31, 2021
Simple but maybe too simple config management through python data classes. We use it for machine learning.

👩‍✈️ Coqpit Simple, light-weight and no dependency config handling through python data classes with to/from JSON serialization/deserialization. Curre

coqui 67 Nov 29, 2022