Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
Xbps-install wrapper written in Python that doesn't care about case sensitiveness and package versions

xbi Xbps-install wrapper written in Python that doesn't care about case sensitiveness and package versions. Description This Python script can be easi

Emanuele Sabato 5 Apr 11, 2022
LSO, also known as Linux Swap Operator, is a software with both GUI and terminal versions that you can manage the Swap area for Linux operating systems.

LSO - Linux Swap Operator Türkçe - LSO Nedir? LSO, diğer adıyla Linux Swap Operator Linux işletim sistemleri için Swap alanını yönetebileceğiniz hem G

Eren İnce 4 Feb 09, 2022
Winxp_python3.6.15 - Python 3.6.15 For Windows XP SP3

This is Python version 3.6.15 Copyright (c) 2001-2021 Python Software Foundation. All rights reserved. See the end of this file for further copyright

Alex Free 13 Sep 11, 2022
synchronize projects via yaml/json manifest. built on libvcs

vcspull - synchronize your repos. built on libvcs Manage your commonly used repos from YAML / JSON manifest(s). Compare to myrepos. Great if you use t

python utilities for version control 200 Dec 20, 2022
Dev-meme - A repository that contains memes just for people like us

A repository that contains memes just for people like us. Coders are constantly

Padmashree Jha 4 Oct 31, 2022
Basic cryptography done in Python for study purposes

criptografia Criptografia básica feita em Python para fins de estudo Converte letras em numeros partindo do indice 0 e vice-versa A criptografia é fei

Carlos Eduardo 2 Dec 05, 2021
A basic python project which replicates the functionalities on an 8 Ball.

Magic-8-Ball To the people who wish to make decisions using a Magic 8 Ball but can't get one? I gotchu. This is a basic python project which replicate

3 Jun 24, 2021
E5自动续期

AutoApi v6.3 (2021-2-18) ———— E5自动续期 AutoApi系列: AutoApi(v1.0) 、 AutoApiSecret(v2.0) 、 AutoApiSR(v3.0) 、 AutoApiS(v4.0) 、 AutoApiP(v5.0) 说明 E5自动续期程序,但是

34 Feb 20, 2021
Demo of connecting Rasa with Zalo

Demo of connecting Rasa with Zalo

6 Jul 25, 2022
A Notifier Program that Notifies you to relax your eyes Every 15 Minutes👀

Every 15 Minutes is an application that is used to Notify you to Relax your eyes Every 15 Minutes, This is fully made with Python and also with the us

FSP Gang s' YT 2 Nov 11, 2021
Projeto de Jogo de dados em Python 3 onde é definido o lado a ser apostado e número de jogadas, pontuando os acertos e exibindo se ganhou ou perdeu.

Jogo de DadoX Projeto de script que simula um Jogo de dados em Python 3 onde é definido o lado a ser apostado (1, 2, 3, 4, 5 e 6) ou se vai ser um núm

Estênio Mariano 1 Jul 10, 2021
Forward RSS feeds to your email address, community maintained

Getting Started With rss2email We highly recommend that you watch the rss2email project on GitHub so you can keep up to date with the latest version,

248 Dec 28, 2022
This script can be used to get unlimited Gb for WARP.

Warp-Unlimited-GB This script can be used to get unlimited Gb for WARP. How to use Change the value of the 'referrer' to warp id of yours You can down

Anix Sam Saji 1 Feb 14, 2022
Este script añade la config de s4vitar a bspwm automaticamente!

Se ha testeado este script en ParrotOS, Kali y Ubuntu. Funciona para todos los sistemas operativos basados en Debian. Instalación git clone https://gi

yorkox 201 Dec 30, 2022
Performance monitoring and testing of OpenStack

Browbeat Browbeat is a performance tuning and analysis tool for OpenStack. Browbeat is free, Open Source software. Analyze and tune your Cloud for opt

cloud-bulldozer 83 Dec 14, 2022
Python solution of advent-of-code 2021

Advent of code 2021 Python solutions of Advent of Code 2021 written by Eric Bouteillon Requirements The solutions were developed and tested using Pyth

Eric Bouteillon 3 Oct 25, 2022
This is a batch script created to WEB-DL.

widevine-L3-WEB-DL-Script This is a batch script created to WEB-DL. Works well with .mpd files , for m3u8 please use n_m3u8 program (not included in t

Paranjay Singh 312 Dec 31, 2022
Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

What is Craxk ? Craxk is a UNIQUE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reprod

5 Jun 19, 2021
An ongoing curated list of frameworks, libraries, learning tutorials, software and resources in Python Language.

Python Development Welcome to the world of Python. An ongoing curated list of frameworks, libraries, learning tutorials, software and resources in Pyt

Paul Veillard 2 Dec 24, 2021
A python script to simplify recompiling, signing and installing reverse engineered android apps.

urszi.py A python script to simplify the Uninstall Recompile Sign Zipalign Install cycle when reverse engineering Android applications. It checks if d

Ahmed Harmouche 4 Jun 24, 2022