MongoDB utility to inflate the contents of small collection to a new larger collection

Overview

MongoDB Data Inflater ("data-inflater")

The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using data sourced from an existing smaller database collection.

By default, the utility will use the Atlas 'sample data set' database collection sample_mflix.movies as the source collection. However, most users will provide parameters to the utility to specify the use of their own database and source collection. If you do want to use the Atlas sample data set, see the sample data manual page for more information.

The data-inflater utility issues multiple concurrent aggregation processes, each copying batches of records in parallel for increased performance. The resulting collection will contain documents with duplicated data but with new unique _id field values. The variance ratio of data in the new collection will approximately reflect the variance ratio of the source collection. Therefore, you should ensure you have supplied at least a few different documents (if not a few hundred or thousand) in the source collection.

If you are running a sharded cluster, the utility will ensure the target collection is sharded with a shard key, and where it can, it will pre-split the chunks to avoid subsequent needless balancer overhead. For example, if you specify the --shardkey parameter for this utility to reference a field (e.g. product_name) as the range based shard key, before creating the target collection, the utility will introspect the spread of values for the shard key field (e.g. product_name). The utility will then create pre-split chunks in the new empty target collection before any data is copied to it, to maximise performance.

How To Run

In a running MongoDB cluster (self-managed or running in Atlas), ensure you have created and populated a source collection with at least one sample record in it (ideally more with varying values for the fields across the different documents to reflect the shape and variance you desire).

Ensure Python3 (version 3.8 or greater) and the MongoDB Python Driver (PyMongo) are already installed on your workstation. Example to install PyMongo:

pip3 install --user pymongo

Ensure the .py script is executable and then execute the following to view the utility's help instructions and the full list of parameters that you can provide:

./data-inflater.py -h

Execute the following to connect to a locally running single server database (default port) to copy and expand the data from an existing source collection, mydb.mySrcColl, to an a new collection, mydb.myDestColl, which will contain 1 million records:

./data-inflater.py --url 'mongodb://localhost:27017' -d 'mydb' -c 'mySrcColl' -t 'myDestColl' -s 1000000

Execute the following to connect to an Atlas cluster (ensure you've already loaded the Atlas sample data set), to inflate the data from the source movies collection to the new movies_big collection, which will contain 100 million records (note, first change the URL username, password and hostname shown, to match the URL of your Atlas cluster):

./data-inflater.py --url 'mongodb+srv://usr:[email protected]/'
Owner
Paul Done
Paul Done
Finds price floor for every single attribute in a given collection

Solana Solanart Scanner Enjoy the Free Code Steps to run Download VS Code

Dalton Nisbett 19 Oct 20, 2022
Genart - Generate random art to sell as nfts

Genart - Generate random art to sell as nfts Usage git clone

Will 13 Mar 17, 2022
A simple python implementation of Decision Tree.

DecisionTree A simple python implementation of Decision Tree, using Gini index. Usage: import DecisionTree node = DecisionTree.trainDecisionTree(lab

1 Nov 12, 2021
Early version for manipulate Geo localization data trough API REST.

Backend para obtener los datos (beta) Descripción El servidor está diseñado para recibir y almacenar datos enviados en forma de JSON por una aplicació

Víctor Omar Vento Hernández 1 Nov 14, 2021
It is a tool that looks for a specific username in social networks

It is a tool that looks for a specific username in social networks

MasterBurnt 6 Oct 07, 2022
API Rate Limit Decorator

ratelimit APIs are a very common way to interact with web services. As the need to consume data grows, so does the number of API calls necessary to re

Tomas Basham 575 Jan 05, 2023
Utility to play with ADCS, allows to request tickets and collect information about related objects.

certi Utility to play with ADCS, allows to request tickets and collect information about related objects. Basically, it's the impacket copy of Certify

Eloy 185 Dec 29, 2022
A python app which aggregates and splits costs from multiple public cloud providers into a csv

Cloud Billing This project aggregates the costs public cloud resources by accounts, services and tags by importing the invoices from public cloud prov

1 Oct 04, 2022
Display your calendar on the wallpaper.

wallCal Have your calendar appear as the wallpaper. disclaimer Use at your own risk. Don't blame me if you miss a meeting :-) Some parts of the script

7 Jun 14, 2022
.bvh to .mcfunction file converter.

bvh-to-mcf .bvh file to .mcfunction converter

Hanmin Kim 28 Nov 21, 2022
cssOrganizer - organize a css file by grouping them into categories

This python project was created to scan through a CSS file and produce a more organized CSS file by grouping related CSS Properties within selectors. Created in my spare time for fun and my own utili

Andrew Espindola 0 Aug 31, 2022
A library to easily convert climbing route grades between different grading systems.

pyclimb A library to easily convert climbing route grades between different grading systems. In rock climbing, mountaineering, and other climbing disc

Ilias Antonopoulos 4 Jan 26, 2022
About Library for extract infomation from thai personal identity card.

ThaiPersonalCardExtract Library for extract infomation from thai personal identity card. imprement from easyocr and tesseract New Feature v1.3.2 🎁 In

ggafiled 26 Nov 15, 2022
Shypan, a simple, easy to use, full-featured library written in Python.

Shypan, a simple, easy to use, full-featured library written in Python.

ShypanLib 4 Dec 08, 2021
This is Cool Utility tools that you can use in python.

This is Cool Utility tools that you can use in python. There are a few tools that you might find very useful, you can use this on pretty much any project and some utils might help you a lot and save

Senarc Studios 6 Apr 18, 2022
This tool lets you perform some quick tasks for CTFs and Pentesting.

This tool lets you convert strings and numbers between number bases (2, 8, 10 and 16) as well as ASCII text. You can use the IP address analyzer to find out details on IPv4 and perform abbreviation a

Ayomide Ayodele-Soyebo 1 Jul 16, 2022
'ToolBurnt' A Set Of Tools In One Place =}

'ToolBurnt' A Set Of Tools In One Place =}

MasterBurnt 5 Sep 10, 2022
Give you a better view of your Docker registry disk usage.

registry-du Give you a better view of your Docker registry disk usage. This small tool will analysis your Docker registry(vanilla or Harbor both work)

Nova Kwok 16 Jan 07, 2023
A plugin to simplify creating multi-page Dash apps

Multi-Page Dash App Plugin A plugin to simplify creating multi-page Dash apps. This is a preview of functionality that will of Dash 2.1. Background Th

Plotly 19 Dec 09, 2022
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark - Parsing Library & Toolkit 3.5k Jan 05, 2023