MongoDB utility to inflate the contents of small collection to a new larger collection

Overview

MongoDB Data Inflater ("data-inflater")

The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using data sourced from an existing smaller database collection.

By default, the utility will use the Atlas 'sample data set' database collection sample_mflix.movies as the source collection. However, most users will provide parameters to the utility to specify the use of their own database and source collection. If you do want to use the Atlas sample data set, see the sample data manual page for more information.

The data-inflater utility issues multiple concurrent aggregation processes, each copying batches of records in parallel for increased performance. The resulting collection will contain documents with duplicated data but with new unique _id field values. The variance ratio of data in the new collection will approximately reflect the variance ratio of the source collection. Therefore, you should ensure you have supplied at least a few different documents (if not a few hundred or thousand) in the source collection.

If you are running a sharded cluster, the utility will ensure the target collection is sharded with a shard key, and where it can, it will pre-split the chunks to avoid subsequent needless balancer overhead. For example, if you specify the --shardkey parameter for this utility to reference a field (e.g. product_name) as the range based shard key, before creating the target collection, the utility will introspect the spread of values for the shard key field (e.g. product_name). The utility will then create pre-split chunks in the new empty target collection before any data is copied to it, to maximise performance.

How To Run

In a running MongoDB cluster (self-managed or running in Atlas), ensure you have created and populated a source collection with at least one sample record in it (ideally more with varying values for the fields across the different documents to reflect the shape and variance you desire).

Ensure Python3 (version 3.8 or greater) and the MongoDB Python Driver (PyMongo) are already installed on your workstation. Example to install PyMongo:

pip3 install --user pymongo

Ensure the .py script is executable and then execute the following to view the utility's help instructions and the full list of parameters that you can provide:

./data-inflater.py -h

Execute the following to connect to a locally running single server database (default port) to copy and expand the data from an existing source collection, mydb.mySrcColl, to an a new collection, mydb.myDestColl, which will contain 1 million records:

./data-inflater.py --url 'mongodb://localhost:27017' -d 'mydb' -c 'mySrcColl' -t 'myDestColl' -s 1000000

Execute the following to connect to an Atlas cluster (ensure you've already loaded the Atlas sample data set), to inflate the data from the source movies collection to the new movies_big collection, which will contain 100 million records (note, first change the URL username, password and hostname shown, to match the URL of your Atlas cluster):

./data-inflater.py --url 'mongodb+srv://usr:[email protected]/'
Owner
Paul Done
Paul Done
Dynamic key remapper for Wayland Window System, especially for Sway

wayremap Dynamic keyboard remapper for Wayland. It works on both X Window Manager and Wayland, but focused on Wayland as it intercepts evdev input and

Kay Gosho 50 Nov 29, 2022
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons.

Kevin Schwarzwald 42 Nov 09, 2022
Homebase Name Changer for Fortnite: Save the World.

Homebase Name Changer This program allows you to change the Homebase name in Fortnite: Save the World. How to use it? After starting the HomebaseNameC

PRO100KatYT 7 May 21, 2022
๐ŸŒฒ A simple BST (Binary Search Tree) generator written in python

Tree-Traversals (BST) ๐ŸŒฒ A simple BST (Binary Search Tree) generator written in python Installation Use the package manager pip to install BST. Usage

Jan Kupczyk 1 Dec 12, 2021
Conveniently measures the time of your loops, contexts and functions.

Conveniently measures the time of your loops, contexts and functions.

Maciej J Mikulski 79 Nov 15, 2022
A way to write regex with objects instead of strings.

Py Idiomatic Regex (AKA iregex) Documentation Available Here An easier way to write regex in Python using OOP instead of strings. Makes the code much

Ryan Peach 18 Nov 15, 2021
Networkx with neo4j back-end

Dump networkx graph into nodes/relations TSV from neo4jnx.tsv import graph_to_tsv g = pklload('indranet_dir_graph.pkl') graph_to_tsv(g, 'docker/nodes.

Benjamin M. Gyori 1 Oct 27, 2021
Deep Difference and search of any Python object/data.

DeepDiff v 5.6.0 DeepDiff Overview DeepDiff: Deep Difference of dictionaries, iterables, strings and other objects. It will recursively look for all t

Sep Dehpour 1.6k Jan 08, 2023
Exports the local variables into a global dictionary for later debugging.

PyExfiltrator Juliaโ€™s @exfiltrate for Python; Exports the local variables into a global dictionary for later debugging. Installation pip install pyexf

6 Nov 07, 2022
๐Ÿ’‰ ์ฝ”๋กœ๋‚˜ ์ž”์—ฌ๋ฐฑ์‹  ์˜ˆ์•ฝ ๋งคํฌ๋กœ ์ปค์Šคํ…€ ๋นŒ๋“œ (์†๋„ ํ–ฅ์ƒ ๋ฒ„์ „)

Korea-Covid-19-Vaccine-Reservation ์ฝ”๋กœ๋‚˜ ์ž”์—ฌ ๋ฐฑ์‹  ์˜ˆ์•ฝ ๋งคํฌ๋กœ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ปค์Šคํ…€ ๋นŒ๋“œ์ž…๋‹ˆ๋‹ค. ๋” ๋น ๋ฅธ ๋ฐฑ์‹  ์˜ˆ์•ฝ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋ฉฐ, ์†๋„๋ฅผ ์šฐ์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž๋Š” ์ด์— ๋Œ€์ฒ˜๊ฐ€ ๊ฐ€๋Šฅํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ง€์ •ํ•œ ์ขŒํ‘œ ๋‚ด ๋Œ€๊ธฐ์ค‘์ธ ๋ณ‘์›์—์„œ ์ž”์—ฌ ๋ฐฑ์‹ 

Queue.ri 21 Aug 15, 2022
Group imports from Windows binaries

importsort This is a tool that I use to group imports from Windows binaries. Sometimes, you have a gigantic folder full of executables, and you want t

ใ€โ˜† ใ‚†ใ† โ˜† ใ€‘ 15 Aug 27, 2022
Color box that provides various colorsโ€˜ rgb decimal code.

colorbox Color box that provides various colorsโ€˜ rgb decimal code

1 Dec 07, 2021
Simple script to export contacts from telegram into vCard file

Telegram Contacts Exporter Simple script to export contacts from telegram into vCard file Getting Started Prerequisites You must to put your Telegram

Pere Antoni 1 Oct 17, 2021
SH-PUBLIC is a python based cloning script. You can clone unlimited UID facebook accounts by using this tool.

SH-PUBLIC is a python based cloning script. You can clone unlimited UID facebook accounts by using this tool. This tool works on any Android devices without root.

(Md. Tanvir Ahmed) 5 Mar 09, 2022
A utility that makes it easy to work with Python projects containing lots of packages, of which you only want to develop some.

Mixed development source packages on top of stable constraints using pip mxdev [mษชks dษ›v] is a utility that makes it easy to work with Python projects

BlueDynamics Alliance 6 Jun 08, 2022
Use generator for range function

Use the generator for the range function! installation method: pip install yrange How to use: First import yrange in your application. You can then wo

1 Oct 28, 2021
A BlackJack simulator in Python to simulate thousands or millions of hands using different strategies.

BlackJack Simulator (in Python) A BlackJack simulator to play any number of hands using different strategies The Rules To keep the code relatively sim

Hamid 4 Jun 24, 2022
'ToolBurnt' A Set Of Tools In One Place =}

'ToolBurnt' A Set Of Tools In One Place =}

MasterBurnt 5 Sep 10, 2022
Collection of code auto-generation utility scripts for the Horizon `Boot` system module

boot-scripts This is a collection of code auto-generation utility scripts for the Horizon Boot system module, intended for use in Atmosphรจre. Usage Us

4 Oct 11, 2022
A tiny Python library for generating public IDs from integers

pids Create short public identifiers based on integer IDs. Installation pip install pids Usage from pids import pid public_id = pid.from_int(1234) #

Simon Willison 7 Nov 11, 2021