A proof-of-concept jupyter extension which converts english queries into relevant python code

Overview

Text2Code for Jupyter notebook

A proof-of-concept jupyter extension which converts english queries into relevant python code.

Blog post with more details:

Data analysis made easy: Text2Code for Jupyter notebook

Demo Video:

Text2Code for Jupyter notebook

Supported Operating Systems:

  • Ubuntu
  • macOS

Installation

NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version.

pip uninstall mopp

CPU-only install:

For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.

export JUPYTER_TEXT2CODE_MODE="cpu"

GPU install dependencies:

sudo apt-get install libopenblas-dev libomp-dev

Installation commands:

git clone https://github.com/deepklarity/jupyter-text2code.git
cd jupyter-text2code
pip install .
jupyter nbextension enable jupyter-text2code/main

Uninstallation:

pip uninstall jupyter-text2code

Usage Instructions:

  • Start Jupyter notebook server by running the following command: jupyter notebook
  • If you don't see Nbextensions tab in Jupyter notebook run the following command:jupyter contrib nbextension install --user
  • You can open the sample notebooks/ctds.ipynb notebook for testing
  • If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from tensorflow_hub
  • Click on the Terminal Icon which appears on the menu (to activate the extension)
  • Type "help" to see a list of currently supported commands in the repo
  • Watch Demo video for some examples

Docker containers for jupyter-text2code

We have published CPU and GPU images to docker hub with all dependencies pre-installed.

Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.
CPU image size: 1.51 GB
GPU image size: 2.56 GB

Model training:

Generate training data:

From a list of templates present at jupyter_text2code/jupyter_text2code_serverextension/data/ner_templates.csv, generate training data by running the following command:

cd scripts && python generate_training_data.py

This command will generate data for intent matching and NER(Named Entity Recognition).

Create intent index faiss

Use the generated data to create a intent-matcher using faiss.

cd scripts && python create_intent_index.py

Train NER model

cd scripts && python train_spacy_ner.py

Steps to add more intents:

  • Add more templates in ner_templates with a new intent_id
  • Generate training data. Modify generate_training_data.py if different generation techniques are needed or if introducing a new entity.
  • Train intent index
  • Train NER model
  • modify jupyter_text2code/jupyter_text2code_serverextension/__init__.py with new intent's condition and add actual code for the intent
  • Reinstall plugin by running: pip install .

TODO:

  • Publish Docker image
  • Refactor code and make it mode modular, remove duplicate code, etc
  • Add support for Windows
  • Add support for more commands
  • Improve intent detection and NER
  • Explore sentence Paraphrasing to generate higher-quality training data
  • Gather real-world variable names, library names as opposed to randomly generating them
  • Try NER with a transformer-based model
  • With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline
  • Create a survey to collect linguistic data
  • Add Speech2Code support

Authored By:

Owner
DeepKlarity
DeepKlarity
Cloud Optimized GeoTIFF creation and validation plugin for rasterio

rio-cogeo Cloud Optimized GeoTIFF (COG) creation and validation plugin for Rasterio. Documentation: https://cogeotiff.github.io/rio-cogeo/ Source Code

216 Dec 31, 2022
Creates 3D geometries from 2D vector graphics, for use in geodynamic models

geomIO - creating 3D geometries from 2D input This is the Julia and Python version of geomIO, a free open source software to generate 3D volumes and s

3 Feb 01, 2022
A modern, geometric typeface by @chrismsimpson (last commit @ 85fa625 Jun 9, 2020 before deletion)

Metropolis A modern, geometric typeface. Influenced by other popular geometric, minimalist sans-serif typefaces of the new millenium. Designed for opt

Darius 183 Dec 25, 2022
Constraint-based geometry sketcher for blender

Geometry Sketcher Constraint-based sketcher addon for Blender that allows to create precise 2d shapes by defining a set of geometric constraints like

1.7k Jan 02, 2023
Fiona reads and writes geographic data files

Fiona Fiona reads and writes geographic data files and thereby helps Python programmers integrate geographic information systems with other computer s

987 Jan 04, 2023
Build, deploy and extract satellite public constellations with one command line.

SatExtractor Build, deploy and extract satellite public constellations with one command line. Table of Contents About The Project Getting Started Stru

Frontier Development Lab 70 Nov 18, 2022
Expose a GDAL file as a HTTP accessible on-the-fly COG

cogserver Expose any GDAL recognized raster file as a HTTP accessible on-the-fly COG (Cloud Optimized GeoTIFF) The on-the-fly COG file is not material

Even Rouault 73 Aug 04, 2022
Google maps for Jupyter notebooks

gmaps gmaps is a plugin for including interactive Google maps in the IPython Notebook. Let's plot a heatmap of taxi pickups in San Francisco: import g

Pascal Bugnion 747 Dec 19, 2022
Python project to generate Kerala's distrcit level panchayath map.

Kerala-Panchayath-Maps Python project to generate Kerala's distrcit level panchayath map. As of now, geojson files of Kollam and Kozhikode are added t

Athul R T 2 Jan 10, 2022
Asynchronous Client for the worlds fastest in-memory geo-database Tile38

This is an asynchonous Python client for Tile38 that allows for fast and easy interaction with the worlds fastest in-memory geodatabase Tile38.

Ben 53 Dec 29, 2022
Processing and interpolating spatial data with a twist of machine learning

Documentation | Documentation (dev version) | Contact | Part of the Fatiando a Terra project About Verde is a Python library for processing spatial da

Fatiando a Terra 468 Dec 20, 2022
a Geolocator made in python

Geolocator A Geolocator made in python ✨ Features locates ur location using ur ip thats it! 💁‍♀️ How to use first download the locator.py file instal

Portgas D Ace 1 Oct 27, 2021
When traveling in the backcountry during winter time, updating yourself on current and recent weather data is important to understand likely avalanche danger.

Weather Data When traveling in the backcountry during winter time, updating yourself on current and recent weather data is important to understand lik

Trevor Allen 0 Jan 02, 2022
A utility to search, download and process Landsat 8 satellite imagery

Landsat-util Landsat-util is a command line utility that makes it easy to search, download, and process Landsat imagery. Docs For full documentation v

Development Seed 681 Dec 07, 2022
Raster processing benchmarks for Python and R packages

Raster processing benchmarks This repository contains a collection of raster processing benchmarks for Python and R packages. The tests cover the most

Krzysztof Dyba 13 Oct 24, 2022
glTF to 3d Tiles Converter. Convert glTF model to Glb, b3dm or 3d tiles format.

gltf-to-3d-tiles glTF to 3d Tiles Converter. Convert glTF model to Glb, b3dm or 3d tiles format. Usage λ python main.py --help Usage: main.py [OPTION

58 Dec 27, 2022
A compilation of several single-beam bathymetry surveys of the Caribbean

Caribbean - Single-beam bathymetry This dataset is a compilation of several single-beam bathymetry surveys of the Caribbean ocean displaying a wide ra

Fatiando a Terra Datasets 0 Jan 20, 2022
Ingest and query genomic intervals from multiple BED files

Ingest and query genomic intervals from multiple BED files.

4 May 29, 2021
Search and download Copernicus Sentinel satellite images

sentinelsat Sentinelsat makes searching, downloading and retrieving the metadata of Sentinel satellite images from the Copernicus Open Access Hub easy

837 Dec 28, 2022
gpdvega is a bridge between GeoPandas and Altair that allows to seamlessly chart geospatial data

gpdvega gpdvega is a bridge between GeoPandas a geospatial extension of Pandas and the declarative statistical visualization library Altair, which all

Ilia Timofeev 49 Jul 25, 2022