A python framework to transform natural language questions to queries in a database query language.

Last update: Dec 18, 2022

Related tags

Overview

  __ _ _   _  ___ _ __  _   _
 / _` | | | |/ _ \ '_ \| | | |
| (_| | |_| |  __/ |_) | |_| |
 \__, |\__,_|\___| .__/ \__, |
    |_|          |_|    |___/

What's quepy?

Quepy is a python framework to transform natural language questions to queries in a database query language. It can be easily customized to different kinds of questions in natural language and database queries. So, with little coding you can build your own system for natural language access to your database.

Currently Quepy provides support for Sparql and MQL query languages. We plan to extended it to other database query languages.

An example

To illustrate what can you do with quepy, we included an example application to access DBpedia contents via their sparql endpoint.

You can try the example online here: Online demo

Or, you can try the example yourself by doing:

python examples/dbpedia/main.py "Who is Tom Cruise?"

And it will output something like this:

SELECT DISTINCT ?x1 WHERE {
    ?x0 rdf:type foaf:Person.
    ?x0 rdfs:label "Tom Cruise"@en.
    ?x0 rdfs:comment ?x1.
}

Thomas Cruise Mapother IV, widely known as Tom Cruise, is an...

The transformation from natural language to sparql is done by first using a special form of regular expressions:

person_name = Group(Plus(Pos("NNP")), "person_name")
regex = Lemma("who") + Lemma("be") + person_name + Question(Pos("."))

And then using and a convenient way to express semantic relations:

person = IsPerson() + HasKeyword(person_name)
definition = DefinitionOf(person)

The rest of the transformation is handled automatically by the framework to finally produce this sparql:

SELECT DISTINCT ?x1 WHERE {
    ?x0 rdf:type foaf:Person.
    ?x0 rdfs:label "Tom Cruise"@en.
    ?x0 rdfs:comment ?x1.
}

Using a very similar procedure you could generate and MQL query for the same question obtaining:

[{
    "/common/topic/description": [{}],
    "/type/object/name": "Tom Cruise",
    "/type/object/type": "/people/person"
}]

Installation

You need to have installed docopt and numpy. Other than that, you can just type:

pip install quepy

You can get more details on the installation here:

http://quepy.readthedocs.org/en/latest/installation.html

Learn more

You can find a tutorial here:

http://quepy.readthedocs.org/en/latest/tutorial.html

And the full documentation here:

http://quepy.readthedocs.org/

Join our mailing list

Contribute!

Want to help develop quepy? Welcome aboard! Find us in http://groups.google.com/group/quepy

A python framework to transform natural language questions to queries in a database query language.

Related tags

Overview

What's quepy?

An example

Installation

Learn more

Contribute!

Owner

Machinalis

NL. The natural language programming language.

天池中药说明书实体识别挑战冠军方案；中文命名实体识别；NER; BERT-CRF & BERT-SPAN & BERT-MRC；Pytorch

A Fast Command Analyser based on Dict and Pydantic

Python powered crossword generator with database with 20k+ polish words

Pytorch-Named-Entity-Recognition-with-BERT

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization

Large-scale Knowledge Graph Construction with Prompting

A library for Multilingual Unsupervised or Supervised word Embeddings

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Binaural Speech Synthesis

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

Mesh TensorFlow: Model Parallelism Made Easier

🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle