Index different CKAN entities in Solr, not just datasets

Last update: Dec 02, 2022

Overview

ckanext-sitesearch

Index different CKAN entities in Solr, not just datasets

Requirements

This extension requires CKAN 2.9 or higher and Python 3

Features

Search actions

ckanext-sitesearch allows Solr-powered searches on the following CKAN entities:

Entity	Action	Permissions	Notes
Organizations	`organization_search`	Public
Groups	`group_search`	Public
Users	`user_search`	Sysadmins only
Pages	`page_search`	Public (individual page permissions apply)	Requires ckanext-pages

All *_search actions support most of the same paramters that package_search, except the facet* and include_* ones. That includes q, fq, rows, start and sort.

In all actions, the output matches the one of package_search as well, an object with a count key and a results one, wich is a list of the corresponding entities dict (ie the result of organization_show, user_show etc):

, , ] } ">

{
    "count": 2,
    "results": [
        
    
     ,
        
     
      ,
    ]
}

Additionally the plugin registers a site_search action that performs a search across all entities that the user is allowed to, including datasets. Results are returned in an object including the keys for which the user has permission to search on. For instance for a sysadmin user that has access to all searches:

, "organizations": , "groups": , "users": , "pages": }">

{
    "datasets": 
       
        ,
    "organizations": 
        
         ,
    "groups": 
         
          ,
    "users": 
          
           ,
    "pages": 
           
             }

For each item, the results object is the one described above (count and results keys).

Note that all parameters are passed unchanged to each of the search actions, so this site-wide search is mostly useful for free-text searches like q=flood.

CLI

The plugin inlcudes a ckan command to reindex the current entities in the database in Solr:

ckan sitesearch rebuild

Where entity_type is one of organizations, groups, users or pages. You can also pass the id or name of a particular entity to index just that particular one:

ckan sitesearch rebuild organization department-of-transport

Check the command help for additional options:

ckan sitesearch rebuild --help

Installation

To install ckanext-sitesearch:

Activate your CKAN virtual environment, for example:

. /usr/lib/ckan/default/bin/activate
Clone the source and install it on the virtualenv

git clone https://github.com/okfn/ckanext-sitesearch.git cd ckanext-sitesearch pip install -e . pip install -r requirements.txt
Add sitesearch to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/ckan.ini).
Restart CKAN

Config settings

None at present

Developer installation

To install ckanext-sitesearch for development, activate your CKAN virtualenv and do:

git clone https://github.com/okfn/ckanext-sitesearch.git
cd ckanext-sitesearch
python setup.py develop

Tests

To run the tests, do:

pytest --ckan-ini=test.ini

License

AGPL

Index different CKAN entities in Solr, not just datasets

Related tags

Overview

ckanext-sitesearch

Requirements

Features

Search actions

CLI

Installation

Config settings

Developer installation

Tests

License

Owner

Open Knowledge Foundation

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Guide to using pre-trained large language models of source code

A flask application to predict the speech emotion of any .wav file.

2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

An Open-Source Package for Neural Relation Extraction (NRE)

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

A Japanese tokenizer based on recurrent neural networks

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

BERT, LDA, and TFIDF based keyword extraction in Python

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

Few-shot Natural Language Generation for Task-Oriented Dialog

ChatBotProyect - This is an unfinished project about a simple chatbot.

Python generation script for BitBirds

Code for Text Prior Guided Scene Text Image Super-Resolution

Meta learning algorithms to train cross-lingual NLI (multi-task) models

Graphical user interface for Argos Translate