Wagtail CLIP allows you to search your Wagtail images using natural language queries.

Related tags

Searchwagtail-clip
Overview

Wagtail CLIP

Wagtail CLIP allows you to search your Wagtail images using natural language queries.

Intro Video

This project was inspired by, and draws heavily from memery, by deepfates et al. It makes use of OpenAI's CLIP model (it was very nice of them to open source it, cheers).

An example project is available here

Installation

You can install this package as follows (requires Git):

pip install \
    [email protected]+https://github.com/MattSegal/wagtail-clip.git \
    -f https://download.pytorch.org/whl/torch_stable.html \
    torch==1.7.1+cpu \
    torchvision==0.8.2+cpu

You will find that this installs ~200MB of deep learning libraries (PyTorch). You will also need to update your Django project's settings:

INSTALLED_APPS = [
    # ... whatever ...
    "wagtailclip",
]

# A place to store ~330MB of downloaded model parameters
WAGTAIL_CLIP_DOWNLOAD_PATH = "/clip"
# Maximum number of search results
WAGTAIL_CLIP_MAX_IMAGE_SEARCH_RESULTS = 256
# A unique name for the search backend.
WAGTAIL_CLIP_SEARCH_BACKEND_NAME = "clip"
# Recommended model, or your can roll your own (read the source).
WAGTAILIMAGES_IMAGE_MODEL = "wagtailclip.NaturalSearchImage"
# Add the search backend.
WAGTAILSEARCH_BACKENDS = {
    # ... whatever ...
    WAGTAIL_CLIP_SEARCH_BACKEND_NAME: {
        "BACKEND": "wagtailclip.search.CLIPSearchBackend",
    },
}

That's enough to get started, however if you want pre-download the ~330MB of model parameters, you can run this management command:

./manage.py download_clip

How it works

This package wraps the CLIP model. which can be used for:

  • encoding text into 1x512 float vectors
  • encoding images into 1x512 float vectors

These vectors can be thought of as points in a 512 dimensional space, where the closer two points are to each other, the more "related" they are. Importantly, CLIP encodes both text and images into the same space, meaning that we can:

  • encode all Wagtail images into vectors and store them in the database
  • encode a user's search query text into a vector; and then
  • compare the search query vector with all the image vectors

This comparison is done using a dot product to get a similarity score for each image. The operation is performed in Python. Once we have a similarity score we pick the top N (say, 256) most similar images and return those as the results.

Will this scale?

Haha probably not. I've tested my naive implementation on up to 2048 images and it runs OK (~3s / query). There are specialized Postgres extensions and vector similarity databases that you can use if you want to do this for tens of thousands of images.

Contributing

If you want to help out, make a pull request and/or email me at [email protected] or DM me on Twitter. Probably better to talk to me first before writing a bunch of code.

Owner
Matt Segal
Matt Segal
High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

elastic 3.6k Dec 30, 2022
TG-searcherBot - Search any channel/chat from keyword

TG-searcherBot Search any channel/chat from keyword. Commands /start - Starts th

TechiError 12 Nov 04, 2022
Pythonic Lucene - A simplified python impelementaiton of Apache Lucene

A simplified python impelementaiton of Apache Lucene, mabye helps to understand how an enterprise search engine really works.

Mahdi Sadeghzadeh Ghamsary 2 Sep 12, 2022
Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.

yagooglesearch - Yet another googlesearch Overview yagooglesearch is a Python library for executing intelligent, realistic-looking, and tunable Google

115 Dec 29, 2022
solrpy is a Python client for Solr

solrpy solrpy is a Python client for Solr, an enterprise search server built on top of Lucene. solrpy allows you to add documents to a Solr instance,

Jiho Persy Lee 37 Jul 22, 2021
基于RSSHUB阅读器实现的获取P站排行和P站搜图,使用时需使用代理

基于RSSHUB阅读器实现的获取P站排行和P站搜图

34 Dec 05, 2022
A play store search application programming interface ( API )

Play-Store-API A play store search application programming interface ( API ) Made with Python3

Fayas Noushad 8 Oct 21, 2022
Home for Elasticsearch examples available to everyone. It's a great way to get started.

Introduction This is a collection of examples to help you get familiar with the Elastic Stack. Each example folder includes a README with detailed ins

elastic 2.5k Jan 03, 2023
cve-search - a tool to perform local searches for known vulnerabilities

cve-search cve-search is a tool to import CVE (Common Vulnerabilities and Exposures) and CPE (Common Platform Enumeration) into a MongoDB to facilitat

cve-search 2k Jan 01, 2023
Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

elastic 463 Dec 30, 2022
PwnWiki 数据库搜索命令行工具;该工具有点像 searchsploit 命令,只是搜索的不是 Exploit Database 而是 PwnWiki 条目

PWSearch PwnWiki 数据库搜索命令行工具。该工具有点像 searchsploit 命令,只是搜索的不是 Exploit Database 而是 PwnWiki 条目。

K4YT3X 72 Dec 20, 2022
PwnWiki Telegram database searching bot

pwtgbot PwnWiki Telegram database searching bot. Screenshots How it looks like in the terminal when running How it looks like in Telegram Run Directly

K4YT3X 3 Jan 25, 2022
An image inline search telegram bot.

Image-Search-Bot An image inline search telegram bot. Note: Use Telegram picture bot. That is better. Not recommending to deploy this bot. Made with P

Fayas Noushad 24 Oct 21, 2022
Reverse-ikea-image-search - A simple image of ikea search using jina.ai

IKEA Reverse Image Search This is a demo project to fetch ikea product images(IK

SOUVIK GHOSH 4 Mar 08, 2022
This is a Telegram Bot written in Python for searching data on Google Drive.

This is a Telegram Bot written in Python for searching data on Google Drive. Supports multiple Shared Drives (TDs). Manual Guide for deploying the bot

Levi 158 Dec 27, 2022
ForFinder is a search tool for folder and files

ForFinder is a search tool for folder and files. You can use that when you Source Code Analysis at your project's local files or other projects that you are download. Enter a root path and keyword to

Çağrı Aliş 7 Oct 25, 2022
🔍 Messages Searcher is make for search custom message in all channels in guild and dm.

🔍 Messages Searcher is make for search custom message in all channels in guild and dm.

Kaneki 33 Dec 31, 2022
Free and Open, Distributed, RESTful Search Engine

Elasticsearch Elasticsearch is the distributed, RESTful search and analytics engine at the heart of the Elastic Stack. You can use Elasticsearch to st

elastic 62.4k Jan 08, 2023
User-friendly, tiny source code searcher written by pure Python.

User-friendly, tiny source code searcher written in pure Python. Example Usages Cat is equivalent in the regular expression as '^Cat$' bor class Cat

Furkan Onder 106 Nov 02, 2022
Inverted index creation and query search mechanism on Wikipedia pages.

WikiPedia Search Engine Step 1 : Installing Requirements Install "stemming" module for python using pip. Step 2 : Parsing the Data To parse the data,

Piyush Atri 1 Nov 27, 2021