Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Overview

FeatureHasher

Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Note, this requires Jina>=2.2.4.

Example

Here I use FeatureHasher to hash each sentence of Pride and Prejudice into a 128-dim vector, and then use .match to find top-K similar sentences.

from jina import Document, DocumentArray, Flow

# load 
   
d = Document(uri='https://www.gutenberg.org/files/1342/1342-0.txt').convert_uri_to_text()

# cut into non-empty sentences store in a DA
da = DocumentArray(Document(text=s.strip()) for s in d.text.split('\n') if s.strip())

# use FeatureHasher in a Flow
f = Flow().add(uses='jinahub://FeatureHasher')

embed_da = DocumentArray()
with f:
    f.post('/', da, on_done=lambda req: embed_da.extend(req.docs), show_progress=True)

print('self-matching...')
embed_da.match(embed_da, exclude_self=True, limit=5, normalization=(1, 0))
print('total sentences: ', len(embed_da))
for d in embed_da:
    print(d.text)
    for m in d.matches:
        print(m.scores['cosine'], m.text)
    input()
           [email protected][I]:๐ŸŽ‰ Flow is ready to use!
	๐Ÿ”— Protocol: 		GRPC
	๐Ÿ  Local access:	0.0.0.0:52628
	๐Ÿ”’ Private network:	192.168.178.31:52628
	๐ŸŒ Public address:	217.70.138.123:52628
โ น       DONE โ”โ•ธโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 0:00:01 100% ETA: 0 seconds 40 steps done in 1 second
total sentences:  12153
๏ปฟThe Project Gutenberg eBook of Pride and Prejudice, by Jane Austen

   
     *** END OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

    
      *** START OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

     
       production, promotion and distribution of Project Gutenberg-tm

      
        Pride and Prejudice

       
         By Jane Austen This eBook is for the use of anyone anywhere in the United States and 
        
          This eBook is for the use of anyone anywhere in the United States and 
         
           by the awkwardness of the application, and at length wholly 
          
            Elizabeth passed the chief of the night in her sisterโ€™s room, and 
           
             the happiest memories in the world. Nothing of the past was 
            
              charities and charitable donations in all 50 states of the United 
            
           
          
         
        
       
      
     
    
   

In practice, you can implement matching and storing via an indexer inside Flow. This example is only for demo purpose so any non-feature hashing related ops are implemented outside the Flow to avoid distraction.

Owner
Jina AI
A Neural Search Company. We provide the cloud-native neural search solution powered by state-of-the-art AI technology.
Jina AI
Phishing-Crack tools to punish friends

Phishing-Crack Phishing Tool Version 1.0.0 Created By temirovazat A Phishing Tool With PHP and Python3 Features Fake Instagram Phishing Page Fake Face

3 Oct 04, 2022
Python implementation for PrintNightmare using standard Impacket.

PrintNightmare Python implementation for PrintNightmare (CVE-2021-1675 / CVE-2021-34527) using standard Impacket. Installtion $ pip3 install impacket

ollypwn 141 Dec 31, 2022
Polkit - Local Privilege Escalation (CVE-2021-3560)

CVE-2021-3560 Polkit - Local Privilege Escalation Original discovery by kevin_backhouse from GitHub Security Lab References https://github.blog/2021-0

Salman Asad 1 Nov 12, 2021
Volunteer & Campaign Management System

Cleansweep Requirements A Linux (or Mac OS X) node with the following software installed. Ubuntu 14.04 is preferred. PostgreSQL 9.3 database server Py

Aam Aadmi Party 39 May 24, 2022
Springboot directory scanning

Springboot directory scanning

WINEZERO 87 Dec 28, 2022
client attack remotely , this script was written for educational purposes only

client attack remotely , this script was written for educational purposes only, do not use against to any victim, which you do not have permission for it

9 Jun 05, 2022
An automated header extensive scanner for detecting log4j RCE CVE-2021-44228

log4j An automated header extensive scanner for detecting log4j RCE CVE-2021-44228 Usage $ python3 log4j.py -l urls.txt --dns-log REPLACE_THIS.dnslog.

2 Dec 16, 2021
An auxiliary tool for iot vulnerability hunter

firmeye - IoTๅ›บไปถๆผๆดžๆŒ–ๆŽ˜ๅทฅๅ…ท firmeye ๆ˜ฏไธ€ไธช IDA ๆ’ไปถ๏ผŒๅŸบไบŽๆ•ๆ„Ÿๅ‡ฝๆ•ฐๅ‚ๆ•ฐๅ›žๆบฏๆฅ่พ…ๅŠฉๆผๆดžๆŒ–ๆŽ˜ใ€‚ๆˆ‘ไปฌ็Ÿฅ้“๏ผŒๅœจๅ›บไปถๆผๆดžๆŒ–ๆŽ˜ไธญ๏ผŒไปŽๆ•ๆ„Ÿ/ๅฑ้™ฉๅ‡ฝๆ•ฐๅ‡บๅ‘๏ผŒๅฏปๆ‰พๅ…ถๅ‚ๆ•ฐๆฅๆบ๏ผŒๆ˜ฏไธ€็งๅพˆๆœ‰ๆ•ˆ็š„ๆผๆดžๆŒ–ๆŽ˜ๆ–นๆณ•๏ผŒไฝ†็จ‹ๅบไธญ่ฐƒ็”จๆ•ๆ„Ÿๅ‡ฝๆ•ฐ็š„ๅœฐๆ–น้žๅธธๅคš๏ผŒไบบๅทฅๅˆ†ๆž่€—ๆ—ถ่ดนๅŠ›๏ผŒ้€š่ฟ‡่ฏฅๆ’ไปถ๏ผŒๅฏไปฅๅธฎๅŠฉๆŽ’้™คๅคง้ƒจๅˆ†็š„ๅฎ‰ๅ…จ

Firmy Yang 171 Nov 28, 2022
Js File Scanner This is Js File Scanner

Js File Scanner This is Js File Scanner . Which are scan in js file and find juicy information Toke,Password Etc.

122 Dec 12, 2022
Jolokia Exploitation Toolkit (JET) helps exploitation of exposed jolokia endpoints.

jolokia-exploitation-toolkit Jolokia Exploitation Toolkit (JET) helps exploitation of exposed jolokia endpoints. Core concept Jolokia is a protocol br

Laluka 194 Jan 01, 2023
Exploit tool for Adminer 1.0 up to 4.6.2 Arbitrary File Read vulnerability

AdminerRead Exploit tool for Adminer 1.0 up to 4.6.2 Arbitrary File Read vulnerability Installation git clone https://github.com/p0dalirius/AdminerRea

Podalirius 58 Dec 05, 2022
RedlineSpam - Python tool to spam Redline Infostealer panels with legit looking data

RedlineSpam Python tool to spam Redline Infostealer panels with legit looking da

4 Jan 27, 2022
Python3 script for scanning CVE-2021-44228 (Log4shell) vulnerable machines.

Log4j_checker.py (CVE-2021-44228) Description This Python3 script tries to look for servers vulnerable to CVE-2021-44228, also known as Log4Shell, a v

lfama 8 Feb 27, 2022
A simple multi-threaded distributed SSH brute-forcing tool written in Python.

OrbitalDump A simple multi-threaded distributed SSH brute-forcing tool written in Python. How it Works When the script is executed without the --proxi

K4YT3X 408 Jan 03, 2023
this keylogger is only for pc not for android but it will only work on those pc who have python installed it is made for all linux,windows and macos

Keylogger this keylogger is only for pc not for android but it will only work on those pc who have python installed it is made for all linux,windows a

Titan_Exodous 1 Nov 04, 2021
Omega - From Wordpress admin to pty

The Linux tool to automate the process of getting a pty once you got admin credentials in a Wordpress site. Keep in mind that right now Omega only can attack Linux hosts.

รngel Heredia 12 Nov 09, 2022
A python module for retrieving and parsing WHOIS data

pythonwhois A WHOIS retrieval and parsing library for Python. Dependencies None! All you need is the Python standard library. Instructions The manual

Sven Slootweg 384 Dec 23, 2022
Genpyteal - Experiment to rewrite Python into PyTeal using RedBaron

genpyteal Converts Python to PyTeal. Your mileage will vary depending on how muc

Jason Livesay 9 Oct 19, 2022
A malware to encrypt all the .txt and .jpg files in target computer using RSA algorithms

A malware to encrypt all the .txt and .jpg files in target computer using RSA algorithms. Change the Blackgound image of targets' computer. and decrypt the targets' encrypted files in our own compute

Li Ka Lok 2 Dec 02, 2022
On the 11/11/21 the apache 2.4.49-2.4.50 remote command execution POC has been published online and this is a loader so that you can mass exploit servers using this.

ApacheRCE ApacheRCE is a small little python script that will allow you to input the apache version 2.4.49-2.4.50 and then input a list of ip addresse

3 Dec 04, 2022