kawadi is a versatile tool that used as a form of weapon and is used to cut, shape and split wood.

Overview

kawadi

Build Status Code Coverage kawadi

kawadi (કવાડિ in Gujarati) (Axe in English) is a versatile tool that used as a form of weapon and is used to cut, shape and split wood.

kawadi is collection of small tools that I found useful for me more often. Currently it contains text search which searches a string inside another string.

Text Search

Text search in kawadi uses sliding window technique to search for a word or phrase in another text. The step size in the sliding window is 1 and the window size is the size of the word/phrase we are interested in.

For example, if the text we are interested in searching is "The big brown fox jumped over the lazy dog" and the word that we want to search is "brown fox".

len(["brown", ["fox"]]) slides = sliding_window(text, window_size) -> ['The', 'big']['big', 'brown']['brown', 'fox']['fox', 'jumped']['jumped', 'over']['over', 'the']['the', 'lazy']['lazy', 'dog'] for each slide in slides score(" ".join(slide), interested_word) if score >= threshold then select slide else continue ">
text = "The big brown fox jumped over the lazy dog"
interested_word = "brown fox"
window_size = len(interested.split()) -> len(["brown", ["fox"]])

slides = sliding_window(text, window_size) -> ['The', 'big']['big', 'brown']['brown', 'fox']['fox', 'jumped']['jumped', 'over']['over', 'the']['the', 'lazy']['lazy', 'dog']

for each slide in slides
  score(" ".join(slide), interested_word)
  if score >= threshold then
    select slide
  else
    continue

Currently, there are 3 similarity scores are calculated and averaged to calculate the final score. These similarity scores are Cosine, JaroWinkler and Normalized Levinstine similarities.

In development

  • Add functionality to accept custom user similarity metrics.
  • [] Generate documentation.
  • [] Write the custom counter

You can follow the project development in the Projects tab.

Quick Start

from kawadi.text_search import SearchInText

search = SearchInText()

text_to_find = "String distance algorithm"
text_to_search = """SIFT4 is a general purpose string distance algorithm inspired by JaroWinkler and Longest Common Subsequence. It was developed to produce a distance measure that matches as close as possible to the human perception of string distance. Hence it takes into account elements like character substitution, character distance, longest common subsequence etc. It was developed using experimental testing, and without theoretical background."""

result = search.find_in_text(text_to_find, text_to_search)

print(result)
[
    {
        "sim_score": 1.0,
        "searched_text": "string distance algorithm",
        "to_find": "string distance algorithm",
        "start": 27,
        "end": 52,
    }
]

If the text that needs to be searched is big, SearchInText can utilize multiprocessing to make the search fast.

from kawadi.text_search import SearchInText

search = SearchInText(multiprocessing=True, max_workers=8)

Custom user defined score calculation.

Its often the case that the provided string similarity score is not enough for the use case that you may have. For this very case, you can add, your own score calculation.

from kawadi.text_search import SearchInText


def my_custom_fun(**kwargs):

  slide_of_text:str = kwargs["slide_of_text"]
  text_to_find:str = kwargs["text_to_find"]

  # Here you can then go on to do preprocessing if you like,
  # or use them to count char based n-gram string matching scores.

  return score: float

search = SearchInText(search_threshold=0.9, custom_score_func= your custom func)

This custom score function will have access to two things slide_of_text for every slide in text (From the example above, "The big", "big brown" and so on...) and text_to_find.

Note: The return type of this custom function should be same as the type of search_threshold as you can see from the above example.

Installation

Stable Release: pip install kawadi
Development Head: pip install git+https://github.com/jdvala/kawadi.git

Development

See CONTRIBUTING.md for information related to developing the code.

Free software: MIT license

You might also like...
A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis.

Jupyter Pylinter A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis. If you find this tool us

A python tool give n number of inputs and parallelly you will get a output by separetely

http-status-finder Hello Everyone!! This is kavisurya, In this tool you can give n number of inputs and parallelly you will get a output by separetely

NetConfParser is a tool that helps you analyze the rpcs coming and going from a netconf client to a server

NetConfParser is a tool that helps you analyze the rpcs coming and going from a netconf client to a server

Basic loader is a small tool that will help you generating Cloudflare cookies

Basic Loader Cloudflare cookies loader This tool may help some people getting valide cloudflare cookies Installation 🔌 : pip install -r requirements.

Python tool to check a web applications compliance with OWASP HTTP response headers best practices

Check Your Head A quick and easy way to check a web applications response headers!

A tool for testing improper put method vulnerability
A tool for testing improper put method vulnerability

Putter-CUP A tool for testing improper put method vulnerability Usage :- python3 put.py -f live-subs.txt Result :- The result in txt file "result.txt"

Tool for generating Memory.scan() compatible instruction search patterns

scanpat Tool for generating Frida Memory.scan() compatible instruction search patterns. Powered by r2. Examples $ ./scanpat.py arm.ks:64 'sub sp, sp,

Stubmaker is an easy-to-use tool for generating python stubs.

Stubmaker is an easy-to-use tool for generating python stubs. Requirements Stubmaker is to be run under Python 3.7.4+ No side effects during

PyHook is an offensive API hooking tool written in python designed to catch various credentials within the API call.
PyHook is an offensive API hooking tool written in python designed to catch various credentials within the API call.

PyHook is the python implementation of my SharpHook project, It uses various API hooks in order to give us the desired credentials. PyHook Uses

Comments
  • :sparkles: Accept custom user defined score functions

    :sparkles: Accept custom user defined score functions

    This pull request adds feature to accept user defined custom score function for calculating string similarity.

    Pull request recommendations:

    • [x] Name your pull request your-development-type/short-description. Ex: feature/read-tiff-files
    • [ ] Link to any relevant issue in the PR description. Ex: Resolves [gh-12], adds tiff file format support
    • [x] Provide context of changes.
    • [x] Provide relevant tests for your feature or bug fix.
    • [x] Provide or update documentation for any feature added by your pull request.

    Thanks for contributing!

    opened by jdvala 1
Releases(0.0.2)
  • 0.0.2(Oct 31, 2021)

    What's Changed

    • :memo: Update README.md by @jdvala in https://github.com/jdvala/kawadi/pull/1
    • :memo: Fix README.md by @jdvala in https://github.com/jdvala/kawadi/pull/2
    • :memo: Rename repo to Kawadi by @jdvala in https://github.com/jdvala/kawadi/pull/3
    • :sparkles: Accept custom user defined score functions by @jdvala in https://github.com/jdvala/kawadi/pull/4
    • :memo: update setup.py to read README for long description. by @jdvala in https://github.com/jdvala/kawadi/pull/5

    New Contributors

    • @jdvala made their first contribution in https://github.com/jdvala/kawadi/pull/1

    Full Changelog: https://github.com/jdvala/kawadi/commits/0.0.2

    Source code(tar.gz)
    Source code(zip)
  • 0.0.1(Oct 31, 2021)

    What's Changed

    • :memo: Update README.md by @jdvala in https://github.com/jdvala/kawadi/pull/1
    • :memo: Fix README.md by @jdvala in https://github.com/jdvala/kawadi/pull/2
    • :memo: Rename repo to Kawadi by @jdvala in https://github.com/jdvala/kawadi/pull/3
    • :sparkles: Accept custom user defined score functions by @jdvala in https://github.com/jdvala/kawadi/pull/4
    • :memo: update setup.py to read README for long description. by @jdvala in https://github.com/jdvala/kawadi/pull/5

    New Contributors

    • @jdvala made their first contribution in https://github.com/jdvala/kawadi/pull/1

    Full Changelog: https://github.com/jdvala/kawadi/commits/0.0.1

    Source code(tar.gz)
    Source code(zip)
Owner
Jay Vala
Data Scientist at scoutbee
Jay Vala
A pythonic dependency injection library.

Pinject Pinject is a dependency injection library for python. The primary goal of Pinject is to help you assemble objects into graphs in an easy, main

Google 1.3k Dec 30, 2022
python package for generating typescript grpc-web stubs from protobuf files.

grpc-web-proto-compile NOTE: This package has been superseded by romnn/proto-compile, which provides the same functionality but offers a lot more flex

Roman Dahm 0 Sep 05, 2021
Deep Difference and search of any Python object/data.

DeepDiff v 5.6.0 DeepDiff Overview DeepDiff: Deep Difference of dictionaries, iterables, strings and other objects. It will recursively look for all t

Sep Dehpour 1.6k Jan 08, 2023
A program will generate a eth key pair that has the public key that starts with a defined amount of 0

ETHAdressGenerator This short program will generate a eth key pair that has the public key that starts with a defined amount of 0 Requirements Python

3 Nov 19, 2021
✨ Un générateur de lien raccourcis en fonction d'un lien totalement fait en Python par moi, et en français.

Shorter Link ❗ Un générateur de lien raccourcis en fonction d'un lien totalement fait en Python par moi, et en français. Dépendences : pip install pys

MrGabin 3 Jun 06, 2021
ColorController is a Pythonic interface for managing colors by english-language name and various color values.

ColorController.py Table of Contents Encode color data in various formats. 1.1: Create a ColorController object using a familiar, english-language col

Tal Zaken 2 Feb 12, 2022
Just some scripts to export vector tiles to geojson.

Vector tiles to GeoJSON Nowadays modern web maps are usually based on vector tiles. The great thing about vector tiles is, that they are not just imag

Lilith Wittmann 77 Jul 26, 2022
Dice Rolling Simulator using Python-random

Dice Rolling Simulator As the name of the program suggests, we will be imitating a rolling dice. This is one of the interesting python projects and wi

PyLaboratory 1 Feb 02, 2022
Monte Carlo simulation of 3G rules

mc3g Monte Carlo simulation of 3G rules This project contains the Python code to do simulations of events according to the 3G rule (in German: "Geimpf

Jan Christoph Terasa 4 Nov 01, 2021
A multipurpose python module

pysherlock pysherlock is a Python library for dealing with web scraping using images, it's a Python application of the rendertron headless browser API

Sachit 2 Nov 11, 2021
Produce a simulate-able SDF of an arbitrary mesh with convex decomposition.

Mesh-to-SDF converter Given a (potentially nasty, nonconvex) mesh, automatically creates an SDF file that describes that object. The visual geometry i

Greg Izatt 22 Nov 23, 2022
✨ Un générateur de mot de passe aléatoire totalement fait en Python par moi, et en français.

Password Generator ❗ Un générateur de mot de passe aléatoire totalement fait en Python par moi, et en français. 🔮 Grâce a une au module random et str

MrGabin 3 Jul 29, 2021
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

Seth Morton 712 Dec 23, 2022
Toolkit for collecting and applying templates of prompting instances

PromptSource Toolkit for collecting and applying templates of prompting instances. WIP Setup Download the repo Navigate to root directory of the repo

BigScience Workshop 1k Jan 05, 2023
We provide useful util functions. When adding a util function, please add a description of the util function.

Utils Collection Motivation When we implement codes, we often search for util functions that are already implemented. Here, we are going to share util

6 Sep 09, 2021
Collection of code auto-generation utility scripts for the Horizon `Boot` system module

boot-scripts This is a collection of code auto-generation utility scripts for the Horizon Boot system module, intended for use in Atmosphère. Usage Us

4 Oct 11, 2022
Script to generate a massive volume of data in sql, csv, json or xml format

DataGenerator Made with Python Open for pull requests 1. Dependencies To install required dependencies run pip install -r requirements.txt 2. Executi

icrescenti 3 Sep 20, 2022
A tool written in python to generate basic repo files from github

A tool written in python to generate basic repo files from github

Riley 7 Dec 02, 2021
Python Random Number Genrator

This Genrates Random Numbers. This Random Number Generator was made using python. This software uses Time and Random extension. Download the EXE file and run it to get your answer.

Krish Sethi 2 Feb 03, 2022
Simple tool for creating changelogs

Description Simple utility for quickly generating changelogs, assuming your commits are ordered as they should be. This tool will simply log all lates

2 Jan 05, 2022