Measure file similarity in a many-to-many fashion

Overview

Mesi

Lint and Test codecov PyPI PyPI - Downloads License


Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.

Installation

Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.

pipx install mesi

If you'd like to test out Mesi before installing it, use the remote execution feature of pipx, which will temporarily download Mesi and run it in an isolated virtual environment.

pipx run mesi --help

Usage

For a directory structure that looks like:

lab-one
├── StudentOne
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
├── StudentTwo
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
│

where similarity should be measured between each student's deliverables/python_program.py file, run the command:

mesi lab-one/*/deliverables/python_program.py

A lower distance in the produced table equates to a higher degree of similarity.

See the help menu (mesi --help) for additional options and configuration.

Algorithms

There are many algorithms to choose from when comparing string similarity! Mesi implements all the algorithms provided by TextDistance. In general levenshtein is never a bad choice, which is why it is the default.

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.

Dependencies

Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.

License

Distributed under the terms of the GPL v3 license, mesi is free and open source software.

You might also like...
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file with several blocks.

A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

This program can help you to move and rename many files at once
This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

A platform independent file lock for Python
A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

Releases(v1.1.0)
  • v1.1.0(Dec 8, 2021)

    This release adds an --average option, which prints the average distance computed against the given files after the individual comparison distances. An unimplemented --distribution option, intended for use in the future, was also added.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Oct 26, 2021)

    This release adds more documentation, a --version option, and removes the bright white table styling from previous versions that were hard to read on light-themed terminals.

    Full Changelog: https://github.com/Michionlion/mesi/compare/v1.0.1...v1.0.2

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Oct 26, 2021)

    This release fixes a bug where in some situations, some distinct parts of compared file names were not displayed. Additionally, a new option, --table-format, was added. This allows configuration of the output table format, using tabulate's table formats.

    Full Changelog: https://github.com/Michionlion/mesi/compare/v1.0.0...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Oct 25, 2021)

    Mesi is a tool to check the similarity between many different files; this is its first release, with support for many basic features and lots of text similarity/distance algorithms. You can even try out Mesi without downloading it!

    pipx run mesi --help
    

    If you don't have pipx installed, you may need to install it, but that's something you should do anyways. :smile:

    Source code(tar.gz)
    Source code(zip)
Owner
GatorEducator
Software tools developed at Allegheny College for computer science courses
GatorEducator
Organizer is a python program that organizes your downloads folder

Organizer Organizer is a python program that organizes your downloads folder, it can run as a service and so will start along with the system, and the

Gustavo 2 Oct 18, 2021
Kartothek - a Python library to manage large amounts of tabular data in a blob store

Kartothek - a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store

15 Dec 25, 2022
FileGenerator - File Generator for sites that accepts documents

File Generator for sites that accepts documents This code generates files as per

Shaunak 2 Mar 19, 2022
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".

the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/AppName If

ActiveState Software 948 Dec 31, 2022
Get Your TXT File Length !.

TXTLen Get Your TXT File Length !. Hi 👋 , I'm Alireza A Python Developer Boy 🔭 I’m currently working on my C# projects 🌱 I’m currently Learning CSh

Alireza Hasanzadeh 1 Jan 06, 2022
Ini adalah program python untuk mengubah background foto dalam 1 folder, tidak perlu satu satu

Myherokuapp my web drive You can see my web drive and can request film/Application do you want in here my blog you can visit my blog RemBg ini adalah

XnuxersXploitXen 13 Dec 01, 2022
csv2ir is a script to convert ir .csv files to .ir files for the flipper.

csv2ir csv2ir is a script to convert ir .csv files to .ir files for the flipper. For a repo of .ir files, please see https://github.com/logickworkshop

Alex 38 Dec 31, 2022
Various technical documentation, in electronically parseable format

a-pile-of-documentation Various technical documentation, in electronically parseable format. You will need Python 3 to run the scripts and programs in

Jonathan Campbell 2 Nov 20, 2022
This project is a set of programs that I use to create a README.md file.

🤖 codex-readme 📜 codex-readme What is it? This project is a set of programs that I use to create a README.md file. How does it work? It reads progra

Tom Dörr 224 Jan 07, 2023
pytiff is a lightweight library for reading chunks from a tiff file

pytiff is a lightweight library for reading chunks from a tiff file. While it supports other formats to some extend, it is focused on reading tiled greyscale/rgb images, that can also be bigtiffs. Wr

Big Data Analytics group 9 Mar 21, 2022
Utils for streaming large files (S3, HDFS, gzip, bz2...)

smart_open — utils for streaming large files in Python What? smart_open is a Python 3 library for efficient streaming of very large files from/to stor

RARE Technologies 2.7k Jan 06, 2023
Media file renamer and organizion tool

mnamer mnamer (media renamer) is an intelligent and highly configurable media organization utility. It parses media filenames for metadata, searches t

Jessy Williams 533 Dec 29, 2022
Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule.

simaster.ics Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule. Usage Getting the events.json file from SIMASTER O

Faiz Jazadi 8 Nov 02, 2022
PyDeleter - delete a specifically formatted file in a directory or delete all other files

PyDeleter If you want to delete a specifically formatted file in a directory or delete all other files, PyDeleter does it for you. How to use? 1- Down

Amirabbas Motamedi 1 Jan 30, 2022
This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

João Assalim 1 Oct 10, 2022
Some-tasks - Files for some of the tasks for the group sessions

Files for some of the tasks for the group sessions Here you can find some of the

<a href=[email protected] Computer Networks"> 0 Aug 25, 2022
A simple file sharing tool written in python

Share it A simple file sharing tool written in python Installation If you are using Windows os you can directly Run .exe file -- download If you are

Sachit Yadav 7 Dec 16, 2022
fast change directory with python and ruby

fcdir fast change directory with python and ruby run run python script , chose drirectoy and change your directory need you need python and ruby deskt

XCO 2 Jun 20, 2022
Yadl - it is a simple library for working with both dotenv files and environment variables.

Yadl Yadl - it is a simple library for working with both dotenv files and environment variables. Features Validation of whitespaces. Validation of num

Ivan Kapranov 3 Oct 19, 2021
This simple python script pcopy reads a list of file names and copies them to a separate folder

pCopy This simple python script pcopy reads a list of file names and copies them to a separate folder. Pre-requisites Python 3 (ver. 3.6) How to use

Madhuranga Rathnayake 0 Sep 03, 2021