Semantic similarity computation with different state-of-the-art metrics

Last update: Jun 22, 2022

Related tags

Overview

Semantic similarity computation with different state-of-the-art metrics

Description • Installation • Usage • License

Description

TaxoSS is a semantic similarity library for Python which implements the state-of-the-art semantic similarity metrics like Resnik, JCN, and HSS.

Requirements

Python 3.6 or later
NLTK
NumPy
Pandas

Installation

TaxoSS can be installed through pip (the Python package manager) in the following way:

pip install taxoss

Usage

Semantic similarity functions

You can compute the semantic similarity in the following way:

from TaxoSS.functions import semantic_similarity
semantic_similarity('brother', 'sister', 'hss')

3.353513521371089

The function semantic_similarity(word1, word2, kind, ic) has these options for the argument kind:

hss -> HSS (default)
wup -> WUP
lcs -> LC
path_sim -> Shortest Path
resnik -> Resnik
jcn -> Jiang-Conrath
lin -> Lin
seco -> Seco

For the argument ic see the following section.

Information Content

Using a Wikipedia copus for calculating the Information Content (default of the argument ic):

from TaxoSS.functions import semantic_similarity
semantic_similarity('cat', 'dog', 'resnik')

6.169410755220327

Calculating Information Conent from a given corpus:

from TaxoSS.calculate_IC import calculate_IC
from TaxoSS.functions import semantic_similarity

calculate_IC(path_to_corpus, path_to_save_IC_file)
semantic_similarity('cat', 'dog', 'resnik', path_to_save_IC_file)

with path_to_save_IC_file a path into the virtual environment TaxoSS package, e.g. venv/lib/python3.6/site-packages/TaxoSS/data/prova_IC.csv.

Benchmark

	HSS (ours)	HSS (ours)	WUP	WUP	LC	LC	Shortest Path	Shortest Path	Resnik	Resnik	Jiang-Conrath	Jiang-Conrath	Lin	Lin	Seco	Seco
	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman
MEN	0.41	0.33	0.36	0.33	0.14	0.05	0.07	0.03	0.05	0.03	-0.05	-0.04	0.05	0.04	-0.01	0.03
MC30	0.74	0.69	0.74	0.73	0.33	0.21	0.22	0.3	0.13	0.03	-0.06	-0.01	0.05	0.01	0.13	-0.09
WSS	0.68	0.65	0.58	0.59	0.36	0.23	0.16	0.1	0.02	-0.03	0.04	0.06	0.03	0.06	-0.01	-0.04
Simlex999	0.4	0.38	0.45	0.43	0.26	0.15	0.2	0.16	-0.04	-0.04	0.12	0.14	0.12	0.14	-0.02	-0.08
MT287	0.46	0.31	0.4	0.28	0.26	0.12	0.11	0.11	0.03	0.04	0.18	0.16	0.22	0.17	0	-0.06
MT771	0.44	0.4	0.43	0.49	0.06	0.02	0.1	0.13	0	-0.01	0	0	0	0	-0.05	-0.03
Time per pair (s)	0.0007	0.0007	0.008	0.008	0.0055	0.0055	0.0064	0.0064	0.5586	0.5586	0.551	0.551	0.5866	0.5866	0.0013	0.0013

Semantic similarity computation with different state-of-the-art metrics

Related tags

Overview

Semantic similarity computation with different state-of-the-art metrics

Description

Requirements

Installation

Usage

Semantic similarity functions

Information Content

Benchmark

Owner

Cognate Detection Repository

Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Official Repsoitory for "Activate or Not: Learning Customized Activation." [CVPR 2021]

Image Recognition using Pytorch

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

Voice Gender Recognition

Unofficial PyTorch implementation of SimCLR by Google Brain

Tandem Mass Spectrum Prediction with Graph Transformers

Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Java and SHACL code commented in the paper "Towards compliance checking in reified I/O logic via SHACL" submitted to ICAIL 2021

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

A platform for intelligent agent learning based on a 3D open-world FPS game developed by Inspir.AI.

Official implementation of deep-multi-trajectory-based single object tracking (IEEE T-CSVT 2021).

The VarCNN is an Convolution Neural Network based approach to automate Video Assistant Referee in football.

[cvpr22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

Specification language for generating Generalized Linear Models (with or without mixed effects) from conceptual models

unofficial pytorch implement of "Squareplus: A Softplus-Like Algebraic Rectifier"