SimCSE在中文任务上的简单实验

Related tags

MiscellaneousSimCSE
Overview

SimCSE 中文测试

SimCSE在常见中文数据集上的测试,包含ATECBQLCQMCPAWSXSTS-B共5个任务。

介绍

文件

- utils.py  工具函数
- eval.py  评测主文件

评测

命令格式:

python eval.py [model_type] [pooling] [task_name] [dropout_rate]

使用例子:

python eval.py BERT cls ATEC 0.3

其中四个参数必须传入,含义分别如下:

- model_type: 模型,必须是['BERT', 'RoBERTa', 'WoBERT', 'RoFormer', 'BERT-large', 'RoBERTa-large', 'SimBERT', 'SimBERT-tiny', 'SimBERT-small']之一;
- pooling: 池化方式,必须是['first-last-avg', 'last-avg', 'cls', 'pooler']之一;
- task_name: 评测数据集,必须是['ATEC', 'BQ', 'LCQMC', 'PAWSX', 'STS-B']之一;
- dropout_rate: 浮点数,dropout的比例,如果为0则不dropout;

环境

测试环境:tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.5,如果在其他环境组合下报错,请根据错误信息自行调整代码。

下载

Google官方的两个BERT模型:

关于语义相似度数据集,可以从数据集对应的链接自行下载,也可以从作者提供的百度云链接下载。

其中senteval_cn目录是评测数据集汇总,senteval_cn.zip是senteval目录的打包,两者下其一就好。

相关

交流

QQ交流群:808623966,微信群请加机器人微信号spaces_ac_cn

Owner
苏剑林(Jianlin Su)
科学爱好者
苏剑林(Jianlin Su)
Identify and annotate mutations from genome editing assays.

CRISPR-detector Here we propose our CRISPR-detector to facilitate the CRISPR-edited amplicon and whole genome sequencing data analysis, with functions

hlcas 2 Feb 20, 2022
The blancmange curve can be visually built up out of triangle wave functions if the infinite sum is approximated by finite sums of the first few terms.

Blancmange-curve The blancmange curve can be visually built up out of triangle wave functions if the infinite sum is approximated by finite sums of th

Shankar Mahadevan L 1 Nov 30, 2021
Python programming language Test

Exercise You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirem

Monirul Islam Khan 1 Dec 13, 2021
Python library to decorate and beautify strings

outputformater Python library to decorate and beautify your standard output 💖 I

Felipe Delestro Matos 259 Dec 13, 2022
Find all solutions to SUBSET-SUM, including negative, positive, and repeating numbers

subsetsum The subsetsum Python module can enumerate all combinations within a list of integers which sums to a specific value. It works for both negat

Trevor Phillips 9 May 27, 2022
This python module allows to extract data from the RAW-file-format produces by devices from Thermo Fisher Scientific.

fisher_py This Python module allows access to Thermo Orbitrap raw mass spectrometer files. Using this library makes it possible to automate the analys

8 Oct 14, 2022
WGGCommute - Adding Commute Times to WG-Gesucht Listings

WGGCommute - Adding Commute Times to WG-Gesucht Listings This is a barebones implementation of a chrome extension that can be used to add commute time

Jannis 2 Jul 20, 2022
Fused multiply-add (with a single rounding) for Python.

pyfma Fused multiply-add for Python. Fused multiply-add computes (x*y) + z with a single rounding. Useful for dot products, matrix multiplications, po

Nico Schlömer 18 Nov 08, 2022
Extended functionality for Namebase past their web UI

Namebase Extended Extended functionality for Namebase past their web UI.

RunDavidMC 12 Sep 02, 2022
Repository to store sample python programs for python learning

py Repository to store sample Python programs. This repository is meant for beginners to assist them in their learning of Python. The repository cover

codebasics 5.8k Dec 30, 2022
Fithub is a website application for athletes and fitness enthusiasts of all ages and experience levels.

Fithub is a website application for athletes and fitness enthusiasts of all ages and experience levels. Our website allows users to easily search, filter, and sort our comprehensive database of over

Andrew Wu 1 Dec 13, 2021
通过简单的卷积神经网络直接预测出验证码图片中滑块的位置

使用说明 1. 在本地测试 运行python3 prdict_one.py即可,默认需要预测的图片路径位于testImg文件夹下的test1.png 运行python3 predict_folder.py预测testImg下的所有图片 2. 部署到服务器 运行python3 run_a_server

12 Mar 08, 2022
p5 is a Python package based on the core ideas of Processing.

p5 p5 is a Python library that provides high level drawing functionality to help you quickly create simulations and interactive art using Python. It c

p5py 645 Jan 04, 2023
A compilation of useful scripts to automate common tasks

Scripts-To-Automate-This A compilation of useful scripts for common tasks Name What it does Type Add file extensions Adds ".png" to a list of file nam

0 Nov 05, 2021
A slapdash script to solve Wordle or Absurdle automatically

A slapdash script to solve Wordle or Absurdle automatically

Michael Anthony 1 Jan 19, 2022
Un Assistente Vocale scritto in Python e altamente personalizzabile

Un Assistente Vocale scritto in Python e altamente personalizzabile

Marco 2 May 06, 2022
NeurIPS'19: Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting (Pytorch implementation for noisy labels).

Meta-Weight-Net NeurIPS'19: Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting (Official Pytorch implementation for noisy labels). The

243 Jan 03, 2023
A script to add issues to a project in Github based on label or status.

Add Github Issues to Project (Beta) A python script to move Github issues to a next-gen (beta) Github Project Getting Started These instructions will

Kate Donaldson 3 Jan 16, 2022
A performant state estimator for power system

A state estimator for power system. Turbocharged with sparse matrix support, JIT, SIMD and improved ordering.

9 Dec 12, 2022
a package that provides a marketstrategy for whitelisting on golem

filterms a package that provides a marketstrategy for whitelisting on golem watching requestor logs distribute 10 tasks asynchronously is fun. but you

KJM 3 Aug 03, 2022