SimCSE在中文任务上的简单实验

Related tags

MiscellaneousSimCSE
Overview

SimCSE 中文测试

SimCSE在常见中文数据集上的测试,包含ATECBQLCQMCPAWSXSTS-B共5个任务。

介绍

文件

- utils.py  工具函数
- eval.py  评测主文件

评测

命令格式:

python eval.py [model_type] [pooling] [task_name] [dropout_rate]

使用例子:

python eval.py BERT cls ATEC 0.3

其中四个参数必须传入,含义分别如下:

- model_type: 模型,必须是['BERT', 'RoBERTa', 'WoBERT', 'RoFormer', 'BERT-large', 'RoBERTa-large', 'SimBERT', 'SimBERT-tiny', 'SimBERT-small']之一;
- pooling: 池化方式,必须是['first-last-avg', 'last-avg', 'cls', 'pooler']之一;
- task_name: 评测数据集,必须是['ATEC', 'BQ', 'LCQMC', 'PAWSX', 'STS-B']之一;
- dropout_rate: 浮点数,dropout的比例,如果为0则不dropout;

环境

测试环境:tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.5,如果在其他环境组合下报错,请根据错误信息自行调整代码。

下载

Google官方的两个BERT模型:

关于语义相似度数据集,可以从数据集对应的链接自行下载,也可以从作者提供的百度云链接下载。

其中senteval_cn目录是评测数据集汇总,senteval_cn.zip是senteval目录的打包,两者下其一就好。

相关

交流

QQ交流群:808623966,微信群请加机器人微信号spaces_ac_cn

Owner
苏剑林(Jianlin Su)
科学爱好者
苏剑林(Jianlin Su)
VCC-Generator is a python script that generate VCC for testing purposes only

VCC-Generator is a python script that generate VCC for testing purposes only

Spider Anongreyhat 10 Oct 23, 2022
Sigma coding youtube - This is a collection of all the code that can be found on my YouTube channel Sigma Coding.

Sigma Coding Tutorials & Resources YouTube • Facebook Support Sigma Coding Patreon • GitHub Sponsor • Shop Amazon Table of Contents Overview Topics Re

Alex Reed 927 Jan 08, 2023
Генератор отчетов на Python с использованием библиотеки docx для работы с word-файлами и запросов к сервису

Генератор отчетов на Python с использованием библиотеки docx для работы с word-файлами и запросов к сервису

Semyon Esaev 2 Jun 24, 2022
Intelligent Employer Profiling Platform.

Intelligent Employer Profiling Platform Setup Instructions Generating Model Data Ensure that Python 3.9+ and pip is installed. Install project depende

Harvey Donnelly 2 Jan 09, 2022
Dungeon Dice Rolls is an aplication that the user can roll dices (d4, d6, d8, d10, d12, d20 and d100) and store the results in one of the 6 arrays.

Dungeon Dice Rolls is an aplication that the user can roll dices (d4, d6, d8, d10, d12, d20 and d100) and store the results in one of the 6 arrays.

Bracero 1 Dec 31, 2021
Scripts for hosting urbit in production-ish

Urbit Sysops Contains some helpful scripts for hosting Urbit. There are two variants included in this repo: one using docker, and one using plain syst

Jōshin 12 Sep 25, 2022
Rock-paper-scissors basic game in terminal with Python

piedra-papel-tijera Juego básico de piedra, papel o tijera en terminal con Python. El juego incluye: Nombre de jugador Número de veces a jugar Resulta

Isaías Flores 1 Dec 14, 2021
Appointment Tracker that allows user to input client information and update if needed.

Appointment-Tracker Appointment Tracker allows an assigned admin to input client information regarding their appointment and their appointment time. T

IS Coding @ KSU 1 Nov 30, 2021
Heads Down Application for Mac OSX

Heads Down A Mac app that lives in your ribbon—with a click of the mouse, temporarily block distracting websites and applications to encourage "heads

20 Mar 10, 2021
Python3 Interface to numa Linux library

py-libnuma is python3 interface to numa Linux library so that you can set task affinity and memory affinity in python level for your process which can help you to improve your code's performence.

Dalong 13 Nov 10, 2022
This is a practice on Airflow, which is building virtual env, installing Airflow and constructing data pipeline (DAGs)

airflow-test This is a practice on Airflow, which is Builing virtualbox env and setting Airflow on that env Installing Airflow using python virtual en

Jaeyoung 1 Nov 01, 2021
A reproduction repo for a Scheduling bug in AirFlow 2.2.3

A reproduction repo for a Scheduling bug in AirFlow 2.2.3

Ilya Strelnikov 1 Feb 09, 2022
Rufus port to linux, writed on Python3

Rufus-for-Linux Rufus port to linux, writed on Python3 Программа будет иметь тот же интерфейс что и оригинал, и тот же функционал. Программа создается

6 Jan 07, 2022
Class and mathematical functions for quaternion numbers.

Quaternions Class and mathematical functions for quaternion numbers. Installation Python This is a Python 3 module. If you don't have Python installed

3 Nov 08, 2022
Python project setup, updater, and launcher

Launcher Python project setup, updater, and launcher Purpose: Increase project productivity and provide features easily. Once installed as a git submo

DAAV, LLC 1 Jan 07, 2022
a simple functional programming language compiler written in python

Functional Programming Language A compiler for my small functional language. Written in python with SLY lexer/parser generator library. Requirements p

Ashkan Laei 3 Nov 05, 2021
The code submitted for the Analytics Vidhya Jobathon - February 2022

Introduction On February 11th, 2022, Analytics Vidhya conducted a 3-day hackathon in data science. The top candidates had the chance to be selected by

11 Nov 21, 2022
I³ Tracker for Essential Open Innovation Datasets

I³ Tracker for Essential Open Innovation Datasets This repository is set up to track, version, and contribute updates to the I³ Essential Open Innovat

1 Feb 08, 2022
Islam - This is a simple python script.In this script I have written all the suras of Al Quran. As a result, by using this script, you can know the number of any sura at the moment.

Introduction: If you want to know sura number of al quran by just typing the name of sura than you can use this script. Usage in termux: $ pkg install

Fazle Rabbi 1 Jan 02, 2022
Python with braces. Because Python is awesome, but whitespace is awful.

Bython Python with braces. Because Python is awesome, but whitespace is awful. Bython is a Python preprosessor which translates curly brackets into in

1 Nov 04, 2021