SimCSE在中文任务上的简单实验

Related tags

MiscellaneousSimCSE
Overview

SimCSE 中文测试

SimCSE在常见中文数据集上的测试,包含ATECBQLCQMCPAWSXSTS-B共5个任务。

介绍

文件

- utils.py  工具函数
- eval.py  评测主文件

评测

命令格式:

python eval.py [model_type] [pooling] [task_name] [dropout_rate]

使用例子:

python eval.py BERT cls ATEC 0.3

其中四个参数必须传入,含义分别如下:

- model_type: 模型,必须是['BERT', 'RoBERTa', 'WoBERT', 'RoFormer', 'BERT-large', 'RoBERTa-large', 'SimBERT', 'SimBERT-tiny', 'SimBERT-small']之一;
- pooling: 池化方式,必须是['first-last-avg', 'last-avg', 'cls', 'pooler']之一;
- task_name: 评测数据集,必须是['ATEC', 'BQ', 'LCQMC', 'PAWSX', 'STS-B']之一;
- dropout_rate: 浮点数,dropout的比例,如果为0则不dropout;

环境

测试环境:tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.5,如果在其他环境组合下报错,请根据错误信息自行调整代码。

下载

Google官方的两个BERT模型:

关于语义相似度数据集,可以从数据集对应的链接自行下载,也可以从作者提供的百度云链接下载。

其中senteval_cn目录是评测数据集汇总,senteval_cn.zip是senteval目录的打包,两者下其一就好。

相关

交流

QQ交流群:808623966,微信群请加机器人微信号spaces_ac_cn

Owner
苏剑林(Jianlin Su)
科学爱好者
苏剑林(Jianlin Su)
Annotates sequences with Eggnog-mapper and hhblits against PDB70

Annotating "hypothetical" proteins with the PDB See config/ for configuration information. This workflow takes as input a set of protein sequences. It

1 Apr 05, 2022
Translation patch for Hololive ERROR

Translation patch for Hololive ERROR How do I install the patch? Grab the Translation.zip file for the latest version from the releases page, and unzi

18 Jul 20, 2022
Nag0mi ctf problem 2021 writeup

Nag0mi ctf problem 2021 writeup

3 Apr 04, 2022
Pyhexdmp - Python hex dump module

Pyhexdmp - Python hex dump module

25 Oct 23, 2022
SimBiber - A tool for simplifying bibtex with official info

SimBiber: A tool for simplifying bibtex with official info. We often need to sim

336 Jan 02, 2023
A simple script written using symbolic python that takes as input a desired metric and automatically calculates and outputs the Christoffel Pseudo-Tensor, Riemann Curvature Tensor, Ricci Tensor, Scalar Curvature and the Kretschmann Scalar

A simple script written using symbolic python that takes as input a desired metric and automatically calculates and outputs the Christoffel Pseudo-Tensor, Riemann Curvature Tensor, Ricci Tensor, Scal

2 Nov 27, 2021
A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

Windel Bouwman 277 Dec 26, 2022
The Playwright Workshop for TAU: The Homecoming

tau-playwright-workshop This repository contains the instructions and example code for the Playwright workshop for TAU: The Homecoming on December 1,

Pandy Knight 134 Dec 30, 2022
1. 네이버 카페 댓글을 빨리 다는 기능

naver_autoprogram 기능 설명 네이버 카페 댓글을 빨리 다는 기능 네이버 카페 자동 출석 체크 기능 동작 방식 카페 댓글 기능 기본 동작은 주기적인 스케쥴 동작으로 해당 카페 ID 와 특정 API 주소로 대상이 새글을 작성했는지 체크. 해당 대상이 새글 등

1 Dec 22, 2021
A tool to determine optimal projects for Gridcoin crunchers. Maximize your magnitude!

FindTheMag FindTheMag helps optimize your BOINC client for Gridcoin mining. You can group BOINC projects into two groups: "preferred" projects and "mi

7 Oct 04, 2022
A simple package for interacting with the 9kw.eu anti-captcha service.

Welcome to captcha9kw’s documentation! captcha9kw is a smallish Python package for making use of the 9kw.eu services, including solving of interactive

2 Feb 26, 2022
A funny alarm clock I made in python

Wacky-Alarm-Clock Basically, I kept forgetting to take my medications, so I thought it would be a fun project to code my own alarm clock and make it r

1 Nov 18, 2021
Convert your Gyrosco.pe travels to GPX files

gyroscope2gpx This little python joint will do you a favor of taking your "Travel" export from Gyroscope (https://gyrosco.pe) and turn it into a bunch

nick g 4 Oct 02, 2022
Poetry workspace plugin for Python monorepos.

poetry-workspace-plugin Poetry workspace plugin for Python monorepos. Inspired by Yarn Workspaces. Adds a new subcommand group, poetry workspace, whic

Jack Smith 74 Jan 01, 2023
Team Hash Brown Science4Cast Submission

Team Hash Brown Science4Cast Submission This code reproduces Team Hash Brown's (@princengoc, @Xieyangxinyu) best submission (ee5a) for the competition

3 Feb 02, 2022
A Brainfuck interpreter written in Python.

A Brainfuck interpreter written in Python.

Ethan Evans 1 Dec 05, 2021
Simple project to assist in tracking/logging my working hours

Fill working hours Basic script to assist in the logging/tracking of my working hours How it works Create a file called projects.json in this director

Robin Kennedy-Reid 2 Oct 31, 2022
Plugins for Agisoft Metashape

Данные плагины предназначены для расширения функциональных возможностей Agisoft Metashape. Плагины представляют собой отдельные программы с собственным интерфейсом, которые запускаются внутри Agisoft

GeoScan 17 Dec 10, 2022
Simple Denial of Service Program yang di bikin menggunakan bahasa pemograman Python,

Peringatan Tujuan kami share code Indo-DoS hanya untuk bertujuan edukasi / pembelajaran! Dilarang memperjual belikan source ini / memperjual-belikan s

SonLyte 8 Nov 07, 2021
This repo created to complete the task HACKTOBER 2021, contribute now and get your special T-Shirt & Sticker. TO SUPPORT OWNER PLEASE PRESS STAR BUTTON

❤ THIS REPO WILL CLOSED IN 31 OCT 00:00 ❤ This repository will automatically assign the hacktoberfest and hacktoberfest-accepted labels to all submitt

Rajendra Rakha 307 Dec 27, 2022