Code To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment.

Related tags

Deep Learningcoliee
Overview

COLIEE 2021 - task 2: Legal Case Entailment

This repository contains the code to reproduce NeuralMind's submissions to COLIEE 2021 presented in the paper To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment. There has been mounting evidence that pretrained language models fine-tuned on large and diverse supervised datasets can transfer well to a variety of out-of-domain tasks. In this work, we investigate this transfer ability to the legal domain. For that, we participated in the legal case entailment task of COLIEE 2021, in which we use such models with no adaptations to the target domain. Our submissions achieved the highest scores, surpassing the second-best submission by more than six percentage points. Our experiments confirm a counter-intuitive result in the new paradigm of pretrained language models: that given limited labeled data, models with little or no adaption to the target task can be more robust to changes in the data distribution and perform better on held-out datasets than models fine-tuned on it.

Models

monoT5-zero-shot: We use a model T5 Large fine-tuned on MS MARCO, a dataset of approximately 530k query and relevant passage pairs. We use a checkpoint available at Huggingface’smodel hub that was trained with a learning rate of 10−3 using batches of 128 examples for 10k steps, or approximately one epoch of the MS MARCO dataset. In each batch, a roughly equal number of positive and negative examples are sampled.

monoT5: We further fine-tune monoT5-zero-shot on the COLIEE 2020 training set following a similar training procedure described for monoT5-zero-shot. The model is fine-tuned with a learning rate of 10−3 for 80 steps using batches of size 128, which corresponds to 20 epochs. Each batch has the same number of positive and negative examples.

DeBERTa: Decoding-enhanced BERT with disentangled attention(DeBERTa) improves on the original BERT and RoBERTa architectures by introducing two techniques: the disentangled attention mechanism and an enhanced mask decoder. Both improvements seek to introduce positional information to the pretraining procedure, both in terms of the absolute position of a token and the relative position between them. We fine-tune DeBERTa on the COLIEE 2020 training set following a similar training procedure described for monoT5.

DebertaT5 (Ensemble): We use the following method to combine the predictions of monoT5 and DeBERTa (both fine-tuned on COLIEE 2020 dataset): We concatenate the final set of paragraphs selected by each model and remove duplicates, preserving the highest score. It is important to note that our method does not combine scores between models. The final answer for each test example is composed of individual answers from one or both models. It ensures that only answers with a certain degree of confidence are maintained, which generally leads to an increase in precision.

Results

Model Train data Evaluation F1 Description
Median of submissions Coliee 58.60
Coliee 2nd best team Coliee 62.74
DeBERTa (ours) Coliee Coliee 63.39 Single model
monoT5 (ours) Coliee Coliee 66.10 Single model
monoT5-zero-shot (ours) MS Marco Coliee 68.72 Single model
DebertaT5 (ours) Coliee Coliee 69.12 Ensemble

In this table, we present the results. Our main finding is that our zero-shot model achieved the best result of a single model on 2021 test data, outperforming DeBERTa and monoT5, which were fine-tuned on the COLIEE dataset. As far as we know, this is the first time that a zero-shot model outperforms fine-tuned models in the task of legal case entailment. Given limited annotated data for fine-tuning and a held-out test data, such as the COLIEE dataset, our results suggest that a zero-shot model fine-tuned on a large out-of-domain dataset may be more robust to changes in data distribution and may generalize better on unseen data than models fine-tuned on a small domain-specific dataset. Moreover, our ensemble method effectively combines DeBERTa and monoT5 predictions,achieving the best score among all submissions (row 6). It is important to note that despite the performance of DebertaT5 being the best in the COLIEE competition, the ensemble method requires training time, computational resources and perhaps also data augmentation to perform well on the task, while monoT5-zero-shot does not need any adaptation. The model is available online and ready to use.

Conclusion

Based on those results, we question the common assumption that it is necessary to have labeled training data on the target domain to perform well on a task. Our results suggest that fine-tuning on a large labeled dataset may be enough.

How do I get the dataset?

Those who wish to use previous COLIEE data for a trial, please contact rabelo(at)ualberta.ca.

How do I evaluate?

As our best model is a zero-shot one, we provide only the evaluation script.

References

[1] Document Ranking with a Pretrained Sequence-to-Sequence Model

[2] DeBERTa: Decoding-enhanced BERT with Disentangled Attention

[3] ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

[4] Proceedings of the Eigth International Competition on Legal Information Extraction/Entailment

How do I cite this work?

 @article{to_tune,
    title={To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment},
    author={Moraes, Guilherme and Rodrigues, Ruan and Lotufo, Roberto and Nogueira, Rodrigo},
    journal={ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law June 2021 Pages 295–300},
    url={https://dl.acm.org/doi/10.1145/3462757.3466103},
    year={2021}
}
Owner
NeuralMind
Deep Learning for NLP and image processing
NeuralMind
Repository containing detailed experiments related to the paper "Memotion Analysis through the Lens of Joint Embedding".

Memotion Analysis Through The Lens Of Joint Embedding This repository contains the experiments conducted as described in the paper 'Memotion Analysis

Nethra Gunti 1 Mar 16, 2022
Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

D-HAN The source code of D-HAN This is the source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network. However, only the co

30 Sep 22, 2022
LSTM model trained on a small dataset of 3000 names written in PyTorch

LSTM model trained on a small dataset of 3000 names. Model generates names from model by selecting one out of top 3 letters suggested by model at a time until an EOS (End Of Sentence) character is no

Sahil Lamba 1 Dec 20, 2021
Discord bot for notifying on github events

Git-Observer Discord bot for notifying on github events ⚠️ This bot is meant to write messages to only one channel (implementing this for multiple pro

ilu_vatar_ 0 Apr 19, 2022
Source code for Fathony, Sahu, Willmott, & Kolter, "Multiplicative Filter Networks", ICLR 2021.

Multiplicative Filter Networks This repository contains a PyTorch MFN implementation and code to perform & reproduce experiments from the ICLR 2021 pa

Bosch Research 66 Jan 04, 2023
Sequential GCN for Active Learning

Sequential GCN for Active Learning Please cite if using the code: Link to paper. Requirements: python 3.6+ torch 1.0+ pip libraries: tqdm, sklearn, sc

45 Dec 26, 2022
Python Jupyter kernel using Poetry for reproducible notebooks

Poetry Kernel Use per-directory Poetry environments to run Jupyter kernels. No need to install a Jupyter kernel per Python virtual environment! The id

Pathbird 204 Jan 04, 2023
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

lacmus The program for searching through photos from the air of lost people in the forest using Retina Net neural nwtwork. The project is being develo

Lacmus Foundation 168 Dec 27, 2022
Code for "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection", ICRA 2021

FGR This repository contains the python implementation for paper "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection"(I

Yi Wei 31 Dec 08, 2022
Graph Analysis From Scratch

Graph Analysis From Scratch Goal In this notebook we wanted to implement some functionalities to analyze a weighted graph only by using algorithms imp

Arturo Ghinassi 0 Sep 17, 2022
Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"

Infinitely Deep Bayesian Neural Networks with SDEs This library contains JAX and Pytorch implementations of neural ODEs and Bayesian layers for stocha

Winnie Xu 95 Nov 26, 2021
This is an official implementation for "Video Swin Transformers".

Video Swin Transformer By Ze Liu*, Jia Ning*, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin and Han Hu. This repo is the official implementation of "V

Swin Transformer 981 Jan 03, 2023
a basic code repository for basic task in CV(classification,detection,segmentation)

basic_cv a basic code repository for basic task in CV(classification,detection,segmentation,tracking) classification generate dataset train predict de

1 Oct 15, 2021
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

DeepConsensus DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS)

Google 149 Dec 19, 2022
This repository introduces a short project about Transfer Learning for Classification of MRI Images.

Transfer Learning for MRI Images Classification This repository introduces a short project made during my stay at Neuromatch Summer School 2021. This

Oscar Guarnizo 3 Nov 15, 2022
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds PCAM: Product of Cross-Attention Matrices for Rigid Registration of P

valeo.ai 24 May 31, 2022
ZeroVL - The official implementation of ZeroVL

This repository contains source code necessary to reproduce the results presente

31 Nov 04, 2022
Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images

Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images This repository contains the implementation of the following paper

Seonggwan Ko 9 Jul 30, 2022
the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

EmbedSeg Introduction This repository hosts the version of the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

JugLab 88 Dec 25, 2022
Vehicle Detection Using Deep Learning and YOLO Algorithm

VehicleDetection Vehicle Detection Using Deep Learning and YOLO Algorithm Dataset take or find vehicle images for create a special dataset for fine-tu

Maryam Boneh 96 Jan 05, 2023