Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Overview

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

This is an accompanying repository to the ICAIL 2021 paper entitled "Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains". All the data and the code used in the experiments reported in the paper are to be found here.

Data

The data set consists of 807 adjudicatory decisions from 7 different countries (6 languages) annotated in terms of the following type system:

  • Out of Scope - Parts outside of the main document body (e.g., metadata, editorial content, dissents, end notes, appendices).
  • Heading - Typically an incomplete sentence or marker starting a section (e.g., “Discussion,” “Analysis,” “II.”).
  • Background - The part where the court describes procedural history, relevant facts, or the parties’ claims.
  • Analysis - The section containing reasoning of the court, issues, and application of law to the facts of the case.
  • Introductory Summary - A brief summary of the case at the beginning of the decision.
  • Outcome - A few sentences stating how the case was decided (i.e, the overall outcome of the case).

The country specific subsets:

  • Canada - Random selection of cases retrieved from www.canlii.org from multiple provinces. The selection is not limited to any specific topic or court.
  • Czech Republic - A random selection of cases from Constitutional Court (30), Supreme Court (40), and Supreme Administrative Court (30). Temporal distribution was taken into account.
  • France - A selection of cases decided by Cour de cassation between 2011 and 2019. A stratified sampling based on the year of publication of the decision was used to select the cases.
  • Germany - A stratified sample from the federal jurisprudence database spanning all federal courts (civil, criminal, labor, finance, patent, social, constitutional, and administrative).
  • Italy - The top 100 cases of the criminal courts stored between 2015 and 2020 mentioning “stalking” and keyed to the Article 612 bis of the Criminal Code.
  • Poland - A stratified sample from trial-level, appellate, administrative courts, the Supreme Court, and the Constitutional tribunal. The cases mention “democratic country ruled by law.”
  • U.S.A. I - Federal district court decisions in employment law mentioning “motion for summary judgment,” “employee,” and “independent contractor.”
  • U.S.A. II - Administrative decisions from the U.S. Department of Labor. Top 100 ordered in reverse chronological rulings order, starting in October 2020, were selected.

For more detailed information, please, refer to the original paper.

How to Use

ICAIL 2021 Data

The data used in the ICAIL 2021 experiments can be found in the following paths:

data/Country-Language-*/annotator-*-ICAIL2021.csv

Note that the Canadian subset could not be included in this repository due to concerns about personal information protection in Canada. However, it can be obtained upon request at [email protected]. Once you obtain the data, you just need to create data/Canada-EN-1 directory and place all the files there.

If you would like to experiment with different preprocessing techniques the original texts are placed in the following paths:

data/Country-Language-*/texts

You can find the annotations corresponding to these texts here:

data/Country-Language-*/annotator-*.csv

The texts cleaned of the Out of Scope and Heading segments (via dataset_clean.py) are placed in the following paths:

data/Country-Language-*/texts-clean-annotator-*

Note that the processing depends on annotations. Hence, there are several versions of documents at this stage if there were multiple annotators. The annotations corresponding to the cleaned texts are here:

data/Country-Language-*/annotator-*-clean.csv

The dataset_ICAIL2021.py has the processing code that has been applied to the cleaned texts and annotations to generate the ICAIL 2021 dataset (see above). Note, that the code will skip the Czech Republic subset by default. This is because this subset requires an external resource for sentence segmentation (czech-pdt-ud-X.X-XXXXXX.udpipe). You first need to obtain the file at https://universaldependencies.org/. Then, you need to place it into the data directory. Then, you can remove the Czech_Republic-CZ-1 string from the EXCLUDED tuple in dataset_ICAIL2021.py. Finally, you need to replace the data/czech-pdt-ud-2.5-191206.udpipe string in the utils.py to correspond to the file that you have downloaded. After these changes, the code will also operate on the Czech Republic part of the dataset.

Dataset Statistics

To replicate the inter-annotator agreement analysis performed in the ICAIL 2021 paper you can use the ia_agreement.ipynb notebook.

To generate the dataset statistics reported in the ICAIL 2021 paper you can use the dataset_statistics.ipynb notebook.

Experiments

The file ICAIL2021_experiments.ipynb contains the code necessary to run the code presented in the paper. This includes the code to embed the sentences of the cases into a multilingual vector representation, the definition of the Gated Recurrent Unit model and the code to train and evaluated along the different experiments described in the paper. It also contains the code to create the visualizations presented in the discussion section of the paper.

The notebook can be run in two different ways:

Attribution

We kindly ask you to cite the following paper:

@inproceedings{savelka2021,
    title={Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains},
    author={Jaromir Savelka and Hannes Westermann and Karim Benyekhlef and Charlotte S. Alexander and Jayla C. Grant and David Restrepo Amariles and Rajaa El Hamdani and S\'{e}bastien Mee\`{u}s and Aurore Troussel and Micha\l\ Araszkiewicz and Kevin D. Ashley and Alexandra Ashley and Karl Branting and Mattia Falduti and Matthias Grabmair and Jakub Hara\v{s}ta and Tereza Novotn\'a, Elizabeth Tippett and Shiwanni Johnson},
    year={2021},
    booktitle={Proceedings of the 18th International Conference on Artificial Intelligence and Law},
    publisher={Association for Computing Machinery},
    doi={10.1145/3462757.3466149}
}

Jaromir Savelka, Hannes Westermann, Karim Benyekhlef, Charlotte S. Alexander, Jayla C. Grant, David Restrepo Amariles, Rajaa El Hamdani, Sébastien Meeùs, Aurore Troussel, Michał Araszkiewicz, Kevin D. Ashley, Alexandra Ashley, Karl Branting, Mattia Falduti, Matthias Grabmair, Jakub Harašta, Tereza Novotná, Elizabeth Tippett, and Shiwanni Johnson. 2021. Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains. In Eighteenth International Conference for Artificial Intelligence and Law (ICAIL’21), June 21–25, 2021, São Paulo, Brazil. ACM, New York,NY, USA, 10 pages. https://doi.org/10.1145/3462757.3466149

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining Our code is based on Learning Attention-based Embed

宋朝都 4 Aug 07, 2022
Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds

Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds Xinxin Zuo, Sen Wang, Minglun Gong, Li Cheng Prerequisites We have tested the code on Ubun

41 Dec 12, 2022
What can linearized neural networks actually say about generalization?

What can linearized neural networks actually say about generalization? This is the source code to reproduce the experiments of the NeurIPS 2021 paper

gortizji 11 Dec 09, 2022
nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 610 Dec 28, 2022
Progressive Growing of GANs for Improved Quality, Stability, and Variation

Progressive Growing of GANs for Improved Quality, Stability, and Variation — Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NV

Tero Karras 5.9k Jan 05, 2023
PyTorch Implementation of Sparse DETR

Sparse DETR By Byungseok Roh*, Jaewoong Shin*, Wuhyun Shin*, and Saehoon Kim at Kakao Brain. (*: Equal contribution) This repository is an official im

Kakao Brain 113 Dec 28, 2022
Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data

We proposed a new approach to detect anomalies of mobile robot data. We investigate each data seperately with two clustering method hierarchical and k-means. There are two sub-method that we used for

Zekeriyya Demirci 1 Jan 09, 2022
Code Impementation for "Mold into a Graph: Efficient Bayesian Optimization over Mixed Spaces"

Code Impementation for "Mold into a Graph: Efficient Bayesian Optimization over Mixed Spaces" This repo contains the implementation of GEBO algorithm.

Jaeyeon Ahn 2 Mar 22, 2022
Multi-Stage Progressive Image Restoration

Multi-Stage Progressive Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Sh

Syed Waqas Zamir 859 Dec 22, 2022
Deep Learning Theory

Deep Learning Theory 整理了一些深度学习的理论相关内容,持续更新。 Overview Recent advances in deep learning theory 总结了目前深度学习理论研究的六个方向的一些结果,概述型,没做深入探讨(2021)。 1.1 complexity

fq 103 Jan 04, 2023
Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

by Matyáš Boháček and Marek Hrúz, University of West Bohemia Should you have any questions or inquiries, feel free to contact us here. Repository acco

Matyáš Boháček 30 Dec 30, 2022
Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

546 Final Project: Masked Autoencoder Haoran Tang, Qirui Wu 1. Training To train the network, please run mae_pretraining.py. Please modify folder path

Haoran Tang 0 Apr 22, 2022
Easily Process a Batch of Cox Models

ezcox: Easily Process a Batch of Cox Models The goal of ezcox is to operate a batch of univariate or multivariate Cox models and return tidy result. ⏬

Shixiang Wang 15 May 23, 2022
A Real-Time-Strategy game for Deep Learning research

Description DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provi

Centre for Artificial Intelligence Research (CAIR) 156 Dec 19, 2022
DANA paper supplementary materials

DANA Supplements This repository stores the data, results, and R scripts to generate these reuslts and figures for the corresponding paper Depth Norma

0 Dec 17, 2021
Rational Activation Functions - Replacing Padé Activation Units

Rational Activations - Learnable Rational Activation Functions First introduce as PAU in Padé Activation Units: End-to-end Learning of Activation Func

<a href=[email protected]"> 38 Nov 22, 2022
[SDM 2022] Towards Similarity-Aware Time-Series Classification

SimTSC This is the PyTorch implementation of SDM2022 paper Towards Similarity-Aware Time-Series Classification. We propose Similarity-Aware Time-Serie

Daochen Zha 49 Dec 27, 2022
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

VITA lab at EPFL 50 Dec 29, 2022
League of Legends Reinforcement Learning Environment (LoLRLE) multiple training scenarios using PPO.

League of Legends Reinforcement Learning Environment (LoLRLE) About This repo contains code to train an agent to play league of legends in a distribut

2 Aug 19, 2022