Automate the case review on legal case documents and find the most critical cases using network analysis

Overview

Automation on Legal Court Cases Review

This project is to automate the case review on legal case documents and find the most critical cases using network analysis.

Short write-up

Affiliation: Institute for Social and Economic Research and Policy, Columbia University

Project Information:

Keywords: Automation, PDF parse, String Extraction, Network Analysis

Software:

  • Python : pdfminer, LexNLP, nltk sklearn
  • R: igraph

Scope:

  1. Parse court documents, extract citations from raw text.
  2. Build citation network, identify important cases in the network.
  3. Extract judge's opinion text and meta information including opinion author, court, decision.
  4. Model training to predict court decision based on opinion text.

Polit Study on 159 Legal Court Documents (in pilot_159 folder)

1. Process PDF documents using Python

Ipython Notebook Description
1.Extraction by LexNLP.ipynb Extract meta inforation use LexNLP package.
2.Layer Analysis on Sigle File. ipynb Use pdfminer to extract the raw text and the paragraph segamentation in the PDF document.
3.Patent Position by Layer.ipynb Identify the position of patent number in extracted layers from PDF.
4.Opinion and Author by Layer.ipynb Extract opinion text, author, decisions from the layers list.
5.Wrap up to Meta Data.ipynb Store extracted meta data to .json or .csv
6.Visualize citation frequency.ipynb Bar plot of the citation frequencies

2. Data: Parse PDF documents via Python

These datasets are NOT included in this public repository for intellectual property and privacy concern

File
pdf2text159.json A dictionary of 3 list: file_name, raw_text, layers.
cite_edge159.csv Edge list of citation network
cite_node159.csv Meta information of each case: case_number, court, dates
reference_extract.csv cited cases in a list for every case, untidy format for analysis
citation159.csv file citation pair, tidy format for calculation
regulation159.csv file regulation pair, tidy format for calculation

3. Analyze and Visualize using R

File
Calculate Citation Frequency.Rmd Analyze reference_extract.csv
Citation Network.Rmd Analyze cite_edge159

4. Visulization Chart Sample

Citation Frequencycase_freq

Citation Networkcitation_net

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in Extraction_Modelling folder)

1. Extract opinion and meta information from raw text data

.ipynb notebook Description
Full Dataset Merge.ipynb Merge the 854 cases dataset
Edge and Node List.ipynb Create edge and node list
Full Extractions.ipynb Extract author, judge panel, opinion text
Clean Opinion Text.ipynb Remove references and special characters in opinion text

2. Datasets

These datasets are NOT included in this public repository for intellectual property and privacy concern

Dataset Description
amy_cases.json large dictionary {file name: raw text} for 854 cases, from Lilian's PDF parsing
full_name_text.json convert amy_cases.json key value pair to two list: file_name, raw_text
cite_edge.csv edge list of citation
cite_node.csv node list contains case_code, case_name, court_from, court_type
extraction854.csv full extractions include case_code, case_name, court_from, court_type, result, author, judge_panel
decision_text.json json file include author, decision(result of the case), opinion (opinion text), cleaned_text (cleaned opinion text)
cleaned_text.csv csv file contains allt the cleaned text
predict_data.csv cleaned dataset for NLP modeling predict court decision

3. Visulization using R

R markdown file
Full Network Graph.Rmd draw the full citation network
Citation Betwwen Nodes.Rmd draw citation between all the available cases
Clean Data For Predictive Modelling.rmd clean text data for predictive modeling

Interactive Graph

Play with Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

ipynb notebook
NLP Predictive Modeling.ipynb Try different preprocessing, and build a logistic regression to predict court decision.

Visulization of the Bi-gram (words) with the strongest coefficient

Bigram

Owner
Yi Yin
Tech & Business Alignment @ Wolfram Research, Social Sciences Research @ Columbia University
Yi Yin
A deceptively simple plotting library for Streamlit

🍅 Plost A deceptively simple plotting library for Streamlit. Because you've been writing plots wrong all this time. Getting started pip install plost

Thiago Teixeira 192 Dec 29, 2022
a plottling library for python, based on D3

Hello August 2013 Hello! Maybe you're looking for a nice Python interface to build interactive, javascript based plots that look as nice as all those

Mike Dewar 1.4k Dec 28, 2022
Generate the report for OCULTest.

Sample report generated in this function Usage example from utils.gen_report import generate_report if __name__ == '__main__': # def generate_rep

Philip Guo 1 Mar 10, 2022
A python script editor for napari based on PyQode.

napari-script-editor A python script editor for napari based on PyQode. This napari plugin was generated with Cookiecutter using with @napari's cookie

Robert Haase 9 Sep 20, 2022
DrawBot lets you draw images taken from the internet on Skribbl.io, Gartic Phone and Paint

DrawBot You don't speak french? No worries, english translation is over here. C'est quoi ? DrawBot est un logiciel codé par V2F qui va prendre possess

V2F 205 Jan 01, 2023
An adaptable Snakemake workflow which uses GATKs best practice recommendations to perform germline mutation calling starting with BAM files

Germline Mutation Calling This Snakemake workflow follows the GATK best-practice recommandations to call small germline variants. The pipeline require

12 Dec 24, 2022
Plot toolbox based on Matplotlib, simple and elegant.

Elegant-Plot Plot toolbox based on Matplotlib, simple and elegant. 绘制效果 绘制过程 数据准备 每种图标类型的目录下有data.csv文件,依据样例数据填入自己的数据。

3 Jul 15, 2022
A guide for using Bootstrap 5 classes in Dash Bootstrap Components V1

dash-bootstrap-cheatsheet This handy interactive cheatsheet makes it easy to use the Bootstrap 5 classes with your Dash app made with the latest versi

10 Dec 22, 2022
Farhad Davaripour, Ph.D. 1 Jan 05, 2022
WebApp served by OAK PoE device to visualize various streams, metadata and AI results

DepthAI PoE WebApp | Bootstrap 4 & Vue.js SPA Dashboard Based on dashmin (https:

Luxonis 6 Apr 09, 2022
A streamlit component for bi-directional communication with bokeh plots.

Streamlit Bokeh Events A streamlit component for bi-directional communication with bokeh plots. Its just a workaround till streamlit team releases sup

Ashish Shukla 123 Dec 25, 2022
Visualization of numerical optimization algorithms

Visualization of numerical optimization algorithms

Zhengxia Zou 46 Dec 01, 2022
Generate knowledge graphs with interesting geometries, like lattices

Geometric Graphs Generate knowledge graphs with interesting geometries, like lattices. Works on Python 3.9+ because it uses cool new features. Get out

Charles Tapley Hoyt 5 Jan 03, 2022
A Bokeh project developed for learning and teaching Bokeh interactive plotting!

Bokeh-Python-Visualization A Bokeh project developed for learning and teaching Bokeh interactive plotting! See my medium blog posts about making bokeh

Will Koehrsen 350 Dec 05, 2022
GDSHelpers is an open-source package for automatized pattern generation for nano-structuring.

GDSHelpers GDSHelpers in an open-source package for automatized pattern generation for nano-structuring. It allows exporting the pattern in the GDSII-

Helge Gehring 76 Dec 16, 2022
Show Data: Show your dataset in web browser!

Show Data is to generate html tables for large scale image dataset, especially for the dataset in remote server. It provides some useful commond line tools and fully customizeble API reference to gen

Dechao Meng 83 Nov 26, 2022
Python code for solving 3D structural problems using the finite element method

3DFEM Python 3D finite element code This python code allows for solving 3D structural problems using the finite element method. New features will be a

Rémi Capillon 6 Sep 29, 2022
Smarthome Dashboard with Grafana & InfluxDB

Smarthome Dashboard with Grafana & InfluxDB This is a complete overhaul of my Raspberry Dashboard done with Flask. I switched from sqlite to InfluxDB

6 Oct 20, 2022
An(other) implementation of JSON Schema for Python

jsonschema jsonschema is an implementation of JSON Schema for Python. from jsonschema import validate # A sample schema, like what we'd get f

Julian Berman 4k Jan 04, 2023
A tool for creating Toontown-style nametags in Panda3D

Toontown-Nametag Toontown-Nametag is a tool for creating Toontown Online/Toontown Rewritten-style nametags in Panda3D. It contains a function, createN

BoggoTV 2 Dec 23, 2021