- Notifications
You must be signed in to change notification settings - Fork0
triztian/caselawcite
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
An analysis project that inspects citations in rulings from Illinois.
As mentioned in the project report we seek to answer the following questions:
- Who is the attorney that has had the most participation in cases?, from private parties?, from the government?
- How much the work in which an attorney is involved is cited, e.g.how influential was the work.
- What is the page count of cases in which an attorney has participated?
Those questions are answered by the following respective Jupyter Notebooks,and the findings presented in the project report:
Bulk case data can be downloaded from the following URL:
mkdir Data&&cd Datacurl https://api.case.law/v1/bulk/22341/download/
Citation can be found here:
cd Data/Illinois-20200302-text/datacurl https://case.law/download/citation_graph/2020-04-28/citations.csv.gz
First be sure to install the required Tools as listed here:
After downloading the data into theData
directory we can use thepython script included in./ETL/hcapetl.py
directory to transform, clean and insertthe data into a SQLite database that will simplify our analysis.
The data must be extracted first with these commands:
DATA=Data/Illinois-20200302-text/dataDPROC=Data/Processedxzcat$DATA/data.jsonl.xz> data.jsonljq -s$DATA/data.jsonl>$DPROC/data.json
The database will be namedhcap.sqlite
and it can be created by the followingcommands:
dbpath=./hcap.sqlite./ETL/hcapetl.py create tables"$dbpath" ./Database/*.ddl.sql./ETL/hcapetl.py create attorneys"$dbpath" ./Data/Processed/data.json./ETL/hcapetl.py create cases"$dbpath" ./Data/Processed/data.json./ETL/hcapetl.py create citations"$dbpath" ./Data/Processed/data.json
Running the full ETL pipeline should take about 10 minutes (excluding data download).
The previous commands can be found in thegendata.sh
script at the root ofthis project.
TheETL
directory has all of the python source necessary to work with the data.To aid with the exploration and cleanup we have the following Jupyter Notebooks:
TheData Exploration file has information aboutthe commands used to gain insights to fragments of the data and to determinea SQL db schema.