You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Therankings.ipynb notebook analyzes and compares different centrality metrics for a citation network. It loads node and edge data from JSON files, calculates various centrality metrics (like degree, betweenness, closeness, etc.), and compares them against ground truth measures such as importance and document type.
Project Setup
To set up the project, follow these steps:
Clone the Repository:Clone the repository to your local machine using the following command:
Navigate to the Project Directory:Change into the project directory:
cd rankings
Create a Virtual Environment (optional but recommended):You can create a virtual environment to manage dependencies:
python -m venv venvsource venv/bin/activate# On Windows use `venv\Scripts\activate`
Install Dependencies:Install the required packages using pip:
pip install -r requirements.txt
Launch Jupyter Notebook:Start Jupyter Notebook with the following command:
jupyter notebook
Open the Rankings Notebook:In the Jupyter interface, openrankings.ipynb to begin your analysis.
Functionality ofrankings.ipynb
1. Data Loading
The notebook begins by loading node and edge data from JSON files. Ensure that your data files are in the correct format and located in the specified directory.You can run theload.py script to load the data into thedata/ECHR directory.
Preprocessing steps are applied to clean and prepare the data for analysis. This includes:
Converting document types to numeric values.
Filtering out rows with uncomputed metric values.
Filtering out rows with Nan doctypebranch
3. Centrality Calculation
The notebook calculates various centrality measures using the NetworkX library. Key centrality metrics include:
Degree Centrality
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality
PageRank
Disruption
4. Composite Ranking
The notebook creates composite rankings based on the best-performing centrality measures for predicting high and low relevance scores. It includes:
Error bar plots for centrality measures against ground truth scores.
Functions to find the best centrality measures and create composite rankings.
5. Correlation Analysis
The notebook calculates correlations between individual centrality measures and ground truth scores, as well as between composite rankings and ground truths. It visualizes these correlations using plots.
6. Network Analysis
Theanalyze_network() function performs comprehensive network analysis using various centrality measures and composite rankings. It returns anAnalysisResults dictionary containing:
Correlation coefficients between rankings and ground truths
Best performing centrality measures for each ground truth
Composite ranking results
The final processed DataFrame with all measures included
7. Comparison of Networks
Thecompare_networks() function allows for the comparison of results across different networks. It analyzes:
Correlation comparisons between centrality measures and ground truth metrics across networks.
Ranking comparisons to see how centrality measures rank relative to each other in different networks.
Conclusion
By following the steps outlined above, you can effectively utilize therankings.ipynb notebook to analyze and visualize centrality metrics in citation networks. Feel free to modify the notebook to suit your specific analysis needs.