Movatterモバイル変換

Skip to content

#

mechanistic-interpretability

Here are 135 public repositories matching this topic...

Language:All

Filter by language

All135 Python63 Jupyter Notebook52 HTML3 JavaScript2 GLSL1 Rust1 Shell1 TeX1 TypeScript1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

pyvene

stanfordnlp /pyvene

Stanford NLP Python library for understanding and improving PyTorch models via interventions

intervention interpretability mechanistic-interpretability activation-intervention activation-patching

UpdatedOct 13, 2025
Python

ruizheliUOA /Awesome-Interpretability-in-Large-Language-Models

This repository collects all relevant resources about interpretability in LLMs

dictionary-learning sparse-autoencoder interpretability-and-explainability mechanistic-interpretability

UpdatedNov 1, 2024

itsqyh /Awesome-LMMs-Mechanistic-Interpretability

A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.

generative-model generative paperlist vision-models large-language-models mechanistic-interpretability large-vision-language-models large-multimodal-models vision-foundation-model

UpdatedOct 20, 2025

OpenMOSS /Language-Model-SAEs

Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.

sparse-autoencoders interpretability sparse-dictionary mechanistic-interpretability

UpdatedNov 29, 2025
Python

stanfordnlp /axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

intervention interpretability large-language-models mechanistic-interpretability llm-steering

UpdatedJun 25, 2025
Python

MadryLab /modelcomponents

Decomposing and Editing Predictions by Modeling Model Computation

attribution pytorch interpretability model-editing mechanistic-interpretability

UpdatedJun 12, 2024
Jupyter Notebook

steering-vectors /steering-vectors

Steering vectors for transformer language models in Pytorch / Huggingface

nlp ai pytorch gpt huggingface mechanistic-interpretability representation-engineering

UpdatedFeb 21, 2025
Python

pauljblazek /deepdistilling

Mechanistically interpretable neurosymbolic AI (Nature Comput Sci 2024): losslessly compressing NNs to computer code and discovering new algorithms which generalize out-of-distribution and outperform human-designed algorithms

program-synthesis knowledge-distillation inductive-logic-programming domain-adaptation explainable-ai interpretable distilling neurosymbolic model-distillation out-of-distribution-generalization mechanistic-interpretability

UpdatedFeb 20, 2024
Python

jbloomAus /DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks

reinforcement-learning mechanistic-interpretability

UpdatedOct 23, 2023
Jupyter Notebook

Alsace08 /Chain-of-Embedding

[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"

interpretability trustworthy-ai large-language-models mechanistic-interpretability self-evaluation hallucination-detection iclr-2025

UpdatedDec 19, 2024
Python

epfl-dlab /llm-latent-language

Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".

multilingual-nlp llm mechanistic-interpretability llama2

UpdatedMar 11, 2024
Jupyter Notebook

apartresearch /interpretability-starter

🧠 Starter templates for doing interpretability research

interpretability interpretability-jam alignment-jam mechanistic-interpretability

UpdatedJul 16, 2023

wesg52 /sparse-probing-paper

Sparse probing paper full code.

ai-safety interpretability ai-alignment mechanistic-interpretability

UpdatedDec 17, 2023
Jupyter Notebook

taufeeque9 /codebook-features

Sparse and discrete interpretability tool for neural networks

transformers features language-model interpretability codebook mechanistic-interpretability

UpdatedFeb 12, 2024
Python

Butanium /nnterp

Unified access to Large Language Model modules using NNsight

mechanistic-interpretability nnsight patchscopes

UpdatedNov 19, 2025
Python

automated-brain-explanations

microsoft /automated-brain-explanations

Generating and validating natural-language explanations for the brain.

data-science machine-learning natural-language-processing neuroscience artificial-intelligence fmri gpt explanation language-model interpretability xai fmri-data-analysis huggingface gpt4 large-language-models ai-for-science mechanistic-interpretability automated-interpretability interpretable-embeddings

UpdatedNov 11, 2025
Jupyter Notebook

Trustworthy-ML-Lab /CLIP-dissect

[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs

deep-neural-networks computer-vision deep-learning interpretable-deep-learning explainable-ai interpretable-machine-learning mechanistic-interpretability

UpdatedNov 6, 2023
Jupyter Notebook

aryamanarora /causalgym

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

benchmark causality interpretability mechanistic-interpretability syntaxgym

UpdatedNov 30, 2024
Python

EleutherAI /bergson

Mapping out the "memory" of neural nets with data attribution

interpretability influence-functions mechanistic-interpretability data-attribution

UpdatedNov 29, 2025
Python

yash-srivastava19 /arrakis

Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.

transformer garcon explainable-ai mechanistic-interpretability anthropic transformerlens

UpdatedApr 22, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to themechanistic-interpretability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with themechanistic-interpretability topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2025 Movatter.jp