Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

mechanistic-interpretability

Here are 135 public repositories matching this topic...

pyvene

Stanford NLP Python library for understanding and improving PyTorch models via interventions

  • UpdatedOct 13, 2025
  • Python

This repository collects all relevant resources about interpretability in LLMs

  • UpdatedNov 1, 2024

A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.

  • UpdatedOct 20, 2025

Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.

  • UpdatedNov 29, 2025
  • Python

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

  • UpdatedJun 25, 2025
  • Python

Decomposing and Editing Predictions by Modeling Model Computation

  • UpdatedJun 12, 2024
  • Jupyter Notebook

Steering vectors for transformer language models in Pytorch / Huggingface

  • UpdatedFeb 21, 2025
  • Python

Mechanistically interpretable neurosymbolic AI (Nature Comput Sci 2024): losslessly compressing NNs to computer code and discovering new algorithms which generalize out-of-distribution and outperform human-designed algorithms

  • UpdatedFeb 20, 2024
  • Python

Interpreting how transformers simulate agents performing RL tasks

  • UpdatedOct 23, 2023
  • Jupyter Notebook

[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"

  • UpdatedDec 19, 2024
  • Python

Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".

  • UpdatedMar 11, 2024
  • Jupyter Notebook

🧠 Starter templates for doing interpretability research

  • UpdatedJul 16, 2023

Sparse probing paper full code.

  • UpdatedDec 17, 2023
  • Jupyter Notebook

Sparse and discrete interpretability tool for neural networks

  • UpdatedFeb 12, 2024
  • Python

Unified access to Large Language Model modules using NNsight

  • UpdatedNov 19, 2025
  • Python
automated-brain-explanations

[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs

  • UpdatedNov 6, 2023
  • Jupyter Notebook

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

  • UpdatedNov 30, 2024
  • Python

Mapping out the "memory" of neural nets with data attribution

  • UpdatedNov 29, 2025
  • Python

Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.

  • UpdatedApr 22, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to themechanistic-interpretability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with themechanistic-interpretability topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp