Giulio Starace thesofakillers

🤔

💡

Achievements

openai/mle-benchopenai/mle-benchPublic
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
Python 643 82
openai/evalsopenai/evalsPublic
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Python 15.7k 2.7k
GPTrue-or-FalseGPTrue-or-FalsePublic
📝🔍 A browser extension that displays the GPT-2 Log Probability of selected text
JavaScript 113 11
nlgoalsnlgoalsPublic
Official repository for my MSc thesis: "Addressing Goal Misgeneralization with Natural Language Interfaces."
TeX 3
infoshareinfosharePublic
Official repository for the paper: "Probing LLMs for Joint Encoding of Linguistic Categories." Findings of EMNLP 2023.
Python 6
dlml-tutorialdlml-tutorialPublic
🤓 A tutorial on the Discretized Logistic Mixture Likelihood (DLML)
Python 8