Urmăriți
Can Rager
Can Rager
Independent
Adresă de e-mail confirmată pe northeastern.edu - Pagina de pornire
Titlu
Citat de
Citat de
Anul
Sparse feature circuits: Discovering and editing interpretable causal graphs in language models
S Marks, C Rager, EJ Michaud, Y Belinkov, D Bau, A Mueller
arXiv preprint arXiv:2403.19647, 2024
232024
Attribution patching outperforms automated circuit discovery
A Syed, C Rager, A Conmy
arXiv preprint arXiv:2310.10348, 2023
222023
The advantage of foraging myopically
CL Rager, U Bhat, O Bénichou, S Redner
Journal of Statistical Mechanics: Theory and Experiment 2018 (7), 073501, 2018
62018
Linearly Structured World Representations in Maze-Solving Transformers
M Ivanitskiy, AF Spies, T Räuker, G Corlouer, C Mathwin, L Quirke, ...
Proceedings of UniReps: the First Workshop on Unifying Representations in …, 2024
3*2024
A Configurable Library for Generating and Manipulating Maze Datasets
M Igorevich Ivanitskiy, R Shah, AF Spies, T Räuker, D Valentine, C Rager, ...
arXiv e-prints, arXiv: 2309.10498, 2023
2*2023
The quest for the right mediator: A history, survey, and theoretical grounding of causal interpretability
A Mueller, J Brinkmann, M Li, S Marks, K Pal, N Prakash, C Rager, ...
arXiv preprint arXiv:2408.01416, 2024
12024
Measuring progress in dictionary learning for language model interpretability with board game models
A Karvonen, B Wright, C Rager, R Angell, J Brinkmann, L Smith, ...
arXiv preprint arXiv:2408.00113, 2024
12024
Nnsight and ndif: Democratizing access to foundation model internals
J Fiotto-Kaufman, AR Loftus, E Todd, J Brinkmann, C Juang, K Pal, ...
arXiv preprint arXiv:2407.14561, 2024
12024
An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
J Dao, YT Lao, C Rager, J Janiak
arXiv preprint arXiv:2310.07325, 2023
12023
Safety of self-assembled neuromorphic hardware
C Rager, K Webster
arXiv preprint arXiv:2301.10201, 2023
2023
Sistemul nu poate realiza operația în acest moment. Încercați din nou mai târziu.
Articole 1–10