Urmăriți
Samuel Marks
Samuel Marks
Postdoctoral researcher, Northeastern University
Adresă de e-mail confirmată pe northeastern.edu - Pagina de pornire
Titlu
Citat de
Citat de
Anul
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
2822023
The geometry of truth: Emergent linear structure in large language model representations of true/false datasets
S Marks, M Tegmark
arXiv preprint arXiv:2310.06824, 2023
562023
Sparse feature circuits: Discovering and editing interpretable causal graphs in language models
S Marks, C Rager, EJ Michaud, Y Belinkov, D Bau, A Mueller
arXiv preprint arXiv:2403.19647, 2024
232024
Open problems and fundamental limitations of reinforcement learning from human feedback. CoRR, abs/2307.15217, 2023. doi: 10.48550
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint ARXIV.2307.15217, 0
8
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
C Denison, M MacDiarmid, F Barez, D Duvenaud, S Kravec, S Marks, ...
arXiv preprint arXiv:2406.10162, 2024
52024
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback (arXiv: 2307.15217). arXiv
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
52023
& Hadfield-Menell, D.(2023). Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 0
5
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
J Treutlein, D Choi, J Betley, C Anil, S Marks, RB Grosse, O Evans
arXiv preprint arXiv:2406.14546, 2024
22024
The quest for the right mediator: A history, survey, and theoretical grounding of causal interpretability
A Mueller, J Brinkmann, M Li, S Marks, K Pal, N Prakash, C Rager, ...
arXiv preprint arXiv:2408.01416, 2024
12024
Measuring progress in dictionary learning for language model interpretability with board game models
A Karvonen, B Wright, C Rager, R Angell, J Brinkmann, L Smith, ...
arXiv preprint arXiv:2408.00113, 2024
12024
Nnsight and ndif: Democratizing access to foundation model internals
J Fiotto-Kaufman, AR Loftus, E Todd, J Brinkmann, C Juang, K Pal, ...
arXiv preprint arXiv:2407.14561, 2024
12024
Prismatic -crystals and Lubin-Tate -modules
S Marks
arXiv preprint arXiv:2303.07620, 2023
2023
Laurent F-Crystals and Lubin-Tate (φq, Γ)-Modules
S Marks
Harvard University, 2023
2023
p-adic Modular Formsa la Serre
S Marks
2020
Derivatives of p-adic Siegel Eisenstein series and p-adic degrees of arithmetic cycles
SP Marks
Princeton University, 2019
2019
p-Adic Properties of Hauptmoduln with Applications to Moonshine
RC Chen, S Marks, M Tyler
SIGMA. Symmetry, Integrability and Geometry: Methods and Applications 15, 033, 2019
2019
Prismatic F-crystals and Lubin-Tate (φq, Γ)-modules
S Marks
Sistemul nu poate realiza operația în acest moment. Încercați din nou mai târziu.
Articole 1–17