Urmăriți
Christopher Olah
Christopher Olah
Anthropic
Adresă de e-mail confirmată pe google.com - Pagina de pornire
Titlu
Citat de
Citat de
Anul
TensorFlow: Large-scale machine learning on heterogeneous systems
M Abadi, A Agarwal, P Barham, E Brevdo, Z Chen, C Citro, GS Corrado, ...
59782*2015
Conditional image synthesis with auxiliary classifier gans
A Odena, C Olah, J Shlens
International conference on machine learning, 2642-2651, 2017
43872017
Understanding LSTM Networks
C Olah
colah.github.io, 2015
3240*2015
Concrete problems in AI safety
D Amodei, C Olah, J Steinhardt, P Christiano, J Schulman, D Mané
arXiv preprint arXiv:1606.06565, 2016
31902016
Deconvolution and Checkerboard Artifacts
A Odena, V Dumoulin, C Olah
Distill, 2016
19612016
Training a helpful and harmless assistant with reinforcement learning from human feedback
Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ...
arXiv preprint arXiv:2204.05862, 2022
16582022
Feature visualization
C Olah, A Mordvintsev, L Schubert
Distill 2 (11), e7, 2017
1555*2017
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
12842022
Inceptionism: Going deeper into neural networks
A Mordvintsev, C Olah, M Tyka
Google research blog 20 (14), 5, 2015
1100*2015
The building blocks of interpretability
C Olah, A Satyanarayan, I Johnson, S Carter, L Schubert, K Ye, ...
Distill 3 (3), e10, 2018
898*2018
Document embedding with paragraph vectors
AM Dai
arXiv preprint arXiv:1507.07998, 2015
5922015
A mathematical framework for transformer circuits
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Transformer Circuits Thread 1 (1), 12, 2021
591*2021
In-context learning and induction heads
C Olsson, N Elhage, N Nanda, N Joseph, N DasSarma, T Henighan, ...
arXiv preprint arXiv:2209.11895, 2022
572*2022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
4772022
Zoom in: An introduction to circuits
C Olah, N Cammarata, L Schubert, G Goh, M Petrov, S Carter
Distill 5 (3), e00024. 001, 2020
4242020
A general language assistant as a laboratory for alignment
A Askell, Y Bai, A Chen, D Drain, D Ganguli, T Henighan, A Jones, ...
arXiv preprint arXiv:2112.00861, 2021
3922021
Multimodal neurons in artificial neural networks
G Goh, N Cammarata, C Voss, S Carter, M Petrov, L Schubert, A Radford, ...
Distill 6 (3), e30, 2021
3802021
Toy models of superposition
N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ...
arXiv preprint arXiv:2209.10652, 2022
3102022
Towards monosemanticity: Decomposing language models with dictionary learning
T Bricken, A Templeton, J Batson, B Chen, A Jermyn, T Conerly, N Turner, ...
Transformer Circuits Thread 2, 2023
3032023
Predictability and surprise in large generative models
D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ...
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022
2972022
Sistemul nu poate realiza operația în acest moment. Încercați din nou mai târziu.
Articole 1–20