Bert and pals: Projected attention layers for efficient adaptation in multi-task learning AC Stickland, I Murray International Conference on Machine Learning, 5986-5995, 2019 | 303 | 2019 |
Gpqa: A graduate-level google-proof q&a benchmark D Rein, BL Hou, AC Stickland, J Petty, RY Pang, J Dirani, J Michael, ... arXiv preprint arXiv:2311.12022, 2023 | 175 | 2023 |
The reversal curse: Llms trained on" a is b" fail to learn" b is a" L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ... arXiv preprint arXiv:2309.12288, 2023 | 130 | 2023 |
Recipes for adapting pre-trained monolingual and multilingual models to machine translation AC Stickland, X Li, M Ghazvininejad arXiv preprint arXiv:2004.14911, 2020 | 47 | 2020 |
Multilingual domain adaptation for NMT: Decoupling language and domain information with adapters AC Stickland, A Berard, V Nikoulina arXiv preprint arXiv:2110.09574, 2021 | 30 | 2021 |
Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, and Owain Evans L Berglund, AC Stickland, M Balesni Taken out of context: On measuring situational awareness in llms, 2023 | 26 | 2023 |
Deep transformers with latent depth X Li, A Cooper Stickland, Y Tang, X Kong Advances in Neural Information Processing Systems 33, 1736-1746, 2020 | 23 | 2020 |
Diverse ensembles improve calibration AC Stickland, I Murray arXiv preprint arXiv:2007.04206, 2020 | 22 | 2020 |
Taken out of context: On measuring situational awareness in LLMs L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ... arXiv preprint arXiv:2309.00667, 2023 | 17 | 2023 |
Targeted latent adversarial training improves robustness to persistent harmful behaviors in llms A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ... arXiv e-prints, arXiv: 2407.15549, 2024 | 13 | 2024 |
Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, and Owain Evans. 2023. Taken out of context: On measuring situational awareness in LLMs L Berglund, AC Stickland, M Balesni arXiv preprint arXiv:2309.00667, 2023 | 10 | 2023 |
When does Parameter-Efficient Transfer Learning Work for Machine Translation? A Üstün, AC Stickland arXiv preprint arXiv:2205.11277, 2022 | 8 | 2022 |
Steering without side effects: Improving post-deployment control of language models AC Stickland, A Lyzhov, J Pfau, S Mahdi, SR Bowman arXiv preprint arXiv:2406.15518, 2024 | 7 | 2024 |
Robustification of multilingual language models to real-world noise in crosslingual zero-shot settings with robust contrastive pretraining AC Stickland, S Sengupta, J Krone, S Mansour, H He arXiv preprint arXiv:2210.04782, 2022 | 7 | 2022 |
Future events as backdoor triggers: Investigating temporal vulnerabilities in llms S Price, A Panickssery, S Bowman, AC Stickland arXiv preprint arXiv:2407.04108, 2024 | 3 | 2024 |
Latent adversarial training improves robustness to persistent harmful behaviors in llms A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ... arXiv preprint arXiv:2407.15549, 2024 | 1 | 2024 |
Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining. AC Stickland, S Sengupta, J Krone, S Mansour, H He arXiv preprint arXiv:2210.04782, 2022 | 1 | 2022 |
Regularising Fisher Information Improves Cross-lingual Generalisation AC Stickland, I Murray Proceedings of the 1st Workshop on Multilingual Representation Learning, 238-241, 2021 | 1 | 2021 |
Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods J Doshi, AC Stickland arXiv preprint arXiv:2411.12103, 2024 | | 2024 |
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs A Ewart, A Sheshadri, PH Guo, A Lynch, C Wu, V Hebbar, H Sleight, ... Workshop on Socially Responsible Language Modelling Research, 0 | | |