Google Scholar

User profiles for Anna Chen

Anna H. Chen

- Verified email at fas.harvard.edu - Cited by 888

Anna Chen

- Verified email at student.unimelb.edu.au - Cited by 710

[PDF] arxiv.org

Constitutional ai: Harmlessness from ai feedback

…, S Kundu, A Askell, J Kernion, A Jones, A Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

As AI systems become more capable, we would like to enlist their help to supervise other AIs.
We experiment with methods for training a harmless AI assistant through self-improvement, …

Save Cite Cited by 584 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Training a helpful and harmless assistant with reinforcement learning from human feedback

Y Bai, A Jones, K Ndousse, A Askell, A Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

We apply preference modeling and reinforcement learning from human feedback (RLHF) to
finetune language models to act as helpful and harmless assistants. We find this alignment …

Save Cite Cited by 683 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned

…, K Ndousse, A Jones, S Bowman, A Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

We describe our early efforts to red team language models in order to simultaneously discover,
measure, and attempt to reduce their potentially harmful outputs. We make three main …

Save Cite Cited by 214 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Language models (mostly) know what they know

…, A Jones, N Elhage, T Hume, A Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

We study whether language models can evaluate the validity of their own claims and predict
which questions they will be able to answer correctly. We first show that larger models are …

Save Cite Cited by 223 Related articles All 2 versions View as HTML

[PDF] arxiv.org

In-context learning and induction heads

…, T Henighan, B Mann, A Askell, Y Bai, A Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

"Induction heads" are attention heads that implement a simple algorithm to complete token
sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence …

Save Cite Cited by 189 Related articles All 2 versions View as HTML

[PDF] arxiv.org

A general language assistant as a laboratory for alignment

A Askell, Y Bai, A Chen, D Drain, D Ganguli… - arXiv preprint arXiv …, 2021 - arxiv.org

Given the broad capabilities of large language models, it should be possible to work towards
a general-purpose, text-based assistant that is aligned with human values, meaning that it …

Save Cite Cited by 214 Related articles All 4 versions View as HTML

[PDF] acm.org

Predictability and surprise in large generative models

…, L Lovitt, A Askell, Y Bai, A Chen… - Proceedings of the …, 2022 - dl.acm.org

Large-scale pre-training has recently emerged as a technique for creating capable, general-purpose,
generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many …

Save Cite Cited by 173 Related articles All 6 versions

[PDF] arxiv.org

Discovering language model behaviors with model-written evaluations

…, S Ringer, K Lukošiūtė, K Nguyen, E Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

As language models (LMs) scale, they develop many novel behaviors, good and bad,
exacerbating the need to evaluate how they behave. Prior work creates evaluations with …

Save Cite Cited by 125 Related articles All 9 versions View as HTML

[PDF] arxiv.org

The capacity for moral self-correction in large language models

…, N Schiefer, TI Liao, K Lukošiūtė, A Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

We test the hypothesis that language models trained with reinforcement learning from human
feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful …

Save Cite Cited by 93 Related articles All 2 versions View as HTML

[PDF] archive.org

Primary Tumor Hypoxia Recruits CD11b⁺/Ly6C^med/Ly6G⁺ Immune Suppressor Cells and Compromises NK Cell Cytotoxicity in the Premetastatic Niche

J Sceneay, MT Chow, A Chen, HM Halse, CSF Wong… - Cancer research, 2012 - AACR

Hypoxia within a tumor acts as a strong selective pressure that promotes angiogenesis,
invasion, and metastatic spread. In this study, we used immune competent bone marrow …

Save Cite Cited by 389 Related articles All 9 versions

Create alert

Cite

Advanced search

Saved to My library

User profiles for Anna Chen

Anna H. Chen

Anna Chen

Constitutional ai: Harmlessness from ai feedback

Training a helpful and harmless assistant with reinforcement learning from human feedback

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned

Language models (mostly) know what they know

In-context learning and induction heads

A general language assistant as a laboratory for alignment

Predictability and surprise in large generative models

Discovering language model behaviors with model-written evaluations

The capacity for moral self-correction in large language models

Primary Tumor Hypoxia Recruits CD11b⁺/Ly6C^med/Ly6G⁺ Immune Suppressor Cells and Compromises NK Cell Cytotoxicity in the Premetastatic Niche

Related searches