{"id":790,"date":"2023-10-29T03:16:26","date_gmt":"2023-10-29T03:16:26","guid":{"rendered":"https:\/\/todaysainews.com\/index.php\/2023\/10\/29\/building-safer-dialogue-agents-google-deepmind\/"},"modified":"2025-04-27T07:32:16","modified_gmt":"2025-04-27T07:32:16","slug":"building-safer-dialogue-agents-google-deepmind","status":"publish","type":"post","link":"https:\/\/todaysainews.com\/index.php\/2023\/10\/29\/building-safer-dialogue-agents-google-deepmind\/","title":{"rendered":"Building safer dialogue agents &#8211; Google DeepMind"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<div class=\"article-cover article-cover--centered\">\n<div class=\"article-cover__header\">\n<p class=\"article-cover__eyebrow glue-label\">Research<\/p>\n<dl class=\"article-cover__meta\">\n<dt class=\"glue-visually-hidden\">Published<\/dt>\n<dd class=\"article-cover__date glue-label\">\n              <time datetime=\"2022-09-22\"><br \/>\n                22 September 2022<br \/>\n              <\/time>\n            <\/dd>\n<dt class=\"glue-visually-hidden\">Authors<\/dt>\n<dd class=\"article-cover__authors\">\n<p data-block-key=\"36jna\">The Sparrow team<\/p>\n<\/dd>\n<\/dl>\n<section class=\"glue-social glue-social--zippy share share--centered article-cover__share\" data-glue-expansion-panel-expand-tooltip=\"Share: Expand to see social channels\" data-glue-expansion-panel-collapse-tooltip=\"Share: Hide social channels\" id=\"share-a6d2eee8-d3c8-42f1-b259-b1c2c44eeeb8\">\n<\/section><\/div>\n<picture class=\"article-cover__image\"><source media=\"(min-width: 1024px)\" type=\"image\/webp\" width=\"1072\" height=\"603\" srcset=\"https:\/\/lh3.googleusercontent.com\/Wxv_lz_mcQUe7_-NqqiUx-eKTKXLqqb26JIFz00ImqRCWBO85HE-QnlVsuaUjaVL7LHLmBfby_sJEKCtrJrOhDqQY5LQhVqTgqLmL_tWvYs3fsnwmQ=w1072-h603-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/Wxv_lz_mcQUe7_-NqqiUx-eKTKXLqqb26JIFz00ImqRCWBO85HE-QnlVsuaUjaVL7LHLmBfby_sJEKCtrJrOhDqQY5LQhVqTgqLmL_tWvYs3fsnwmQ=w2144-h1206-n-nu-rw 2x\"\/><source media=\"(min-width: 600px)\" type=\"image\/webp\" width=\"928\" height=\"522\" srcset=\"https:\/\/lh3.googleusercontent.com\/Wxv_lz_mcQUe7_-NqqiUx-eKTKXLqqb26JIFz00ImqRCWBO85HE-QnlVsuaUjaVL7LHLmBfby_sJEKCtrJrOhDqQY5LQhVqTgqLmL_tWvYs3fsnwmQ=w928-h522-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/Wxv_lz_mcQUe7_-NqqiUx-eKTKXLqqb26JIFz00ImqRCWBO85HE-QnlVsuaUjaVL7LHLmBfby_sJEKCtrJrOhDqQY5LQhVqTgqLmL_tWvYs3fsnwmQ=w1856-h1044-n-nu-rw 2x\"\/><source type=\"image\/webp\" width=\"528\" height=\"297\" srcset=\"https:\/\/lh3.googleusercontent.com\/Wxv_lz_mcQUe7_-NqqiUx-eKTKXLqqb26JIFz00ImqRCWBO85HE-QnlVsuaUjaVL7LHLmBfby_sJEKCtrJrOhDqQY5LQhVqTgqLmL_tWvYs3fsnwmQ=w528-h297-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/Wxv_lz_mcQUe7_-NqqiUx-eKTKXLqqb26JIFz00ImqRCWBO85HE-QnlVsuaUjaVL7LHLmBfby_sJEKCtrJrOhDqQY5LQhVqTgqLmL_tWvYs3fsnwmQ=w1056-h594-n-nu-rw 2x\"\/><img loading=\"lazy\" decoding=\"async\" alt=\"\" height=\"603\" src=\"https:\/\/lh3.googleusercontent.com\/Wxv_lz_mcQUe7_-NqqiUx-eKTKXLqqb26JIFz00ImqRCWBO85HE-QnlVsuaUjaVL7LHLmBfby_sJEKCtrJrOhDqQY5LQhVqTgqLmL_tWvYs3fsnwmQ=w1072-h603-n-nu\" width=\"1072\"\/>\n    <\/picture>\n<\/p><\/div>\n<div class=\"gdm-rich-text rich-text\">\n<h2 data-block-key=\"poyah\">Training an AI to communicate in a way that\u2019s more helpful, correct, and harmless<\/h2>\n<p data-block-key=\"sfhft\">In recent years, large language models (LLMs) have achieved success at a range of tasks such as question answering, summarisation, and dialogue. Dialogue is a particularly interesting task because it features flexible and interactive communication. However, dialogue agents powered by LLMs can express inaccurate or invented information, use discriminatory language, or encourage unsafe behaviour.<\/p>\n<p data-block-key=\"wcqw5\">To create safer dialogue agents, we need to be able to learn from human feedback. Applying reinforcement learning based on input from research participants, we explore new methods for training dialogue agents that show promise for a safer system.<\/p>\n<p data-block-key=\"cwvt5\">In our <a href=\"https:\/\/arxiv.org\/abs\/2209.14375\" rel=\"noopener\" target=\"_blank\">latest paper<\/a>, we introduce <i>Sparrow<\/i> \u2013 a dialogue agent that\u2019s useful and reduces the risk of unsafe and inappropriate answers. Our agent is designed to talk with a user, answer questions, and search the internet using Google when it\u2019s helpful to look up evidence to inform its responses.<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"cm86r\">Our new conversational AI model replies on its own to an initial human prompt.<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"vnr63\">Sparrow is a research model and proof of concept, designed with the goal of training dialogue agents to be more helpful, correct, and harmless. By learning these qualities in a general dialogue setting, Sparrow advances our understanding of how we can train agents to be safer and more useful \u2013 and ultimately, to help build safer and more useful artificial general intelligence (AGI).<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"kxjgm\">Sparrow declining to answer a potentially harmful question.<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<h2 data-block-key=\"vnr63\">How Sparrow works<\/h2>\n<p data-block-key=\"f6n2g\">Training a conversational AI is an especially challenging problem because it\u2019s difficult to pinpoint what makes a dialogue successful. To address this problem, we turn to a form of reinforcement learning (RL) based on people&#8217;s feedback, using the study participants\u2019 preference feedback to train a model of how useful an answer is.<\/p>\n<p data-block-key=\"bagmg\">To get this data, we show our participants multiple model answers to the same question and ask them which answer they like the most. Because we show answers with and without evidence retrieved from the internet, this model can also determine when an answer should be supported with evidence.<\/p>\n<\/div>\n<figure class=\"single-media single-media--large\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"yg6j8\">We ask study participants to evaluate and interact with Sparrow either naturally or adversarially, continually expanding the dataset used to train Sparrow.<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"3ifvv\">But increasing usefulness is only part of the story. To make sure that the model\u2019s behaviour is safe, we must constrain its behaviour. And so, we determine an initial simple set of rules for the model, such as \u201cdon&#8217;t make threatening statements\u201d and \u201cdon&#8217;t make hateful or insulting comments\u201d.<\/p>\n<p data-block-key=\"gitvw\">We also provide rules around possibly harmful advice and not claiming to be a person. These rules were informed by studying existing work on language harms and consulting with experts. We then ask our study participants to talk to our system, with the aim of tricking it into breaking the rules. These conversations then let us train a separate \u2018rule model\u2019 that indicates when Sparrow&#8217;s behaviour breaks any of the rules.<\/p>\n<h2 data-block-key=\"2ltx9\">Towards better AI and better judgments<\/h2>\n<p data-block-key=\"j61bz\">Verifying Sparrow\u2019s answers for correctness is difficult even for experts. Instead, we ask our participants to determine whether Sparrow&#8217;s answers are plausible and whether the evidence Sparrow provides actually supports the answer. According to our participants, Sparrow provides a plausible answer and supports it with evidence 78% of the time when asked a factual question. This is a big improvement over our baseline models. Still, Sparrow isn&#8217;t immune to making mistakes, like hallucinating facts and giving answers that are off-topic sometimes.<\/p>\n<p data-block-key=\"2979c\">Sparrow also has room for improving its rule-following. After training, participants were still able to trick it into breaking our rules 8% of the time, but compared to simpler approaches, Sparrow is better at following our rules under adversarial probing. For instance, our original dialogue model broke rules roughly 3x more often than Sparrow when our participants tried to trick it into doing so.<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"yhe3g\">Sparrow answers a question and follow-up question using evidence, then follows the \u201cDo not pretend to have a human identity\u201d rule when asked a personal question (sample from 9 September, 2022).<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"ezr2m\">Our goal with Sparrow was to build flexible machinery to enforce rules and norms in dialogue agents, but the particular rules we use are preliminary. Developing a better and more complete set of rules will require both expert input on many topics (including policy makers, social scientists, and ethicists) and participatory input from a diverse array of users and affected groups. We believe our methods will still apply for a more rigorous rule set.<\/p>\n<p data-block-key=\"4s4un\">Sparrow is a significant step forward in understanding how to train dialogue agents to be more useful and safer. However, successful communication between people and dialogue agents should not only avoid harm but be aligned with human values for effective and beneficial communication, as discussed in recent work on <a href=\"https:\/\/deepmind.google\/discover\/blog\/in-conversation-with-ai-building-better-language-models\/\">aligning language models with human values<\/a>.<\/p>\n<p data-block-key=\"8p6ip\">We also emphasise that a good agent will still decline to answer questions in contexts where it is appropriate to defer to humans or where this has the potential to deter harmful behaviour. Finally, our initial research focused on an English-speaking agent, and further work is needed to ensure similar results across other languages and cultural contexts.<\/p>\n<p data-block-key=\"dkabk\">In the future, we hope conversations between humans and machines can lead to better judgments of AI behaviour, allowing people to align and improve systems that might be too complex to understand without machine help.<\/p>\n<p data-block-key=\"e06hj\">Eager to explore a conversational path to safe AGI? We\u2019re <a href=\"https:\/\/boards.greenhouse.io\/deepmind\/jobs\/4187868?t=bbda0eea1us\" rel=\"noopener\" target=\"_blank\">currently hiring research scientists<\/a> for our Scalable Alignment team.<\/p>\n<\/div>\n<aside class=\"related-posts\">\n<\/aside><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/deepmind.google\/discover\/blog\/building-safer-dialogue-agents\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Research Published 22 September 2022 Authors The Sparrow team Training an AI to communicate in a way<\/p>\n","protected":false},"author":2,"featured_media":791,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[],"class_list":["post-790","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-deepmind-ai"],"_links":{"self":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/790","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/comments?post=790"}],"version-history":[{"count":1,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/790\/revisions"}],"predecessor-version":[{"id":2677,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/790\/revisions\/2677"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media\/791"}],"wp:attachment":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media?parent=790"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/categories?post=790"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/tags?post=790"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}