{"id":754,"date":"2023-10-27T16:49:51","date_gmt":"2023-10-27T16:49:51","guid":{"rendered":"https:\/\/todaysainews.com\/index.php\/2023\/10\/27\/an-early-warning-system-for-novel-ai-risks-2\/"},"modified":"2025-04-27T07:32:35","modified_gmt":"2025-04-27T07:32:35","slug":"an-early-warning-system-for-novel-ai-risks-2","status":"publish","type":"post","link":"https:\/\/todaysainews.com\/index.php\/2023\/10\/27\/an-early-warning-system-for-novel-ai-risks-2\/","title":{"rendered":"An early warning system for novel AI risks"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<div class=\"article-cover article-cover--centered\">\n<div class=\"article-cover__header\">\n<p class=\"article-cover__eyebrow glue-label\">Research<\/p>\n<dl class=\"article-cover__meta\">\n<dt class=\"glue-visually-hidden\">Published<\/dt>\n<dd class=\"article-cover__date glue-label\">\n              <time datetime=\"2023-05-25\"><br \/>\n                25 May 2023<br \/>\n              <\/time>\n            <\/dd>\n<dt class=\"glue-visually-hidden\">Authors<\/dt>\n<dd class=\"article-cover__authors\">\n<p data-block-key=\"h9hk8\">Toby Shevlane<\/p>\n<\/dd>\n<\/dl>\n<section class=\"glue-social glue-social--zippy share share--centered article-cover__share\" data-glue-expansion-panel-expand-tooltip=\"Share: Expand to see social channels\" data-glue-expansion-panel-collapse-tooltip=\"Share: Hide social channels\" id=\"share-5a282288-f599-442d-a4c0-04dc1e71d4a6\">\n<\/section><\/div>\n<picture class=\"article-cover__image\"><source media=\"(min-width: 1024px)\" type=\"image\/webp\" width=\"1072\" height=\"603\" srcset=\"https:\/\/lh3.googleusercontent.com\/REkFCC8KEOAocMWBwcHOxKM6K2zRs_qpMeUhnmHYkkGSbPPCLRhPDluhoZzx2k6_b4XvgZmhUqeuko9BXZZIPLmGR1q4BycDjLuDFQ5G5FDYPKD0x08=w1072-h603-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/REkFCC8KEOAocMWBwcHOxKM6K2zRs_qpMeUhnmHYkkGSbPPCLRhPDluhoZzx2k6_b4XvgZmhUqeuko9BXZZIPLmGR1q4BycDjLuDFQ5G5FDYPKD0x08=w2144-h1206-n-nu-rw 2x\"\/><source media=\"(min-width: 600px)\" type=\"image\/webp\" width=\"928\" height=\"522\" srcset=\"https:\/\/lh3.googleusercontent.com\/REkFCC8KEOAocMWBwcHOxKM6K2zRs_qpMeUhnmHYkkGSbPPCLRhPDluhoZzx2k6_b4XvgZmhUqeuko9BXZZIPLmGR1q4BycDjLuDFQ5G5FDYPKD0x08=w928-h522-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/REkFCC8KEOAocMWBwcHOxKM6K2zRs_qpMeUhnmHYkkGSbPPCLRhPDluhoZzx2k6_b4XvgZmhUqeuko9BXZZIPLmGR1q4BycDjLuDFQ5G5FDYPKD0x08=w1856-h1044-n-nu-rw 2x\"\/><source type=\"image\/webp\" width=\"528\" height=\"297\" srcset=\"https:\/\/lh3.googleusercontent.com\/REkFCC8KEOAocMWBwcHOxKM6K2zRs_qpMeUhnmHYkkGSbPPCLRhPDluhoZzx2k6_b4XvgZmhUqeuko9BXZZIPLmGR1q4BycDjLuDFQ5G5FDYPKD0x08=w528-h297-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/REkFCC8KEOAocMWBwcHOxKM6K2zRs_qpMeUhnmHYkkGSbPPCLRhPDluhoZzx2k6_b4XvgZmhUqeuko9BXZZIPLmGR1q4BycDjLuDFQ5G5FDYPKD0x08=w1056-h594-n-nu-rw 2x\"\/><img loading=\"lazy\" decoding=\"async\" alt=\"Abstract image of a sphere in the middle of twisted concentric circles in gradients of blue.\" height=\"603\" src=\"https:\/\/lh3.googleusercontent.com\/REkFCC8KEOAocMWBwcHOxKM6K2zRs_qpMeUhnmHYkkGSbPPCLRhPDluhoZzx2k6_b4XvgZmhUqeuko9BXZZIPLmGR1q4BycDjLuDFQ5G5FDYPKD0x08=w1072-h603-n-nu\" width=\"1072\"\/>\n    <\/picture>\n<\/p><\/div>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"fjpl8\"><b>New research proposes a framework for evaluating general-purpose models against novel threats<\/b><\/p>\n<p data-block-key=\"8m5oi\">To pioneer responsibly at the cutting edge of artificial intelligence (AI) research, we must identify new capabilities and novel risks in our AI systems as early as possible.<\/p>\n<p data-block-key=\"864uw\">AI researchers already use a range of <a href=\"https:\/\/crfm.stanford.edu\/helm\/latest\/\" rel=\"noopener\" target=\"_blank\">evaluation benchmarks<\/a> to identify unwanted behaviours in AI systems, such as AI systems making misleading statements, biased decisions, or repeating copyrighted content. Now, as the AI community builds and deploys increasingly powerful AI, we must expand the evaluation portfolio to include the possibility of <i>extreme risks<\/i> from general-purpose AI models that have strong skills in manipulation, deception, cyber-offense, or other dangerous capabilities.<\/p>\n<p data-block-key=\"59ukc\">In our <a href=\"https:\/\/arxiv.org\/abs\/2305.15324\" rel=\"noopener\" target=\"_blank\">latest paper<\/a>, we introduce a framework for evaluating these novel threats, co-authored with colleagues from University of Cambridge, University of Oxford, University of Toronto, Universit\u00e9 de Montr\u00e9al, OpenAI, Anthropic, Alignment Research Center, Centre for Long-Term Resilience, and Centre for the Governance of AI.<\/p>\n<p data-block-key=\"zqxyc\">Model safety evaluations, including those assessing extreme risks, will be a critical component of safe AI development and deployment.<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"xplrm\">An overview of our proposed approach: To assess extreme risks from new, general-purpose AI systems, developers must evaluate for dangerous capabilities and alignment (see below). By identifying the risks early on, this will unlock opportunities to be more responsible when training new AI systems, deploying these AI systems, transparently describing their risks, and applying appropriate cybersecurity standards.<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<h2 data-block-key=\"i21yy\">Evaluating for extreme risks<\/h2>\n<p data-block-key=\"vfjg4\">General-purpose models typically learn their capabilities and behaviours during training. However, existing methods for steering the learning process are imperfect. For example, <a href=\"https:\/\/deepmind.google\/discover\/blog\/how-undesired-goals-can-arise-with-correct-rewards\/\">previous research<\/a> at Google DeepMind has explored how AI systems can learn to pursue undesired goals even when we correctly reward them for good behaviour.<\/p>\n<p data-block-key=\"g01c2\">Responsible AI developers must look ahead and anticipate possible future developments and novel risks. After continued progress, future general-purpose models may learn a variety of dangerous capabilities by default. For instance, it is plausible (though uncertain) that future AI systems will be able to conduct offensive cyber operations, skilfully deceive humans in dialogue, manipulate humans into carrying out harmful actions, design or acquire weapons (e.g. biological, chemical), fine-tune and operate other high-risk AI systems on cloud computing platforms, or assist humans with any of these tasks.<\/p>\n<p data-block-key=\"vz0qq\">People with malicious intentions accessing such models could <a href=\"https:\/\/maliciousaireport.com\/\" rel=\"noopener\" target=\"_blank\">misuse<\/a> their capabilities. Or, due to failures of alignment, these AI models might take harmful actions even without anybody intending this.<\/p>\n<p data-block-key=\"zstuw\">Model evaluation helps us identify these risks ahead of time. Under our framework, AI developers would use model evaluation to uncover:<\/p>\n<ol>\n<li data-block-key=\"okgsr\">To what extent a model has certain \u2018dangerous capabilities\u2019 that could be used to threaten security, exert influence, or evade oversight.<\/li>\n<li data-block-key=\"anuqf\">To what extent the model is prone to applying its capabilities to cause harm (i.e. the model\u2019s alignment). Alignment evaluations should confirm that the model behaves as intended even across a very wide range of scenarios, and, where possible, should examine the model\u2019s internal workings.<\/li>\n<\/ol>\n<p data-block-key=\"65fra\">Results from these evaluations will help AI developers to understand whether the ingredients sufficient for extreme risk are present. The most high-risk cases will involve multiple dangerous capabilities combined together. The AI system doesn\u2019t need to provide all the ingredients, as shown in this diagram:<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"beyfe\">Ingredients for extreme risk: Sometimes specific capabilities could be outsourced, either to humans (e.g. to users or crowdworkers) or other AI systems. These capabilities must be applied for harm, either due to misuse or failures of alignment (or a mixture of both).<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"oe146\">A rule of thumb: the AI community should treat an AI system as highly dangerous if it has a capability profile sufficient to cause extreme harm, <i>assuming<\/i> it\u2019s misused or poorly aligned. To deploy such a system in the real world, an AI developer would need to demonstrate an unusually high standard of safety.<\/p>\n<h2 data-block-key=\"ijhwg\">Model evaluation as critical governance infrastructure<\/h2>\n<p data-block-key=\"ynp91\">If we have better tools for identifying which models are risky, companies and regulators can better ensure:<\/p>\n<ol>\n<li data-block-key=\"fl1ok\"><b>Responsible training:<\/b> Responsible decisions are made about whether and how to train a new model that shows early signs of risk.<\/li>\n<li data-block-key=\"okwja\"><b>Responsible deployment<\/b><i>:<\/i> Responsible decisions are made about whether, when, and how to deploy potentially risky models.<\/li>\n<li data-block-key=\"50q3y\"><b>Transparency:<\/b> Useful and actionable information is reported to stakeholders, to help them prepare for or mitigate potential risks.<\/li>\n<li data-block-key=\"35i45\"><b>Appropriate security:<\/b> Strong information security controls and systems are applied to models that might pose extreme risks.<\/li>\n<\/ol>\n<p data-block-key=\"75y7t\">We have developed a blueprint for how model evaluations for extreme risks should feed into important decisions around training and deploying a highly capable, general-purpose model. The developer conducts evaluations throughout, and grants <a href=\"https:\/\/www.governance.ai\/post\/sharing-powerful-ai-models\" rel=\"noopener\" target=\"_blank\">structured model access<\/a> to external safety researchers and <a href=\"https:\/\/arxiv.org\/abs\/2302.08500\" rel=\"noopener\" target=\"_blank\">model auditors<\/a> so they can conduct <a href=\"https:\/\/arxiv.org\/abs\/2206.04737\" rel=\"noopener\" target=\"_blank\">additional evaluations<\/a> The evaluation results can then inform risk assessments before model training and deployment.<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"g7ru8\">A blueprint for embedding model evaluations for extreme risks into important decision making processes throughout model training and deployment.<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<h2 data-block-key=\"023ia\">Looking ahead<\/h2>\n<p data-block-key=\"tkpoe\">Important <a href=\"https:\/\/evals.alignment.org\/blog\/2023-03-18-update-on-recent-evals\/\" rel=\"noopener\" target=\"_blank\">early<\/a> <a href=\"https:\/\/cdn.openai.com\/papers\/gpt-4-system-card.pdf\" rel=\"noopener\" target=\"_blank\">work<\/a> on model evaluations for extreme risks is already underway at Google DeepMind and elsewhere. But much more progress \u2013 both technical and institutional \u2013 is needed to build an evaluation process that catches all possible risks and helps safeguard against future, emerging challenges.<\/p>\n<p data-block-key=\"pv136\">Model evaluation is not a panacea; some risks could slip through the net, for example, because they depend too heavily on factors external to the model, such as <a href=\"https:\/\/www.lawfareblog.com\/thinking-about-risks-ai-accidents-misuse-and-structure\" rel=\"noopener\" target=\"_blank\">complex social, political, and economic forces<\/a> <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3287560.3287598\" rel=\"noopener\" target=\"_blank\">in society<\/a>. Model evaluation must be combined with other risk assessment tools and a wider dedication to safety across industry, government, and civil society.<\/p>\n<p data-block-key=\"bptwh\"><a href=\"https:\/\/blog.google\/technology\/ai\/a-policy-agenda-for-responsible-ai-progress-opportunity-responsibility-security\/\" rel=\"noopener\" target=\"_blank\">Google&#8217;s recent blog on responsible AI<\/a> states that, \u201cindividual practices, shared industry standards, and sound government policies would be essential to getting AI right\u201d. We hope many others working in AI and sectors impacted by this technology will come together to create approaches and standards for safely developing and deploying AI for the benefit of all.<\/p>\n<p data-block-key=\"9zi6e\">We believe that having processes for tracking the emergence of risky properties in models, and for adequately responding to concerning results, is a critical part of being a responsible developer operating at the frontier of AI capabilities.<\/p>\n<\/div>\n<aside class=\"related-posts\">\n<\/aside><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/deepmind.google\/discover\/blog\/an-early-warning-system-for-novel-ai-risks\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Research Published 25 May 2023 Authors Toby Shevlane New research proposes a framework for evaluating general-purpose models<\/p>\n","protected":false},"author":2,"featured_media":755,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[],"class_list":["post-754","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-deepmind-ai"],"_links":{"self":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/754","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/comments?post=754"}],"version-history":[{"count":1,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/754\/revisions"}],"predecessor-version":[{"id":2697,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/754\/revisions\/2697"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media\/755"}],"wp:attachment":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media?parent=754"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/categories?post=754"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/tags?post=754"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}