{"id":502,"date":"2023-04-14T15:12:41","date_gmt":"2023-04-14T15:12:41","guid":{"rendered":"https:\/\/todaysainews.com\/index.php\/2023\/04\/14\/beyond-automatic-differentiation-google-ai-blog\/"},"modified":"2025-04-27T07:33:42","modified_gmt":"2025-04-27T07:33:42","slug":"beyond-automatic-differentiation-google-ai-blog","status":"publish","type":"post","link":"https:\/\/todaysainews.com\/index.php\/2023\/04\/14\/beyond-automatic-differentiation-google-ai-blog\/","title":{"rendered":"Beyond automatic differentiation \u2013 Google AI Blog"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"post-body-5107802027865733485\">\n<span class=\"byline-author\">Posted by Matthew Streeter, Software Engineer, Google Research<\/span><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgeF_4QxTZliBxybI6a8fpfDt-IC6AG1GkRjM6mQqSLQ64RQTP5BJYq8WcGkzss_Ft-5QF7nU36hT8CahVaq04fVThFV_zWbgbfutAzQU3g8yoghkEmb603znIRbX5jT74LQxBKQ1qKp6sx9GTKVEgcHaWJOpmiipwcuUj6FXC1_hCbqHHX05yMA_SaLw\/s660\/AutoBound-Crop.gif\" style=\"display: none;\"\/><\/p>\n<p>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Derivative\">Derivatives<\/a> play a central role in optimization and machine learning. By locally approximating a <a href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/descending-into-ml\/training-and-loss\">training loss<\/a>, derivatives guide an optimizer toward lower values of the loss. Automatic differentiation frameworks such as <a href=\"https:\/\/www.tensorflow.org\/\">TensorFlow<\/a>, <a href=\"https:\/\/pytorch.org\/\">PyTorch<\/a>, and <a href=\"https:\/\/github.com\/google\/jax\">JAX<\/a> are an essential part of modern machine learning, making it feasible to use gradient-based optimizers to train very complex models.\n<\/p>\n<p><a name=\"more\"\/> <\/p>\n<p>\nBut are derivatives all we need? By themselves, derivatives only tell us how a function behaves on an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Infinitesimal\">infinitesimal<\/a> scale. To use derivatives effectively, we often need to know more than that. For example, to choose a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Learning_rate\">learning rate<\/a> for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\">gradient descent<\/a>, we need to know something about how the loss function behaves over a small but <em>finite<\/em> window. A finite-scale analogue of automatic differentiation, if it existed, could help us make such choices more effectively and thereby speed up training.\n<\/p>\n<p>\nIn our new paper &#8220;<a href=\"https:\/\/arxiv.org\/pdf\/2212.11429.pdf\">Automatically Bounding The Taylor Remainder Series: Tighter Bounds and New Applications<\/a>&#8220;, we present an algorithm called AutoBound that computes polynomial upper and lower bounds on a given function, which are valid over a user-specified <a href=\"https:\/\/en.wikipedia.org\/wiki\/Interval_(mathematics)\">interval<\/a>. We then begin to explore AutoBound&#8217;s applications. Notably, we present a meta-optimizer called SafeRate that uses the upper bounds computed by AutoBound to derive learning rates that are guaranteed to monotonically reduce a given loss function, without the need for time-consuming hyperparameter tuning.\u00a0We are also making AutoBound available as an <a href=\"https:\/\/github.com\/google\/autobound\">open-source library<\/a>.<\/p>\n<p><\/p>\n<h2>The AutoBound algorithm<\/h2>\n<p>\nGiven a function <code><em>f<\/em><\/code> and a reference point <code><em>x<sub>0<\/sub><\/em><\/code>, AutoBound computes polynomial upper and lower bounds on <code><em>f<\/em><\/code> that hold over a user-specified interval called a <em>trust region<\/em>. Like <a href=\"https:\/\/en.wikipedia.org\/wiki\/Taylor_series\">Taylor polynomials<\/a>, the bounding polynomials are equal to <code><em>f<\/em><\/code> at <code><em>x<sub>0<\/sub><\/em><\/code>. The bounds become tighter as the trust region shrinks, and approach the corresponding Taylor polynomial as the trust region width approaches zero.\n<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEjb0WLo5_NiobuGRFacHeB-xYdA4i6kQEFF0BDeF5FkXuqHHTX-ddE1qzK1ekOgtOwu8c3JHknR8FFU14sgobiTj5tx23yn183Yf8KP6ca5O5y6nKRmgaYi-2Vs8ZL4Gt9kxITBlmNSc6Zmm3UAamc84Ry_TR1GqIiexxrkHVCREuKSh9Y5GT7cIBDGZQ\/s3091\/taylor_enclosure_animation.gif\" style=\"margin-left: auto; margin-right: auto;\"><img fetchpriority=\"high\" decoding=\"async\" border=\"0\" data-original-height=\"2217\" data-original-width=\"3091\" height=\"459\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEjb0WLo5_NiobuGRFacHeB-xYdA4i6kQEFF0BDeF5FkXuqHHTX-ddE1qzK1ekOgtOwu8c3JHknR8FFU14sgobiTj5tx23yn183Yf8KP6ca5O5y6nKRmgaYi-2Vs8ZL4Gt9kxITBlmNSc6Zmm3UAamc84Ry_TR1GqIiexxrkHVCREuKSh9Y5GT7cIBDGZQ\/w640-h459\/taylor_enclosure_animation.gif\" width=\"640\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">Automatically-derived quadratic upper and lower bounds on a one-dimensional function f, centered at x<sub>0<\/sub>=0.5. The upper and lower bounds are valid over a user-specified trust region, and become tighter as the trust region shrinks.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\nLike automatic differentiation, AutoBound can be applied to any function that can be implemented using standard mathematical operations. In fact, AutoBound is a generalization of <a href=\"https:\/\/openreview.net\/pdf?id=SkxEF3FNPH\">Taylor mode automatic differentiation<\/a>, and is equivalent to it in the special case where the trust region has a width of zero.\n<\/p>\n<p>\nTo derive the AutoBound algorithm, there were two main challenges we had to address:\n<\/p>\n<ol>\n<li>We had to derive polynomial upper and lower bounds for various elementary functions, given an arbitrary reference point and arbitrary trust region.\n<\/li>\n<li>We had to come up with an analogue of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Chain_rule\">chain rule<\/a> for combining these bounds.\n<\/li>\n<\/ol>\n<p><\/p>\n<h2>Bounds for elementary functions<\/h2>\n<p>\nFor a variety of commonly-used functions, we derive <em>optimal<\/em> polynomial upper and lower bounds in closed form. In this context, &#8220;optimal&#8221; means the bounds are as tight as possible, among all polynomials where only the maximum-<a href=\"https:\/\/en.wikipedia.org\/wiki\/Degree_of_a_polynomial\">degree coefficient<\/a> differs from the Taylor series. Our theory applies to elementary functions, such as <code><em>exp<\/em><\/code> and <code><em>log<\/em><\/code>, and common neural network activation functions, such as <code><a href=\"https:\/\/www.cs.toronto.edu\/~hinton\/absps\/reluICML.pdf\">ReLU<\/a><\/code> and <code><a href=\"https:\/\/arxiv.org\/pdf\/1710.05941.pdf\">Swish<\/a><\/code>. It builds upon and generalizes <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3129714\/pdf\/nihms90438.pdf\">earlier work<\/a> that applied only to quadratic bounds, and only for an unbounded trust region.\n<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEh5WY6ZTX0Rpwa4dpCoa4pyu5nFBoGDFnDqpQrVqHkOmTAAXivb7AhueTYHGwWFJLUYuvP6daRXsp8A7k_U1kJ8OaSHhW3TyIFAvNeoIOki9ek9GY95DXAfGzZXLqsKlHeb2_8Z4yHMBx1kPYJx4LXU_mQQO0u0JOLgWHaOZqYflTrKd6n9Z4NouU5wAQ\/s1999\/image7.png\" style=\"margin-left: auto; margin-right: auto;\"><img decoding=\"async\" border=\"0\" data-original-height=\"1446\" data-original-width=\"1999\" height=\"463\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEh5WY6ZTX0Rpwa4dpCoa4pyu5nFBoGDFnDqpQrVqHkOmTAAXivb7AhueTYHGwWFJLUYuvP6daRXsp8A7k_U1kJ8OaSHhW3TyIFAvNeoIOki9ek9GY95DXAfGzZXLqsKlHeb2_8Z4yHMBx1kPYJx4LXU_mQQO0u0JOLgWHaOZqYflTrKd6n9Z4NouU5wAQ\/w640-h463\/image7.png\" width=\"640\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">Optimal quadratic upper and lower bounds on the exponential function, centered at x<sub>0<\/sub>=0.5 and valid over the interval [0, 2].<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/p>\n<h2>A new chain rule<\/h2>\n<p>\nTo compute upper and lower bounds for arbitrary functions, we derived a generalization of the chain rule that operates on polynomial bounds. To illustrate the idea, suppose we have a function that can be written as\n<\/p>\n<p>\nand suppose we already have polynomial upper and lower bounds on <code><em>g<\/em><\/code> and <code><em>h<\/em><\/code>. How do we compute bounds on <code><em>f<\/em><\/code>?<\/p>\n<p>\nThe key turns out to be representing the upper and lower bounds for a given function as a <em>single<\/em> polynomial whose highest-degree coefficient is an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Interval_(mathematics)\">interval<\/a> rather than a scalar. We can then plug the bound for <code><em>h<\/em><\/code> into the bound for <code><em>g<\/em><\/code>, and convert the result back to a polynomial of the same form using <a href=\"https:\/\/epubs.siam.org\/doi\/book\/10.1137\/1.9780898717716\">interval arithmetic<\/a>. Under suitable assumptions about the trust region over which the bound on <code><em>g<\/em><\/code> holds, it can be shown that this procedure yields the desired bound on <code><em>f<\/em><\/code>.<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgXlrNk32-9fDM8A8ooF6TnLk3CNpNpo-QRl79dL9PXRNRa4O4l6gX2NxfBH9WxVFoG0Ks7R80E3-HxeORDu_OjSI446cntCytscoKTPODAHUU20h8hkXi3EV0mJLsrm4dCT5iROfgi5_2rLX0aM1S3oMREuYqKAB_yy9B2wo4aV2bY3qxPxzMIKS7c9Q\/s1999\/image3.png\" style=\"margin-left: auto; margin-right: auto;\"><img decoding=\"async\" border=\"0\" data-original-height=\"591\" data-original-width=\"1999\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgXlrNk32-9fDM8A8ooF6TnLk3CNpNpo-QRl79dL9PXRNRa4O4l6gX2NxfBH9WxVFoG0Ks7R80E3-HxeORDu_OjSI446cntCytscoKTPODAHUU20h8hkXi3EV0mJLsrm4dCT5iROfgi5_2rLX0aM1S3oMREuYqKAB_yy9B2wo4aV2bY3qxPxzMIKS7c9Q\/s16000\/image3.png\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">The interval polynomial chain rule applied to the functions h(x) = sqrt(x) and g(y) = exp(y), with x<sub>0<\/sub>=0.25 and trust region [0, 0.5].<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\nOur chain rule applies to one-dimensional functions, but also to multivariate functions, such as matrix multiplications and convolutions.\n<\/p>\n<p><\/p>\n<h2>Propagating bounds<\/h2>\n<p>\nUsing our new chain rule, AutoBound propagates interval polynomial bounds through a computation graph from the inputs to the outputs, analogous to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Automatic_differentiation#Forward_accumulation\">forward-mode automatic differentiation<\/a>.\n<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEjb3Iw5MwfRFRP6MUo8JdubaBYOMqZqbUI-Xjjla07M8_faGpdxsI_TfRnS3OZjrUGDQrie_mwPCroYO_2xKiutmjUCCSHSOKQHqI8tbEYDRiIE-gD4UmV252Z1IKFsR6wYRVZtT9X1bbHdxrJofymjtFTlm3lrTG8tR28hOI8wNmfUwl_v8DnGluTRYw\/s960\/image2.gif\" style=\"margin-left: auto; margin-right: auto;\"><img decoding=\"async\" border=\"0\" data-original-height=\"161\" data-original-width=\"960\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEjb3Iw5MwfRFRP6MUo8JdubaBYOMqZqbUI-Xjjla07M8_faGpdxsI_TfRnS3OZjrUGDQrie_mwPCroYO_2xKiutmjUCCSHSOKQHqI8tbEYDRiIE-gD4UmV252Z1IKFsR6wYRVZtT9X1bbHdxrJofymjtFTlm3lrTG8tR28hOI8wNmfUwl_v8DnGluTRYw\/s16000\/image2.gif\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">Forward propagation of interval polynomial bounds for the function f(x) = exp(sqrt(x)).  We first compute (trivial) bounds on x, then use the chain rule to compute bounds on sqrt(x) and exp(sqrt(x)).<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\nTo compute bounds on a function <code><em>f(x)<\/em><\/code>, AutoBound requires memory proportional to the dimension of <code><em>x<\/em><\/code>. For this reason, practical applications apply AutoBound to functions with a small number of inputs. However, as we will see, this does not prevent us from using AutoBound for neural network optimization.<\/p>\n<p><\/p>\n<h2>Automatically deriving optimizers, and other applications<\/h2>\n<p>\nWhat can we do with AutoBound that we couldn&#8217;t do with automatic differentiation alone?\n<\/p>\n<p>\nAmong other things, AutoBound can be used to automatically derive problem-specific, hyperparameter-free optimizers that converge from any starting point. These optimizers iteratively reduce a loss by first using AutoBound to compute an upper bound on the loss that is tight at the current point, and then minimizing the upper bound to obtain the next point.\n<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEj2JJN-D-nFQ7Bl4BYySe3oe4YBFLAnqROlRvrZuRURTicKO8Luhti9I_kuJ1nJKQe4chKqldyddqGXQKBMB1ltXNWKu06SfDxUEjaG52pNCdgyhgNYpo8DQmGnlr5FoSVJSvZMbi0GtD1w60iQF6hV4eRky-SSmfFY2fCUWLKxTcVcMIwVc8T8QyA-6w\/s3074\/mm_animation.gif\" style=\"margin-left: auto; margin-right: auto;\"><img decoding=\"async\" border=\"0\" data-original-height=\"2197\" data-original-width=\"3074\" height=\"457\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEj2JJN-D-nFQ7Bl4BYySe3oe4YBFLAnqROlRvrZuRURTicKO8Luhti9I_kuJ1nJKQe4chKqldyddqGXQKBMB1ltXNWKu06SfDxUEjaG52pNCdgyhgNYpo8DQmGnlr5FoSVJSvZMbi0GtD1w60iQF6hV4eRky-SSmfFY2fCUWLKxTcVcMIwVc8T8QyA-6w\/w640-h457\/mm_animation.gif\" width=\"640\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">Minimizing a one-dimensional logistic regression loss using quadratic upper bounds derived automatically by AutoBound.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\nOptimizers that use upper bounds in this way are called <a href=\"https:\/\/epubs.siam.org\/doi\/book\/10.1137\/1.9781611974409\">majorization-minimization<\/a> (MM) optimizers. Applied to one-dimensional logistic regression, AutoBound rederives an MM optimizer <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3129714\/\">first published in 2009<\/a>. Applied to more complex problems, AutoBound derives novel MM optimizers that would be difficult to derive by hand.\n<\/p>\n<p>\nWe can use a similar idea to take an existing optimizer such as <a href=\"https:\/\/arxiv.org\/pdf\/1412.6980.pdf\">Adam<\/a> and convert it to a hyperparameter-free optimizer that is guaranteed to monotonically reduce the loss (in the full-batch setting). The resulting optimizer uses the same update direction as the original optimizer, but modifies the learning rate by minimizing a one-dimensional quadratic upper bound derived by AutoBound. We refer to the resulting meta-optimizer as SafeRate.\n<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgutnTe8tbfFqdlv7G1CWmk5LgRRVRIuw2veGSrIOv3N7k-dKGCWZMmIb8n3drlkWcZSc2dcwd2-BmxCm-9I7y5q3lAcCkB17wC2nMrIeSbZeQeck6P2W_bu4VvQ38QkVKr5CDB62pwIV0zEGmG_n_r8oY9vnv92Jv-Dh-ZS3Qb_20QV2rQTEdmRH1_pg\/s1656\/image8.png\" style=\"margin-left: auto; margin-right: auto;\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" data-original-height=\"1094\" data-original-width=\"1656\" height=\"423\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgutnTe8tbfFqdlv7G1CWmk5LgRRVRIuw2veGSrIOv3N7k-dKGCWZMmIb8n3drlkWcZSc2dcwd2-BmxCm-9I7y5q3lAcCkB17wC2nMrIeSbZeQeck6P2W_bu4VvQ38QkVKr5CDB62pwIV0zEGmG_n_r8oY9vnv92Jv-Dh-ZS3Qb_20QV2rQTEdmRH1_pg\/w640-h423\/image8.png\" width=\"640\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">Performance of SafeRate when used to train a single-hidden-layer neural network on a subset of the <a href=\"https:\/\/yann.lecun.com\/exdb\/mnist\/\">MNIST<\/a> dataset, in the full-batch setting.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\nUsing SafeRate, we can create more robust variants of existing optimizers, at the cost of a single additional forward pass that increases the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Elapsed_real_time\">wall time<\/a> for each step by a small factor (about 2x in the example above).\n<\/p>\n<p>\nIn addition to the applications just discussed, AutoBound can be used for verified <a href=\"https:\/\/en.wikipedia.org\/wiki\/Numerical_integration\">numerical integration<\/a> and to automatically prove sharper versions of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Jensen%27s_inequality\">Jensen&#8217;s inequality<\/a>, a fundamental mathematical inequality used frequently in statistics and other fields.\n<\/p>\n<p><\/p>\n<h2>Improvement over classical bounds<\/h2>\n<p>\nBounding the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Taylor%27s_theorem\">Taylor remainder term<\/a> automatically is not a new idea. A classical technique produces degree <code><em>k<\/em><\/code> polynomial bounds on a function <code><em>f<\/em><\/code> that are valid over a trust region <code>[<em>a<\/em>, <em>b<\/em>]<\/code> by first computing an expression for the <code><em>k<\/em><\/code>th derivative of <code><em>f<\/em><\/code> (using automatic differentiation), then evaluating this expression over <code>[<em>a<\/em>,<em>b<\/em>]<\/code> using interval arithmetic.<\/p>\n<p>\nWhile elegant, this approach has some inherent limitations that can lead to very loose bounds, as illustrated by the dotted blue lines in the figure below.\n<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgENtTpbXUtRpnOVihpobiGYrw4L_jEUASrWaRvGcPBgTk-HLuy2_gUevqyhtmDMpELAmjYaszDdniJRndog7u8zH-o0Sjp3OrWF5lB4q1nOcyYay8AY2mgE9xGK3ZwRsAfARybfzzO7o2SFIcSWz66Ghyw66tWQYf8rLwxBoG1Zkkagm3I8tQVweYZTg\/s1510\/image5.jpg\" style=\"margin-left: auto; margin-right: auto;\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" data-original-height=\"1107\" data-original-width=\"1510\" height=\"469\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgENtTpbXUtRpnOVihpobiGYrw4L_jEUASrWaRvGcPBgTk-HLuy2_gUevqyhtmDMpELAmjYaszDdniJRndog7u8zH-o0Sjp3OrWF5lB4q1nOcyYay8AY2mgE9xGK3ZwRsAfARybfzzO7o2SFIcSWz66Ghyw66tWQYf8rLwxBoG1Zkkagm3I8tQVweYZTg\/w640-h469\/image5.jpg\" width=\"640\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">Quadratic upper and lower bounds on the loss of a multi-layer <a href=\"https:\/\/en.wikipedia.org\/wiki\/Perceptron\">perceptron<\/a> with two hidden layers, as a function of the initial learning rate. The bounds derived by AutoBound are much tighter than those obtained using interval arithmetic evaluation of the second derivative.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/p>\n<h2>Looking forward<\/h2>\n<p>\nTaylor polynomials have been in use for over three hundred years, and are omnipresent in numerical optimization and scientific computing. Nevertheless, Taylor polynomials have significant limitations, which can limit the capabilities of algorithms built on top of them. Our work is part of a growing literature that recognizes these limitations and seeks to develop a new foundation upon which more robust algorithms can be built.\n<\/p>\n<p>\nOur experiments so far have only scratched the surface of what is possible using AutoBound, and we believe it has many applications we have not discovered. To encourage the research community to explore such possibilities, we have made AutoBound available as an open-source library built on top of <a href=\"https:\/\/github.com\/google\/jax\">JAX<\/a>. To get started, visit our <a href=\"https:\/\/github.com\/google\/autobound\">GitHub repo<\/a>.<\/p>\n<p><\/p>\n<h2>Acknowledgements<\/h2>\n<p>\n<em>This post is based on joint work with Josh Dillon. We thank Alex Alemi and Sergey Ioffe for valuable feedback on an earlier draft of the post.<\/em>\n<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"http:\/\/ai.googleblog.com\/2023\/04\/beyond-automatic-differentiation.html\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Posted by Matthew Streeter, Software Engineer, Google Research Derivatives play a central role in optimization and machine<\/p>\n","protected":false},"author":2,"featured_media":503,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-502","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-google-ai"],"featured_image_urls":{"full":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop.gif",660,600,false],"thumbnail":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop-150x150.gif",150,150,true],"medium":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop-300x273.gif",300,273,true],"medium_large":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop.gif",640,582,false],"large":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop.gif",640,582,false],"1536x1536":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop.gif",660,600,false],"2048x2048":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop.gif",660,600,false],"broadnews-featured":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop.gif",660,600,false],"broadnews-large":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop-660x575.gif",660,575,true],"broadnews-medium":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/04\/AutoBound-Crop-590x410.gif",590,410,true]},"author_info":{"info":["Sanna"]},"category_info":"<a href=\"https:\/\/todaysainews.com\/index.php\/category\/google-ai\/\" rel=\"category tag\">Google AI<\/a>","tag_info":"Google AI","comment_count":"0","_links":{"self":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/502","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/comments?post=502"}],"version-history":[{"count":1,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/502\/revisions"}],"predecessor-version":[{"id":2821,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/502\/revisions\/2821"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media\/503"}],"wp:attachment":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media?parent=502"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/categories?post=502"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/tags?post=502"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}