{"id":478,"date":"2023-03-28T19:45:18","date_gmt":"2023-03-28T19:45:18","guid":{"rendered":"https:\/\/todaysainews.com\/index.php\/2023\/03\/28\/leveraging-transfer-learning-for-large-scale-differentially-private-image-classification-google-ai-blog\/"},"modified":"2025-04-27T07:33:50","modified_gmt":"2025-04-27T07:33:50","slug":"leveraging-transfer-learning-for-large-scale-differentially-private-image-classification-google-ai-blog","status":"publish","type":"post","link":"https:\/\/todaysainews.com\/index.php\/2023\/03\/28\/leveraging-transfer-learning-for-large-scale-differentially-private-image-classification-google-ai-blog\/","title":{"rendered":"Leveraging transfer learning for large scale differentially private image classification \u2013 Google AI Blog"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"post-body-8499243478892035015\">\n<span class=\"byline-author\">Posted by Harsh Mehta, Software Engineer, and Walid Krichene, Research Scientist, Google Research<\/span><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEiLQvL5pFQZ3D2NH8DXg_h7vPTF7IGiscC9zSISa7WCuX2GRvsrySJRcVsT1WWxhHMF72zmbOvWc_0wElk3bXpmVSyz4k0IVhONETBaLzTOI-C5Wrf-Q2Qk0TkPcHUy6w2-9mJChJLnVIy2RdKEgQE9eOg3bzbOhbCYgXTfWlufh7q4uInlW6f3KqCHvQ\/s1100\/DPTransfer.png\" style=\"display: none;\"\/><\/p>\n<p>\nLarge deep learning models are becoming the workhorse of a variety of critical machine learning (ML) tasks. However, it has been shown that without any protection it is plausible for bad actors to attack a variety of models, across modalities, to reveal information from individual training examples. As such, it\u2019s essential to protect against this sort of information leakage.\n<\/p>\n<p><a name=\"more\"\/><\/p>\n<p>\n<a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/11681878_14\">Differential privacy<\/a> (DP) provides formal protection against an attacker who aims to extract information about the training data. The most popular method for DP training in deep learning is <a href=\"https:\/\/arxiv.org\/abs\/1607.00133\">differentially private stochastic gradient descent<\/a> (DP-SGD). The core recipe implements a common theme in DP: \u201cfuzzing\u201d an algorithm\u2019s outputs with noise to obscure the contributions of any individual input.\n<\/p>\n<p>\nIn practice, <a href=\"https:\/\/ai.googleblog.com\/2022\/02\/applying-differential-privacy-to-large.html\">DP training<\/a> can be very expensive or even ineffective for very large models. Not only does the computational cost typically increase when requiring privacy guarantees, but the noise also increases proportionally. Given these challenges, there has recently been much interest in developing methods that enable <em><a href=\"https:\/\/arxiv.org\/abs\/2010.09063\">efficient<\/a><\/em> DP training. The goal is to develop simple and practical methods for producing high-quality large-scale private models.\n<\/p>\n<p>\nThe <a href=\"https:\/\/image-net.org\/index.php\">ImageNet classification benchmark<\/a> is an effective test bed for this goal because 1) it is a challenging task even in the non-private setting, that requires sufficiently large models to successfully classify large numbers of varied images and 2) it is a public, open-source dataset, which other researchers can access and use for collaboration. With this approach, researchers may simulate a practical situation where a large model is required to train on private data with DP guarantees.\n<\/p>\n<p>\nTo that end, today we discuss improvements we\u2019ve made in training high-utility, large-scale private models. First, in \u201c<a href=\"https:\/\/arxiv.org\/abs\/2205.02973\">Large-Scale Transfer Learning for Differentially Private Image Classification<\/a>\u201d, we share strong results on the challenging task of image classification on the ImageNet-1k dataset with DP constraints. We show that with a combination of large-scale <a href=\"https:\/\/www.wikiwand.com\/en\/Transfer_learning\">transfer learning<\/a> and carefully chosen <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hyperparameter_optimization\">hyperparameters<\/a> it is indeed possible to significantly reduce the gap between private and non-private performance even on challenging tasks and high-dimensional models. Then in \u201c<a href=\"https:\/\/arxiv.org\/abs\/2211.13403\">Differentially Private Image Classification from Features<\/a>\u201d, we further show that privately fine-tuning just the last layer of pre-trained model with more advanced optimization algorithms improves the performance even further, leading to new state-of-the-art DP results across a variety of popular image classification benchmarks, including ImageNet-1k. To encourage further development in this direction and enable other researchers to verify our findings, we are also releasing the associated <a href=\"https:\/\/github.com\/google-research\/google-research\/tree\/master\/dp_transfer\">source code<\/a>.\n<\/p>\n<h2>Transfer learning <em>and<\/em> differential privacy<\/h2>\n<p>\nThe main idea behind transfer learning is to reuse the knowledge gained from solving one problem and then apply it to a related problem. This is especially useful when there is limited or low-quality data available for the target problem as it allows us to leverage the knowledge gained from a larger and more diverse public dataset.\n<\/p>\n<p>\nIn the context of DP, transfer learning has emerged <a href=\"https:\/\/ai.googleblog.com\/2022\/02\/applying-differential-privacy-to-large.html\">as a promising technique to improve the accuracy of private models<\/a>, by leveraging knowledge learned from pre-training tasks. For example, if a model has already been trained on a large public dataset for a similar privacy-sensitive task, it can be fine-tuned on a smaller and more specific dataset for the target DP task. More specifically, one first pre-trains a model on a large dataset with no privacy concerns, and then privately fine-tunes the model on the sensitive dataset. In our work, we improve the effectiveness of DP transfer learning and illustrate it by simulating private training on publicly available datasets, namely ImageNet-1k, <a href=\"https:\/\/www.cs.toronto.edu\/~kriz\/cifar.html\">CIFAR-100, and CIFAR-10<\/a>.\n<\/p>\n<h2>Better pre-training improves DP performance<\/h2>\n<p>\nTo start exploring how transfer learning can be effective for differentially private image classification tasks, we carefully examined hyperparameters affecting DP performance. Surprisingly, we found that with carefully chosen hyperparameters (e.g., initializing the last layer to zero and choosing large batch sizes), privately fine-tuning just the last layer of a pre-trained model yields significant improvements over the baseline. Training just the last layer also significantly improves the cost-utility ratio of training a high-quality image classification model with DP.\n<\/p>\n<p>\nAs shown below, we compare the performance on ImageNet of the best hyperparameter recommendations both with and without privacy and across a variety of model and pre-training dataset sizes. We find that scaling the model and using a larger pre-training dataset decreases the gap in accuracy coming from the addition of the privacy guarantee. Typically, privacy guarantees of a system are characterized by a positive parameter \u03b5, with smaller \u03b5 corresponding to better privacy. In the following figure, we use the privacy guarantee of \u03b5 = 10.\n<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgeXZ6zumu9happIaEL1sUtVwo4w7fVejo2S4uodu5uQWN0M4fVbzNWGNd_FEMu9ceUnQlE8mAbJ4CdSsALfqULkONss2w4YyVh09gBHpJ-zAQhghJ2l6Rd_BSgDrflT7PYLFTNZy-UEzJpWsfm5H39PqeP8K9vR8X74uVve_tsQP4BRbYR1_bS2B8-Ng\/s840\/image1.png\" style=\"margin-left: auto; margin-right: auto;\"><img fetchpriority=\"high\" decoding=\"async\" border=\"0\" data-original-height=\"608\" data-original-width=\"840\" height=\"464\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEgeXZ6zumu9happIaEL1sUtVwo4w7fVejo2S4uodu5uQWN0M4fVbzNWGNd_FEMu9ceUnQlE8mAbJ4CdSsALfqULkONss2w4YyVh09gBHpJ-zAQhghJ2l6Rd_BSgDrflT7PYLFTNZy-UEzJpWsfm5H39PqeP8K9vR8X74uVve_tsQP4BRbYR1_bS2B8-Ng\/w640-h464\/image1.png\" width=\"640\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">Comparing our best models with and without privacy on ImageNet across model and pre-training dataset sizes. The X-axis shows the different <a href=\"https:\/\/arxiv.org\/abs\/2010.11929\">Vision Transformer<\/a> models we used for this study in ascending order of model size from left to right. We used <a href=\"https:\/\/paperswithcode.com\/dataset\/jft-300m\">JFT-300M<\/a> to pretrain B\/16, L\/16 and H\/14 models, JFT-4B (a larger version of <a href=\"https:\/\/paperswithcode.com\/dataset\/jft-3b\">JFT-3B<\/a>) to pretrain H\/14-4b and <a href=\"https:\/\/paperswithcode.com\/dataset\/jft-3b\">JFT-3B<\/a> to pretrain G\/14-3b. We do this in order to study the effectiveness of jointly scaling the model and pre-training dataset (JFT-3B or 4B). The Y-axis shows the <a href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/classification\/precision-and-recall#precision\">Top-1 accuracy<\/a> on ImageNet-1k test set once the model is finetuned (in the private or non-private way) with the ImageNet-1k training set. We consistently see that the scaling of the model and the pre-training dataset size decreases the gap in accuracy coming from the addition of the privacy guarantee of \u03b5 = 10.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Better optimizers improve DP performance<\/h2>\n<p>\nSomewhat surprisingly, we found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies [<a href=\"https:\/\/ai.googleblog.com\/2022\/02\/applying-differential-privacy-to-large.html\">1<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2204.13650\">2<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2205.10683\">3<\/a>] largely relied on using first-order differentially private training algorithms like DP-SGD for training large models, in the specific case of privately learning just the last layer from features, we observe that computational burden is often low enough to allow for more sophisticated optimization schemes, including second-order methods (e.g., <a href=\"https:\/\/en.wikipedia.org\/wiki\/Newton%27s_method\">Newton<\/a> or <a href=\"https:\/\/en.wikipedia.org\/wiki\/Quasi-Newton_method\">Quasi-Newton<\/a> methods), which can be more accurate but also more computationally expensive.\n<\/p>\n<p>\nIn \u201c<a href=\"https:\/\/arxiv.org\/abs\/2211.13403\">Differentially Private Image Classification from Features<\/a>\u201d, we systematically explore the effect of <a href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/descending-into-ml\/training-and-loss\">loss functions<\/a> and optimization algorithms. We find that while the commonly used <a href=\"https:\/\/en.wikipedia.org\/wiki\/Logistic_regression\">logistic regression<\/a> performs better than <a href=\"https:\/\/en.wikipedia.org\/wiki\/Linear_regression\">linear regression<\/a> in the non-private setting, the situation is reversed in the private setting: least-squares linear regression is much more effective than logistic regression from both a privacy and computational standpoint for typical range of \u03b5 values ([1, 10]), and even more effective for stricter epsilon values (\u03b5 &lt; 1).\n<\/p>\n<p>\nWe further explore using DP Newton\u2019s method to solve <em>logistic<\/em> regression. We find that this is still outperformed by DP linear regression in the high privacy regime. Indeed, Newton&#8217;s method involves computing a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hessian\">Hessian<\/a> (a matrix that captures second-order information), and making this matrix differentially private requires adding far more noise in logistic regression than in linear regression, which has a highly structured Hessian.\n<\/p>\n<p>\nBuilding on this observation, we introduce a method that we call <em>differentially private SGD with feature covariance<\/em> (DP-FC), where we simply replace the Hessian in logistic regression with privatized feature covariance. Since feature covariance only depends on the inputs (and neither on model parameters nor class labels), we are able to share it across classes and training iterations, thus greatly reducing the amount of noise that needs to be added to protect it. This allows us to combine the benefits of using logistic regression with the efficient privacy protection of linear regression, leading to improved privacy-utility trade-off.\n<\/p>\n<p>\nWith DP-FC, we surpass previous state-of-the-art results considerably on three private image classification benchmarks, namely ImageNet-1k, CIFAR-10 and CIFAR-100, just by performing DP fine-tuning on features extracted from a powerful pre-trained model.\n<\/p>\n<table align=\"center\" cellpadding=\"0\" cellspacing=\"0\" class=\"tr-caption-container\" style=\"margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><a href=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEj4Gp5Z0q5vYQkXhOoFLaZUt0ah5hQzlDhZD2nzN35pqFDArejojDHQTiKN8v4j9JMc8wSdyrZwUXlWCXX1ha4VuRLokuoLJPRGy570hqZ1YdUlx_PQE0OXo-uhv63XQneAcejswYE4F_Idfz3-kA1rU5Q9dGIy-4tGPKo1EEUm2hFaEtMCFMSSieN2YQ\/s1825\/image2.png\" style=\"margin-left: auto; margin-right: auto;\"><img decoding=\"async\" border=\"0\" data-original-height=\"521\" data-original-width=\"1825\" src=\"https:\/\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEj4Gp5Z0q5vYQkXhOoFLaZUt0ah5hQzlDhZD2nzN35pqFDArejojDHQTiKN8v4j9JMc8wSdyrZwUXlWCXX1ha4VuRLokuoLJPRGy570hqZ1YdUlx_PQE0OXo-uhv63XQneAcejswYE4F_Idfz3-kA1rU5Q9dGIy-4tGPKo1EEUm2hFaEtMCFMSSieN2YQ\/s16000\/image2.png\"\/><\/a><\/td>\n<\/tr>\n<tr>\n<td class=\"tr-caption\" style=\"text-align: center;\">Comparison of <a href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/classification\/precision-and-recall#precision\">top-1 accuracies<\/a> (Y-axis) with private fine-tuning using DP-FC method on all three datasets across a range of \u03b5 (X-axis). We observe that better pre-training helps even more for lower values of \u03b5 (stricter privacy guarantee).<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Conclusion<\/h2>\n<p>\nWe demonstrate that large-scale pre-training on a public dataset is an effective strategy for obtaining good results when fine-tuned privately. Moreover, scaling both model size and pre-training dataset improves performance of the private model and narrows the quality gap compared to the non-private model. We further provide strategies to effectively use transfer learning for DP. Note that this work has several limitations <a href=\"https:\/\/arxiv.org\/abs\/2212.06470\">worth considering<\/a> \u2014 most importantly our approach relies on the availability of a large and trustworthy public dataset, which can be challenging to source and vet. We hope that our work is useful for training large models with meaningful privacy guarantees!\n<\/p>\n<h2>Acknowledgements<\/h2>\n<p>\n<em>In addition to the authors of this blogpost, this research was conducted by Abhradeep Thakurta, Alex Kurakin and Ashok Cutkosky. We are also grateful to the developers of Jax, Flax, and Scenic libraries. Specifically, we would like to thank Mostafa Dehghani for helping us with Scenic and high-performance vision baselines and Lucas Beyer for help with deduping the JFT data. We are also grateful to Li Zhang, Emil Praun, Andreas Terzis, Shuang Song, Pierre Tholoniat, Roxana Geambasu, and Steve Chien for stimulating discussions on differential privacy throughout the project. Additionally, we thank anonymous reviewers, Gautam Kamath and Varun Kanade for helpful feedback throughout the publication process. Finally, we would like to thank John Anderson and Corinna Cortes from Google Research, Borja Balle, Soham De, Sam Smith, Leonard Berrada, and Jamie Hayes from DeepMind for generous feedback.<\/em>\n<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"http:\/\/ai.googleblog.com\/2023\/03\/leveraging-transfer-learning-for-large.html\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Posted by Harsh Mehta, Software Engineer, and Walid Krichene, Research Scientist, Google Research Large deep learning models<\/p>\n","protected":false},"author":2,"featured_media":479,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-478","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-google-ai"],"featured_image_urls":{"full":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer.png",1100,630,false],"thumbnail":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer-150x150.png",150,150,true],"medium":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer-300x172.png",300,172,true],"medium_large":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer-768x440.png",640,367,true],"large":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer-1024x586.png",640,366,true],"1536x1536":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer.png",1100,630,false],"2048x2048":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer.png",1100,630,false],"broadnews-featured":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer-1024x586.png",1024,586,true],"broadnews-large":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer-825x575.png",825,575,true],"broadnews-medium":["https:\/\/todaysainews.com\/wp-content\/uploads\/2023\/03\/DPTransfer-590x410.png",590,410,true]},"author_info":{"info":["Sanna"]},"category_info":"<a href=\"https:\/\/todaysainews.com\/index.php\/category\/google-ai\/\" rel=\"category tag\">Google AI<\/a>","tag_info":"Google AI","comment_count":"0","_links":{"self":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/478","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/comments?post=478"}],"version-history":[{"count":1,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/478\/revisions"}],"predecessor-version":[{"id":2833,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/478\/revisions\/2833"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media\/479"}],"wp:attachment":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media?parent=478"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/categories?post=478"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/tags?post=478"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}