{"id":31,"date":"2023-01-24T06:01:33","date_gmt":"2023-01-24T06:01:33","guid":{"rendered":"https:\/\/todaysainews.com\/index.php\/2023\/01\/24\/new-and-improved-embedding-model\/"},"modified":"2025-04-27T07:36:22","modified_gmt":"2025-04-27T07:36:22","slug":"new-and-improved-embedding-model","status":"publish","type":"post","link":"https:\/\/todaysainews.com\/index.php\/2023\/01\/24\/new-and-improved-embedding-model\/","title":{"rendered":"New and Improved Embedding Model"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n        <!--kg-card-begin: markdown--><\/p>\n<div class=\"js-excerpt\">\n<p>We are excited to announce a new embedding model which is significantly more capable, cost effective, and simpler to use. The new model, <code>text-embedding-ada-002<\/code>, replaces five separate models for text search, text similarity, and code search, and outperforms our previous most capable model, Davinci, at most tasks, while being priced 99.8% lower.<\/p>\n<section class=\"btns\"><a href=\"https:\/\/beta.openai.com\/docs\/guides\/embeddings\" class=\"btn btn-dark btn-padded icon-external right\">Read documentation<\/a><\/section>\n<\/div>\n<p>Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. Since the <a href=\"https:\/\/openai.com\/blog\/introducing-text-and-code-embeddings\/\">initial launch<\/a> of the OpenAI <a href=\"https:\/\/beta.openai.com\/docs\/api-reference\/embeddings\">\/embeddings<\/a> endpoint, many applications have incorporated embeddings to personalize, recommend, and search content.<\/p>\n<p>You can query the <a href=\"https:\/\/beta.openai.com\/docs\/api-reference\/embeddings\">\/embeddings<\/a> endpoint for the new model with two lines of code using our <a href=\"https:\/\/github.com\/openai\/openai-python\">OpenAI Python Library<\/a>, just like you could with previous models:<\/p>\n<div class=\"mb-3 endpoint-code\">\n<div class=\"endpoint-code-call\">\n<pre><code class=\"language-py\">import openai\nresponse = openai.Embedding.create(\n  input=\"porcine pals say\",\n  model=\"text-embedding-ada-002\"\n)\n<\/code><\/pre>\n<\/div>\n<p><button class=\"faded icon-right right\" onclick=\"printResponse(this)\">Print response<\/button><\/p>\n<div class=\"endpoint-code-response\" style=\"display:none\">\n<pre><code class=\"language-py\">\nprint(response)\n{\n  \"data\": [\n    {\n      \"embedding\": [\n        -0.0108,\n        -0.0107,\n        0.0323,\n        ...\n        -0.0114\n      ],\n      \"index\": 0,\n      \"object\": \"embedding\"\n    }\n  ],\n  \"model\": \"text-embedding-ada-002\",\n  \"object\": \"list\"\n}\n<\/code><\/pre>\n<\/div>\n<\/div>\n<h2 id=\"model-improvements\">Model Improvements<\/h2>\n<p><strong>Stronger performance<\/strong>. <code>text-embedding-ada-002<\/code> outperforms all the old embedding models on text search, code search, and sentence similarity tasks and gets comparable performance on text classification. For each task category, we evaluate the models on the datasets used in <a href=\"https:\/\/arxiv.org\/abs\/2201.10005\">old embeddings<\/a>.<\/p>\n<div id=\"tables\" class=\"mt-2 mb-2\">\n<p>\n<button onclick=\"toggle(['ts'], ['cs','ss','tc'])\" class=\"js-toggler active button-unstyled py-0.25 mr-5\/12\">Text search<\/button><br \/>\n<button onclick=\"toggle(['cs'], ['ts','ss','tc'])\" class=\"js-toggler button-unstyled py-0.25 mr-5\/12\">Code search<\/button><br \/>\n<button onclick=\"toggle(['ss'], ['ts','cs','tc'])\" class=\"js-toggler button-unstyled py-0.25 mr-5\/12\">Sentence similarity<\/button><br \/>\n<button onclick=\"toggle(['tc'], ['ts','cs','ss'])\" class=\"js-toggler button-unstyled py-0.25\">Text classification<\/button><\/p>\n<hr class=\"my-0\"\/>\n<\/div>\n<p><!-- end #tables --><\/p>\n<p><strong>Unification of capabilities<\/strong>. We have significantly simplified the interface of the <a href=\"https:\/\/beta.openai.com\/docs\/api-reference\/embeddings\">\/embeddings<\/a> endpoint by merging the five separate models shown above (<code>text-similarity<\/code>, <code>text-search-query<\/code>, <code>text-search-doc<\/code>, <code>code-search-text<\/code> and <code>code-search-code<\/code>) into a single new model. This single representation performs better than our previous embedding models across a diverse set of text search, sentence similarity, and code search benchmarks.<\/p>\n<p><strong>Longer context.<\/strong> The context length of the new model is increased by a factor of four, from 2048 to 8192, making it more convenient to work with long documents.<\/p>\n<p><strong>Smaller embedding size.<\/strong> The new embeddings have only 1536 dimensions, one-eighth the size of <code>davinci-001<\/code> embeddings, making the new embeddings more cost effective in working with vector databases.<\/p>\n<p><strong>Reduced price.<\/strong> We have reduced the price of new embedding models by 90% compared to old models of the same size. The new model achieves better or similar performance as the old Davinci models at a 99.8% lower price.<\/p>\n<p>Overall, the new embedding model is a much more powerful tool for natural language processing and code tasks. We are excited to see how our customers will use it to create even more capable applications in their respective fields.<\/p>\n<h2 id=\"limitations\">Limitations<\/h2>\n<p>The new <code>text-embedding-ada-002<\/code> model is not outperforming <code>text-similarity-davinci-001<\/code> on the SentEval linear probing classification benchmark. For tasks that require training a light-weighted linear layer on top of embedding vectors for classification prediction, we suggest comparing the new model to <code>text-similarity-davinci-001<\/code> and choosing whichever model gives optimal performance.<\/p>\n<p>Check the <a href=\"https:\/\/beta.openai.com\/docs\/guides\/embeddings\/limitations-risks\">Limitations &amp; Risks<\/a> section in the embeddings documentation for general limitations of our embedding models.<\/p>\n<h2 id=\"examples-of-embeddings-api-in-action\">Examples of Embeddings API in Action<\/h2>\n<p><strong><a href=\"https:\/\/kalendar.ai\/\">Kalendar AI<\/a><\/strong> is a sales outreach product that uses embeddings to match the right sales pitch to the right customers out of a dataset containing 340M profiles. This automation relies on similarity between embeddings of customer profiles and sale pitches to rank up most suitable matches, eliminating 40\u201356% of unwanted targeting compared to their old approach.<\/p>\n<p><!--\n<img decoding=\"async\" src=\"https:\/\/openai.com\/blog\/new-and-improved-embedding-model\/kalendar-ai-screenshot.png\" \/>\n\n*Caption: The interface of the marketing tool by Kalendar AI. With the new embedding model, it is able to filter and select only a small subset of the audience out of all 56k audience, tightly matching the pitch defined by user inputs.*\n--><\/p>\n<p><strong><a href=\"https:\/\/www.notion.so\/\">Notion<\/a><\/strong>, the online workspace company, will use OpenAI&#8217;s new embeddings to improve Notion search beyond today&#8217;s keyword matching systems.<\/p>\n<hr\/>\n<section class=\"btns\"><a href=\"https:\/\/beta.openai.com\/docs\/guides\/embeddings\" class=\"btn btn-dark btn-padded icon-external right\">Read documentation<\/a><\/section>\n<p><!--kg-card-end: markdown--><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/openai.com\/blog\/new-and-improved-embedding-model\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] We are excited to announce a new embedding model which is significantly more capable, cost effective, and<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,1],"tags":[],"class_list":["post-31","post","type-post","status-publish","format-standard","hentry","category-ai","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/31","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/comments?post=31"}],"version-history":[{"count":1,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/31\/revisions"}],"predecessor-version":[{"id":3029,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/31\/revisions\/3029"}],"wp:attachment":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media?parent=31"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/categories?post=31"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/tags?post=31"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}