<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[tech-at-instacart - Medium]]></title>
        <description><![CDATA[Instacart Engineering - Medium]]></description>
        <link>https://tech.instacart.com?source=rss----587883b5d2ee---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>tech-at-instacart - Medium</title>
            <link>https://tech.instacart.com?source=rss----587883b5d2ee---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Wed, 13 May 2026 22:39:30 GMT</lastBuildDate>
        <atom:link href="https://tech.instacart.com/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Empowering Carrot Ads with Domain Adaptive Learning]]></title>
            <link>https://tech.instacart.com/empowering-carrot-ads-with-domain-adaptive-learning-870730e6add5?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/870730e6add5</guid>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Xiyu Wang]]></dc:creator>
            <pubDate>Mon, 04 May 2026 19:11:17 GMT</pubDate>
            <atom:updated>2026-05-04T19:11:16.311Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*aG_kQkxgI-QtdnXMlc3BjA.png" /></figure><p>Authors: Trey Zhong, Xiyu Wang</p><p>Contributors: Joseph Haraldson, Sharad Gupta, Sarah Lamacchia</p><h3>Introduction</h3><p>Carrot Ads is Instacart’s omnichannel retail media solution that allows retailer partners to build and scale their own advertising businesses on either their owned-and-operated (O&amp;O) websites and apps or their whitelabel Storefront hosted by Instacart. Carrot Ads empowers retailers and CPG brands to accelerate revenue, while improving the customer experience, engagement and Ads return on investment. It features enterprise-grade infrastructure, AI-powered optimization, years of proprietary first-party data and flexibility to choose from retailer-sourced Ads demand, Instacart-sourced demand from 7,500+ CPG brands, or both.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*g1qSHMiz-liv6_SnFZmeCQ.png" /></figure><p>However, onboarding a new partner onto Carrot Ads introduces a key challenge: the ‘cold start’ problem, where limited historical interactions make it difficult to predict user behavior accurately.</p><p>To serve performant ads, our systems rely on predicting a user’s Click-Through Rate (CTR) to generate a ranking score. On the Instacart Marketplace, we have billions of historical signals to train a model to do so. But when a partner launches a new ads experience on their O&amp;O e-commerce site, there is often little to no interaction history for that property, so training an accurate model becomes challenging. User behavior can vary dramatically between websites — for example, browsing patterns on a grocery site differ from those on a pet supply or electronics site.</p><p>Training a model from scratch for a new domain is data hungry. Conversely, directly deploying Instacart’s existing Marketplace model often fails to capture the nuances of the partner’s specific inventory and user base.</p><p>To address this, we developed a Domain Adaptive Learning approach that transfers knowledge from Instacart’s data-rich environment to new partner environments. By treating the Instacart Marketplace as a source domain and the partner’s website as a target domain, we can transfer knowledge to bootstrap performance with a relatively smaller amount of data. We also found that even when there is enough data to train a model directly on the target domain, the domain adaptive model still performs better because of the benefits from Instacart’s first party data.</p><h3>Domain Adaptive Learning</h3><h3>What is Domain Adaptive Learning?</h3><p>At a high level, Domain Adaptive Learning is a subset of <strong>transfer learning</strong>. It focuses on transferring knowledge gained from solving a problem in a data-rich environment (source domain) to improve performance in a related, often data-scarce environment (target domain).</p><p>Instead of initializing a new model with random weights for every partner, we reuse representations and relationship signals learned from Instacart marketplace data to “warm start” the model. This saves labeled data and computational power, but more importantly, it allows us to deploy performant models in scenarios where the target domain lacks sufficient history to converge on its own.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/512/1*da2MWtutJq-jfNJcApJTKw.png" /></figure><h3>Benefits &amp; Challenges</h3><p>The benefits of domain adaptive learning are significant.</p><ul><li><strong>Performance:<br>- In Low-Data Scenarios</strong>: Allows models to perform well, even with limited labeled data in the target domain.<br>- <strong>In High-Data Scenarios: </strong>Improves performance beyond what models trained solely on the target domain can achieve.</li><li><strong>Efficiency:</strong> Drastically reduces training time and development costs for new domains by reusing pre-trained components.</li><li><strong>Generalization</strong>: Enables robust models that can generalize well in different but related domains, even when there are distribution shifts.</li></ul><h3>Model Architecture</h3><p>The Domain Adaptive Learning method is based on a wide and deep Predicted Click-Through-Rate (pCTR) model architecture commonly used in large-scale recommendation systems. This model predicts CTR by first transforming raw inputs, like user IDs and product text, into dense feature embeddings. These features are concatenated and processed through two parallel paths: an interaction layer for learning explicit feature interactions and a deep Multi-layer Perceptron (MLP) tower for learning complex, hidden patterns. The outputs are then merged and passed through a final MLP to synthesize the findings. Finally, a Sigmoid activation squashes the result into a probability score (pCTR) between 0 and 1. This architecture combines a linear “wide” model (for memorization of specific feature interactions) with a “deep” neural network (for generalization). More details about this architecture can be found at this other <a href="https://tech.instacart.com/one-model-to-serve-them-all-0eb6bf60b00d">blog post</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nZ1xn-fW5NL8cf-RIg_Seg.png" /></figure><p>Our strategy for domain adaptation occurs at two distinct layers: the <strong>Neural Network</strong> level and the <strong>Training Data</strong> level.</p><h3>Domain Adaptation At Neural Network Level</h3><p>In Instacart Ads’ pCTR model, transfer learning at the neural network level involves reusing and fine-tuning components from a pre-trained model that originated from a related domain or task. Specifically:</p><ol><li><strong>Shared Embedding Layers</strong>: The model utilizes embedding representation layers pre-trained on shopping contexts. These embeddings capture fundamental signals that are transferable.</li><li><strong>Feature Transfer Domain Adaption</strong>: The model structure allows seamless integration of pre-trained embeddings with domain-specific input features. For instance, “Wide” components might focus on explicit features (e.g. historical CTR for a product category) sampled from the new domain, while “Deep” components adapt pre-trained dense representations.</li><li><strong>Fine-Tuning Specific Layers</strong>: While shared layers are reused without major alterations, subsequent layers are fine-tuned using limited partner-specific training data to capture domain-specific behavior.</li><li><strong>Generalization</strong>: Transfer learning ensures the model can generalize knowledge learned from user interactions in Instacart Marketplace to predict user responses in the partner’s domain. This prevents the need to train the deep ranker entirely from scratch.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/495/1*P1kdkuVVXJc-RsfaMblKSw.png" /></figure><h3>Domain Adaptation At Training Data Level</h3><p>Transfer learning at the data level involves aligning the input signals of the source and target domains so the model “speaks the same language.”</p><p>We rely on aggregated historical performance signals to normalize features across domains, but there are a variety of details that contribute to the quality of our training data.</p><ul><li><strong>Source Data</strong>: Large-scale data from Instacart Marketplace’s user behavior is leveraged as the source domain. This data is used to pre-train embeddings and build a foundational model.</li><li><strong>Matching Features Between Domains</strong>: Common contextual and catalog-level features between the Instacart Marketplace’s catalog data and the Carrot Ads Partner’s catalog are aligned (e.g. ensuring product category uses the same taxonomy) to ensure the source domain knowledge is transferable.</li><li><strong>Feature Trimming for Latency Optimization</strong>: To meet real-time auction latency requirements and be flexible to various feature availability for the partners, we apply feature trimming technique to balance performance and speed. We analyze feature importance in the target domain and prune inputs that do not contribute to prediction accuracy for that specific partner, ensuring the model remains lightweight.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sZm6Y0-TlSzMMrUlj090RA.png" /></figure><h3><strong>Learnings &amp; Conclusions</strong></h3><p>Our evaluation of Domain Adaptive Learning demonstrates that it is possible to achieve satisfactory pCTR prediction accuracy with limited data from our partners. By leveraging the “source” knowledge of the Instacart Marketplace, we achieved higher CTR, total clicks per user and ads revenue across search ads and product category ads. This approach enables us to launch high-performing ad networks for partners immediately, eliminate the traditional data ramp-up period and converge to a better stable state.</p><p>However, this process is not yet fully autonomous. The complexity of mapping data schemas and verifying model alignment currently requires human-in-the-loop verification to prevent negative transfer.</p><p>Looking ahead, we are building an automated <strong>Domain Adaptation Platform</strong> that can detect domain shifts and fundamentally streamline the workflow. This allows us to onboard new retail partners faster and in a more scalable way, while continuing to deliver performant ad systems from day one.</p><h3>References</h3><ul><li><a href="https://tech.instacart.com/one-model-to-serve-them-all-0eb6bf60b00d">One model to serve them all</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=870730e6add5" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/empowering-carrot-ads-with-domain-adaptive-learning-870730e6add5">Empowering Carrot Ads with Domain Adaptive Learning</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Our Early Journey to Transform Instacart’s Discovery Recommendations with LLMs]]></title>
            <link>https://tech.instacart.com/our-early-journey-to-transform-instacarts-discovery-recommendations-with-llms-cf4591a8602b?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/cf4591a8602b</guid>
            <category><![CDATA[large-language-models]]></category>
            <category><![CDATA[recommender-systems]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[ai]]></category>
            <dc:creator><![CDATA[Moein Hasani]]></dc:creator>
            <pubDate>Thu, 26 Feb 2026 18:55:35 GMT</pubDate>
            <atom:updated>2026-02-26T18:55:34.156Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4cPvbqERfSPT-ZrLgROqXg.png" /></figure><p><strong>Key Contributors: </strong>Moein Hasani, Hamidreza Shahidi, Trace Levinson, Guanghua Shu</p><h3>Introduction</h3><p>At Instacart, we are laser-focused on improving the user experience by making shopping feel easy, engaging, and personalized. Our discovery surfaces play a central role in bringing this to life. Alongside explicit Search intents, discovery is our opportunity to meet customers’ implicit needs, presenting them with the most relevant and inspiring content we have to offer. The main discovery surface within the Instacart app, referred to here as the “Shopping Hub”, is one of the most critical in this regard. This is the surface a customer lands on within the Instacart app after selecting their desired retailer, guiding them along their entire journey. What users see here shapes not just what they buy, but how intuitive and enjoyable their experience feels.</p><p>Given its importance, our team runs dozens of Shopping Hub experiments per year, constantly evaluating new ways to enrich the discovery experience. Historically, these experiments have been constrained by static content libraries feeding our recommendation systems.</p><p>With the rapid advancement of generative AI, a critical opportunity began to emerge: rather than incrementally improving a swath of legacy systems, could we leverage LLMs to rethink how content shows up for a user from the ground up? Which new primitives could we build to uplevel quality, personalization, and cohesion across the page?</p><p>This blog post walks through our early journey to answer these questions. By investing in a new AI-native platform for content generation, evaluation, and retrieval, we have found generative models to show real promise in improving recommendations at scale. Below, we highlight the approach we took in developing this platform, a few key learnings so far, and where we’re most bullish moving forward.</p><h4>Limitations of Traditional Recommendation Engines</h4><p>Our Shopping Hub page is constructed from multiple subcomponents called placements. Each placement contains a number of products or other entities within it. The example below can help us visualize how these various pieces ladder up to the full page.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Olbz2OD5tftVpOyj4zWTZg.png" /><figcaption>Fig 1. Current Shopping Hub</figcaption></figure><p>Today, the flow to generate and serve Shopping Hub content looks something like the following:</p><ol><li>Text, visual assets, and the underlying retrieval sources for a new placement are defined explicitly. Titles are often generic, such as “Dairy”, with a narrow set of retrieval sources used to fetch corresponding products. Once created, each placement enters the content library and becomes eligible for serving, universally, across all users. This generation process is human-driven and inherently assumes a one-size-fits-all content library that applies to all users.</li><li>Retrieval systems fetch these candidate placements and corresponding products at runtime.</li><li>From the retrieved set, ranking models order products and placements across the page to optimize against a static set of business metrics. Each placement is treated as an independent entity on the page within ranking models.</li></ol><p>This setup can perform well for optimizing average engagement under the above constraints. However, the reality is that different users have different needs on the platform, and business objectives and the broader environment are constantly changing. This results in a couple of key limitations for our recommendations:</p><ol><li><strong>Difficulty in scaling personalized content: </strong>The human-driven process above is expensive and time-intensive, with teams managing both generation and content QA by hand. As a result, traditional architectures inhibit the ability to quickly deploy and personalize new content — not only per user, but also according to seasonal and other shifting dimensions and business objectives.</li><li><strong>Lack of cohesion: </strong>Placements are often created by different siloed teams with divergent focus areas and goals. As a result, the series of placements can result in a chaotic surface presentation. Users are required to scroll without the ability to easily navigate the page to solve their needs.</li></ol><h4>So, where does AI come in?</h4><p>Large language models offer a natural mechanism for producing cohesive, dynamic, and personalized output. We began to explore ways to tackle the above limitations by introducing generative models into our recommendations stack. To narrow down our approach, we first began thinking through objectives for the system to ensure any solution would be anchored to North Star principles.</p><ul><li><strong>Delightful Personalization:</strong> The system should be capable of leveraging our rich user data to meet users where they are on the platform. One user may be health-conscious and focused on soups and salads. Another user looks to Instacart for home improvement goods and other category needs beyond only food and drinks. Given the diverse reasons our customers turn to Instacart, our primary motivator for the work was to enable rich, delightful experiences for <em>every</em> customer on our platform, rather than solving for averages. This may also include very different products retrieved for the same intent — a thematic “Breakfast” placement may prioritize waffles and pancakes for one user, but granola and yogurt for the other.</li><li><strong>Cohesion:</strong> The system should enable full cohesion across the page — every placement should be intentionally grouped, ordered, and aware of others around it. We want the discovery journey to feel seamless.</li><li><strong>Adaptability:</strong> The system should be responsive to rapid adjustments as our business environment shifts. This includes support for varying business objectives, such as relevance versus novelty, as well as temporal dimensions such as seasonal winter placements that can be spun up dynamically as they become relevant, then phased out.</li></ul><p>With these guiding principles, we narrowed consideration down to two core generative paradigms:</p><ol><li><strong>Bottoms-up generation:</strong> Directly generate all possible products to serve to a user, then cluster and organize them into placements.</li><li><strong>Top-down generation: </strong>Begin by generating ordered placements to structure the entire page, then generate products per placement.</li></ol><p>To visualize this distinction, let’s take a simple example and assume we are building two placements to recommend to a user. Generative models can compose the problem in one of two ways:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FvI-9buekpASiBQz-fm99A.png" /><figcaption>Fig 2. Comparing the generation methods</figcaption></figure><p>In the bottoms-up approach, a model generates a raw sequence with all relevant products, then clusters those products into <strong>Breakfast Staples </strong>and<strong> Health-Conscious Snacks </strong>themes.</p><p>Under the top-down approach, <strong>Easy Pasta Night </strong>and <strong>Gourmet Salad Fixings</strong> themes are first generated and ordered to meet user relevance, cross-theme cohesion, and other business goals. Products are then generated to map to each theme.</p><h4><em>Comparing the two methods for our use case</em></h4><p>The bottoms-up approach contains interesting benefits, such as deep flexibility with less constrained recall. However, it also presents difficulties in real-world settings due to latency requirements and catalog turnover. Further, with a much broader modeling task, it can be difficult to ensure generated products meet a diverse set of page requirements and intents, and may require significant fine tuning efforts as needs evolve. In other words, while the first two tenets could be achieved, we felt our adaptability goal would be put at risk. To best balance personalization, cohesion, and adaptability, we landed on a top-down, cascaded approach.</p><h3>Methodology</h3><p>After landing on the top-down approach, our next question was how exactly to decompose the problem. In early explorations, we evaluated the possibility of an all-in-one model that would directly generate placement content from raw signals. We started with this approach for simplicity, but ultimately found great value in decomposing generation into multiple targeted tasks. This opened the door to using retrieval‑augmented generation (RAG) and other techniques that aren’t feasible in a single‑step model, enabling us to achieve higher quality while improving cost efficiency.</p><h4>Overview</h4><p>The system consists of a few main phases:</p><ol><li>Page design &amp; theme generation</li><li>Retrieval keyword generation</li><li>Quality and diversity filtering</li><li>Product and pagewise ranking</li></ol><p>The first three phases form our generative content pipeline, while the final phase leverages existing ranking infrastructure for scalable serving.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NTZgQZlx5LLMaCxoWTh7RA.png" /><figcaption>Fig 3. Our generative content pipeline</figcaption></figure><p>Over the next few sections, we’ll walk through each of these components in detail and how they ladder up to the final Shopping Hub users see on the platform.</p><h3>Phase 1: Page Design &amp; Theme Generation</h3><p>First, a page design agent leverages user context (purchase history, engagement signals, and other derived preferences) to produce a set of high-level themes personalized to each user. Themes are designed to represent discrete and coherent shopping intents (for example, “Flavor builders for weeknight meals” or “Functional hydration, lower sugar”). We leverage constrained decoding with a structured schema to ensure interpretability and downstream usability.</p><p>To optimize downstream token efficiency, Phase 1 outputs both placement entities as well as a set of derived signals, such as user personas and freeform product concepts that align well with user context and placement intent. This removes the need for redundant context passthrough along each stage of the pipeline.</p><h3>Phase 2: Retrieval Keyword Generation</h3><p>Once generated in Phase 1, each theme is mapped to one or more retrieval-compatible descriptors (ex: search query strings, categories from our catalog taxonomy, product attribute filters). We explored various descriptor representations and ultimately found structured, taxonomy-grounded representations to perform best in upholding relevance. For simplicity, we will refer broadly to these descriptors as <em>keywords</em> here.</p><h4>Teacher-Student Fine Tuning</h4><p>To meet latency and cost requirements, we leveraged teacher–student learning: a closed-weight LLM first generates high-quality supervised data, validated on a small sample by human annotators. An LLM judge is then used to prune poor-quality data from our fine-tuning dataset, and an internal model is fine-tuned to imitate the teacher while satisfying domain-specific constraints.</p><p>Finally, we performed a number of ablation studies to converge toward the optimal student model:</p><ul><li>Open-weight base model explorations across the Llama and Qwen families</li><li>LoRA adapter addition at varying ranks</li><li>Finetuning sample size augmentation</li></ul><h4>RAG</h4><p>To further improve prompt efficiency while maintaining strong precision, we incorporated retrieval-augmented generation (RAG) into the keyword generation pipeline. First, the page design LLM in Phase 1 generates freeform product concepts, such as “eggs”, from its universal knowledge base that align well with user context and placement intent. Embeddings are generated for these concepts. In the keyword generation model, we then restrict eligible candidate keywords per theme using embedding‑based similarity. Roughly 100 nearest neighbors are retrieved from a 300,000‑term keyword corpus, and only this refined subset is passed down for the second LLM to select from as final recommendations. This first-pass candidate pruning reduces input context significantly in the second LLM, reducing all-in generation costs by 15–20% in each generation. This became a core motivator for adopting a cascaded generation architecture. A single‑LLM setup would instead require the full keyword corpus to be passed directly into the prompt to maintain the same level of precision.</p><h3>Phase 3: Quality and Diversity Filtering</h3><p>Given the dynamic nature of this system, guardrails help to prevent cross-placement redundancies and ensure high-quality content. The system handles this in a few stages:</p><ol><li>To ensure sufficient diversity, embeddings are generated for each placement’s content, and similarity-thresholded deduplication is applied to remove redundant placements.</li><li>For broad quality validation, LLM-as-a-judge workflows are deployed against a small proportion of users to ensure overall theme quality and brand compliance. Theme-product relevance is then enforced through a fine-tuned cross encoder, which explicitly classifies the relevance of each placement’s products to their overarching theme. Low-scoring entries are flagged for offline filtering or repair before serving deployment. Our full suite of evaluators is described in more detail in the Evals section below.</li><li>Finally, additional guardrails kick in to enforce business and policy constraints. For example, we should ensure all original business objectives from agent instructions are addressed. Furthermore, it is critical to ensure themes do not misalign with Instacart’s brand, or even hallucinate with harmful or inappropriate pairings like alcoholic products for a child’s birthday party.</li></ol><h3>Phase 4: Product &amp; Pagewise Ranking</h3><p>Finalized placements and keywords are cached for runtime retrieval. Existing product and placement ranking services retrieve all generated entities, perform additional ranking and post-processing, and return finalized ordered entities on the page. This design modularizes the system, decoupling generative retrieval from mature ranking systems and providing a path to deeper pagewise control as the generative component matures.</p><h3>Designing for Rapid Iteration: Treating Evals as a First-Class Citizen</h3><p>When developing any AI-native system, particularly one that generates dynamic content served to millions of users, quality enforcement is essential. Potential for off-brand or other low-quality content can quickly degrade trust with our customers, so our team invested deeply in designing a robust suite of LLM-based and other evaluators, enabling us to iterate with confidence. This not only helped us derisk adverse behavior; it also became a massive accelerant. Given the vast exploration space for generative recommendations, online iteration would be slow, variance-prone, and cost-prohibitive. After a temporary slowdown upfront, the benefits of our QA investments have begun to compound across both velocity and output quality.</p><p>Below, we’ll walk through our three-pronged Eval framework:</p><ol><li>LLM-as-a-judge evaluators</li><li>Fine-tuned QA at scale</li><li>Traditional ML and metric-based evaluators</li></ol><h4>LLM-as-a-Judge</h4><p>First, a rich suite of LLM-as-a-judge evaluators audits output along each level of the content hierarchy. Quality is graded along dimensions such as the following:</p><p>At the page level:</p><ul><li>Does the page feel cohesive enough? Diverse enough?</li><li>Does the full set of generated placements cover all of our business needs?</li></ul><p>At the placement level:</p><ul><li>Are the titles of high quality and aligned with our brand?</li><li>Do placement themes align with user preferences and order behavior?</li></ul><p>At the product level:</p><ul><li>Have we maintained sufficient product recall in the final output?</li><li>Are the underlying retrieval keywords and products still aligned with the title’s thematic intent?</li></ul><p>To build trust in this framework, we developed a series of human-in-the-loop (HITL) workflows to build ground truth data, tuning the LLMs until passing high human-alignment thresholds.</p><h4>Unlocking Evaluation at e-Commerce Scale</h4><p>LLM-as-a-judge evaluators are a powerful tool. However, we found that while this framework guided us well at the averages, it failed at the edges. Since evaluating millions of candidates is cost-prohibitive, LLMs are unable to <em>take action </em>and improve quality at scale. Certain quality dimensions hit diminishing returns, such as preserving end-to-end model context: final products retrieved did not always align well with the placement’s upstream thematic intent.</p><p>Given this insight, we made the decision to supplement Evals with a fine-tuned DeBERTa model, classifying product-title relevance for every generated placement. This model is trained on the same HITL ground truth data generated for LLM-as-a-judge evaluators, synthetically augmented for broader teacher-student model learning.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*bgMlr3QkmlcRYO3T4jlsVg.png" /><figcaption>Fig 4. The finetuned placement evaluator</figcaption></figure><p>This model unlocked over a 99% cost reduction relative to closed-weight LLM inference. This enabled us to leverage it not only for evaluation, but also for full-scale quality filtering, where any placements classified as a severe violation are pruned before deploying to production.</p><h4>Classical ML and Metric-Based Evaluations</h4><p>Lastly, we rounded out the suite with a number of classical ML and metric-based evaluators. These span both explicit and derived signals already built for other use cases. We have found these to be useful proxies for broad relevance and quality:</p><ul><li>Average proportion of products represented in the user’s purchase history</li><li>Predicted user-product engagement scores from our existing ranking models</li><li>Average products per placement (density)</li></ul><h3>Bringing the Pieces Together</h3><p>Let’s see how the above pieces come together with a more detailed view of our content generation &amp; evaluation architecture:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sruzhzOFGhO3840VnLrC9g.png" /><figcaption>Fig 5. The final architecture</figcaption></figure><h3>Results</h3><p>This generative merchandising framework leads to a meaningful shift in placement composition. A sample of static vs. AI-generated placements for the same user can be seen in Figures 2a and 2b below. Compared to rigid single-category placements, generative placements tie closely to the user’s shopping history and build engaging themes composed of several underlying categories (e.g. meats, cheeses, and breads).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*S4X4zIJ4Wiejp1HT.jpg" /><figcaption>Fig 6a. Control static placements</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sPUqQmY-tZIxI3Ql9cvxCA.png" /><figcaption>Fig 6b. Generative recommendation placements</figcaption></figure><p>Amongst dozens<em> </em>of iterations, we began to observe generative policies outperform our baseline in offline evaluations.<strong> </strong>This validation finally built the level of confidence needed to perform large-scale A/B experiments comparing the generative page to the production baseline. While work remains to enable us to fully overhaul our prior systems, initial results have been quite promising.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rHAXa5h5PHGfjjrrhwT55g.png" /></figure><h3>Key Learnings</h3><p>Over the course of this work, we have taken away broad learnings that we hope can be valuable to other AI developers.</p><p><strong><em>Keep each modeling task focused:</em></strong><em> </em>Models perform best with a well-defined task. While all-in-one workflows are tempting when working with frontier models, we have found stronger, more easily tunable performance by decomposing into multiple tasks. This became most evident in our fine-tuning efforts — smaller models required a high level of handholding in sample data and label definitions to reach strong performance.</p><p><strong><em>Evals are worth the pain:</em></strong> It is difficult to overstate the value Evals provided in this work. Building comprehensive and human-validated LLM judges is admittedly a daunting task when kicking off a new project. But particularly for domains with highly variable output, such as personalized recommendations, a clear quality definition helps to prevent paralysis later on. We consistently steered our generation strategy to make progress against the evaluators we had set up.</p><p><strong><em>Adding structure to input and output layers meaningfully improves outcomes:</em></strong> A number of optimizations to our inference data flows were impactful in bringing this system to production. Efficient context handling — especially through RAG and aggressive token compression — unlocked richer input signals without ballooning cost. In the output layer, constrained generation ensures the model always produces reliable, production‑safe outputs. Together, these examples speak to a broader principle: well-structured flows lead to more dependable and scalable agentic systems.</p><h3>Looking Forward</h3><p>We are just getting started on this platform. Moving forward, we will expand the system to balance multiple objectives, such as relevance and novel inspiration, and introduce deeper personalization through real-time and sparse signals. We are also exploring reward modeling and reinforcement fine tuning (RFT) to enable self-improvement with tight feedback. An exciting direction here will be learning how our stack of traditional ranking models can be fused as reward models within post-training, bringing recommendations fully into the generative paradigm.</p><p>In parallel, our teams are also exploring how to scale generative recommendations to surfaces beyond the Shopping Hub, such as landing pages and Search results. Early experiments are showing promising signs — stay tuned!</p><h3>Acknowledgements and Final Notes</h3><p>We would like to extend deep gratitude to our cross-functional partners Dhruv Khanna, Logan Murdock, Roy Li, Aref Kashani, Shayaan Nadeem, Amish Popli, Lauren Downey, Shrikar Archak, Brett Brownell, and Brandon Silberstein, who have provided critical ongoing design feedback and driven system integrations to bring this research to production. Vinesh Gudla, Hechao Sun, Jingying Zhou, and Tejaswi Tenneti also made meaningful contributions in our early stages of development. Additional thanks to Pramod Adiddam and Venkatesh Shankar for steady leadership and support, enabling the team to push forward.</p><p>Our team is investing deeply to optimize how generative AI and traditional machine learning systems intersect. Interested in helping us advance the frontier? <a href="https://instacart.careers/current-openings/">Our machine learning teams are hiring</a>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cf4591a8602b" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/our-early-journey-to-transform-instacarts-discovery-recommendations-with-llms-cf4591a8602b">Our Early Journey to Transform Instacart’s Discovery Recommendations with LLMs</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Turning Data into Velocity: Caper’s Edge and Cloud Data Flywheel with Capsight]]></title>
            <link>https://tech.instacart.com/turning-data-into-velocity-capers-edge-and-cloud-data-flywheel-with-capsight-544a49ca3db7?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/544a49ca3db7</guid>
            <dc:creator><![CDATA[Youming Luo]]></dc:creator>
            <pubDate>Tue, 17 Feb 2026 16:24:56 GMT</pubDate>
            <atom:updated>2026-02-17T23:12:47.544Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>Key Contributors:</strong> Youming Luo, Andrew Tanner, Matas Sriubiskis, Sylvia Lin, Sikun Zhu, Lei Li, Xiao Zhou</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VFb9zvLNvDFsUn4fZOsZCg.png" /></figure><h3>Introduction</h3><p>Caper is Instacart’s AI-powered smart cart that provides customers with a fast, seamless, and intuitive shopping experience. We achieve this through computer vision and multi-sensor fusion to power accurate product recognition and effortless checkout. Delivering this experience requires Caper’s AI models to understand what truly happens in stores — the movement, intention, and decisions unfolding across every grocery aisle.</p><p>Historically, our ability to learn from production environments was limited. Even though the carts were deployed in stores, we lacked a scalable way to collect real‑world data that would allow us to rapidly iterate and improve our models. This resulted in three core challenges:</p><ul><li><strong>Scalable Onboard Observability</strong>: We had little visibility into what was happening on the cart, in the stores. When something went wrong, it was hard to understand or reproduce the scenario. At the same time, each cart generates gigabytes of multimodal data, from sources such as cameras, weight sensors, and localization sensors. We needed a centralized way to capture key moments so the team could clearly understand what the cart experiences, how users interact with it, and where to improve — all while maintaining a magical user experience and minimal impact on the network.</li><li><strong>Data Quality and Diversity: </strong>Our models were primarily trained on manually-collected data that didn’t fully reflect the complexity of real-world stores, including lighting changes, occlusions, damaged packaging, motion blur, unusual angles, and store-specific products. This gap can hurt the user experience. We need a reliable way to systematically gather high-quality, diverse production data that could increase models’ robustness and accuracy.</li><li><strong>End-to-End Model Cycle Time</strong>: Turning production data into model updates required manual data cleaning, triage, labelling and training. This made the end‑to‑end model development cycle not only slow but also expensive. We want a rapid, automated way to learn from the vast diversity of real‑world data so the full end‑to‑end model iteration cycle can improve on a weekly cadence instead of every month, and so the cost of iteration would not grow linearly with deployment size.</li></ul><p>To address these challenges, we built <strong>Capsight, </strong>an end-to-end platform that transforms our fleet of Caper carts into a distributed data collection and model improvement engine, enabling our carts to get smarter on their own over time.</p><h3>The Capsight Ecosystem: Core Components</h3><p>Capsight creates a continuous feedback loop connecting live edge-captured data directly to our ML training workflow. It consists of three components that work together to create our data flywheel:</p><p><strong>Collect → Manage → Label → Train → Deploy</strong></p><ul><li><strong>Capsight Collector:</strong> An intelligent, on-device agent that captures high-value data from a variety of cart sensors (camera, weight, location, etc.).</li><li><strong>Capsight Depot:</strong> A centralized cloud platform for data management. It ingests, processes, and indexes all incoming data while performing data cleaning and quality processing, making the data searchable, explorable, and ready for annotation.</li><li><strong>Capsight Learner:</strong> A distributed training platform that consumes curated datasets from the Depot to train, evaluate, and accelerate model iteration.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KI-KPJVwII9dOyOnD4RVRA.png" /><figcaption>Fig 1: Capsight Ecosystem</figcaption></figure><h4>Capsight Collector: Intelligent Capture on the Edge</h4><p>The Capsight Collector is the foundation of the entire system: the cart’s central data-gathering agent. Its mission is to capture a holistic view of every interaction by collecting synchronized data from all of the cart’s sensors.</p><p>While the initial phase focuses on high-value computer vision data from the cameras, the Collector is designed as a multi-modal platform. We integrated other crucial data streams, such as weight data from the scale and location data within the store, to build a complete picture of each event.</p><p>To achieve this <em>without</em> impacting cart performance or overwhelming store networks, we solved several key engineering challenges:</p><ul><li><strong>Intelligent, Trigger-Based Capture: </strong>To avoid collecting terabytes of irrelevant data, the Collector operates on a trigger-based system. It only begins capturing data when an important event is detected. The initial trigger is a combination of an activity signal (like a hand motion) and a recognized barcode, with more signals being developed. This dual signal gives us high confidence that a meaningful interaction is occurring. The sensitivity of this trigger is an important trade-off. Collecting useless data is expensive and increases noise, but missing signals decreases training input.</li><li><strong>Optimized On-Device Processing</strong>: We leverage dedicated hardware for video encoding, which ensures the entire collection process runs with zero performance regression on the cart’s primary AI tasks. We also built a dedicated communication protocol so that weight and location data can be collected without any performance degradation.</li><li><strong>Resilient Uploading Workflow: </strong>The collected data is stored locally on the cart’s storage device. The Uploader is designed to work seamlessly within the store environment, carefully managing upload timing and bandwidth to avoid any impact on retailer operations or network performance. To prevent disk space issues, it includes a storage check to pause collection if usage exceeds a configurable threshold and an auto-cleanup mechanism to remove the oldest files if the upload fails.</li></ul><h4>Capsight Depot: Ingestion, Curation, and AI-Assisted Annotation</h4><p>Once the Collector uploads the raw sensor packages, the Capsight Depot begins the process of transforming these massive volumes of raw files into structured, high-quality training datasets:</p><ul><li><strong>Processing:</strong> A distributed data processing system ingests the raw files, extracts metadata, and performs quality checks to ensure the data is consistent and ready for downstream use.</li><li><strong>Indexing, Search and Visualization:</strong> All data and metadata are securely stored, indexed, and enriched to support search and deeper semantic exploration through a user‑friendly web interface. This allows the team to quickly investigate production scenarios by filtering relevant metadata and immediately accessing corresponding videos and logs.</li><li><strong>AI-Accelerated Annotation and Curation:</strong> A key function of the Depot is preparing data for annotation. However, as Capsight scales to millions of images daily, manual labeling becomes a significant bottleneck in both cost and time. To break this bottleneck, we’ve integrated a powerful <strong>Vision Language Model (VLM)-based pre-labeling service</strong> directly into the Depot and Labelling Platform. Instead of sending raw images for manual annotation, the pipeline first filters out empty background images. Then, a VLM, in combination with our teacher models, automatically generates high-quality pre-labels for items and barcodes. These pre-labeled images are then sent to human annotators for rapid correction rather than slow, from-scratch creation. This AI-assisted approach is projected to <strong>reduce annotation costs by over 70%</strong> and cut down a multi-day labeling task to just a few hours. It even allows us to efficiently clean errors from our historical ground truth data.</li></ul><h4>Capsight Learner: Closing the Training Loop</h4><p>With a curated, labeled dataset ready in the Depot, the Capsight Learner completes the flywheel. This distributed, Ray-based training platform automates the process of consuming these datasets to train new model versions. An automated evaluation pipeline benchmarks models against standardized test sets, ensuring only validated improvements make it to production. By providing a direct path from labeled data to a trained model, the Learner dramatically accelerates our model training cycle. As a result, we reduced the model training stage from one week to two days.</p><h3>Impact: The Flywheel in Motion</h3><p>By closing the data loop, Capsight has transformed our AI development process from a reactive, manual effort to a proactive, automated one. Providing a measurable and immediate impact for retailers and their customers.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/512/1*KQSmfL3YJrgQ8uIv8bkNBg.png" /><figcaption>fig2: Example Capsight Data</figcaption></figure><p>We’re already seeing significant accuracy gains. Within just weeks of deployment, we collected enough diverse, real-world data to train improved models that showed more than 5% improvement in accuracy, with continued gains as the deployment scales. Our dataset is now richer and systematically captures edge cases, lighting variations, and store-specific products that make our AI models more robust.</p><p>The iteration cycle is also dramatically faster. The full loop of collecting data, labeling, training, and releasing a model used to take about a month, and now completes in a week. Customers can benefit from these improvements much sooner.</p><h3>Future Work: What Capsight Unlocks</h3><p>Our journey with Capsight is just beginning. Next, we’re expanding beyond vision to full sensor fusion, combining camera, weight, motion, and location data. This richer, multi-modal dataset will fuel a foundation model capable of understanding real-world store environments across vision, motion, weight, and behavior.</p><p>This foundation model enables powerful capabilities:</p><ul><li>Detecting complex multi-item interactions and intent</li><li>Improving location-based experiences</li><li>Automatically surfacing the most valuable data for model improvements</li></ul><p>As the system scales, we’re also optimizing costs: from multi-attribute extraction in a single pass to efficient VLM inference, ensuring that Capsight grows smarter and more scalable with every iteration.</p><h3>Conclusion</h3><p>Capsight represents a step change in how we build and ship AI with Caper at Instacart. By wiring observability and data feedback directly into our ML loop, we’ve created a closed-loop system that accelerates learning, boosts model accuracy, and strengthens reliability across our in-store retailer technology.</p><p>With higher accuracy, faster debugging, and dramatically shorter iteration cycles, Capsight allows retailers and their customers to feel improvements as soon as they’re made. It transforms real‑world data into a continuous innovation engine — one that strengthens Caper today and lays the foundation for the next generation of in‑store AI.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=544a49ca3db7" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/turning-data-into-velocity-capers-edge-and-cloud-data-flywheel-with-capsight-544a49ca3db7">Turning Data into Velocity: Caper’s Edge and Cloud Data Flywheel with Capsight</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[From print to digital: Making weekly flyers shoppable at Instacart through computer vision and LLMs]]></title>
            <link>https://tech.instacart.com/from-print-to-digital-making-weekly-flyers-shoppable-at-instacart-through-computer-vision-and-llms-739cae1f5629?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/739cae1f5629</guid>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[llm-applications]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Prithvi Srinivasan]]></dc:creator>
            <pubDate>Mon, 09 Feb 2026 20:45:50 GMT</pubDate>
            <atom:updated>2026-02-09T20:45:49.239Z</atom:updated>
            <content:encoded><![CDATA[<h3>From Print to Digital: Making Weekly Flyers Shoppable at Instacart Through Computer Vision and LLMs</h3><p>Key contributors: Prithvi Srinivasan, Shishir Kumar Prasad, Kristen Morgan, Bryan Pham, Rick Shukla, Preeti Chadha, Vipul Bahubali, Ahmad Sajedi, and Ali Maleky</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5LdpGLfEgdvt9dZybF3tBQ.png" /></figure><h3>Introduction</h3><p>Grocery flyers have long been a cornerstone of retail promotions, from paper inserts in the newspaper to email blasts featuring weekly deals. As more grocery shopping shifts online, however, these static promotions haven’t kept pace with customer expectations for convenience and interactivity. At Instacart, we recognized the opportunity to transform static promotional content into interactive, shoppable experiences.</p><p>In 2024, we launched <a href="https://www.instacart.com/company/updates/new-ways-to-save-on-instacart/">grocery flyers on our platform[1]</a>, enabling retailers to upload their weekly and monthly promotions. This enabled our customers to browse through weekly deals for their favourite retailers, providing easy ways to save.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*f8fJsidq2f-TBF2WzT1vmQ.png" /><figcaption>Fig 1: Sample grocery flyer</figcaption></figure><p>Customers expect digital flyers to look and feel like the physical versions they’re used to, with the added ability to tap on items and shop directly. To deliver that experience early on, we relied on a manual digitization process. This involved drawing bounding boxes around every deal and accurately matching those deals to products to our catalog — a painstaking task that required 3–4 hours per flyer.</p><p>As the feature gained traction with retailers, this manual approach quickly became unsustainable. With dozens of retailers uploading weekly flyers, our team faced a mounting workload of hundreds of hours each week. The manual process also required the retailers send us the flyers well ahead of time so we could process them before the deals went live.</p><p>With multiple retailers eager to adopt weekly flyers, we needed a scalable solution that could handle the complexity and variety of flyer designs from simple grid layouts to complex promotional spreads featuring everything from branded packaged goods to fresh produce. Each flyer presented unique challenges in product presentation and layout, making it clear that a one-size-fits-all approach wouldn’t work.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*03K7kicpWKrtJmPPZzRj9w.png" /><figcaption>Fig 2: Manual workflow to digitize flyer image</figcaption></figure><p>While existing solutions like <a href="https://arxiv.org/abs/2308.05938#:~:text=,detector%2C%20which%20renders%20FoodSAM%20to">FoodSAM[2]</a> showed promise for food-specific segmentation, they fell short of addressing the breadth and variety of products featured in retail flyers. It became clear we needed a purpose-built approach: one that combined state-of-the-art computer vision models with custom algorithms tailored to the unique challenges of grocery flyer digitization.</p><h3>Our Approach: A two phase pipeline</h3><p>We developed a two-phase pipeline as seen in figure 3 to transform static flyer images into interactive shopping experiences. The entire process now takes less than 30 minutes once a flyer is uploaded, a dramatic improvement from the 3–4 hours of manual work previously required.</p><p><strong>Phase 1: Image Segmentation</strong> — Identifying and extracting bounding boxes around each product or deal on the flyer. This phase uses a custom algorithm built on Meta’s Segment Anything Model (SAM), enhanced with techniques to handle the unique challenges of retail flyers: overlapping products, decorative text, varying layouts, and products of all sizes.</p><p><strong>Phase 2: Product Identification</strong> — Matching each segmented box to actual products in our catalog. This phase leverages optical character recognition (OCR), large language models, and our existing search infrastructure to accurately identify products and their attributes, even when deals feature multiple items or generic produce.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Zz1B0bls5NajPhg_zSeVOw.png" /><figcaption>Fig 3: Automated flyer tagging workflow</figcaption></figure><h3>Phase 1: Image Segmentation</h3><p>The first step in digitizing flyers is image segmentation. Initial experiments with off-the-shelf ML solutions to extract bounding boxes around each deal revealed significant limitations. <a href="https://huggingface.co/blog/vlms">Vision Language Models (VLMs)[3]</a> or multimodal LLMs work for very simple flyers where the boxes are well separated and few in number. For simple flyers we iteratively ask multimodal LLMs where each box begins (X &amp; Y coordinates) by drawing uniform grid lines as shown in figure 4 below. Once we identify the first coordinates we divide the selected box into smaller boxes to find the starting and ending coordinates for each segmentation box. We achieved a high accuracy (~90%) for simple flyers through this method.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*iRofi9Br7Tw0yZQnA2eXyg.png" /><figcaption>Fig 4: Image of LLM based solution for simple flyers</figcaption></figure><p>However, for more complex flyer images like seen in figure 5, multimodal LLMs produce imprecise bounding boxes. Traditional segmentation and contour detection models generated excessive noise, rendering their outputs unusable without extensive post-processing. These challenges led us to develop a hybrid approach that leverages the strengths of multiple techniques while addressing their individual weaknesses.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*jm4kNCeLgJAxaQaVk7wbXQ.png" /><figcaption>Fig 5: Sample image of complex flyers</figcaption></figure><p>Our solution builds upon Meta’s <a href="https://segment-anything.com/">Segment Anything Model (SAM)[4]</a> as a foundation, but it requires custom techniques to denoise smaller boxes, merge multiple products belonging to the same deal, remove text-only boxes, and more:</p><h4>Text Box Removal</h4><p>Flyers often contain decorative elements and promotional text that don’t correspond to specific products. Our system intelligently identifies and removes these extraneous elements, ensuring our segmentation focuses exclusively on product-related content. This preprocessing step significantly improves the accuracy of subsequent stages.</p><h4>Box Merging with Weighted Boxes Fusion (WBF)</h4><p>To consolidate overlapping detections and improve localization accuracy, we employed the <a href="https://arxiv.org/abs/1910.13302">Weighted Boxes Fusion (WBF)[5]</a> technique. Unlike traditional Non-Maximum Suppression (NMS), which may discard valuable information by eliminating lower-confidence boxes, WBF combines all overlapping boxes by computing a confidence-weighted average of their coordinates. This approach retains more information and often results in more precise bounding boxes.</p><p>WBF has demonstrated significant improvements in various applications. For instance, in medical imaging, combining outputs from multiple detectors using WBF has led to an increase in mean Average Precision (mAP) by approximately 3–10% over the best single model. Similarly, in our application, merging nearby boxes that likely represent the same product enhances detection accuracy and reduces redundancy.</p><h4>Model Ensembling</h4><p>To leverage the strengths of different detection approaches, we combined outputs from segmentation models and contour detection algorithms. The decision whether or not to use contour detection models was based on how densely the flyer images were packed. This varied from retailer to retailer. This ensemble strategy allows us to capture a broader range of product representations, as different models may excel in detecting various features. By integrating their outputs, we achieve a more comprehensive and robust detection system.</p><h4>Filtering with Heuristics and Machine Learning</h4><p>Post-processing is crucial to eliminate false positives and refine detections. We applied a combination of heuristic rules, such as filtering based on the relative size and aspect ratio of bounding boxes, and machine learning-based filters trained to distinguish between valid product boxes and noise. This dual approach ensures that only the most relevant and accurate detections are retained for further processing. By combining these methods, we’re now able to accurately extract most of our targeted bounding boxes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wbBS-JWjPj0j9ZZ0LN7M7g.png" /><figcaption>Fig 6: Flow diagram of custom segmentation</figcaption></figure><p>In the images below, you’ll see how some of the out of the box approaches work in comparison to our developed algorithm. Below in figures 7–9 you can find the image processed through the LLM multimodal model, the image after processing with Meta’s Segment Anything Model, and the final image after running through our algorithm.</p><p>In figure 7 below you can see flyer segmentation through a multimodal LLM as a one-shot approach. This renders the coordinates unusable.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MlI7q3SPfZSkt12R4IuDwQ.png" /><figcaption>Fig 7: Segmentation by multimodal LLM</figcaption></figure><p>In figure 8 below, you can see flyer segmentation with Segment Anything Model (SAM). This renders too many noisy boxes. The off the shelf model is designed to segment to the smallest segment possible without context of grouping. For example, each coffee bean and scoop of ice cream is rendered as a separate segment or box shown by the different colors assigned in figure 8 below. This leads to complexities while trying to fuse the boxes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*beQi6Bw1l2Xu1FNTGK04lg.png" /><figcaption>Fig 8: Segmentation by SAM</figcaption></figure><p>In figure 9, below, you can see the accuracy of the custom pipeline.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*B7MbeYbqHt2Ag6qViH2KdQ.png" /><figcaption>Fig 9: Flyer Segmentation with our custom pipeline</figcaption></figure><p>By combining these techniques, we’re now able to accurately extract 75–90% of targeted bounding boxes depending on flyer design, with significantly higher precision than any single off-the-shelf solution.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TXE3XAqPG8LL4SaN_YB-dg.png" /><figcaption>Fig 10: Precision and recall performance of custom segmentation model</figcaption></figure><h3>Phase 2 — Image to Product Identification</h3><p>The second step of this process is to identify the elements or products within each box. We wanted to use the power of our existing Search infrastructure to identify elements within the image instead of reinventing image-based search from scratch. Instacart’s <a href="https://www.instacart.com/company/how-its-made/how-instacart-uses-embeddings-to-improve-search-relevance/">search algorithm[6]</a> uses a two-tower algorithm with query features and product features to find the top matching products for any given query.</p><p>To transform image information into queries that are similar to Instacart searches and match attributes from the image to the product catalog, we developed a way using Optical Character Recognition (OCR), LLM, and our Search ANN (Approximate Nearest Neighbors) cluster to get a highly precise match for every item on the flyer.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Z7ZvWhR8DOrcGrgCpLYS6Q.png" /><figcaption>Fig 11: Flow diagram for phase 2 — item identification</figcaption></figure><p>We use <a href="https://github.com/PaddlePaddle/PaddleOCR">PaddleOCR[7]</a>, a state-of-the-art OCR model, to identify text on images and extract text from the image. We pass this output along with the original image to an LLM model to separate deal images into queries and attributes for each product, as some deals might contain multiple products. We have noticed that adding the output of the OCR model increases the find rate of products by 15% on average.</p><p>Next, using our Search Approximate Nearest Neighbors Algorithm (ANN), we are able to find the top 10–15 products that match the product query embedding. To complete our matches, we retrieve product attributes stored in our Instacart feature store and use LLM to rank the matches based on the matching attributes. This step of the process also helps us to eliminate noisy boxes from the segmentation step, which might not contain products in them. This part of the algorithm has a 95% recall of finding the product in the top position.</p><p>Together, these techniques deliver transformative improvements to our workflow. What previously required 3–4 hours of manual effort per flyer, drawing bounding boxes and matching products can now be reviewed and finalized in just 30 minutes. This represents a <strong>10x reduction in processing time</strong>, enabling us to scale flyer digitization across our entire retail network.</p><h3>Future directions for flyers</h3><p>Our work on digital flyers demonstrates how combining multiple AI approaches can solve complex real-world challenges. By integrating SAM, contour detection, OCR, and large language models with our existing search infrastructure, we’ve transformed static flyer images into interactive shopping experiences. This approach opens up opportunities for new retail experiences and deal discovery for customers, while also laying a foundation for future exploration of time-saving innovations in retail AI.</p><h3>References</h3><ol><li><a href="https://www.instacart.com/company/updates/new-ways-to-save-on-instacart/">https://www.instacart.com/company/updates/new-ways-to-save-on-instacart/</a></li><li><a href="https://arxiv.org/abs/2308.05938#:~:text=,detector%2C%20which%20renders%20FoodSAM%20to">https://arxiv.org/abs/2308.05938#:~:text=,detector%2C%20which%20renders%20FoodSAM%20to</a></li><li><a href="https://huggingface.co/blog/vlms">https://huggingface.co/blog/vlms</a></li><li><a href="https://segment-anything.com/">https://segment-anything.com/</a></li><li><a href="https://arxiv.org/abs/1910.13302">https://arxiv.org/abs/1910.13302</a></li><li><a href="https://www.instacart.com/company/how-its-made/how-instacart-uses-embeddings-to-improve-search-relevance/">https://www.instacart.com/company/how-its-made/how-instacart-uses-embeddings-to-improve-search-relevance/</a></li><li><a href="https://github.com/PaddlePaddle/PaddleOCR">https://github.com/PaddlePaddle/PaddleOCR</a></li></ol><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=739cae1f5629" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/from-print-to-digital-making-weekly-flyers-shoppable-at-instacart-through-computer-vision-and-llms-739cae1f5629">From print to digital: Making weekly flyers shoppable at Instacart through computer vision and LLMs</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Migrating to Jetpack Compose]]></title>
            <link>https://tech.instacart.com/migrating-to-jetpack-compose-587e912ca858?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/587e912ca858</guid>
            <category><![CDATA[android]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[jetpack-compose]]></category>
            <category><![CDATA[android-development]]></category>
            <dc:creator><![CDATA[Matt Kranzler]]></dc:creator>
            <pubDate>Tue, 03 Feb 2026 21:58:14 GMT</pubDate>
            <atom:updated>2026-02-03T21:58:12.947Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>Migrating to Jetpack Compose: How AI Accelerated Our Journey at Caper</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*abTgQpORJ0AeroMQolmSPQ.png" /></figure><h3>Introduction</h3><p>At Instacart, our Caper smart carts bring together AI, computer vision, and real-time data to power the future of in-store shopping. Customers scan items as they shop and pay directly on the cart. The Android app powering each cart handles barcode scanning to payment processing, making stability critical in live retail environments.</p><p>Like many Android teams, we built our app using Fragments and XML layouts. By 2024, the ecosystem had moved to Compose-first, and staying on Fragments meant accumulating technical debt. Migrating hundreds of Fragment-based screens to Jetpack Compose is a significant undertaking, especially when a crash can lead to cart abandonment.</p><p>We initially scoped the migration as a multi-quarter effort. With AI coding assistants, our refactoring became significantly faster than anticipated.</p><p>This post outlines our four-phase migration strategy and how we used AI to transform tedious refactoring into a major accelerator.</p><h3>The Four-Phase Migration Strategy</h3><p>When we finally decided to adopt Compose, we started with a hybrid approach using Fragments as wrappers around Compose UI:</p><pre>class ShoppingCartFragment : BaseComposeFragment() {<br>    @Inject lateinit var viewModel: ShoppingCartViewModel<br><br>    @Composable<br>    override fun Content() {<br>        val viewState by viewModel.state.collectAsStateWithLifecycle()<br>        ShoppingCartView(viewState)<br>    }<br>}</pre><p>Our goal was clear: get off Fragments entirely and build a fully Compose app. Rather than waiting for a “big bang” migration, we designed a four-phase approach that would let us make progress incrementally while keeping the app stable in production.</p><h3>Phase 1: Implicit Fragment Hosts</h3><p><strong>Goal</strong>: Remove explicit Fragment classes from Compose-based features while maintaining Fragment-based navigation.</p><p><strong>Key Insight</strong>: Google released navigation-fragment-compose, a library that creates implicit Fragment hosts for Composables. This allowed us to write pure Compose screens without manually creating Fragment wrappers.</p><p>We established a new pattern for Compose features:</p><pre>// MyFeatureScreen.kt<br><br>@Composable<br>fun MyFeatureScreen() {<br>    // Required parameterless function for fragment-based navigation<br>    val bindings = provideMyFeatureBindings()<br>    val viewModel = provideMyFeatureViewModel()<br>    val fragment = LocalFragment.current<br>    val navController = fragment.findNavController()<br><br>    MyFeatureScreenInternal(<br>        viewModel = viewModel,<br>        scaffold = bindings.scaffold(),<br>        navigateBack = { navController.popBackStack() },<br>        navigateToDetails = { id -&gt; navController.navigate(R.id.details_fragment) }<br>    )<br>}<br><br>@Composable<br>internal fun MyFeatureScreenInternal(<br>    viewModel: MyFeatureViewModel,<br>    scaffold: ScaffoldProvider,<br>    navigateBack: () -&gt; Unit,<br>    navigateToDetails: (String) -&gt; Unit,<br>) {<br>    val state by viewModel.state.collectAsStateWithLifecycle()<br><br>    MyFeatureView(state = state, scaffold)<br><br>    LaunchedEffect(viewModel) {<br>        // handle effects &amp; invoke navigation callbacks<br>    } <br>}</pre><p>The split between the parameterless MyFeatureScreen() and internal MyFeatureScreenInternal() was intentional. The outer function handles navigation and dependencies (tied to Fragment world), while the inner function is pure Compose with callbacks, preparing us for eventual Compose Navigation.</p><h4>AI Acceleration in Phase 1</h4><p>We completed this phase manually without AI assistance. We adopted this pattern for new features while gradually migrating existing ones. This manual work was essential for understanding the migration patterns, edge cases, and conventions that would later inform our AI-assisted approach in subsequent phases.</p><h3>Phase 2: Type-Safe Navigation</h3><p><strong>Goal</strong>: Migrate from XML nav graphs and resource IDs to Kotlin DSL with type-safe routing.</p><p><strong>Key Insight</strong>: Compose Navigation doesn’t support resource IDs. Rather than migrate directly to Compose Navigation (which would be a massive lift), we could migrate to the Fragment-based Kotlin DSL graph first. This gave us type-safe routing while keeping Fragment navigation, making the eventual move to Compose Navigation much simpler.</p><p>For each feature, we needed to:</p><p><strong>1. Define type-safe route objects:</strong></p><pre>// feature/api/CouponDetailsRoutes.kt<br><br>@Serializable<br>data class CouponDetailsDialogRoute(<br>    val couponId: String,<br>    val eligibleItemsCollapsed: Boolean = false,<br>    val showClipAnimations: Boolean = false,<br>    val featuredItemImageUrl: String? = null,<br>) : DialogRoute</pre><p><strong>2. Create NavGraphBuilder extensions:</strong></p><pre>// feature/impl/CouponDetailsNavigation.kt<br><br>fun NavGraphBuilder.couponDetailsRoutes() {<br>    composableDialogFragment&lt;CouponDetailsDialogRoute&gt;(<br>        fullyQualifiedName = &quot;com.instacart.cart.feature.coupondetails.impl.ui.CouponDetailsDialogKt\$CouponDetailsDialog&quot;<br>    )<br>}</pre><p><strong>3. Build the Kotlin DSL nav graph:</strong></p><pre>// Before: XML-based<br>val xmlGraph = navController.navInflater.inflate(R.navigation.nav_graph)<br>navController.setGraph(xmlGraph)<br><br>// After: Kotlin DSL<br>navController.graph = navController.createGraph(<br>    startDestination = InitializationNavGraphRoute<br>) {<br>    initializationRoutes()<br>    onboardingRoutes()<br>    shoppingCartRoutes()<br>    productDetailsRoutes()<br>    checkoutRoutes()<br>    couponDetailsRoutes()<br>    // ... all other routes<br>}</pre><p><strong>4. Support gradual migration by combining old and new:</strong></p><pre>val xmlGraph = navController.navInflater.inflate(R.navigation.nav_graph)<br>navController.graph = navController.createGraph(<br>    startDestination = startDestination<br>) {<br>    // New type-safe routes<br>    onboardingRoutes()<br>    couponDetailsRoutes()<br>}.also { navGraph -&gt;<br>    // Still include XML graph for unmigrated features<br>    navGraph.addAll(xmlNavGraph)<br>}</pre><p>This allowed us to migrate features one by one without breaking anything. Navigation calls could use either approach during the transition:</p><pre>// Old style (still worked during migration)<br>navController.navigate(R.id.checkout_fragment)<br><br>// New style (type-safe)<br>navController.navigate(CheckoutScreenRoute)</pre><h4>AI Acceleration in Phase 2</h4><p>We started Phase 2 in late 2024 by manually migrating navigation graphs. The work was methodical but slow, with each sub-graph migration taking anywhere from a couple hours to much longer depending on complexity. We had 30+ sub-navigation graphs and 130+ destinations to migrate.</p><p>By early 2025, we began experimenting with AI-assisted migrations. The capabilities of large language models had improved significantly, representing a major leap in code understanding and refactoring accuracy. We developed an iterative workflow:</p><ol><li><strong>Learn by Doing</strong>: Complete a few migrations manually to understand patterns and edge cases</li><li><strong>Use Git History as Context</strong>: Provide the AI with commit diffs showing complete transformations</li><li><strong>Correct and Refine</strong>: Watch changes in real-time and immediately correct anything that doesn’t match conventions</li><li><strong>Update the Migration Guide</strong>: Capture learnings in markdown (naming conventions, edge cases, common mistakes)</li><li><strong>Repeat with Improved Context</strong>: Each iteration used updated guides and examples from previous successes</li></ol><p>This interactive correction was crucial. As models improved, tasks that once required manual fixes and smaller chunks became single-session successes with cross-file awareness and consistent style.</p><p><strong>The impact was substantial:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Z3kZX6ZEFKpmuPhf5MWbIQ.png" /></figure><p><strong>Notes:</strong></p><ul><li>AI reduced variance across migrations, improving quality and predictability.</li><li>Parallelization amplified speed gains: migrations continued while feature development stayed unblocked.</li><li>Migration time still varied by graph complexity, but the AI workflow dramatically reduced the long tail.</li><li>Improvements compounded as the migration guide improved over time.</li></ul><h3>Phase 3: Fragment to Compose Migration</h3><p><strong>Goal:</strong> Migrate all Fragment-based features to pure Compose screens following our established patterns.</p><p><strong>Key Insight:</strong> With type-safe routes in place and the fragment-less pattern proven in Phase 1, we can now systematically migrate legacy Fragments to pure Compose.</p><p>At the start of Phase 3, we had:</p><ul><li>Type-safe routing with Kotlin DSL nav graph</li><li>Fragment-less pattern for new Compose features</li><li>100+ features still implemented as Fragments (mix of XML views and Compose in Fragments)</li></ul><p>Each feature needs to be converted to follow our Phase 1 pattern: a &lt;Feature&gt;Screen.kt or &lt;Feature&gt;Dialog.kt file with the parameterless and internal Composable functions. This represents the heaviest lift in both technical complexity and developer time.</p><h4>AI Acceleration in Phase 3</h4><p>For Phase 3, we evolved our approach even further by creating a comprehensive AI command that serves as both a migration checklist and AI instruction guide. This 17-step workflow breaks down the migration into four main stages:</p><p><strong>1. Analysis and Baselining</strong></p><ul><li>Identify the fragment type (Fragment vs DialogFragment)</li><li>Analyze the XML layout and implementation</li><li>Create a baseline screenshot test using Paparazzi to capture the current UI</li></ul><p><strong>2. Compose Implementation</strong></p><ul><li>Create the new Compose View with identical UI</li><li>Build the Screen/Dialog structure following Phase 1 patterns</li><li>Set up dependency injection</li><li>Add Compose previews for development</li></ul><p><strong>3. Verification and Integration</strong></p><ul><li>Run automated visual parity check between baseline and new Compose screenshots</li><li>Update the navigation graph to use the new Compose route</li><li>Run all tests to ensure no regressions</li></ul><p><strong>4. Cleanup</strong></p><ul><li>Remove the old Fragment code</li><li>Final verification</li></ul><p>As we completed migrations using this workflow, we continued refining the guide based on learnings. Eventually, we formalized this migration guide into an AI skill, a reusable module that packages the entire 17-step workflow with context and conventions. Skills enable progressive disclosure of information, allowing the AI to access exactly what it needs at each step without overwhelming the context window. This evolution from markdown guide to structured skill dramatically improved migration effectiveness and consistency.</p><p><strong>The impact so far:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*w6jEzMBhHGPCzpEA303frg.png" /></figure><p><strong>Notes:</strong></p><ul><li>Phase 3 uses AI skills with <strong>engineer verification checkpoints</strong>, unlike Phase 2’s fully automated approach. We generate Paparazzi screenshots of the original Fragment view, which inform the AI in building the Compose version. Engineers then compare screenshots of both versions to verify visual parity, ensuring pixel-perfect matches.</li><li>Progressive disclosure of context allowed AI to handle complex, multi-file migrations consistently.</li><li>Migration time varies by fragment complexity, but the AI workflow dramatically reduces the long tail.</li></ul><h3>Principles for AI-Assisted Refactoring</h3><p>Our migration from Fragments to Compose, accelerated by AI tooling, surfaced several core principles about how large-scale modernization projects are evolving. Success depends not just on adopting new tools, but on integrating them with strong engineering practices and a pragmatic approach to risk management.</p><h4>1. The Economics of Technical Debt Have Changed</h4><p>The emergence of powerful AI coding assistants has fundamentally altered the cost-benefit analysis of addressing technical debt.</p><p>AI excels at repetitive, mechanical refactoring. The 5–7x speed increase we observed in Phase 2 saved an estimated 300–350 engineering hours and made migrations feasible that were previously too tedious to justify.</p><p>This shifts the engineering role. Previously, the engineer’s time was spent executing the changes. Now, the engineer’s primary responsibility moves from execution to definition and validation:</p><ul><li><strong>Architecture and Planning</strong>: Defining the migration phases and the desired end-state architecture</li><li><strong>Pattern Definition</strong>: Establishing the explicit conventions the AI must follow (see Principle 2)</li><li><strong>Validation and Oversight</strong>: Reviewing the AI-generated code for correctness, performance, and subtle runtime errors that the AI might miss</li></ul><p>Large-scale refactoring, once a months-long grind that slowed feature development, is becoming far more manageable.</p><h4>2. Treat AI Instructions as Code</h4><p>We achieved the most consistency and highest accuracy when we stopped writing documentation for humans and started writing instructions specifically for AI consumption. Our 325+ line migration guide is effectively a program that the AI executes.</p><p><strong>Documentation written for AI agents must be far more explicit</strong>. It forces you to articulate conventions, edge cases, and decision points that human developers may or may not intuit. This documentation serves triple duty: AI agents execute it autonomously, human developers use it as a checklist, and code reviewers verify that the steps were followed.</p><p>Like code, these guides require investment and must be iterated upon. After 5–6 migrations, we refined the guide until the process became highly predictable. The evolution from markdown guides to structured AI skills represents the formalization of this principle.</p><h4>3. Incrementalism Mitigates AI Risk</h4><p>Our four-phase strategy was essential for managing risk, especially in a critical hardware environment like ours. By decoupling the navigation refactor (Phase 2) from the UI refactor (Phase 3), we were able to ship small, AI-generated changes incrementally. Each phase had clear success criteria and could be validated in production before moving forward, reducing the blast radius of any potential errors.</p><h4>4. Invest in the Workflow, Not Just the Tool</h4><p>The AI tooling landscape moves incredibly fast. The specific tool matters less than the workflow surrounding it. Our iterative feedback loop unlocked velocity: learn by doing, provide context through git history, correct in real-time, update the guides, and repeat.</p><p>During our migration, new capabilities like AI skills emerged that dramatically improved our effectiveness. However, we had to adapt our workflow to capture the full benefit. Teams that simply adopt new tools without evolving their processes will miss most of the gains.</p><p>Realizing these productivity gains requires a human shift. Effectively collaborating with AI requires new skills: prompt engineering, creating effective migration guides and skills, and developing intuition for when to trust the AI versus when to intervene. Share your learnings with your team and demonstrate the impact.</p><h3>What’s Next</h3><h4>Phase 4: Compose Navigation</h4><p>With Phase 3 well underway, we’ve already begun work on the final step: migrating from Fragment-based navigation to Compose-based navigation. We’re running Phase 4 in parallel with completing Phase 3, using feature flags to support both navigation systems during the transition.</p><p>Because we completed the previous phases, Phase 4 is significantly simpler. The type-safe routes are in place, screens are migrating to pure Compose, and we’ve established proven migration patterns. Early experiments with AI skills for this phase have shown promising results, and we expect this final phase to proceed even faster than Phase 3.</p><h4>Scaling AI-Assisted Refactoring Across Instacart</h4><p>The learnings from this migration extend far beyond our Caper team. We’ve shared our approach through internal tech talks and documentation, and other engineering teams across Instacart are now using AI skills to tackle their own large-scale refactoring challenges.</p><p>Teams have adopted the same patterns we developed: treating AI instructions as code, building structured skills for repetitive migrations, and establishing iterative workflows with human oversight. Engineers across the company are applying these techniques to address technical debt that had been deprioritized for years.</p><p>The fundamental shift is this: <strong>the economics of technical debt have changed.</strong> Projects that previously had poor ROI due to high manual labor costs are now feasible. With significant efficiency gains and the ability to run migrations in parallel with feature development, teams can maintain velocity on their roadmap while systematically paying down technical debt.</p><p>This isn’t just about our Fragment-to-Compose migration. It’s about establishing a repeatable playbook for how engineering teams at Instacart can leverage AI to modernize our codebase at a pace that was unimaginable just a year ago.</p><h3>Conclusion</h3><p>At Instacart, our Caper smart carts are a key part of our Connected Stores technology, helping grocery retailers move faster and innovate with confidence. Delivering a best-in-class product requires disciplined refactors like our Compose migration. Investing in AI is core to our engineering mission, and we use it throughout our internal work to move quickly and deliver value. Our partners count on our AI expertise to unlock results faster, and we remain focused on advancing the in-store experience.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=587e912ca858" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/migrating-to-jetpack-compose-587e912ca858">Migrating to Jetpack Compose</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs]]></title>
            <link>https://tech.instacart.com/building-the-intent-engine-how-instacart-is-revamping-query-understanding-with-llms-3ac8051ae7ac?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/3ac8051ae7ac</guid>
            <category><![CDATA[context-engineering]]></category>
            <category><![CDATA[fine-tuning-llm]]></category>
            <category><![CDATA[llm-applications]]></category>
            <category><![CDATA[ecommerce-search]]></category>
            <category><![CDATA[query-understanding]]></category>
            <dc:creator><![CDATA[Yuanzheng Zhu]]></dc:creator>
            <pubDate>Thu, 13 Nov 2025 23:22:56 GMT</pubDate>
            <atom:updated>2025-12-04T20:48:46.604Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*i-xL0bOg4Oy0EBh9nhAULg.png" /></figure><p>Authors: Yuanzheng Zhu, Guanghua Shu, Raochuan Fan, Vinesh Gudla, Tejaswi Tenneti</p><h3>Introduction</h3><p>When people search for items on Instacart, they don’t always type perfectly worded phrases. They might write <em>“bread no gluten”</em> or <em>“x large zip lock”</em> — and that’s okay. Our job is to understand what they mean, not just what they type. This process, called Query Understanding (QU), is the intent engine that helps millions of customers find what they need on Instacart. Getting QU right is essential.</p><p>For years, we relied on traditional machine learning models. They worked well for many searches, but we wanted to deliver a truly intelligent experience for the endless variety of uncommon, highly-specific, or creatively phrased queries — what we call <em>long-tail searches</em>.</p><p>This pursuit led us to a new paradigm. Instead of building another bespoke model from the ground up, we opted to “stand on the shoulder of giants.” We turned to Large Language Models (LLMs) for their vast pre-trained knowledge. We saw the opportunity not just to <strong>use</strong> these models, but to <strong>steer</strong> them into becoming deep domain experts for our vertical. This post details that journey. Our strategy was layered, moving from <strong>context-engineering</strong> with <strong>guardrails</strong> to our ultimate goal: <strong>fine-tuning</strong> to distill proprietary knowledge directly into a LLM. This approach transforms a generalist model into a true specialist. It has shifted our core challenge from feature engineering to productionizing these powerful backbones while managing latency and cost.</p><h3>Challenges in Traditional Query Understanding</h3><p>Our journey to LLMs began with examining where traditional QU falls short. While essential for search at Instacart, accurately interpreting user intent is notoriously difficult for several reasons:</p><ul><li><strong>Broad Queries:</strong> Queries like <em>“healthy food”</em> or <em>“frozen snacks”</em> are common but difficult to act on. Their lack of specificity makes it challenging to narrow down relevant results, as they can span dozens of categories.</li><li><strong>Lack of Labeled Data:</strong> QU operates upstream and doesn’t benefit from direct feedback like clicks or conversions. The pseudo-labels we derive from user behaviors are inherently noisy — a user might search for<em> “bread”</em> but ultimately purchase bananas. Generating clean labels requires costly and time-consuming human evaluation.</li><li><strong>Tail Queries:</strong> Highly specific or rare searches like<em> “red hot chili pepper spice”</em> or<em> “2% reduced-fat ultra-pasteurized chocolate milk”</em> suffer from data sparsity. Models trained on engagement data struggle due to limited historical clicks or conversions, leading to poor generalization.</li><li><strong>System Complexity:</strong> To solve these problems, we historically trained and maintained multiple independent models for individual QU tasks. For instance, query classification and query rewrites were handled by entirely separate systems, each with its own logic (Figure 1). Each of these bespoke solutions demanded its own data pipeline, training and serving architecture. This heterogeneity introduced inconsistencies, slowed down development cycles, and made the overall QU system difficult to scale and evolve.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-C3_1SdXKNF_cMe2E0IsVA.png" /><figcaption><em>Fig 1. Our previous QU involved multiple independent models for individual QU tasks. For instance, query classification relied on a FastText model for multi-label classification, while query rewrites were generated by a separate system that mined user session behavior.</em></figcaption></figure><h3>The Advantages of LLMs</h3><p>To solve these problems, we turned to LLMs to consolidate and enhance our QU models. They offer several key advantages that improve the accuracy and efficiency of Instacart Search:</p><ul><li><strong>World Knowledge and Inference Capabilities:</strong> Trained on diverse textual data, LLMs possess world knowledge that enables them to make logical inferences from user queries. For example, an LLM already understands that <em>“Italian parsley”</em> is a synonym for <em>“flat parsley”</em>, while <em>“curly parsley”</em> is a common substitute. This capability dramatically reduces the manual engineering and specialized data required by conventional models, giving us a powerful head start.</li><li><strong>Simplified System:</strong> Because LLMs possess broad linguistic abilities, they enable us to consolidate numerous bespoke models. By replacing specialized models with a single LLM that can handle multiple NLP tasks, we eliminate the complexity of maintaining separate models and their inconsistencies.</li></ul><h3>LLM as QU: Our Strategy in Action</h3><p>We integrated LLMs by adding Instacart’s domain context in three ways:</p><ol><li><strong>Context-Engineering</strong>: Our primary method is Retrieval-Augmented Generation (RAG). We build data pipelines that retrieve and inject Instacart-specific context, such as conversion history and catalog data, directly into the prompt. This grounds the model in our business reality.</li><li><strong>Post-Processing Guardrails</strong>: We refine LLM outputs through validation layers. These guardrails filter out hallucinations and enforce alignment with Instacart’s product taxonomy.</li><li><strong>Fine-Tuning for Deep Expertise</strong>: For the most advanced use cases, we fine-tune models on proprietary data. This embeds deep domain expertise directly into the model’s weights and represents a key part of our long-term strategy for handling complex, long-tail queries.</li></ol><p>The following examples illustrate how we leverage some of these techniques to transform critical QU components.</p><h3>1. Query Category Classification</h3><p>Instacart’s catalog is organized into a vast, hierarchical product taxonomy that structures billions of items, from broad departments like “Meat” down to specific sub-categories like “Beef Ribs &gt; Short Ribs”. Accurately classifying queries into our product taxonomy is essential. It directly powers recall and ranking, helping us retrieve items from the right categories and intelligently expand the search when a query is broad or ambiguous.</p><p>Our legacy approach treated this as a massive multi-class classification problem. For a given query, the model would predict the top-K most likely categories from a flat list. For example, for <em>“butter milk”</em>, it might predict (“Dairy”, 0.95) and (“Milk”, 0.92) as distinct, non-hierarchical outputs.</p><p>This legacy approach suffered from two primary pitfalls. First, being trained on noisy conversion data (e.g., a user searches <em>“bread”</em> but buys bananas) means it can produce irrelevant suggestions. Second, it lacked deeper contextual understanding, preventing it from using world knowledge to classify new or nuanced queries like <em>“vegan roast”</em> correctly, as shown in Table 1.</p><p>Our new LLM-powered approach greatly improves precision and recall through a three-step process: first, we retrieve the top-K converted categories for each query as initial candidates; second, we use an LLM to re-rank them with injected Instacart context; and finally, we apply a post-processing guardrail. This filter computes a semantic similarity score between the embeddings of the original query and the LLM’s predicted category path, discarding any pair that falls below our relevance threshold.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6UFFj2Ki4CvFZTGlf1Kqyg.png" /><figcaption><em>Table 1: Comparison of category classification between the legacy model and the new LLM-based approach.</em></figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZY5ZCQpXfi3gcD2nRrkVBw.png" /><figcaption><em>Fig 2: Overview of the LLM for Query Category Classification system</em></figcaption></figure><h3>2. Query Rewrites</h3><p>Query rewrites are critical for improving recall, especially when the original query does not return sufficient results. Our legacy system mined candidate rewrites from user session data, but this approach was limited, covering only 50% of search traffic and often failing to generate useful alternatives for product discovery.</p><p>To address this, we turned to LLMs. Our initial attempt involved a simple prompt asking a single model to generate rewrites for recall enhancement. This proved too ambiguous. For example, for <em>“1% milk”</em>, the model might return <em>“one percent milk” </em>— a valid synonym but not a useful rewrite for discovering alternative products.</p><p>This led us to design specialized prompts for three distinct rewrite types: <em>Substitutes</em>, <em>Broader queries</em>, and<em> Synonyms</em>. Each type is handled by a dedicated prompt with advanced prompt engineering — incorporating specific instructions, chain-of-thought (COT) reasoning, and few-shot examples. To ensure the results are logical and useful, we apply post-processing guardrails, including filters for semantic relevance. This structured approach increased our query rewrite coverage to over 95% with 90%+ precision across all three types.</p><p>Building on this success, we are now adopting context engineering to make rewrites more convertible, personalized, and session-aware. We achieve this by injecting user engagement signals, such as the top-converting product categories from their subsequent searches in the same session.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fQCVmfoK_K6rnYxpiF08dA.png" /><figcaption><em>Table 2: Examples of structured query rewrites generated by specialized LLMs</em></figcaption></figure><h3>3. Semantic Role Labeling (SRL)</h3><p>Semantic Role Labeling (SRL) is the task of extracting structured concepts from a user query, such as <strong>product</strong>, <strong>brand</strong>, and <strong>attributes</strong>. These tags are critical for everything from search retrieval and ranking to ad targeting and filters.</p><p>Our goal was to leverage the power of LLMs to generate high-quality tags. However, the power-law nature of search traffic presents a challenge: we can’t pre-compute results for every possible query because the “long-tail” of new and unique searches is effectively infinite, and offline LLM processing is expensive.</p><p>To solve this, we designed a <strong>hybrid system</strong>. A powerful offline process generates high-quality data that serves two purposes: populating a cache for our most common “head” queries and creating the training data for a fast, real-time model that handles the “long-tail.” The system’s flow, shown in the diagram below, is determined simply by a cache-hit.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wJEg1YYYA01rJ4x55mMDVA.png" /><figcaption><em>Fig 3. </em><strong><em>Architecture of the hybrid SRL system.</em></strong><em> Live traffic is routed based on a cache-hit. High-frequency “head” queries are served instantly with cache, while “tail” queries are handled by a real-time, fine-tuned model. The entire system is powered by an offline pipeline that generates data to both populate the cache and train the real-time model</em></figcaption></figure><h4>The Offline System (“Teacher”): Generating High Quality Data at Scale</h4><p>For our high-frequency “head” queries, we run an offline <strong>Retrieval-Augmented Generation (RAG)</strong> and <strong>caching</strong> pipeline. Because latency is not a concern here, we can use complex techniques to ensure the highest possible quality. The core of this is <strong>context-engineering</strong>: enriching the prompt with deep Instacart-specific knowledge.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7r2UQapr0Ri40DIe4cBJfA.png" /><figcaption><em>Fig 4. Overview of RAG pipeline for query tagging. Context-engineering injects Instacart domain knowledge to ground the LLM’s inference and generate far more accurate intent signals. (Note: Brand examples used for illustration are fictitious.)</em></figcaption></figure><p>Consider the query <em>“verdant machine”</em>. Without context, an LLM might assume it’s for machinery. Our offline pipeline, however, automatically enriches the prompt with crucial context from our internal data systems, including:</p><ul><li><strong>Historical Conversion Data:</strong> The top converted brand (<em>MuchPure</em>) and categories (<em>Smoothie Juices</em>).</li><li><strong>Product Catalog Information:</strong> Product brand names with high semantic similarity, ranked by embedding scores.</li></ul><p>Armed with this context, the model correctly infers the users’ intent: they are looking for a smoothie brand. After generation, a post-processing guardrail validates the tags against our catalog. This rigorous process has two critical outputs:</p><ol><li>A low-latency cache containing the validated, high quality tags for our most common queries.</li><li>A high-quality training dataset, which is used to teach a light weight real-time model.</li></ol><h4>The Real-Time System (“Student”): A Fine-Tuned Model for the Long-Tail</h4><p>When a user’s query results in a cache miss (indicating a long-tail query), it is routed to our real-time model. This is a language model with a much smaller backbone (like Llama3–8B) that is fast and cost-effective for live inference.</p><p>Crucially, this model was fine-tuned on the high-quality “curriculum” dataset produced by our offline “teacher” pipeline. By doing this, the smaller model learns to replicate accuracy of its much larger counterpart, along with the domain context we injected. This allows us to deliver a consistent, high-quality experience for virtually any query a user types. This hybrid approach gives us the best of both worlds: the raw power of massive LLMs, and the speed and efficiency of a lightweight, learnable model.</p><h3>Building a New Foundation: Fine-Tuning for Real-Time Inference</h3><p>The success of the real-time “student” model in our SRL system was more than just a win for one project; it proved the viability of a new foundational capability for Instacart: <strong>fine-tuning smaller, open-source models to serve our specific needs at scale</strong>.</p><p>While the SRL system was the first production application, the process of building and deploying this model established a blueprint for future innovation across our platform. Here’s a closer look at how we did it.</p><h3>Distilling Knowledge via Fine-Tuning</h3><p>For the real-time SRL model, we fine-tuned an open-source <strong>Llama-3–8B</strong> model using LoRA (Low-Rank Adaptation). The model was trained on the dataset from the offline “teacher” pipeline. This process effectively distilled the knowledge and nuanced context from the larger model into the smaller, more efficient one.</p><p>The results were remarkable. Our fine-tuned 8B model performs on par with the much larger frontier model it learned from, achieving a similar F1-score with higher precision.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WTWYlNwXI73Fm4EC5wzlqQ.png" /><figcaption><em>Fig 5. Our fine-tuned 8B model achieves performance on par with a much larger foundation model. Compared with baseline (dark blue) ,Our production model (orange) has higher precision 96.4% vs 95.4%, lower recall 95% vs 96.2%, and on-par F1 score 95.7% vs 95.8%.</em></figcaption></figure><h3>The Path to Production: Taming Real-Time Latency</h3><p>Having a great model is only half the battle; serving it in production with a latency target in the low hundreds of milliseconds was a significant engineering challenge. The out-of-the-box latency was nearly 700ms with A100 GPU. We reduced latency through a series of crucial optimizations:</p><ul><li><strong>Adapter Merging &amp; Hardware Upgrade:</strong> Merging the LoRA adapter weights directly into the base model and upgrading to H100 GPUs got us to our 300ms target.</li><li><strong>Quantization Trade-Offs:</strong> We explored quantization (FP8), which cut latency by another 10% but with a slight drop in recall. We deployed the unquantized model to prioritize quality.</li><li><strong>Cost Management:</strong> We enabled GPU autoscaling to run on less GPUs during off-peak hours, reducing costs without compromising performance.</li></ul><p>A/B testing confirmed the success: the real-time LLM meaningfully improved search quality for the bottom 2% of queries. With the new SRL tagging for the tail queries, we reduce “average scroll depth” by 6% (users find items faster), with only a marginal latency increase. The system is now live, serving millions of cold-start queries weekly and reducing user complaints related to poor search results for tail queries by 50%.</p><h3>Key Takeaways</h3><p>Here’s what we learned from putting LLMs into our production search system:</p><ul><li><strong>Context is the Defensible Moat</strong>: A generic LLM is a commodity; your business context is what makes your application defensible, because domain knowledge is the most valuable asset. It’s vast, noisy, and dynamic. It includes everything from user engagement signals (<em>what products are actually purchased after a search?</em>) to real-world constraints (<em>what’s on the shelf at a specific store right now?</em>). In the past, injecting this data into traditional ML models was difficult and brittle. The central challenge today is how to effectively encode this knowledge into an LLM. Through our work, we found a clear hierarchy of effectiveness, each with its own engineering trade-offs: <strong>Fine-tuning &gt; Context-Engineering (RAG) &gt; Prompting</strong>. Each method progressively transforms a generalist model into a true domain expert.</li><li><strong>Start Offline, Go Real-Time Strategically</strong>: To manage costs and prove value, we began with an offline LLM pipeline on high-frequency “head” queries. This cost-effective approach handled the bulk of traffic and generated the data needed to later train a “student” model for the long tail.</li><li><strong>Consolidate, Don’t Complicate</strong>: We simplified our stack by replacing numerous legacy models with a single LLM backbone, reducing maintenance and accelerating development.</li><li><strong>The Model is Only Half the Battle</strong>: A great model is useless if it can’t serve traffic at scale. We turned potential into impact through crucial production engineering: adapter merging cut latency by 30%, smart caching meant only 2% of queries needed real-time inference, and GPU autoscaling managed costs effectively.</li></ul><p>Ultimately, this journey has armed us with more than just a more intelligent QU system; it has laid a new foundation for the future of eCommerce search. Looking ahead, we are expanding beyond single-query search to build a smarter, context-aware system. This means building a system that can understand a user’s entire journey and distinguish between complex intents — differentiating a search for <em>“lasagna ingredients”</em> (item search) from a query for a <em>“quick lasagna recipe”</em> (content discovery) or a request for <em>“lasagna delivery near me”</em> (restaurant search). By understanding this context, we can guide users to the perfect experience, creating a seamless journey across all of Instacart’s offerings.</p><p><strong>Acknowledgments</strong></p><p>This project required the collaboration of multiple teams across the company including ML, backend and infra teams to be realized. Special thanks to<strong> Taesik Na, Tina He, Akshay Nair, Xiao Xiao, Mostafa Rashed, Kevin Lei, Callum Wood</strong>, <strong>Sudha Rani Kolavali</strong> and <strong>Jonathan Bender</strong> who also contributed to this work and made this vision a reality. I’d also like to thank <strong>Naval Shah, Jane Ross </strong>and <strong>Eric Hacke</strong> for their thoughtful and thorough review of the blog post.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3ac8051ae7ac" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/building-the-intent-engine-how-instacart-is-revamping-query-understanding-with-llms-3ac8051ae7ac">Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Simplifying Large-Scale LLM Processing across Instacart with Maple]]></title>
            <link>https://tech.instacart.com/simplifying-large-scale-llm-processing-across-instacart-with-maple-63df4508d5be?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/63df4508d5be</guid>
            <dc:creator><![CDATA[Paul Baranowski]]></dc:creator>
            <pubDate>Wed, 27 Aug 2025 17:19:31 GMT</pubDate>
            <atom:updated>2025-08-27T17:19:29.961Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8tuaKKHLFzNcy__Ys9pCBw.png" /></figure><p>At Instacart, we’re powering a grocery platform that cuts across millions of items, customers, and deliveries thanks to the application of LLMs at scale. In this post, we’ll detail how LLMs are embedded in critical workflows across the company and span teams and domains. Our engineering teams leverage them to clean up catalog product data, enrich item listings with detailed attributes, assist in routing perishable goods, and even improve search relevance.</p><p>For example, the Catalog team uses LLMs to detect and fix data errors, like inaccurate sizes causing tax and pricing losses for Instacart, while enriching listings with new attributes to aid user choices. The Fulfillment team leverages LLMs to help identify perishable items that need special handling during fulfillment. And the Search team trains advanced ranking models with LLMs, enhancing relevance through better query understanding and personalization for superior item matches.</p><p>The challenge? Many of these tasks require <em>millions</em> of LLM calls — and real-time LLM provider APIs just aren’t designed for that scale. To address this, we built Maple, a service that makes large-scale LLM batch processing fast, cost-effective (saving up to 50% on LLM costs compared to standard real-time calls), and developer-friendly. With Maple, teams across Instacart can process millions of prompts reliably and efficiently, unlocking new workflows and accelerating development without needing to reinvent infrastructure.</p><h3>Why We Built Maple</h3><p>The Catalog team had been developing AI pipelines for some time, and it was becoming increasingly clear that we needed better tooling to support our growing needs. Real-time LLM calls were frequently rate-limited, forcing us to throttle requests to stay within token limits. This introduced delays and made it difficult to keep up with demand. At the same time, multiple teams across the company were independently writing similar code to handle their AI workflows, leading to duplicated effort and fragmented solutions. Our existing pipelines also lacked reusability, as any new use case typically required modifying the underlying code; and as our workloads scaled, we became more conscious of cost and efficiency.</p><p>To address these challenges, we decided to build Maple, a reusable service designed to streamline LLM-based workflows. This service was built not just for our team, but for anyone at the company. However, working with the LLM provider’s batch system interface brought its own complexities. Each batch is limited to 50,000 prompts or 200MB, so large jobs, like those with a million prompts, require at least 20 separate batches. Unlike real-time calls, batch workflows have a more involved process: encoding requests in a specific format, uploading them, monitoring job status, downloading result files, parsing them, and retrying failed prompts in new batches. Without shared tooling, each team would be forced to implement this workflow from scratch. Maple solves this by abstracting the complexity and offering a consistent, reusable foundation for scalable AI work.</p><h3>How Maple Works</h3><p>Maple accepts a CSV or Parquet file and a prompt as input and delivers an output file with the input merged with the AI response. It automates:</p><ul><li><strong>Batching</strong>: Splits large input files into smaller batches.</li><li><strong>Encoding/Decoding</strong>: Automates conversions to and from the LLM batch file format.</li><li><strong>File Management</strong>: Automates input uploads, job monitoring, and result downloads.</li><li><strong>Retries</strong>: Ensures failed tasks are retried automatically for consistent outputs.</li><li><strong>Cost tracking:</strong> Tracks detailed cost usage for each team.</li></ul><p>By managing these steps, Maple eliminates the need for teams to write custom batch processing code, significantly enhancing productivity.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*bgjKL1vvuWvZ7asY" /></figure><h3>Where Maple Fits in the AI Stack</h3><p>Maple sits at the center of Instacart’s large-scale LLM processing pipeline, serving as the orchestration layer between internal teams and external model providers.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*PuBe4xFiDfgrs_ZW" /></figure><p>Multiple Instacart internal applications — ranging from product enrichment pipelines to ML training workflows — send prompt data to Maple, which handles all aspects of batching, job coordination, retries, and result merging. From there:</p><ul><li>Maple proxies requests through our AI Gateway, another internal Instacart service which acts as a centralized abstraction layer for communicating with multiple LLM providers.</li><li>The AI Gateway is also responsible for integrating with the Cost Tracker, logging detailed usage and spend per job and team.</li><li>Finally, prompt batches are dispatched to external LLM providers, and results flow back through the same path.</li></ul><p>This architecture ensures that teams don’t need to worry about how to call various LLMs and prompt jobs are fully traceable, cost-monitored, and fault-tolerant. By abstracting away complexity, Maple enables faster iteration and experimentation while enforcing consistency and cost controls across the company.</p><h3>Under the Hood</h3><p>Maple was designed for scalability and fault tolerance using a combination of modern tools:</p><ol><li><a href="https://temporal.io/product"><strong>Temporal</strong></a>: Ensures guaranteed completion of long-running tasks. Even if exceptions occur, Temporal’s fault tolerance safeguards data integrity and guarantees job completion.</li><li><strong>RPC API</strong>: Provides a streamlined interface for submitting jobs and tracking progress.</li><li><strong>Efficient Storage</strong>: Inputs and outputs are stored in S3, avoiding costly database operations. This approach is not only cheaper but also allows handling large datasets.</li></ol><p>Implemented in Python, Maple uses PyArrow to efficiently process input files. Large CSV files are split into smaller Parquet batch files, and stored on S3 to avoid costly database usage. Parquet is an efficient file format for data table storage, where out-of-the-box compression reduces file sizes up to 25x compared to CSV. It also allows non-linear access into the file, making data access extremely fast. These batches are converted into the LLM provider batch input format, respecting the max file size cap, which can reduce prompt counts for large prompts.</p><p>Maple uploads each file, polls for completion, and when complete, downloads results to S3. It then matches responses to inputs, creating per-batch Parquet result files. Finally, all batch results are combined into a single output file, mirroring the input format.</p><h3>LLM Batch Processing: How fast is it?</h3><p>Batch LLM providers say that we should expect results to be returned within 24 hours. In our experience, most of the time batches return relatively quickly, but there are rare periods where there is an obvious delay in processing. We have compiled stats from a sample set of ~580 batches with 40–50K tasks per batch, with most batches having 50K. Below are the real-world results from processing these batches.</p><h4>Prompt Processing Speed</h4><p>The histogram below shows that for a given batch with 40K-50K prompts, how many prompts per second were processed. LLM batches average 2.6 tasks per second. Note that processing time can vary based on the prompt, especially when including images with the prompt, which is a common case for us. We can see that most batches are clustered between 1–4 prompts per second:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*OW83wGq66-LEwCZf" /></figure><h4>Batch Completion Time</h4><p>In the histogram below, we can see that most batches complete in under 12 hours, with occasional batches that take almost a full day:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*nvi5bZhg5_tcXE8b" /></figure><h4>Job Size vs Time</h4><p>In the scatter plot below, we can see that the time to complete a batch job increases with the number of tasks in the job, as you would expect. Note the log scale on the Y-axis.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*-NiWqoXGy27ZfU8H" /></figure><h3>Lessons Learned</h3><p>Building Maple to handle tens of millions of prompts reliably and efficiently required a lot of iteration. Here are some of the key lessons we learned while scaling the system and making it robust enough for widespread use across Instacart.</p><h4>Optimizing for High-Volume Workloads</h4><p>As our internal clients sent larger and larger input files, we hit storage, memory, and processing limitations. To address this:</p><ol><li>We moved task data storage from the database to S3 using Parquet files, which improved load/save speed and reduced cost.</li><li>We adopted stream-based processing to minimize memory consumption when handling large files.</li><li>We replaced some of Python’s built-in libraries, such as the ‘json’ library with ‘orjson’, a faster and more memory-efficient alternative.</li></ol><p>These optimizations allowed Maple to scale efficiently to 10M+ prompt jobs.</p><h4>Ensuring Reliable Execution</h4><p>Running batch jobs at this scale means errors are inevitable — network issues, provider failures, or bugs can happen mid-run. We use the Temporal durable execution engine to ensure that jobs can resume exactly where they left off without losing any work. This not only protects against data loss but also avoids wasting money on partially completed jobs.</p><h4>Handling Failure Modes Gracefully</h4><p>LLM providers return various types of task-level failures, each requiring tailored handling logic:</p><ol><li><strong>Expired</strong>: Occasionally, the LLM provider fails to return results within 24 hours and it will return an error/expired message with no result for that task. Maple will retry these infinitely by default by constructing a new batch with the failed tasks.</li><li><strong>Rate limited</strong>: When we hit the LLM provider token limit, it will return an error message with no result for that task. We retry these infinitely by default.</li><li><strong>Refused</strong>: The LLM provider can just refuse to execute your request — can be because of bad parameters, or the image/prompt being filtered (ie. unacceptable). We retry these for a max of two times by default, because it will probably return the same result.</li><li><strong>Invalid images</strong>: Sometimes the requests have an image URL that is invalid. The endpoint might not exist or the image might not be available. In this case, your request could fail. We provide an option to retry these, but the second time around, Maple will check if the image exists before sending it back to our LLM provider. We don’t do this the first time around because checking each image in a large batch can add significant overhead.</li></ol><h4>Building a Robust Foundation</h4><p>These lessons helped shape Maple into a resilient, high-throughput batch processing system. By building in fault tolerance, efficient processing, and robust failure handling, we enabled any team at Instacart to run massive LLM workloads without needing to build their own infrastructure — or worry about what happens when things go wrong.</p><h3>Extending Maple to Additional LLM Providers</h3><p>Not all LLM providers offer a batch interface, some only support real-time APIs. As internal teams requested access to these additional providers, we extended Maple to abstract the complexity of handling large-scale real-time prompts while maintaining our simple CSV input/output interface.</p><p>Behind the scenes, we implemented automatic parallelization, exponential backoff on rate-limited requests, intelligent retry policies, and failure tracking — all the same operational maturity we applied to batch file-based workflows. Later on, if a provider starts offering a batch interface, we can switch it over seamlessly without our users needing to do anything.</p><p>Teams no longer need to write custom scripts or pipelines to handle bulk real-time calls. Instead, they can use the same Maple interface, and the underlying platform will handle the complexities of interacting with real-time APIs at scale. Enabling real-time support also had the additional benefit of making small batches complete more quickly, which is important for ops-related tasks when they are iterating on a problem.</p><h3>Adoption and Impact</h3><p>Maple has become part of the backbone of our AI infrastructure at Instacart, with several different teams leveraging it as a universal bulk LLM provider. It has dramatically reduced costs by automating repetitive, manual tasks, freeing up teams to focus on higher-impact work. Many processes have been reduced from hundreds of thousands of dollars per year to just thousands of dollars per year.</p><p>Maple democratizes access to bulk LLM prompt processing at Instacart. Teams can now explore new ideas, automate repetitive work, and ship faster — without becoming LLM infrastructure experts. By simplifying bulk prompting, it accelerates innovation, lowers costs, and ultimately provides new features to our customers.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=63df4508d5be" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/simplifying-large-scale-llm-processing-across-instacart-with-maple-63df4508d5be">Simplifying Large-Scale LLM Processing across Instacart with Maple</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AI-Driven Development at Instacart: Scaling Impact and Increasing Velocity]]></title>
            <link>https://tech.instacart.com/ai-driven-development-at-instacart-scaling-impact-and-increasing-velocity-43f6b3902a32?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/43f6b3902a32</guid>
            <dc:creator><![CDATA[Riddhima]]></dc:creator>
            <pubDate>Thu, 21 Aug 2025 16:00:12 GMT</pubDate>
            <atom:updated>2025-08-21T16:00:12.111Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LpC4kkfRPoWbfzv71gY4nA.png" /></figure><p><strong>Key Contributors</strong>: John Stuppy, Min Kim, Laimonas Turauskas, Nathan Marks, Ben Bernard</p><p>At Instacart, artificial intelligence (AI) is an accelerant reshaping how we build, scale, and innovate. Whether it’s fostering a company-wide culture of AI-assisted engineering, or bringing a product like <a href="https://www.instacart.com/company/updates/introducing-fizz-the-best-way-to-order-drinks-and-snacks-as-a-group/">Fizz</a> from idea to launch in record time, we’ve embraced AI to work smarter and faster.</p><p>This blog explores how AI has unlocked productivity gains across individual projects and at organizational scale, the lessons learned along the way, and how these insights are influencing our engineering culture.</p><h3>From Grassroots Adoption to Scaled Enablement</h3><p>Our AI journey started with individual engineers experimenting, from writing unit tests to navigating legacy services, with AI copilots becoming their powerful assistants. Informal demos, Slack threads, and “how I use AI” discussions began organically surfacing across teams.</p><p>But to make AI adoption consistent and applied in a thoughtful, durable way, we needed a structured approach.</p><h3>Case Study: Project Tomato — Accelerating Workflows with AI</h3><p>One of our most pivotal internal initiatives has been <em>Project Tomato</em>, a focused effort to explore how AI can meaningfully accelerate engineering workflows in real-world scenarios. We called it <strong>Project Tomato</strong> because, like fruit in a garden, some AI use cases are ripe for immediate use, while others need more time to mature.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*R-Y24OQ_Hm4v-YJC" /></figure><p>Born within our Commerce organization, Project Tomato explored practical use cases across the engineering lifecycle. From outlining an ERD and generating code to debugging, analyzing logs, and ensuring test coverage, engineers embedded AI tools like <a href="https://tech.instacart.com/unlocking-efficiency-how-ava-became-our-ai-productivity-partner-f1a560686361">Ava</a>, <a href="https://cursor.com/en">Cursor</a>, and <a href="https://www.glean.com/">Glean</a> directly into their day-to-day tasks. What we uncovered was a set of high-leverage patterns (ripe tomatoes!) and equally valuable constraints (unripe tomatoes).</p><h3>✅ Ripe Tomatoes</h3><p><strong>🚀 <em>AI as a Productivity Companion</em></strong></p><p>One of the early wins came from using agentic AI coding assistants to automate repetitive and often neglected tasks. For instance, when tasked with cleaning up stale feature flags or deprecated logic, the AI agent not only identified unused paths but rewrote the class with precision. We cleaned up <strong>15+</strong> feature flags from a single service just with the use of an AI coding agent in a single PR. Refactoring code, a task often deferred, became faster and safer.</p><p>🖼️ <strong><em>Design-to-Execution Made Faster</em></strong></p><p>We also saw success in bridging the design-to-dev handoff. Using AI coding assistant features like image upload, engineers uploaded Figma screenshots of UI components. The AI interpreted layouts and generated usable scaffolds, turning static mocks into functional components with impressive fidelity. This accelerated the build process while preserving designer intent.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CPovrvzm6JPvu3IbVuO4RA@2x.png" /><figcaption>Example: Carebot’s figma design screenshot to code</figcaption></figure><p>🐛 <strong><em>Smart Debugging</em></strong></p><p>AI can help debug complex issues by analyzing code and dependencies. For example, we used our AI coding agent to identify why certain error codes weren’t returned from some cross-service API calls. The AI pinpointed that enabling a specific parsing option in one of our libraries will resolve the issue. We also used an AI agent to identify “N+1” queries in slow performing code and the agent also successfully pinpointed the core issue.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*-S1c4mxiuv3MEFyh" /><figcaption>Example: Ava helping debug N+1 queries</figcaption></figure><p><strong>✍️ <em>Planning and Architecture Support</em></strong></p><p>AI wasn’t limited to coding. In redesigning parts of our invoicing system, engineers worked with our AI coding agent to brainstorm ERDs for migrating tax calculation logic from legacy flows. The back-and-forth interactions generated diagrams, structured schemas, and code snippets — all while allowing engineers to maintain design ownership and judgment.</p><p>🚨 <strong><em>RCA Handling</em></strong></p><p>AI also proved valuable during incident response. With tools like Ava and Glean, teams were able to rapidly analyze logs and stack traces, draft root cause narratives, and even auto-generate “<a href="https://en.wikipedia.org/wiki/Five_whys">Five Why’s</a>” to feed into RCA documentation. The AI coding assistant then helped implement follow-up action items, tightening the loop between identification and resolution.</p><p><strong>🏗️ <em>Structuring the Environment for Success</em></strong></p><p>One of the most consistent learnings was just how much the development environment impacted AI performance. AI coding agents were most effective in modular, well-annotated workspaces. Teams that created scoped workspaces in their IDE focused on a single service instead of a big repository saw improved indexing speed, smarter suggestions, and more reliable outputs.</p><p><strong>🤖 <em>Task-Specific Model Thinking</em></strong></p><p>As engineers gained hands-on experience, they began treating different LLM models like specialized teammates. For example, earlier in our AI adoption journey Claude-3.7 was preferred for planning-oriented tasks like designing ERDs or generating flow diagrams, while Claude-3.5 shined when asked to generate, refine, or review code. Other models were chosen for log analysis or QA tasks. While the specific models have changed as newer, more capable ones become available, the approach remains the same: align each model’s strengths with the right task profile to unlock more accurate and context-aware results, effectively treating LLMs as expert “personas” for different phases of development.</p><p><strong>⚒️ <em>Prompt Engineering as a Skillset</em></strong></p><p>Another key insight was just how much prompt quality affects AI output. Prompt structure, clarity, and context layering significantly influenced the quality of AI output. Engineers who adopted structured prompting techniques such as clearly scoping tasks, reusing successful phrasing, or chaining context, consistently achieved better outcomes. This resulted in an organic sharing of techniques across teams and even led to the creation of internal prompt engineering playbooks. As a result, prompt crafting is increasingly seen as a core skill in the AI-assisted developer’s toolkit.</p><h3>❌ Unripe Tomatoes</h3><p><strong>⛰️ <em>Context Challenges in Monolith</em></strong></p><p>Of course, not every task was ready for automation. In legacy systems like a monolith codebase, AI coding agents struggled to navigate large, interconnected code. Without clear boundaries or modular organization AI had to process too much unrelated context which led to slower suggestions, incomplete answers or even hallucinated code. In contrast, smaller and well-scoped codebases provided a cleaner working context which resulted in better outcomes.</p><p><strong>📁 <em>Large Files</em></strong></p><p>Even in a well structured codebase, file size itself could be a bottleneck and could lead to incomplete or inconsistent outputs. For example, with basic prompts and minimal guidance, the AI coding assistant successfully converted protobuf option comments to standard // notation in a 500-line file. However, when the same task was attempted on a 5,000-line file (10x larger), the tool repeatedly failed to complete it.</p><p><strong>🔄 <em>Code Translation Without Transformation</em></strong></p><p>AI streamlines the translation of existing code into new languages, but it can also carry over flawed logic without addressing it. Our team encountered this during a system rewrite. The experience reinforced the importance of not blindly trusting autogenerated code and of providing clear, explicit instructions before initiating a rewrite.</p><p>Project Tomato helped us spot where AI shines within our engineering work. What became clear across all of these use cases is that the real unlock was in learning how to use these tools effectively in a complex codebase. AI is like a racecar: with the right skills and a well-paved road, it can achieve incredible speed. But in the jungle of a complex codebase, you first have to learn to drive and clear a path. The lessons we’ve learned continue to shape how we design AI workflows, support teams, and think about productivity at scale. What began as a small experiment is now guiding us how we build across the company.</p><h3>Case Study: Fizz — Building Fast and Smart</h3><p>While Project Tomato focused on internal engineering enablement and workflow transformation, looking at Fizz as a case study offers a complementary perspective: how AI can accelerate a product from a seed idea to a fully launched customer experience.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*x-JqNRzzd0tN1Jol" /></figure><p>A small product group at Instacart had an idea for a new group ordering app for drinks and snacks. Thanks to our AI-first mindset, we went from concept to customer-ready in just a few short months — something that would’ve taken far longer a year ago. What began as a hackathon prototype became a fully functional product through extensive use of AI-assisted workflows.</p><p><strong><em>AI at Every Step: From Prototypes to Polish</em></strong></p><p>Leveraging AI, our team autogenerated UI scaffolding from Figma mocks and rapidly built out complex client-side systems. Tasks that traditionally took several days — such as creating a dynamic search experience for iOS — were completed in hours, guided by intent and constraints.</p><p><strong><em>Code Reviews and Automation</em></strong></p><p>AI tools like Cursor reduced the burden of code reviews while boosting confidence in debugging and refactoring. Mock data, documentation, and test generation were also streamlined, cutting down repetitive tasks.</p><p><strong><em>Final Refinement</em></strong></p><p>AI even supported creative tasks like generating notification copy. Despite its limitations in precise visual details (e.g., animations and transitions), it allowed us to focus engineering time on the user experience.</p><p>Building Fizz showed us just how much AI can accelerate development when used thoughtfully, and where human insight still plays a critical role. We’re continuing to refine our approach as we build products at Instacart.</p><h3>Best Practices for AI-First Development</h3><p>Through our experiences with both Fizz and Project Tomato, we’ve surfaced a set of practices that help teams get the most out of AI-assisted development and not just in theory, but in production reality.</p><p><strong>🧑‍🤝‍🧑 <em>Use AI as a teammate — not a replacement</em></strong></p><p>Think of AI as a capable but inexperienced engineer. Like any teammate, it needs clear instructions and context to perform well. We found that the best results came when engineers framed tasks precisely, reviewed output critically, and iterated in tight feedback loops. Structured prompts and layered context often made the difference between success and confusion.</p><p><strong>🌱 <em>Start small to build trust</em></strong></p><p>Adopting AI doesn’t require a full rewrite of your workflow. The most effective teams began with lower-risk tasks like generating test stubs, adding comments, or drafting boilerplate before extending AI into deeper parts of the codebase. These early wins built confidence and helped engineers develop a feel for the tools’ strengths and boundaries.</p><p><strong>🧠 <em>Context is everything</em></strong></p><p>The quality of AI output is directly tied to its understanding of the codebase. Modular, well-annotated services with predictable patterns yielded dramatically better results. Conversely, large legacy systems written in untyped languages often created too much ambiguity for AI agents to navigate reliably. Creating clean workspaces, isolating services, and scoping tasks made a meaningful difference.</p><p><strong>🎯 <em>Apply AI strategically — not universally</em></strong></p><p>AI doesn’t need to be used everywhere to be effective. We saw the greatest ROI when applying it to high-leverage, well-scoped tasks like prototyping UI components, scaffolding backend services, comparing logs, and writing tests. For complex architectural work or nuanced design decisions, human expertise still leads the way.</p><p><strong>📊 <em>Measure impact to focus your efforts</em></strong></p><p>Whenever possible, we tried to quantify the value. For example, in the development of Fizz, we observed up to 20% time savings on frontend workflows like rendering web UI and integrating client logic. These insights help us focus our efforts and prioritize where AI adds the most value, and where human experience is irreplaceable.</p><h3>Structuring Success: Workshops, Playbooks, and Mindset Shifts</h3><p>These efforts gave us a set of learnings but to scale their impact, we had to turn insight into infrastructure. We created structured learning programs for the whole company like bootcamps and playbooks to help all functions, not just engineers, adopt and adapt AI tools thoughtfully.</p><ul><li><strong>Bootcamps:</strong> Hands-on sessions that guide teams through real use cases, from writing infrastructure scripts to generating QA test cases.</li><li><strong>Playbooks:</strong> Living documentation capturing prompt engineering tips, model selection strategies, and feedback techniques.</li><li><strong>Cross-functional involvement:</strong> We’re embedding AI across product, design, and data science teams because engineering isn’t the only place where velocity matters.</li></ul><p>This systematization ensures that AI adoption isn’t left to chance. It’s built into our culture of continuous learning.</p><p>While we’ve explored multiple tools, we believe the tools themselves are just one part of the story. The real unlock comes from a shift in mindset. We’re investing in the idea of the “AI-enabled engineer” who sees AI as a creative partner. We’re fostering a culture where experimentation is encouraged, and engineers are empowered to lead the next wave of productivity.</p><h3>Final Thoughts</h3><p>Instacart’s AI journey is still unfolding, but we’re establishing systems, practices, and mindsets that allow AI to thrive as part of our engineering craft and rethink how we build.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=43f6b3902a32" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/ai-driven-development-at-instacart-scaling-impact-and-increasing-velocity-43f6b3902a32">AI-Driven Development at Instacart: Scaling Impact and Increasing Velocity</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Scaling Catalog Attribute Extraction with Multi-modal LLMs]]></title>
            <link>https://tech.instacart.com/multi-modal-catalog-attribute-extraction-platform-at-instacart-b9228754a527?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/b9228754a527</guid>
            <category><![CDATA[large-language-models]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[multimodal-learning]]></category>
            <category><![CDATA[prompt-engineering]]></category>
            <dc:creator><![CDATA[Shih-Ting Lin]]></dc:creator>
            <pubDate>Fri, 01 Aug 2025 16:53:58 GMT</pubDate>
            <atom:updated>2025-08-01T17:04:01.609Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JTe7UDeHg1O6zM5RRSpxZA.png" /></figure><p><strong>Key Contributors</strong>: Shishir Kumar Prasad, Matt Darcy, Paul Baranowski, Sonali Parthasarathy, DK Kwun, Peggy Men, Talha Maswala</p><p>When you search for almond milk, are you looking for something unsweetened? Organic? Vanilla-flavored? The answer likely depends on your preferences, dietary needs, and even your household habits.</p><p>At Instacart, many of the items in the product catalog are specified by structured product data known as attributes — such as flavor, size / volume, fat content, and more. These attributes are more than just labels — they form the invisible infrastructure behind a smooth and personalized shopping experience. They power helpful features like narrowing your search, choosing different sizes or flavors, and highlighting badges that call out key details, such as ‘Gluten-Free’ or ‘Low Sugar’. Together, these capabilities enable Instacart’s <a href="https://www.instacart.com/company/updates/introducing-smart-shop-personalization/">Smart Shop</a> experience — helping customers find what they need faster and more easily.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3EMhVKjeeTAxWJHB6ywOMg.png" /><figcaption>Figure 1. Examples of attribute-driven experiences at Instacart</figcaption></figure><p>Supporting these experiences at scale requires a robust attribute creation system — one that can deliver high-accuracy, high-coverage attribute data across a vast product catalog. With millions of SKUs across thousands of categories, the system must handle a wide variety of attributes with different requirements: attributes like <em>sheet count</em> require numeric reasoning, while others like <em>flavor</em> involve long-tail or novel values that evolve over time. Achieving high coverage also demands extracting information from multiple product data sources, such as titles, descriptions, and images, since no single source is consistently complete. These challenges have historically resulted in slow development cycles and inconsistent attribute quality — underscoring the need for a more scalable and efficient solution.</p><p>In this blog post, we discuss how Instacart is leveraging Large Language Models (LLMs) to tackle attribute extraction challenges at scale. We’ll introduce PARSE, Product Attribute Recognition System for E-commerce, our self-serve, multi-modal platform for LLM-based catalog attribute extraction, and share how it works, why it’s effective, and what we’ve learned from applying it across millions of products.</p><h3>The limitation of pre-LLM Attribute Creation Approaches</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*aJOWEcoFxRl4ba_dj8j5Vw.png" /><figcaption>Figure 2. Challenging Attribute Extraction Example<br>Identifying the primary flavor (‘Orange’) for the item above is difficult for our previous attribute creation system, since the description also includes flavors of other variants (e.g., Grape, Strawberry). Extracting the correct flavor here requires specific contextual understanding that is often missing or hard to learn in traditional approaches.</figcaption></figure><p>Prior to leveraging LLMs, attribute creation at Instacart relied heavily on either SQL-based rules or traditional text-based machine learning models. But these methods come with notable constraints.</p><p>SQL-based approaches, while scalable, are limited in quality — they’re effective for attributes extractable through simple rules, like identifying ‘organic’ claims via keywords, but struggle with more complex cases that require contextual understanding, as illustrated in Figure 2.</p><p>ML models can handle greater complexity thanks to their generalization capabilities. However, achieving high-quality results for each attribute requires significant effort — from collecting and labeling specialized datasets to developing, training, and maintaining separate models and pipelines for every attribute of interest. This leads to a slower, more resource-intensive process as the catalog and attribute set grow. Both approaches also share a key limitation: they operate only on product text, leaving important gaps when attribute information is available solely in product images.</p><p>These limitations underscored the need for a new approach — one that could deliver high-quality attribute data at scale, support both text and image inputs, and minimize redundant engineering effort.</p><h3>PARSE — LLM based Multi-modal Catalog Attribute Extraction Platform</h3><p>To address these challenges, we built PARSE — a scalable, self-serve platform that uses multi-modal LLMs to automate attribute extraction. PARSE allows Instacart teams to extract accurate product attributes from both text and images, significantly reducing development time and engineering overhead. With zero-shot and few-shot capabilities of LLMs, teams can quickly configure and launch new attributes without building separate pipelines. Multi-modal support helps close quality gaps when information is only available in product images. And with a user-friendly interface, teams can rapidly iterate on prompts and evaluate results — all without writing custom code.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*V-z4piN-3Sh0E1l3G06o8w.png" /><figcaption>Figure 3. Overview of the PARSE platform</figcaption></figure><p>As shown in Figure 3, PARSE consists of four main components that together automate the end-to-end attribute extraction workflow — from retrieving input product data, to running the LLM-based extraction, to managing quality, to ingesting the final results into the catalog.</p><p>To extract an attribute, teams first use the platform in “development mode” to experiment with different models, prompts, and input sources. Once a working configuration is found, it can be deployed to production, where it runs automatically across the catalog and feeds results back into the catalog data pipeline. In the following sections, we’ll walk through each component in more detail.</p><h4>Platform UI</h4><p>The Platform UI component allows clients to individually configure each step of an attribute creation task, including input data fetching and LLM extraction. Specifically, users will input the following configurations:</p><ul><li>Define an attribute to extract by setting the attribute name, type, and description. Some example attribute types are string, dictionary, number and boolean.</li><li>Determine attribute extraction parameters, including the choice of LLM extraction algorithm, its required parameters, and the prompt template.</li><li>Input product data SQL to define which product features will be fed to the LLM for attribute extraction and how to retrieve them from the database.</li><li>Optionally provide few-shot examples to help LLM follow the instructions in the prompt.</li></ul><p>All of the configurations are versioned, allowing users to track changes, identify contributors, and revert to previous configurations if necessary.</p><p>Once the configuration is done, a backend orchestration layer will fetch the product data by the input SQL, and send them along with other input parameters to the subsequent components to execute the extraction.</p><h4>ML Extraction endpoint</h4><p>This component is responsible for executing an LLM-based attribute extraction for each product fetched by the input SQL. Specifically, given a product, the endpoint first constructs LLM extraction prompts by inserting product features and attribute definitions into the input prompt template. Then it uses the selected LLM extraction algorithm to extract the attribute value for the product. In addition, we also obtain a confidence score for the extracted attribute value.</p><p>To accommodate different extraction use cases and balance between cost and accuracy, the endpoint supports different extraction algorithms for the clients to choose from:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nqdYhH5kfwm_49OtH-TuxA.png" /></figure><p>To obtain a confidence score for an extracted attribute value, we use a self verification technique as follows, which has proven to be useful in literature, such as in <a href="#f7d6">[2]</a>.</p><ul><li>We query the LLM with a second scoring prompt. The prompt will ask LLM to do an entailment task: asking LLM if the extracted attribute value by the extraction prompt is correct based on the product features and attribute definition.</li><li>In the scoring prompt, we specifically ask LLM to output “yes” or “no” first. Then we can get the logit of the first generated token, and compute the token probability of “yes” as the confidence score.</li></ul><p>The confidence score here will be useful later for improving the quality of the extracted attributes. For example, if an attribute value is with a low confidence score, we can send it to humans for review.</p><h4>Quality Screening</h4><p>The final component provides a framework for quality evaluation in both development and production mode:</p><p><strong>Development</strong></p><ul><li>Here, clients share a small sample of products to the PARSE platform, with the goal of determining a quality assessment of the extraction results so we can decide if further iteration is required.</li><li>The component provides a human evaluation interface that allows human auditors to label the gold extracted values and compute the quality metrics for the extraction results.</li><li>We also incorporate LLM auto evaluation (LLM-as-a-judge) to speed up the evaluation.</li></ul><p><strong>Production</strong></p><ul><li>We have a human-in-the-loop quality assessment and error correction for the attribute extraction results in production.</li><li>First, the component creates a sample set periodically from the attribute extraction results of new products, and has it evaluated by either human auditors or LLM evaluation. This can help monitor if there is a quality drop that requires attention.</li><li>In addition, the component also runs a proactive error detection. This process considers the extracted values of products with a low confidence score as potentially incorrect values, and has them reviewed and corrected by human auditors.</li><li>The final extraction results are passed into the catalog data pipeline for ingestion.</li></ul><p>These components of the PARSE platform collaboratively work to improve our attribute creation process, effectively addressing past limitations by incorporating multi-modal LLMs alongside advanced automation capabilities.</p><h3>PARSE in Practice — Insights</h3><p>Now that we’ve begun to apply PARSE in our attribute extraction efforts, we would like to share a few interesting learnings and insights about how PARSE improves our work in practice.</p><h4>Multi-modal Reasoning with LLMs Enables Robust Attribute Extraction</h4><p>One of the most significant advancements we’ve seen with PARSE is the power of LLMs to reason across multiple sources of product information. Our experience shows that LLMs’ multi-modal reasoning abilities allow us to flexibly extract attributes from images, text, or both — depending on what information is available — greatly improving both accuracy and coverage.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ujKvkBaZF1utJ-GQOXc5TA.png" /><figcaption>Figure 4. Example dry sheet product and its product information in the catalog.</figcaption></figure><p>For instance, consider the challenge of extracting the sheet_count attribute from a household product. In some cases, the sheet count is clearly visible on the product image (“80 sheets” on the packaging in figure 4), while the accompanying text lacks this detail. Here, PARSE’s multi-modal LLM easily identifies and extracts the value directly from the image — something rule-based, text-only, or even text-only LLM systems would consistently miss.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*v-uRyyqBcsH7C1ppmiqm-A.png" /><figcaption>Figure 5. An example of a multi-pack product that is hard to get correct sheet_count from traditional models.</figcaption></figure><p>However, there are also cases where only textual clues are available, and the relevant information isn’t stated explicitly. For example, a product description might read: “3 boxes of 124 tissues” (see Figure 5). Even though “total sheet count” isn’t directly mentioned, the LLM can use its reasoning abilities to extract the pack count and sheets per pack, perform the necessary multiplication, and output the correct total. This kind of logical deduction from unstructured text was previously challenging for traditional approaches.</p><p>In many cases, the LLMs can even cross-reference both the text and image, using details from one to verify or supplement the other — making the extraction even more reliable.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yepP2hxdsZrKsh6Q7IzHEg.png" /><figcaption>Figure 6. sheet_count quality comparison between different attribute extraction methods.</figcaption></figure><p>To quantify these improvements, we ran experiments comparing SQL-based, text-only LLM, and multi-modal LLM methods on the sheet_count attribute. As shown in Figure 6, the results were clear:</p><ul><li>Text-only LLMs already delivered a significant jump in both recall and precision compared to legacy SQL approaches, thanks to their ability to reason through complex or implicit product descriptions.</li><li>Multi-modal LLMs further increased recall by 10% over text-only models, since they could pull in image-based cues when available — capturing cases where key details appear solely on packaging or where cross-referencing both sources is necessary.</li></ul><p>In other words, our LLM-powered platform can adapt to the available information, intelligently combining both text and image inputs for the highest possible quality. This enables robust and reliable attribute extraction.</p><h4>Different attributes require varying levels of effort in prompt tuning and LLM capability</h4><p>Another insight we’ve learned through practice is that different attributes require different levels of prompt tuning efforts and LLM capabilities.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3qKHqlQEEhP-JBdnhWimYg.png" /><figcaption>Table 1. Comparison of attribute definitions and extraction time required between simple and complex attributes</figcaption></figure><p>Simpler attributes, such as the “organic” claim in Table 1, can be easily extracted with high quality with LLMs since they have more straightforward definitions and guidelines. For instance, our initial prompt for organic extraction gave us a 95% accuracy. With our PARSE platform, this only took us one day of effort, compared to one week previously when using traditional methods. Conversely, difficult attributes such as the “low sugar” claim have more complex guidelines and require multiple prompt iterations for high-quality extraction. However, with PARSE, the iteration process for these more challenging attributes was still reduced to just three days due to the easy-to-use PARSE UI design.</p><p>Moreover, we also found that for simpler attributes, a cheaper but less powerful LLM delivered similar quality to more powerful ones at a 70% cost reduction. However, for difficult attributes, the less powerful LLMs suffered from a 60% accuracy drop. This emphasizes the importance of selecting the right extraction model to balance cost and quality effectively.</p><h3>Ongoing and Future work</h3><p>The LLM-powered PARSE platform described above has helped Instacart both accelerate catalog attribute extraction and boost the attribute data quality. However, there are still challenges remaining and we plan to continue iterating the platform and the underlying ML algorithm. Below we will share two exciting directions that could further improve our platform.</p><h4>LLM Cost Reduction Techniques</h4><p>While LLMs can achieve high performance in attribute extraction, due to the scale of the catalog, it’s critical to keep an eye on resources and balance costs. In the ML extraction endpoint section, we describe the LLM cascade algorithm that can help balance the cost and accuracy, but there are optimization techniques we can do here, such as:</p><p><strong>Multi attribute extraction:</strong></p><ul><li>The current attribute extraction via PARSE is done on a per attribute basis so that we can easily tune the prompt to achieve the highest fidelity answers. However, there is an opportunity to batch multiple attributes in a single prompt and extract them per product at the same time. In this case, we can avoid sending the same product information to the LLM APIs for different attributes to save the cost.</li><li>Similarly, we can also batch multiple products into the same prompt, and ask LLM to output extraction results per product. This will help avoid sending the same attribute extraction guideline to LLM APIs for every product.</li></ul><p><strong>LLM approximation </strong><a href="#f7d6">[2]</a></p><ul><li>Another idea of avoiding redundant LLM prompt processing is to ensure we only ask the LLM to process completely new products. To accomplish this, we will first store all previous attribute extraction results in a cache. Then to extract an attribute for a new product, we first verify if the attribute has been extracted for a similar product previously. If not, we query the LLM as before. But if yes, the extraction result will be retrieved from the cache and returned to save the cost.</li><li>For this approach to succeed, we will need to define a similarity function that is able to help determine if two products have the same attribute values. This will be a challenging problem but there is ongoing work in duplicate product detection that we can take advantage of.</li></ul><p>We plan to explore these techniques within our PARSE platform so that we can tackle attribute extraction in the most cost efficient way.</p><h4>Automatic Prompt Tuning</h4><p>One main bottleneck of attribute extraction via PARSE is prompt iteration, which is done by humans currently and thus time consuming. This is also an issue for all LLM applications since a carefully engineered prompt is usually required to achieve high output quality. Recently, how to automate the prompt generation and tuning process is becoming a hot topic, and there has been much literature published with proposed solutions. For example, in <a href="#f7d6">[6]</a>, it’s found that an LLM itself can be used as an optimizer to generate better prompts. In <a href="#f7d6">[7]</a>, evolutionary algorithms are also applied to make the LLM prompt optimization more efficient. The work in <a href="#f7d6">[8]</a> even proposed a framework to optimize prompts for pipelines that require multiple LLM calls. We plan to explore these different ideas in our attribute extraction setting so as to scale our attribute creation process even more.</p><p>These are just a couple of ways we’re exploring that will continue to improve our attribute extraction pipelines.</p><h3>Conclusion</h3><p>PARSE represents a significant leap forward in attribute extraction technology at Instacart. By addressing the limitations of previous approaches such as limited coverage and challenges with extracting complex or context-dependent attributes, PARSE not only enhances the efficiency and accuracy of our process but also provides us a foundation for exploring better attribute extraction algorithms. Looking ahead, the integration of cost reduction strategies and automated prompt tuning will allow us to further optimize our attribute extraction processes, ultimately ensuring the product catalog’s capability to deliver high-quality product attribute data in a scalable way to help elevate the customer experience.</p><h3>Reference</h3><p>[1] <a href="https://arxiv.org/pdf/2305.05176">FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance</a></p><p>[2] <a href="https://arxiv.org/pdf/2310.12963">AutoMix: Automatically Mixing Language Models</a></p><p>[3] <a href="https://arxiv.org/pdf/2310.03094">LARGE LANGUAGE MODEL CASCADES WITH MIXTURE OF THOUGHT REPRESENTATIONS FOR COSTEFFICIENT REASONING</a></p><p>[4] <a href="https://arxiv.org/abs/2306.13063">Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs</a></p><p>[5] <a href="https://arxiv.org/pdf/2312.09300">Self-Evaluation Improves Selective Generation in Large Language Models</a></p><p>[6] <a href="https://arxiv.org/pdf/2309.03409">LARGE LANGUAGE MODELS AS OPTIMIZERS</a></p><p>[7] <a href="https://arxiv.org/pdf/2309.08532">EVOPROMPT: CONNECTING LLMS WITH EVOLUTIONARY ALGORITHMS YIELDS POWERFUL PROMPT OPTIMIZERS</a></p><p>[8] <a href="https://arxiv.org/pdf/2406.11695">Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b9228754a527" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/multi-modal-catalog-attribute-extraction-platform-at-instacart-b9228754a527">Scaling Catalog Attribute Extraction with Multi-modal LLMs</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Introducing PIXEL: Instacart’s Unified Image Generation Platform]]></title>
            <link>https://tech.instacart.com/introducing-pixel-instacarts-unified-image-generation-platform-6d7dd0efe4c1?source=rss----587883b5d2ee---4</link>
            <guid isPermaLink="false">https://medium.com/p/6d7dd0efe4c1</guid>
            <category><![CDATA[image-generation]]></category>
            <category><![CDATA[ai-image-generator]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Prithvi Srinivasan]]></dc:creator>
            <pubDate>Thu, 17 Jul 2025 22:01:54 GMT</pubDate>
            <atom:updated>2025-07-17T22:01:54.627Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>Key contributor:</strong> Shishir Kumar Prasad</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GlvC6il9znAJ4RzLUWNX_Q.png" /></figure><h3>Introduction</h3><p>Selling groceries online has a fundamental challenge: customers can’t pick up and examine products like they would in-store. This is especially true for prepared foods, butcher items, and fresh bakery goods where visual appeal drives purchasing decisions. Without clear, accurate images, customers hesitate — and often abandon their carts. At Instacart, we understand that high-quality product images are essential for building trust and ensuring customer satisfaction. They’re the digital equivalent of holding a product in your hands. <a href="https://www.researchgate.net/publication/287267271_The_impact_of_product_photo_on_online_consumer_purchase_intention_An_image-processing_enabled_empirical_study">Industry research[1]</a> consistently shows a direct correlation between high-quality imagery and increased customer conversion rates.</p><p>Yet generating accurate, high-quality images at scale is a non-trivial challenge — especially across various applications. As our teams began exploring AI-powered solutions to fill image gaps, we witnessed that image generation was siloed within the organization. Different teams experimented with different models, prompting strategies, and evaluation criteria. This created duplication of effort and inconsistent results. Each team faced its own steep learning curve — figuring out what prompt worked best for a food image, which model produced the most realistic outputs, and how to measure quality.</p><p>That’s why we’ve made a significant investment in PIXEL, our one stop image generation platform to enable faster iteration, improve consistency, and offer a more efficient path to creating high-quality visuals that meet our standards at Instacart.</p><h3>PIXEL: Instacart’s Image Platform</h3><p>Instacart has been experimenting with generative AI models for imagery for a few years. The problem was that each team had to figure out the right models to use, the best prompting strategies for each of those models, and had to spend their time figuring out how to access and integrate with different providers. PIXEL was created to simplify that entire process for food imagery. It provides access to a variety of models, generates the right parameters and configurations, and has strong defaults for prompts for both generating and evaluating images with the added ability for teams to modify those defaults as needed. Teams using PIXEL have witnessed a 10x reduction in the time taken to generate new imagery along with a notable increase in overall quality.</p><p>It starts with a straightforward user interface that can be used by anyone at Instacart, regardless of their technical knowledge or role. They simply select a model from all the models available in PIXEL, enter a prompt, and generate images, so they can easily explore potential applications for their projects. It’s easy to change to a different model and adjust prompts — so teams can move fast without needing specific model training.</p><h3>Technical implementation and innovations</h3><p>PIXEL addresses the challenges of fragmented image generation through several key innovations:</p><ul><li><strong>Unified parameter protocol</strong> — Standardizes parameters across all image generation models</li><li><strong>Prompt templates &amp; few-shot prompting</strong> — Pre-built, tested prompts optimized for various food related imagery along with some default few shot prompt examples based on image type for better quality outputs</li><li><strong>Fine-tuned models</strong> — Custom models trained on Instacart’s specific product categories</li><li><strong>Automated quality evaluation</strong> — Vision-language models that assess output quality</li><li><strong>Infrastructure integration</strong> — Seamless API access through Instacart’s existing systems and storage across S3 and Snowflake</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/980/0*mD0nkGgY0BhadKey" /><figcaption>PIXEL features</figcaption></figure><p>Let’s explore how each of these components works:</p><p><strong>Unified Parameter Protocol</strong></p><p>Behind the scenes lies a unified parameter protocol that standardizes working across multiple image generation models to set image style, size, and cfg_scale which determine how closely the image follows the prompt. This means teams can switch between models from various providers by changing just the model name — PIXEL handles all the parameter translation automatically.</p><p><strong>Prompt Templates and Few-Shot prompting</strong></p><p>Once a model is chosen, team members can leverage a number of <strong>prompt templates</strong> to maintain consistency. These prompt templates define characteristics about lighting, backgrounds, and the image context are injected as few shot examples for each application. Teams can follow practical guidelines to create effective prompts across different models, reducing trial and error in the process. Here are example images with the original prompt and the new prompt which was rewritten using our few shot prompting technique for the final image. The rewritten prompt adds focus to the overall style and presentation of the picture.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*DN8wVjBuTKDXUdec" /></figure><p><strong>Fine tuned models</strong></p><p>We have also implemented fine tuned models for generating images of products using the <a href="https://dreambooth.github.io/">DreamBooth[2]</a> technique. DreamBooth works by fine-tuning a pre-trained text-to-image diffusion model — such as Stable Diffusion — on just a handful of product images, associating them with a unique identifier or keyword. This allows the model to generate highly realistic and detailed images of specific products in a wide variety of environments, poses, and lighting conditions, while preserving the unique characteristics and fine details of each item. By utilizing DreamBooth’s class-specific prior preservation loss, the technique ensures that the generated images not only maintain fidelity to the original product but also enable consistent and creative re-contextualization — placing products in new scenes or styles without losing their defining features.</p><p>This technique was highly useful to generate images of products in different backgrounds based on the retailer requirements and other characteristics such as packaging and quantity. This could be used for unbranded products like produce or meat items to get custom images trained on top of photographed resources. It can also be used for advertising to display the same product across different backgrounds. This approach is especially valuable for e-commerce, as it allows for the rapid creation of high-quality, customized product images that would traditionally require extensive manual photography and editing.</p><p><strong>Automated quality control and assessment</strong></p><p>The standards for food related images are pretty high. Initially we had a poor approval rate with our human in the loop judges with AI generated images. The images need to be accurate to the product and have visual consistency. Since its creation, PIXEL has utilized vision language models as a feedback loop to improve our human judges approval rate of images from 20% to 85%. Our evaluation system follows the steps below.</p><ol><li>We generate a first pass of images with a prompt generated by LLM.</li><li>We judge the image output using a curated set of evaluation questions that are generated by an LLM, based on the project needs.</li><li>We then pass the questions and the image to a VLM for evaluation. We make a decision whether or not to use the image based on the number of questions which passed from the evaluation.</li><li>If the image fails the evaluation, we incorporate the failed questions into the prompt generator LLM to generate a revised prompt for the image generation model and we repeat these steps until the image passes our threshold.</li></ol><p>VLMs were prompted with curated questions which checked for composition, consistency, style and overall appeal. For example, “does the given image contain &lt;X&gt;?”, “does the given image contain a warm neutral background?”, “does the given image contain non food content?”, etc. This provided a significant improvement in image quality while decreasing manual review efforts and cost.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*aOaaDAyrvGfEiX5k" /><figcaption>Image evaluation workflow</figcaption></figure><p><strong>Infrastructure integration</strong></p><p>We built PIXEL on top of Instacart’s existing service infrastructure which creates an RPC service, giving teams access to PIXEL for their workflows through an API call. We also let users store the generated images and easily access their URLs through an unique ID stored in Snowflake. This reliable system will grow and evolve with Instacart’s product and platform needs.</p><h3>Key Product Applications</h3><p>Let’s take a look at three of PIXEL’s applications:</p><h3>Butcher Cuts</h3><p>When we needed to develop a set of images for different types of butcher cuts and meats, PIXEL allowed us to test several models quickly to determine which one was optimal for this specific category of images. This category of products has its own set of challenges for customers, and these images helped them quickly search for and navigate to the right meat cut based on a visual cue instead of all-text descriptions. Overall navigation time and “add to cart” time dropped by over 25% for these items once we introduced images.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8CpTW-4VItZjrCHCvbYyTg.png" /></figure><h3>Lifestyle Imagery</h3><p>PIXEL is also utilized to generate lifestyle imagery for our product carousels and customer recommendations. For example, when our customers purchase herbed cheese, we offer highly explainable pairing recommendations of related cheese and appetiser options including crackers, meats, and pickled items. PIXEL looks across those recommendations to create an overall category image, which in this case is a cheese platter. This increased our personalized carousel recommendation cart conversion by 15%.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*vyr9Ff2OvYeMia9M" /></figure><h3>FoodStorm Prepared Foods</h3><p>Last year, an interesting use of PIXEL was its application within FoodStorm, Instacart’s all-in-one solution for prepared foods and catering. PIXEL enhanced image content for their platforms and gave opportunities for Retailers to generate images for their prepared food offerings. Retailers could generate images for ingredients and make their order management system more visually appealing to customers. PIXEL empowers retailers by giving the tools necessary to quickly set up the ordering experience without having to take expensive food photography these images. Read more about it <a href="https://tech.instacart.com/enhancing-foodstorm-with-ai-image-generation-d76a74867fa4">here[3]</a>!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*NAVCW7u_K8E4r0Ob" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/863/0*BR34YV7OpBg1_zfq" /></figure><p>An interesting outcome we realized from launching various applications was that the best performing model varied project by project. PIXEL enabled project leads to initiate projects using pre-configured, optimal model and parameter recommendations. Subsequently, they could rapidly test other models with a sample dataset and decide which one works best before moving to production image generation at scale.</p><h3>Conclusion and Next steps</h3><p>We’re actively investing in the next phase of the platform. We’re integrating newer models to expand the creative range and quality of output. For teams seeking more expressive control, PIXEL will soon offer fine-tuned knobs for adjusting image composition, lighting, and background. Finally we will offer easier access control to fine tune image models and serve them through the PIXEL platform.</p><p>PIXEL has transformed how Instacart creates product imagery. It centralizes model access, simplifies prompt engineering, enforces visual quality, and integrates with infrastructure for scale.</p><p>Follow<a href="https://tech.instacart.com"> tech-at-instacart[4]</a> to stay updated.</p><h3>References</h3><ol><li><a href="https://www.researchgate.net/publication/287267271_The_impact_of_product_photo_on_online_consumer_purchase_intention_An_image-processing_enabled_empirical_study">https://www.researchgate.net/publication/287267271_The_impact_of_product_photo_on_online_consumer_purchase_intention_An_image-processing_enabled_empirical_study</a></li><li><a href="https://dreambooth.github.io/">https://dreambooth.github.io/</a></li><li><a href="https://tech.instacart.com/enhancing-foodstorm-with-ai-image-generation-d76a74867fa4">https://tech.instacart.com/enhancing-foodstorm-with-ai-image-generation-d76a74867fa4</a></li><li><a href="https://tech.instacart.com/">https://tech.instacart.com/</a></li></ol><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6d7dd0efe4c1" width="1" height="1" alt=""><hr><p><a href="https://tech.instacart.com/introducing-pixel-instacarts-unified-image-generation-platform-6d7dd0efe4c1">Introducing PIXEL: Instacart’s Unified Image Generation Platform</a> was originally published in <a href="https://tech.instacart.com">tech-at-instacart</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>