Optimizing at the Edge:
Using Regression Discontinuity Designs to Power Decision-Making

Tilman Drerup
tech-at-instacart
Published in
8 min readApr 15, 2024

--

Levi Boxell, Robert Fletcher, and Tilman Drerup

In our previous post, we introduced the Economics Team at Instacart and talked about our unique team structure and the various problems we work on. In a series of follow-up posts, we will take a closer look at how we deploy econometric and machine learning tools to tackle specific business problems.

In the first post of this series, we will focus on regression discontinuity designs, a powerful econometric technique that can be used to learn causal effects from observational data. We also provide a simple framework for how these estimates can be used to make business-relevant trade-off decisions. We then take this framework to one of the problems we recently worked on and show how we have used it to revise authorization buffers.

Framework: Regression Discontinuity Designs & Trade-Off Optimization

Regression Discontinuity Designs

At Instacart, we frequently rely on a classic quasi-experimental method to answer business questions: regression discontinuity designs. In a regression discontinuity design, we rely on natural break points in a system to causally estimate an effect of interest.

To gain intuition for the concept, consider an ecommerce site that offers customers different service options based on their time of arrival on the site. For example, imagine the site decided to offer every customer arriving before noon a 2-day delivery window, whereas every customer who arrives right after noon is offered a 3-day window. A customer’s arrival time on the site is of course not random, so we cannot quite learn the causal effect of the change in the delivery window on all customers.

However, there is an interesting quasi-experiment hidden in this data. Arguably, what is random is whether a customer arrives just before or after noon. Imagine now that we plotted a customer’s likelihood of checking out against the time of arrival relative to the noon cutoff and saw something like the following, where the x-axis shows the time of arrival relative to noon and the y-axis shows the average checkout propensity:

Right around noon, we see a sizable drop in checkout propensity. At this time, however, the only thing that arguably changes is that we stop showing customers the faster delivery options.

This drop thus presents us with a local average treatment effect (LATE) for the impact of the expanded delivery window. This effect is local only since it’s only valid for the customers just to the left and to the right of the cutoff. Or, put differently, it presents an estimate of the treatment effect among the specific set of population affected by the threshold. LATEs are valuable pieces of information that can point us in the correct direction, but they can not provide the average treatment effect (ATE) across the entire user population from a change in policy.

Trading Off Business Objectives

Regression discontinuity frameworks provide a powerful tool to examine various tradeoffs between metrics around the threshold. Imagine, for instance, the ecommerce site from the example above wanted to evaluate whether it should adjust the delivery threshold to later or earlier in the day. In such a trade-off, decisions would likely be based on a comparison of the incremental value of moving the thresholds and the associated cost. Specifically, moving the threshold to later in the day may generate a number of incremental orders but also put a strain on delivery systems as more orders have to be fulfilled in less time.

Here is what the site could do to evaluate such a trade-off. Let’s say the site only cares about the number of checkouts and the cost of fulfillment per order. It’s common practice to have a guardrail informed by the long-term value of an incremental order to make such tradeoff decisions. This generates a decision framework that takes the form of:

Incremental orders — guardrail * (incremental fulfillment cost) > 0.

Given this framework, the site can simply plug-in the LATE estimates for orders and fulfillment cost respectively to determine whether to expand or contract eligibility for the shorter delivery window.

Of course, the site would have to ensure that the assumptions underlying this estimation remain valid in the process. To further finetune the threshold, the platform could run an A/B test (or better yet a Multi-Armed Bandit — more on that in a future post!) to evaluate a series of potential thresholds, repeating the above exercise for each of them and charting out the entire trade-off curve.

Now let’s turn to an example of how we used this line of thinking to optimize Instacart’s Authorization Buffers.

Application: Authorization Buffer Optimization

Discontinuous Authorization Buffers at Instacart

When customers shop on Instacart, we face a challenge: How large of a hold should we put on a customer’s credit card? When a customer clicks “Place Order,” we know the initial cost of the items in the customer’s cart as well as the associated fees, taxes, and tips. However, customers can still make changes or add items after the order has been placed. Shopper-initiated replacements for missing items can also change the final charge amounts.

To enable such post-checkout alterations, we place a so-called authorization hold based on the initial cart total plus an additional “buffer” amount. The scheme we use to determine such buffers needs to trade off competing effects: While higher authorization holds evidently reduce the risk of potential unpaid amounts, they also create a confusing customer experience, even if customers are only charged the actual amount of their final order upon the order’s completion.

How can we determine the optimal authorization buffer to add? Here’s where a natural experiment in Instacart’s buffer policy comes in.

To illustrate, we will turn to a hypothetical version of the authorization buffer scheme that resembles the one that was deployed at Instacart at some point in the past. In this scheme, buffers were allocated as follows:

  1. First, add a buffer of 10% to the order total, and
  2. Second, round the resulting total up to the next $5.

The second step was added to avoid odd-looking authorizations to the customer. Notice how the second step in the scheme creates a discontinuous effect. Every time step 1 results in an amount that just exceeds a multiple of five, the total authorization amount jumps by $5! As a result, two orders can generate a $5 difference in the authorization hold even if their actual order amounts are only a single cent apart.

This discontinuous jump gives us everything we need to estimate a local average treatment effect of the higher buffer amount on the business metrics we care about. Of course, the crucial assumption for a causal interpretation is that users just below the $5 multiple and users just above are the same on average. For example, users who place $100 orders should not be doing so because they know that a $100.01 order would trigger a $5 larger authorization buffer. In our setting, we believe this is a safe assumption.

Local Average Treatment Effect Estimation

When applying this method to our data, we see something like the following graph (which is for illustrative purposes only). On the x-axis, the graph shows the difference in cents between the customer’s order total and the threshold. The y-axis shows the rate of card declines. At the discontinuity, where the rounding suddenly jumps up to the next $5, we see the users on the right having significantly higher card decline rates than users on the left.

*The plot is for illustrative purposes and does not reflect actual treatment effects or levels in the data.

We can of course repeat this analysis with other outcomes we may care about, such as unpaid amounts or re-authorization rates. Combining these estimates with a tradeoff framework helps inform us on whether we should be more or less aggressive in our authorization buffers.

Authorization Buffers: Navigating the Payment Predicament

Following the initial analysis, we developed a revised policy based on the estimated effects. To validate our proposed policy and the ATE, however, we still needed to run an A/B experiment. As part of a series of experiments intended to improve the payment experience for our customers, we ran an experiment where we significantly reduced the amount of upward rounding in the initial authorization buffer. The results validated the insights from our regression discontinuity analysis and generated a significant increase in order volume on Instacart through a reduction in the number of card declines!

By reducing frictions associated with the initial authorization buffer amounts, we were able to generate a win-win for both customers and Instacart, on average. However, within the average treatment effects may lie important heterogeneity. More broadly, our series of experiments pointed to the realization that adjusting the initial authorization buffer is just the start.

For some users, the initial authorization is a burdensome friction that needs to be reduced. However, reducing the initial authorization amount has the undesired consequence of increasing the likelihood that a second authorization is needed. For many customers, the subsequent authorization attempts may be a confusing and ultimately poor order experience. Our experiments in this domain are building towards the vision of an authorization and payment platform that can tailor the initial and subsequent authorization rulesets based on current order context and past customer interactions. Our rounding experiment was a stepping stone in this direction.

Wrapping Up

Applying regression discontinuity to rounding in authorization buffers is just one example of how causal inference can improve customer experiences and generate significant business impact. Discontinuous thresholds naturally appear throughout consumer-facing products beyond payment systems and provide an opportunity to estimate (local) treatment effects without having to first run A/B tests. Other areas where regression discontinuity could be applied include relevance thresholds in search or ads, targeting thresholds for incentives, or product availability thresholds. If there is a threshold that generates a discontinuous product experience, regression discontinuity can be applied!

If you would like to learn more about our work, check out the intro to our team or our upcoming posts on projects we’ve worked on. You can follow tech-at-instacart to be notified as they are published.

And a special shoutout goes to Aditya Karan, a former Instacart PhD Intern, who was instrumental in bringing this project to life!

--

--