Know Your Carrots with Jonathan Phillips

Viktor Evdokimov
tech-at-instacart
Published in
4 min readJun 18, 2018

--

Subscribe in iTunes, Stitcher or Tunein

Hi, this your host Viktor and I am presenting the second episode of Know Your Carrots starring Jon Phillips. Jon is Canadian but spent his early years in Bangladesh and Thailand. In this interview, we speak about Jon’s startup experience, how he joined Instacart, and his roles and responsibilities on the Search Infrastructure Team. Jon makes sure the “lights” stay on and customers can find what they’re looking for on the store front. The nature of his role at Instacart requires him to think about improved tooling to aid in stability as Instacart grows. Jon is also a serial hobbyist so stay tuned for his thoughts on how to hobby!

A few quotes from our guest:

What are you doing at Instacart?

Most of our roles and responsibilities on the team are directly related with scaling of our infrastructure, keeping the lights on, making sure we are not on fire, and data ingestion. We handle data ingestion from our partners … and search powers all of it.

… on stability of the search

We have multiple different search clusters, which are isolated by business use … we have a cluster for background/reporting use, and we have two frontend clusters that are divided by fast and slow queries. You want to keep slow queries separated from the fast queries that are critical for checkout.

… on visibility of what is going on in the cluster

We built tools to help us know where the issues are. In the past we had issues with visibility of what queries are actually executing and what is actually causing the issues, and now we have tooling around it

… on example of problems we have with Elastic Search

Elastic Search has a finite capacity … its resources directly relate to how many queries can be running at the same time… so if you have slow running queries, you can get request queuing in ES and as a result, other queries that are usually pretty fast, like fetch by ID all of a sudden start timing out and you start getting 500s and no one knows what is going on. So without knowing what the queries are that are holding up threads, it is very difficult to find the real slow queries because on aggregate they are not executed that often.

on tooling for ES

…most engineers don’t know ES JSON formatting for queries, and it is hard to go into Kibana and create a dashboard when you don’t know the underlying technology. How many know about Timelion? The search team knows, but probably no one outside the team. So having SQL as the underlying language for tooling and understanding what is going on in a cluster was a requirement. Also we needed 100 percent coverage of what is executed on a cluster and accountability. … with Eventer, Kinesis and Druid, we also have a Blazer SQL interface, we track every single query with the code owner and execution location. Every time we have a slow query we know where they are [in code]…

… on how tooling helped with our search infrastructure

Two months ago we were at about 3000 searches a second per cluster, which means about 5000–6000 queries per second across both the slow and fast clusters. Now we are down to 250 queries a second. We dramatically decreased the number of queries just by having visibility into what is being executed on the cluster. So thetakeaway here is to never have a data storage technology that does not provide visibility [into core metrics].

… on snowflake

One of the things that is super nice is writes and reads are completely separated. And they also have a concept of data warehouses, which has read quotas that we can provision by team. And that’s what didn’t work with redshift with our multi-tenant design. You just provision warehouses, and you can provision a DWH that has a subset of your data. If you start request queueing you can automatically provision read nodes, and queries scale linearly and node provisioning helps a lot. … one of the examples is that some queries that data scientists were doing used to take 4 hours on Redshift and right now are executed in under 4 minutes on Snowflake

… on tools Jon uses every day:

I am a zsh guy … I am a firm believer that if you did something 3 times you have to automate it … I have a little bash script that detects local branch and repo, it creates a pull request for you and opens it up in the browser … I use autojump to jump between directories … I use vim with a nerdtree in some workflows … I didn’t like Atom … new vscode is an excellent IDE if it’s your thing

Links for the episode:

Big thanks to Jon Phillips, Jon Hsieh, Muffy, Dominic and Bill for helping to make this happen. If you have any feedback about the format or the content of the podcast, please send it to Viktor at instacart dot com.

Also stay tuned for our next episode starring Gordon, where we share excitement about infrastructure, catalog, tooling, distributed systems and Gordon’s personal projects. Til next time!

--

--