Instacart Distinguished Speaker Series with Professor Tao Yu

Instacart
tech-at-instacart
Published in
2 min readApr 5, 2024

--

For our ongoing Distinguished Speaker Series, Instacart Engineering invites renowned researchers and practitioners to present state-of-the-art work on the technology that lies behind much of the digital economy.

Join us for the next session of our series, in which we’ll host Professor Tao Yu from the University of Hong Kong for a discussion on, “OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments.”

Event details:

Date: Wednesday, April 10, 2024

Time: 4:00–5:00 pm PT

Zoom Link HERE, Password: 313553

Abstract: The advent of autonomous digital agents, powered by advancements in vision-language models (VLMs), promises to revolutionize human-computer interaction by enhancing accessibility and productivity. These multimodal agents make it possible to perform sophisticated reasoning, decision-making, and planning multi-step actions in diverse environments autonomously. In this talk, Professor Yu presents OSWorld, a real computer environment specifically designed to promote the development of agents capable of performing a wide range of digital tasks across various operating systems, interfaces, and applications. Professor Yu will also share insights into how cutting-edge VLMs perform on open-ended tasks within the OSWorld environment. Furthermore, Professor Yu will present recent work in this direction, including instruction-finetuned retrievers for diverse environment adaptation and the enhancement of LLM capabilities with tool integration. The talk will conclude with an exploration of the current and future research prospects in this rapidly evolving domain.

Speaker: Tao Yu is an Assistant Professor of Computer Science at The University of Hong Kong and serves as Director of the XLANG Lab (as part of the HKU NLP Group). His main research interest is in Natural Language Processing. He completed his Ph.D. at Yale University and was a postdoctoral fellow in the UW NLP group at the University of Washington. His research aims to build language model agents that transform (“grounding”) language instructions into code or actions executable in real-world environments, including databases, web applications, and the physical world. It lies at the heart of the next generation of natural language interfaces that can interact with and learn from these real-world environments to facilitate human interaction with data analysis, web applications, and robotic instruction through conversation. This involves executable language grounding, such as semantic parsing and code generation, efficient and generalizable large language models, and interactive systems. Tao is the recipient of the Google Research Scholar Award and the Amazon Research Award.

--

--