Real-time Large Scale Data Ingestion and Analytical Service

Title: Real-time Large Scale Data Ingestion and Analytical Service

Time: July 28. The doors will open at 5:30 PM Pacific Daylight Time, and the talk will start at 6:00 PM.

Introduction:

In the first half of the meetup, Boyang will talk about real-time large scale data ingestion at Rockset. Rockset is a cloud-native data analytical platform which provides subsecond queries on real-time data.

One of the most critical problems to be solved is how to ingest large scale data collections from customers in various data sources freshly and accurately. In this talk, Boyang will dive into the data ingestion pipeline Rockset has built, with a focus on two critical features: SQL rollup and bulk load. SQL rollup h helps users extract useful information at ingestion time without loading all the raw data into Rockset, while bulk load reduces the ingestion time of TB-level datasets from days to hours or minutes. These two features help make Rockset’s data ingestion very successful.

In the second half of the talk, Liquan will talk about data import and export in TiDB. TiDB is a Hybrid Transactional and Analytical Processing (HTAP) database. Data import and export in TiDB needs to guarantee data consistency and achieve scalability. Moreover, the challenge of building data import and export solutions is often ignored or overlooked. Liquan will start with demonstrating why data import and export at scale is challenging, and then go over the design considerations of data import and export and the trade offs TiDB has made.

By attending this meetup, you will gain valuable engineering experience from Rockset, especially how to build real-time large scale data ingestion applications. You will also learn why the HTAP database is a game changer for Software-as-a-Service (SaaS) applications.

Speakers

Boyang Chen

Engineering lead, Rockset

Boyang is the engineering lead at Rockset and an Apache Kafka committer. Prior to Rockset, Boyang spent two years at Confluent on various technical initiatives, including Kafka Streams, exactly-once semantics, and Apache ZooKeeper removal. He also co-authored the paper, “Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka.” Boyang also worked on the ads infrastructure team at Pinterest to rebuild the entire budgeting and pacing pipeline. Boyang has his Bachelor’s and Master’s Degrees in Computer Science from the University of Illinois at Urbana-Champaign.

Liquan Pei

Senior Database Engineer, PingCAP

Liquan is a Senior Database Engineer at PingCAP. Before PingCAP, he was the tech lead of the ads stream processing system at Pinterest. Prior to that, he worked at Confluent, focusing on Kafka and Kafka Connect. He is an open-source contributor to Apache Kafka and Apache Spark and was a speaker at Kafka Summit 2018 and 2019.