2017

Unified Batch and Stream processing with Apache Beam

PyData, July 2017: Apache Beam is a parallel programming model that allows one to implement batch and streaming data processing jobs that can run on a variety of execution engines like Apache Spark and Google Cloud Dataflow. (Slides | Video)

Presented similar talks on Apache Beam at:

Intro to Recommendation Systems

Metis, Auguest 2017: Guest lecture on building real world recommendation systems. Talk goes over building multiple iterations of the system in increasing complexity.

2016

Cassandra batch loading for building Data Products

Cassandra Meetup, July 2016: Talk about Nostos (batch loading service at Coursera) and some of its use cases in data products such as recommendations, search and prediction models. Covers some of the design choice and tradeoffs made in building Nostos and explains how the system evolved over time. (Slides | Video)

Extending our workflow service for use cases beyond ETL

Big Data Meetup, May 2016: Talk about Dataduct the workflow / ETL service at Coursera and how it is now being used for other use cases beyond just ETL such as machine learning, predictions and bulk loading into cassandra. (Slides)

2015

Large-Scale ETL Data Flows with AWS Data Pipeline & Dataduct

AWS Re:Invent, Oct 2015: Dive deep into AWS Data Pipeline and Dataduct, an open source framework built at Coursera to manage pipelines and create reusable patterns to expedite developer productivity. (Slides | Video)