Unified Batch and Stream processing with Apache Beam

PyBay, August 2017: Talk on how Apache Beam's RunnerAPI and FnAPI based architecture allows for building support for multiple languages and runners. And how it enables doing Batch and Stream processing in Python. (Slides)

Intro to Recommendation Systems

Metis, Auguest 2017: Guest lecture / workshop on building real world recommendation systems. Talk goes over building multiple iterations of the system in increasing complexity.

Big data processing with Apache Beam

PyData, July 2017: Presents the Python SDK for Apache Beam - a parallel programming model that allows one to implement batch and streaming data processing jobs that can run on a variety of execution engines like Apache Spark and Google Cloud Dataflow. (Slides | Video)


Cassandra batch loading for building Data Products

Cassandra Meetup, July 2016: Talk about Nostos (batch loading service at Coursera) and some of its use cases in data products such as recommendations, search and prediction models. Covers some of the design choice and tradeoffs made in building Nostos and explains how the system evolved over time. (Slides | Video)

Extending our workflow service for use cases beyond ETL

Big Data Meetup, May 2016: Talk about Dataduct the workflow / ETL service at Coursera and how it is now being used for other use cases beyond just ETL such as machine learning, predictions and bulk loading into cassandra. (Slides)


Large-Scale ETL Data Flows with AWS Data Pipeline & Dataduct

AWS Re:Invent, Oct 2015: Dive deep into AWS Data Pipeline and Dataduct, an open source framework built at Coursera to manage pipelines and create reusable patterns to expedite developer productivity. (Slides | Video)