This is Sourabh, a software engineer on the Analytics team at Coursera. Prior to Coursera I was a student at Georgia Tech and BITS Pilani. In my free time I try to learn something, contribute to open source and go hiking.
- Cassandra Meetup, July 2016: Cassandra batch loading for building Data Products (Slides)
- Big Data Meetup, May 2016: Extending our workflow service for use cases beyond ETL (Slides)
- AWS Reinvent, Oct 2015: Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct (Slides)
- : Core service for all recommendation systems at Coursesa, currently used on the homepage and throughout the content discovery process. Worked on both offline training and online serving.
- : Improved content discovery by building a new onboarding experience on coursera. Using this to personalize the search and browse experience. Also worked on ranking and indexing improvements.
- : Service for sending email, push and in-app notifications. Involved in features such as delivery time optimization, tracking, queuing and A/B testing. Built an internal app to run batch campaigns for marketing etc.
- : Bulk data processing and injection service from Hadoop to Cassandra and provides a thin REST layer on top for serving offline computed data online.
- Workflow Service: Dataduct an open source workflow framework to create and manage data pipelines leveraging reusables patterns to expedite developer productivity.
- : Designed the internal survey/data collection system which allowed for various question types and ability to ask different questions based on learner responses.
- : Analytics environment based on docker and AWS, standardized the python and R dependencies. Wrote the core libraries that are shared by all data scientists.
- : Setup, schema design and management of Amazon Redshift. Built an internal app for access to the data using a web interface. Dataduct integration for daily ETL.
- Course Dashboards: Instructor dashboards and learner surveying tools, which helped instructors run their class better by providing data on Assignments and Learner Activity.
- QuantSoftware Toolkit: Open source python library for financial data analysis and machine learning for finance.
- : Created models for portfolio hedging, portfolio optimization and price forecasting. Also creating a strategy backtesting engine used for simulating and backtesting strategies.
- Mac-Setup: Open source book that gives step by step instructions on setting up developer environment on Mac OS.
- :: Prototyped a motion capture system for controlling a 3D image in realtime.