AWS Video Catalog

AWS re:Invent 2015 | (BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct

Published on Oct 12, 2015

As data volumes grow, managing and scaling data pipelines for ETL and batch processing can be daunting. With more than 13.5 million learners worldwide, hundreds of courses, and thousands of instructors, Coursera manages over a hundred data pipelines for ETL, batch processing, and new product development. In this session, we dive deep into AWS Data Pipeline and Dataduct, an open source framework built at Coursera to manage pipelines and create reusable patterns to expedite developer productivity. We share the lessons learned during our journey: from basic ETL processes, such as loading data from Amazon RDS to Amazon Redshift, to more sophisticated pipelines to power recommendation engines and search services. Attendees learn: Do's and don’ts of Data Pipeline Using Dataduct to streamline your data pipelines How to use Data Pipeline to power other data products, such as recommendation systems What’s next for Dataduct

54:06

AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, ETL & Stream Processing (BDM303)

54:06

38:29

AWS re:Invent 2015 | (BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct

38:29

41:12

AWS re:Invent 2014 | (BDT303) Construct ETL Pipeline w/ AWS Data Pipeline, Amazon EMR & Redshift

41:12

2020

3:53

How do I create an SNS notification that includes error information for a Data Pipeline activity?

3:53