AWS re:Invent 2019: Insert, upsert, and delete data in Amazon S3 using Amazon EMR (ANT239)
Published on Dec 05, 2019
Building data lakes in Amazon S3 offers scale and reliability for open-source data formats and a common data store for both reporting and BI as well as big data analytics and ML/AI. However, most commonly used big data frameworks make customers reinstate large volumes of data for handling incremental changes in individual records. Apache Hudi provides the ability to create, upsert, and delete records and simplifies the handling of change data capture and the ingestion of real-time streams. In this session, we dive deep into using Apache Hudi with Amazon EMR, exploring fundamental concepts, common scenarios, and how to use Hudi to optimize workflows.
35:25
AWS re:Invent 2019: Using Amazon EMR to build a Spark ecosystem at Opendoor (STP302)
35:25
47:59
AWS re:Invent 2019: Insert, upsert, and delete data in Amazon S3 using Amazon EMR (ANT239)
47:59
4:16
Amazon EMR on EC2 Spot Instances (Video 3 of 3): Save on Workloads like Apache Spark and Hadoop
4:16