AWS re:Invent 2017: Best Practices for Building a Data Lake in Amazon S3 and Amazon (STG312)
Published on Dec 01, 2017
Learn how to build a data lake for analytics in Amazon S3 and Amazon Glacier. In this session, we discuss best practices for data curation, normalization, and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon Athena and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR. You'll also get a chance to hear from Airbnb & Viber about their solutions for Big Data analytics using S3 as a data lake.