AWS re:Invent 2018: Build and Govern Your Data Lakes with AWS Glue (ANT309)

Published on Nov 28, 2018

As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. Learn how AWS Glue makes it easy to build and manage enterprise-grade data lakes on Amazon S3. AWS Glue can ingest data from variety of sources into your data lake, clean it, transform it, and automatically register it in the AWS Glue Data Catalog, making data readily available for analytics. Learn how you can set appropriate security policies in the Data Catalog and make data available for a variety of use cases, such as run ad-hoc analytics in Amazon Athena, run queries across your data warehouse and data lake with Amazon Redshift Spectrum, run big data analysis in Amazon EMR, and build machine learning models with Amazon SageMaker and AWS Glue. Additionally, Robinhood will share how they were able to move from a world of data silos to building a robust, petabyte scale data lake on Amazon S3 with AWS Glue. Robinhood is one of the fastest-growing brokerages, serving over five million users with an easy to use investment platform that offers commission-free trading of equities, ETFs, options, and cryptocurrencies. Learn about the design paradigms and tradeoffs that Robinhood made to achieve a cost effective and performant data lake that unifies all data access, analytics, and machine learning use cases.