AWS re:Invent 2017: Architecting a data lake with Amazon S3, Amazon Kinesis, and Ama (ABD318)

Published on Nov 29, 2017

Learn how to architect a data lake where different teams within your organization can publish and consume data in a self-service manner. As organizations aim to become more data-driven, data engineering teams have to build architectures that can cater to the needs of diverse users - from developers, to business analysts, to data scientists. Each of these user groups employs different tools, have different data needs and access data in different ways. In this talk, we will dive deep into assembling a data lake using Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue. The session will feature Mohit Rao, Architect and Integration lead at Atlassian, the maker of products such as JIRA, Confluence, and Stride. First, we will look at a couple of common architectures for building a data lake. Then we will show how Atlassian built a self-service data lake, where any team within the company can publish a dataset to be consumed by a broad set of users.