AWS Video Catalog

AWS re:Invent 2015 | (BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR

Published on Oct 12, 2015

Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.

50:25

AWS re:Invent 2015 | (SEC403) Diving into AWS CloudTrail Events w/Apache Spark on EMR

50:25

45:03

AWS re:Invent 2015 | (BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR

45:03

49:13

AWS re:Invent 2015 | (BDT305) Amazon EMR Deep Dive and Best Practices

49:13

2021

AWS Summit Online ASEAN 2021 | Run Amazon EMR at low cost - Amazon EC2 Spot Instances & Amazon MWAA

How can I access applications on an Amazon EMR cluster if the cluster is in a private subnet?

Amazon EMR on EKS - Build Custom Images for Apache Spark on Kubernetes

AWS re:Invent 2015 | (BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR

2021

AWS Summit Online ASEAN 2021 | Run Amazon EMR at low cost - Amazon EC2 Spot Instances & Amazon MWAA

How can I access applications on an Amazon EMR cluster if the cluster is in a private subnet?

Amazon EMR on EKS - Build Custom Images for Apache Spark on Kubernetes

AWS What's Next ft. Amazon EMR Studio | AWS Events

How can I modify the Spark configuration in an Amazon EMR notebook?

Amazon EMR Support for Targeted ODCR

EMR on EKS - Optimizing Apache Spark jobs on EMR on EKS

EMR on EKS - Orchestrating workflows with Apache Airflow

EMR on EKS - Accessing a Hive metastore or Glue Data Catalog

EMR on EKS - Running Apache Spark jobs on EMR on EKS

Amazon EMR on Amazon EKS - What is EMR on EKS?

Using Amazon EMR Studio to Launch EMR clusters in AWS Service Catalog

Intro to Amazon EMR Studio

Incremental Data Processing using Delta Lake with EMR

AWS re:Invent 2020: What’s new with Amazon EMR

AWS re:Invent 2020: Run big data analytics faster at lower cost with Amazon EMR

AWS re:Invent 2020: How Nielsen built a multi-petabyte data platform using Amazon EMR

AWS re:Invent 2020: Run Spark on Kubernetes with Amazon EMR on Amazon EKS

AWS re:Invent 2020: Introducing EMR Studio: a new notebook-first IDE experience

AWS re:Invent 2020: Implement data access controls for multi-tenant Amazon EMR clusters

AWS re:Invent 2020: Turbocharging query execution on Amazon EMR

AWS re:Invent 2015 | (BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR

2021

AWS Summit Online ASEAN 2021 | Run Amazon EMR at low cost - Amazon EC2 Spot Instances &amp; Amazon MWAA

How can I access applications on an Amazon EMR cluster if the cluster is in a private subnet?

Amazon EMR on EKS - Build Custom Images for Apache Spark on Kubernetes

AWS What&#39;s Next ft. Amazon EMR Studio | AWS Events

How can I modify the Spark configuration in an Amazon EMR notebook?

Amazon EMR Support for Targeted ODCR

EMR on EKS - Optimizing Apache Spark jobs on EMR on EKS

EMR on EKS - Orchestrating workflows with Apache Airflow

EMR on EKS - Accessing a Hive metastore or Glue Data Catalog

EMR on EKS - Running Apache Spark jobs on EMR on EKS

Amazon EMR on Amazon EKS - What is EMR on EKS?

Using Amazon EMR Studio to Launch EMR clusters in AWS Service Catalog

Intro to Amazon EMR Studio

Incremental Data Processing using Delta Lake with EMR

AWS re:Invent 2020: What’s new with Amazon EMR

AWS re:Invent 2020: Run big data analytics faster at lower cost with Amazon EMR

AWS re:Invent 2020: How Nielsen built a multi-petabyte data platform using Amazon EMR

AWS re:Invent 2020: Run Spark on Kubernetes with Amazon EMR on Amazon EKS

AWS re:Invent 2020: Introducing EMR Studio: a new notebook-first IDE experience

AWS re:Invent 2020: Implement data access controls for multi-tenant Amazon EMR clusters

AWS re:Invent 2020: Turbocharging query execution on Amazon EMR

AWS Summit Online ASEAN 2021 | Run Amazon EMR at low cost - Amazon EC2 Spot Instances & Amazon MWAA

AWS What's Next ft. Amazon EMR Studio | AWS Events