Amazon Web Services

Building data lakes on AWS

About the Course: Building data lakes on AWS

The “Building Data Lakes on AWS” course provides a comprehensive guide to creating, managing, and utilizing data lakes on the AWS cloud platform. It is designed to help learners understand the value of data lakes, differentiate them from data warehouses, and recognize the crucial components that make up a data lake.

The course covers essential topics such as data ingestion, cataloging, preparation, and processing using a variety of AWS services, including AWS Glue, Amazon Athena, and AWS Lake Formation.Learners will gain practical experience through hands-on labs, setting up a simple data lake, building a data lake with AWS Lake Formation, automating data lake creation, and data visualisation using Amazon QuickSight. By the end of the course, participants will have a solid understanding of building data lakes on AWS, and will be equipped with the skills to build a data lake on AWS effectively, ensuring they can leverage the full potential of their data assets in the cloud.

Audience Profile

This course is intended for Individuals who are responsible for designing cloud infrastructure and reference architectures , Individuals who have attended Architecting on AWS course and Systems Engineers and Developers who are responsible for designing and implementing advanced architectures on AWS

Learning Objectives and Outcomes:

  • Understand the fundamental value and concepts of data lakes compared to traditional data warehouses.
  • Learn the key components that constitute a data lake and explore common architectures integrating data lakes.
  • Gain knowledge of data ingestion methods, cataloging with AWS Glue, and preparation techniques for optimal data storage and retrieval in AWS.
  • Acquire hands-on experience in setting up a basic data lake on AWS through practical labs.
  • Recognise the importance of data processing within a data lake and how to apply these concepts using AWS Glue.
  • Learn to analyse data efficiently using Amazon Athena within a data lake environment.
  • Explore the features, benefits, and security model of AWS Lake Formation for creating and managing data lakes.
  • Gain practical skills in building a data lake using AWS Lake Formation through guided laboratory exercises.
  • Understand how to automate data lake creation with AWS Lake Formation blueprints and workflows and enforce security and access controls.
  • Develop the ability to match records and visualise data effectively using AWS Lake Formation FindMatches and Amazon QuickSight, respectively.

Course Objectives

After completing this course, students will be able to:

  • Manage multiple AWS accounts for your organisation 
  • Connect on-premises data centre to AWS cloud 
  • Discuss billing implications of connecting multi-region VPCs 
  • Move large data from on-premises data centre to AWS 
  • Design large datastores for AWS cloud 
  • Understand different architectural designs for scaling a large website 
  • Protect your infrastructure from DDoS attack 
  • Secure your data on AWS with encryption 
  • Design protection of data-at-rest as well as data-in-flight 
  • Enhance the performance of your solutions 

To ensure that participants are well-prepared and can fully benefit from the Building Data Lakes on AWS course, the following prerequisites are recommended:

  • Basic understanding of database concepts, including traditional database management systems and SQL.
  • Familiarity with the concept of data warehousing and the differences between structured and unstructured data.
  • Some experience with cloud computing, particularly with Amazon Web Services (AWS), including an understanding of core AWS services such as Amazon S3, AWS Glue, Amazon Athena, and AWS Lake Formation is beneficial.
  • Knowledge of data processing and analytics concepts, which will aid in understanding how data is transformed and analysed within a data lake environment.
  • Basic proficiency in using AWS Management Console and AWS Command Line Interface (CLI) will be helpful for the lab components of the course.
  • A willingness to engage with hands-on lab exercises that reinforce the concepts taught in the lessons.

These prerequisites are intended to provide a foundation that will allow students to engage with the course content effectively. They are not meant to be barriers to entry, but rather to ensure that students have a positive and productive learning experience.

Students with varying levels of prior knowledge have successfully completed the course by taking advantage of the resources provided and actively participating in the learning process.

1 Day

Online/Instructor Led



  • Describe the value of data lakes
  • Compare data lakes and data warehouses
  • Describe the components of a data lake
  • Recognise common architectures built on data lakes
  • Describe the relationship between data lake storage and data ingestion
  • Describe AWS Glue crawlers and how they are used to create a data catalog
  • Identify data formatting, partitioning, and compression for efficient storage and query
  • Lab 1: Set up a simple data lake
  • Recognise how data processing applies to a data lake
  • Use AWS Glue to process data within a data lake
  • Describe how to use Amazon Athena to analyse data in a data lake
  • Describe the features and benefits of AWS Lake Formation
  • Use AWS Lake Formation to create a data lake
  • Understand the AWS Lake Formation security model
  • Lab 2: Build a data lake using AWS Lake Formation
  • Automate AWS Lake Formation using blueprints and workflows
  • Apply security and access controls to AWS Lake Formation
  • Match records with AWS Lake Formation Find Matches
  • Visualise data with Amazon QuickSight
  • Lab 3: Automate data lake creation using AWS Lake Formation blueprints
  • Lab 4: Data visualisation using Amazon QuickSight
  • Post course knowledge check
  • Architecture review
  • Course review

Download Course Outlines