Skip to content

Commit 371ee1a

Browse files
committed
Updated EKS tutorial introduction
1 parent e7477fd commit 371ee1a

File tree

3 files changed

+14
-12
lines changed

3 files changed

+14
-12
lines changed

.DS_Store

0 Bytes
Binary file not shown.

lustre/SageMaker-training-using-FSxL-on-EKS/06-cleanup-resources/readme.adoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@
99

1010
This section will clean-up resources created as part of your tutorial.
1111

12-
In this tutorial, I showed you how to use *Amazon FSx for Lustre persistent file system* with *Amazon SageMaker* for machine learning training on an *Amazon EKS cluster*. First, we setup SageMaker Operator on your Kubernetes cluster. Next we configured Amazon FSx for Lustre persistent file system as a persistent volume claim using the CSI driver on our EKS cluster. Then, we prepared the training job to use Amazon FSx for Lustre, and initiated training a gradient-boosting model with MNIST dataset using Amazon SageMaker Training Operator.
12+
In this tutorial, I showed you how to use an *Amazon FSx for Lustre persistent file system* with *Amazon SageMaker* to train a machine learning model on an *Amazon EKS cluster*. First, we setup SageMaker Operator on your Kubernetes cluster. Next, we configured an Amazon FSx for Lustre persistent file system as a persistent volume using the CSI driver on our EKS cluster. Then, we configured the training job to use Amazon FSx for Lustre for your input data source, and initiated training on a gradient-boosting model using the Amazon SageMaker Training Operator.
1313

14-
Using Amazon FSx for Lustre accelerates your training jobs by enabling faster download of large datasets. Repeat training jobs can make use of dataset already available on Amazon FSx file system and avoid Amazon S3 request costs.
14+
Using Amazon FSx for Lustre accelerates your training jobs by enabling faster download of large datasets. Subsequent training jobs can make use of the dataset already available on an Amazon FSx file system and avoid repeated Amazon S3 requests costs.
1515

16-
In this tutorial,we focused primarily on machine learning use case. Amazon FSx for Lustre persistent file system can be used with any high-performance workload on EKS clusters when applications need access to shared, persistent, and high-performance POSIX-compliant file system.
16+
In this blog post we focused primarily on a machine learning use case. Amazon FSx for Lustre persistent file systems can be used with any high-performance workload on EKS clusters when applications need access to a shared, persistent, and high-performance POSIX-compliant file system.
1717

1818

1919
== Duration

lustre/SageMaker-training-using-FSxL-on-EKS/readme.adoc

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,24 +7,26 @@ image:FSx-SageMaker-EKS-Tutorial.png[alt="Amazon EFS", align="left",width=420]
77

88
This tutorial covers how to use *Amazon FSx for Lustre persistent deployment option*, a high-performance, highly available, scalable file storage for *machine learning* workloads on *Kubernetes containers*
99

10-
Organizations are modernizing their applications by adopting containers and microservices-based architectures. Because containers are transient in nature, long-running applications can benefit from keeping state in durable storage. *Amazon FSx for Lustre* is a simple, scalable, fully managed, high-performance file system enabling customers to build modern applications, persist and share data from their Kubernetes containers. Customers seek out Lustre for workloads where speed matters, such as machine learning, high performance computing (HPC), video processing, and financial modeling. By eliminating the traditional complexity of setting up and managing Lustre file systems, Amazon FSx allows you to spin up a high-performance file system in minutes, providing sub-millisecond latencies, up to hundreds of gigabytes per second of throughput and millions of IOPS.
10+
Organizations are modernizing their applications by adopting containers and microservices-based architectures. Because containers are transient in nature, long-running applications benefit from keeping state in durable storage. *Amazon FSx for Lustre* is a simple, scalable, fully managed, high-performance file system enabling customers to build modern applications, persist and share data from their Kubernetes containers. Customers use FSx for workloads where speed matters, such as machine learning, high performance computing (HPC), video processing, and financial modeling. By eliminating the traditional complexity of setting up and managing Lustre file systems, Amazon FSx allows you to spin up a high-performance file system in minutes, providing sub-millisecond latencies, up to hundreds of gigabytes per second of throughput and millions of IOPS.
1111

12-
*Kubernetes* is an open-source container-orchestration system for automating the deployment, scaling, and management of containerized applications. *Amazon Elastic Kubernetes Service (Amazon EKS)* is a managed service that makes it easy for you to run Kubernetes on AWS without needing to stand up or maintain your own Kubernetes cluster.
12+
*Kubernetes* is an open-source container-orchestration system for automating the deployment, scaling, and management of containerized applications. AWS makes it easy to run Kubernetes without needing to install and operate your own Kubernetes control plane or worker nodes using our managed service *Amazon Elastic Kubernetes Service (Amazon EKS)*.
1313

14-
First, let’s review some basic components of Kubernetes containers and why they need shared persistent storage in bit more detail. A *Pod* is the basic execution unit of a Kubernetes application and comprises of one or more containers with shared storage/network, and a specification for how to run containers. A Pod always runs on a *Node* and each Node is managed by a Kubernetes Master. A Node is a worker machine in Kubernetes and may be either a virtual or a physical machine. A Node can have multiple pods, and the Kubernetes master automatically handles scheduling the pods across the Nodes in the cluster.
14+
First, let’s review some basic components of Kubernetes containers and why we need shared persistent storage. A *Pod* is the basic execution unit of a Kubernetes application and comprises of one or more containers with shared storage/network, and a specification for how to run containers. A Pod always runs on a Node and each Node is managed by a Kubernetes Master. A *Node* is a worker machine in Kubernetes and may be either a virtual or a physical machine. A Node can have multiple pods, and the Kubernetes master automatically handles scheduling the pods across the Nodes in the cluster.
1515

16-
A Pod can use two type of volumes to store data: Regular and Persistent volumes. Regular volumes on Kubernetes clusters are deleted when the Pod hosting them shuts down. As a result, regular volumes are useful for storing temporary data that does not need to exist outside of the pod’s lifetime. A persistent volume is a cluster-wide resource that you can use to store data beyond the lifetime of a pod. A *persistent volume* is hosted in its own Pod and can remain alive for as long as necessary for ongoing operations. A Pod can specify a set of shared storage Volumes. All containers in the Pod can access the shared volumes, allowing those containers to share data. Amazon offers customers a choice of Amazon Elastic Block Store (Amazon EBS), Amazon Elastic File System (Amazon EFS) and Amazon FSx for Lustre CSI drivers to provision volumes.
16+
A Pod can use two types of volumes to store data: Regular and Persistent volumes. Regular volumes on Kubernetes clusters are deleted when the Pod hosting them shuts down. As a result, regular volumes are useful for storing temporary data that does not need to exist outside of the pod’s lifetime. A *persistent volume* is a cluster-wide resource that you can use to store data beyond the lifetime of a pod. A persistent volume is hosted in its own Pod and can remain alive for as long as necessary for ongoing operations. A Pod can specify a set of shared storage Volumes. All containers in the Pod can access the shared volumes, allowing those containers to share data. Amazon offers customers a choice of *Amazon Elastic Block Store (Amazon EBS)*, *Amazon Elastic File System (Amazon EFS)* and *Amazon FSx for Lustre* *CSI drivers* to provision persistent volumes.
1717

18-
In this tutorial, I will focus on FSx for Lustre and cover how to provision Amazon FSx for Lustre persistent file system with Amazon EKS cluster, and accelerate your machine learning training using Amazon FSx and *Amazon SageMaker*. High performance workloads running on EKS clusters that require fast, highly available persistent storage can benefit from using Amazon FSx for Lustre.
18+
In this tutorial, I will focus on FSx for Lustre and cover how to provision *Amazon FSx for Lustre persistent file system* with Amazon EKS cluster, and accelerate your machine learning training using Amazon FSx and *Amazon SageMaker*. High performance workloads running on EKS clusters that require fast, highly available persistent storage can benefit from using Amazon FSx for Lustre.
1919

20-
*Amazon SageMaker* is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
20+
Earlier this year, we announced availability of persistent storage file system deployment option with Amazon FSx for Lustre. The persistent file system option provides highly available and durable storage for workloads that run for extended periods, or indefinitely, and are sensitive to disruptions.
2121

22-
We recently announced availability of persistent storage file system deployment option with Amazon FSx for Lustre. The persistent file system option provides highly available and durable storage for workloads that run for extended periods, or indefinitely, and are sensitive to disruptions.
23-
24-
If a file server becomes unavailable on a persistent file system, it is replaced automatically within minutes of failure. During that time, client requests for data on that server transparently retry and eventually succeed after the file server is replaced. Data on persistent file systems is replicated on disks and any failed disks are automatically replaced, transparently.
22+
Amazon FSx for Lustre stores data across multiple network file servers to maximize performance and reduce bottlenecks. These file servers have multiple disks. If a file server becomes unavailable on a persistent file system, it is replaced automatically within minutes of failure. During that time, client requests for data on that server transparently retry and eventually succeed after the file server is replaced. Data on persistent file systems is replicated on disks and any failed disks are automatically replaced, transparently.
2523

2624
We recommend using Amazon FSx persistent file system option to provision persistent storage for your Kubernetes clusters. The *Amazon FSx for Lustre Container Storage Interface (CSI) Driver* provides a CSI interface that allows Amazon EKS clusters to manage the lifecycle of Amazon FSx for Lustre file systems.
2725

26+
Next, let’s review how you can run machine learning workloads using Amazon SageMaker Operators for Kubernetes. Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
27+
28+
Amazon SageMaker Operators for Kubernetes makes it easier for developers and data scientists using Kubernetes to train, tune, and deploy machine learning (ML) models in Amazon SageMaker. You can install these SageMaker Operators on your Kubernetes cluster in Amazon Elastic Kubernetes Service (EKS) to create SageMaker jobs natively using the Kubernetes API and command-line Kubernetes tools such as ‘kubectl’.
29+
2830

2931
This is a tutorial designed for architects and engineers who would like to learn how to use *Amazon FSx for Lustre* a high-performance persistent storage with your *kubernetes* workloads.
3032

0 commit comments

Comments
 (0)