Skip to content

Commit e7477fd

Browse files
committed
Uploaded FSx+EKS tutorial
1 parent 0824714 commit e7477fd

File tree

18 files changed

+1347
-0
lines changed

18 files changed

+1347
-0
lines changed

lustre/.DS_Store

4 KB
Binary file not shown.
20 KB
Binary file not shown.
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
= Prerequisites
2+
:toc:
3+
:icons:
4+
:linkattrs:
5+
:imagesdir: ../resources/images
6+
7+
8+
== Summary
9+
10+
This section will cover prerequisites for completing this tutorial. *you will need a machine you can use to control the Kubernetes cluster (for example, an EC2 instance or AWS Cloud 9 ).*
11+
12+
If you do not have a EC2 instance available, *Deploy* an EC2 instance first.
13+
14+
Open the link:https://console.aws.amazon.com/ec2/[Amazon EC2] console.
15+
16+
TIP: *_Context-click (right-click)_* the link above and open the link in a new tab or window to make it easy to navigate between this github tutorial and the Amazon EC2 console.
17+
18+
19+
== Duration
20+
21+
NOTE: It will take approximately 30 minutes to complete this section.
22+
23+
24+
== Step-by-step Guide
25+
26+
IMPORTANT: Read through all steps below before continuing.
27+
28+
29+
. Complete the following prerequisites:
30+
31+
.. If you are not using an AMI for your EC2 instance that has Python3, boto3, numpy, and argparse installed, then install them as shown below:
32+
+
33+
[source,bash]
34+
----
35+
sudo yum install -y python3 git
36+
export PATH=~/.local/bin:$PATH
37+
sudo pip3 install boto3 numpy argparse
38+
39+
----
40+
+
41+
.. Please make sure you have configured your AWS CLI credentials using command below.
42+
+
43+
[source,bash]
44+
----
45+
aws configure
46+
----
47+
+
48+
.. Please make sure you have version 1.18.17 or later of the AWS CLI installed. You can check your currently installed version with the aws --version command. To install or upgrade the AWS CLI, see link:https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html[Installing the AWS CLI].
49+
+
50+
[source,bash]
51+
----
52+
aws --version
53+
sudo pip3 install awscli --upgrade
54+
----
55+
+
56+
.. Please make sure you have version *0.16.0* or later of *eksctl* installed. You can check your currently installed version with the *eksctl version* command. To install or upgrade *eksctl*, see link:https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html#installing-eksctl[Installing or Upgrading eksctl]. You can see example commands below for Linux.
57+
+
58+
[source,bash]
59+
----
60+
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
61+
sudo mv /tmp/eksctl /usr/local/bin
62+
eksctl version
63+
----
64+
+
65+
.. The latest version of kubectl installed that aligns with your cluster version. You can check your currently installed version with the *kubectl version –short --client* command. For more information, see link:https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html[Installing kubectl].You can see example commands below for Linux.
66+
+
67+
[source,bash]
68+
----
69+
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/linux/amd64/kubectl
70+
chmod +x ./kubectl
71+
mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin
72+
echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
73+
kubectl version --short --client
74+
----
75+
+
76+
.. AWS IAM Authenticator for Kubernetes – For more information, see link:https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html[Installing aws-iam-authenticator].You can see example commands below for Linux.
77+
+
78+
[source,bash]
79+
----
80+
curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/linux/amd64/aws-iam-authenticator
81+
openssl sha1 -sha256 aws-iam-authenticator
82+
chmod +x ./aws-iam-authenticator
83+
mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
84+
echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
85+
aws-iam-authenticator help
86+
----
87+
+
88+
.. An existing Amazon EKS cluster. If you don’t currently have a cluster, see link:https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html[Getting Started with Amazon EKS] to create one. You can use example shown below to create a new eks cluster:
89+
+
90+
[source,bash]
91+
----
92+
eksctl create cluster --name <Cluster Name> --region <AWS region> --zones <Availability zones> --nodegroup-name <Nodegroup Name> --node-type <EC2 instance type> --nodes 1 --nodes-min 1
93+
----
94+
+
95+
===============================
96+
*Example*:
97+
eksctl create cluster --name FSxL-Persistent-Cluster --region us-east-1 --zones us-east-1a,us-east-1b --nodegroup-name FSxL-Persistent-Cluster-workers --node-type c5.large --nodes 1 --nodes-min 1
98+
===============================
99+
+
100+
101+
102+
103+
104+
== Next section
105+
106+
Click the button below to go to the next section.
107+
108+
image::02-create-SageMaker-Operator-for-Kubernetes.png[link=../02-create-SageMaker-Operator-for-Kubernetes/, align="left",width=420]
109+
110+
111+
112+
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
= Configure Amazon SageMaker operator for Kubernetes
2+
:toc:
3+
:icons:
4+
:linkattrs:
5+
:imagesdir: ../resources/images
6+
7+
8+
== Summary
9+
10+
This section will cover configuring *Amazon SageMaker Operator for Kubernetes*. First, we will setup IAM roles and permissions for SageMaker operator and then setup the operator on the kubernetes cluster.
11+
12+
13+
== Duration
14+
15+
NOTE: It will take approximately 10 minutes to complete this section.
16+
17+
18+
== Step-by-step Guide
19+
20+
IMPORTANT: Read through all steps below before continuing.
21+
22+
For the operator to access your SageMaker resources, you first need to configure a Kubernetes service account with an OIDC authenticated role that has the proper permissions. For more information, see link:https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html[Enabling IAM Roles for Service Accounts on your Cluster].
23+
24+
=== Setup IAM roles and permissions for SageMaker Operator:
25+
26+
1. Associate an IAM OpenID Connect (OIDC) provider with your EKS cluster for authentication with AWS resources using commands shown below:
27+
+
28+
[source,bash,subs="verbatim,quotes"]
29+
----
30+
# Set the AWS region and EKS cluster name
31+
export CLUSTER_NAME=<EKS CLUSTER NAME>
32+
export AWS_REGION=<AWS Region>
33+
----
34+
+
35+
===============================
36+
*Example*:
37+
38+
export CLUSTER_NAME="FSxL-Persistent-Cluster"
39+
40+
export AWS_REGION="us-east-1"
41+
===============================
42+
+
43+
Next, Run the command shown below:
44+
+
45+
[source,bash,subs="verbatim,quotes"]
46+
----
47+
eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} --region ${AWS_REGION} --approve
48+
49+
----
50+
+
51+
52+
You should see output as shown below:
53+
+
54+
[source,bash,subs="verbatim,quotes"]
55+
----
56+
[i] eksctl version 0.16.0
57+
[i] using region us-east-1
58+
[i] will create IAM Open ID Connect provider for cluster "FSxL-Persistent-Cluster" in "us-east-1"
59+
[✔] created IAM Open ID Connect provider for cluster "FSxL-Persistent-Cluster" in "us-east-1"
60+
61+
----
62+
+
63+
64+
Now that your Kubernetes cluster in EKS has an OIDC identity provider, you can create a role and give it permissions.
65+
66+
2. Obtain the OIDC issuer URL using command below:
67+
68+
+
69+
[source,bash,subs="verbatim,quotes"]
70+
----
71+
aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query cluster.identity.oidc.issuer --output text
72+
----
73+
+
74+
75+
The output from above command returns a URL as shown below. If the output is None, make sure your AWS CLI has a version listed in Prerequisites.
76+
+
77+
[source,bash,subs="verbatim,quotes"]
78+
----
79+
https://oidc.eks.${AWS_REGION}.amazonaws.com/id/{Your OIDC ID}
80+
----
81+
+
82+
83+
===============================
84+
*Example*: https://oidc.eks.us-east-1.amazonaws.com/id/*3A12CB18E75BD1133D141F6C7B5FB266*
85+
===============================
86+
87+
3. Next use the OIDC ID returned by previous command to create your role. Create a new file named *trust.json* using the command shown below. Replace the *AWS account ID* with your own ID, *EKS cluster region* and *OIDC ID*:
88+
+
89+
[source,json]
90+
----
91+
{
92+
"Version": "2012-10-17",
93+
"Statement": [{
94+
"Effect": "Allow",
95+
"Principal": {
96+
"Federated": "arn:aws:iam::<AWS account ID>:oidc-provider/oidc.eks.<EKS cluster region>.amazonaws.com/id/<OIDC ID>"
97+
},
98+
"Action": "sts:AssumeRoleWithWebIdentity",
99+
"Condition": {
100+
"StringEquals": {
101+
"oidc.eks.<EKS cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
102+
"oidc.eks.<EKS cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default"
103+
}
104+
}
105+
}]
106+
}
107+
----
108+
+
109+
110+
===============================
111+
*Example*:
112+
[source,json]
113+
----
114+
{
115+
"Version": "2012-10-17",
116+
"Statement": [{
117+
"Effect": "Allow",
118+
"Principal": {
119+
"Federated": "arn:aws:iam::012345678910:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/3A12CB18E75BD1133D141F6C7B5FB266"
120+
},
121+
"Action": "sts:AssumeRoleWithWebIdentity",
122+
"Condition": {
123+
"StringEquals": {
124+
"oidc.eks.us-east-1.amazonaws.com/id/3A12CB18E75BD1133D141F6C7B5FB266:aud": "sts.amazonaws.com",
125+
"oidc.eks.us-east-1.amazonaws.com/id/3A12CB18E75BD1133D141F6C7B5FB266:sub": "system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default"
126+
}
127+
}
128+
}]
129+
}
130+
----
131+
===============================
132+
133+
134+
4. Create a new IAM role that can be assumed by the cluster service accounts. The output from the command will contain the role ARN:
135+
+
136+
[source,bash,subs="verbatim,quotes"]
137+
----
138+
aws iam create-role --role-name <rolename> --assume-role-policy-document file://trust.json --output=text
139+
----
140+
+
141+
===============================
142+
*Example*:
143+
144+
aws iam create-role --role-name *eks-fsx-cluster-role* --assume-role-policy-document file://trust.json --output=text
145+
146+
ROLE *arn:aws:iam::012345678910:role/eks-fsx-cluster-role* 2020-04-07T21:11:34Z / AROAQMDZVU6IANBHRF4QQ eks-fsx-cluster-role
147+
ASSUMEROLEPOLICYDOCUMENT 2012-10-17
148+
STATEMENT sts:AssumeRoleWithWebIdentity Allow
149+
STRINGEQUALS sts.amazonaws.com system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default
150+
PRINCIPAL arn:aws:iam::012345678910:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/3A12CB18E75BD1133D141F6C7B5FB266
151+
===============================
152+
153+
154+
5. Give this new role access to Amazon SageMaker and attach the AmazonSageMakerFullAccess using code below:
155+
+
156+
[source,bash,subs="verbatim,quotes"]
157+
----
158+
aws iam attach-role-policy --role-name <rolename> --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
159+
----
160+
+
161+
===============================
162+
*Example*:
163+
164+
aws iam attach-role-policy --role-name eks-fsx-cluster-role --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
165+
===============================
166+
167+
168+
169+
=== Setup the operator on the Kubernetes cluster:
170+
171+
6. Install Amazon SageMaker Operators for Kubernetes from the link:https://github.com/aws/amazon-sagemaker-operator-for-k8s[GitHub repo] by downloading a YAML configuration file that configures your Kubernetes cluster with the custom resource definitions and operator controller service. See command below:
172+
+
173+
[source,bash,subs="verbatim,quotes"]
174+
----
175+
wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/installer.yaml
176+
----
177+
+
178+
179+
7. In the *installer.yaml* file, update the *eks.amazonaws.com/role-arn* with the ARN from your OIDC-based role from the previous step. This will be *eks-fsx-cluster-role* we created earlier. You can see the example below:
180+
+
181+
[source,bash,subs="verbatim,quotes"]
182+
----
183+
metadata:
184+
annotations:
185+
eks.amazonaws.com/role-arn: <arn of OIDC-based role>
186+
name: sagemaker-k8s-operator-default
187+
namespace: sagemaker-k8s-operator-system
188+
----
189+
+
190+
191+
===============================
192+
*Example*:
193+
[source,bash]
194+
----
195+
metadata:
196+
annotations:
197+
eks.amazonaws.com/role-arn: arn:aws:iam::012345678910:role/eks-fsx-cluster-role
198+
name: sagemaker-k8s-operator-default
199+
namespace: sagemaker-k8s-operator-system
200+
----
201+
===============================
202+
203+
204+
8. On your Kubernetes cluster, install the Amazon SageMaker CRDs and set up your operators as shown below:
205+
+
206+
[source,bash,subs="verbatim,quotes"]
207+
----
208+
kubectl apply -f installer.yaml
209+
----
210+
+
211+
212+
You will see output as shown below:
213+
+
214+
[source,bash,subs="verbatim,quotes"]
215+
----
216+
namespace/sagemaker-k8s-operator-system created
217+
customresourcedefinition.apiextensions.k8s.io/batchtransformjobs.sagemaker.aws.amazon.com created
218+
customresourcedefinition.apiextensions.k8s.io/endpointconfigs.sagemaker.aws.amazon.com created
219+
customresourcedefinition.apiextensions.k8s.io/hostingdeployments.sagemaker.aws.amazon.com created
220+
customresourcedefinition.apiextensions.k8s.io/hyperparametertuningjobs.sagemaker.aws.amazon.com created
221+
customresourcedefinition.apiextensions.k8s.io/models.sagemaker.aws.amazon.com created
222+
customresourcedefinition.apiextensions.k8s.io/trainingjobs.sagemaker.aws.amazon.com created
223+
serviceaccount/sagemaker-k8s-operator-default created
224+
role.rbac.authorization.k8s.io/sagemaker-k8s-operator-leader-election-role created
225+
clusterrole.rbac.authorization.k8s.io/sagemaker-k8s-operator-manager-role created
226+
clusterrole.rbac.authorization.k8s.io/sagemaker-k8s-operator-proxy-role created
227+
rolebinding.rbac.authorization.k8s.io/sagemaker-k8s-operator-leader-election-rolebinding created
228+
clusterrolebinding.rbac.authorization.k8s.io/sagemaker-k8s-operator-manager-rolebinding created
229+
clusterrolebinding.rbac.authorization.k8s.io/sagemaker-k8s-operator-proxy-rolebinding created
230+
service/sagemaker-k8s-operator-controller-manager-metrics-service created
231+
deployment.apps/sagemaker-k8s-operator-controller-manager created
232+
----
233+
+
234+
235+
9. Verify that Amazon SageMaker operators are available in your Kubernetes cluster using command below:
236+
+
237+
[source,bash,subs="verbatim,quotes"]
238+
----
239+
kubectl get crd | grep sagemaker
240+
----
241+
+
242+
243+
You will see output as shown below:
244+
+
245+
[source,bash,subs="verbatim,quotes"]
246+
----
247+
batchtransformjobs.sagemaker.aws.amazon.com 2020-04-07T21:17:59Z
248+
endpointconfigs.sagemaker.aws.amazon.com 2020-04-07T21:17:59Z
249+
hostingdeployments.sagemaker.aws.amazon.com 2020-04-07T21:17:59Z
250+
hyperparametertuningjobs.sagemaker.aws.amazon.com 2020-04-07T21:17:59Z
251+
models.sagemaker.aws.amazon.com 2020-04-07T21:17:59Z
252+
trainingjobs.sagemaker.aws.amazon.com 2020-04-07T21:17:59Z
253+
----
254+
+
255+
256+
With these operators, all Amazon SageMaker’s managed and secured ML infrastructure and software optimization at scale is now available as a custom resource in your Kubernetes cluster.
257+
258+
== Next section
259+
260+
Click the button below to go to the next section.
261+
262+
image::03-Prepare-SageMaker-Training.png[link=../03-Prepare-SageMaker-Training/, align="left",width=420]
263+
264+
265+
266+

0 commit comments

Comments
 (0)