Running an R Code on AWS Batch on Production Environment

Image for post
Image for post

In this blog, We will see how to Run a Job on AWS batch with the help of container, S3, ec2, and environment variables to parameterize the Job.

I am using R(language) as a base container but possibilities are limitless.

First, let us understand the benefit and limitations of AWS Batch.

AWS Batch — This is a service provided by AWS with an essential task of running a Code on an EC2 machine with the capability of elasticity of memory, and storage without worrying about the configuration of the machine.

Limitation- It’s easy to run a job but when there is code change (even a dot) and the Dev-ops pipeline is not in place then we need to repeat the below process from step 2 to step 9. Although when we have a Dev-ops pipeline(Jenkins) then it will be a few seconds job task job.

Process

  1. Create a docker file with the required configuration.
  2. Create a docker image with this docker file.
  3. Push this docker image to AWS ECR.
  4. Create a Launch template in EC2.
  5. Create an EC2 machine with this launch Template.
  6. Create a Job Queue.
  7. Create a batch Job with help of ECR.
  8. Run Batch Job.
  9. Sit back, Monitor, Relax, and hope the job will succeed.

Now let’s deep dive in-process and start our task.

  1. Create a docker file for the docker image.

The docker file will contain all the libraries, software needed for a program to run. For example, you need sugar, tea-leaves, water, and other powders for making tea.

Docker file

I am using r-base as a base for my docker file. and packrat as a dependencies management tool. let me give an overview of the packrat.

PACKRAT- This a library for dependency management in R(Just like maven, sbt, and poetry).

To create a sample packrat file you need to install all libraries in R, install packrat using install.packages(“packrat”).

then run the below commands.

You can find out the packrat.lock file in the packrat folder at the path you have specified in init. A sample packrat file is present in my git (packrat.lock).

Now we will create a container using this docker file.

2. Create a docker image with this docker file.

Go to the folder where you have created the docker file. The directory structure will look like this. We have created a packrat file in the same directory.

Image for post
Image for post

3.Push this docker image to AWS ECR.

Before pushing this container we need to create a repo in AWS ECR(Elastic Container Repository).

For this go to the AWS EMR console and click Create Repository. Give Name to your repo(For example R-code-batch). Turn on image scan on push feature (It will scan if the image has a vulnerability or not). Turn on KMS. These two features are optional.

Now we will push this docker image to ECR using the following commands(should be triggered by AWS CLI).

Prerequisites- you have setup AWS CLI on your server/system. (For setting up you can click here, I have created a blog on that).

If all commands are successful then you will be able to see images in your AWS ECR repo.

4.Create a Launch template in EC2.

It can be created using the Json file or AWS GUI. In our example, we will create using GUI.

a. Give Launch Template name. Now the below steps are optional.

b. Give an AMI. (machine operating system.)

c.Instance type.(t2,m4,r4 etc)

d. Key pair, if you want to login into the machine using shell/Terminal.

e. VPC (if enabled in your organization).

f. storage volume. (EBS). This property is used when you want to attach block storage to your ec2 instance.

g. IAM instance profile (on which role you are going to start an EC2 machine).

h.User data. This is an important thing if you want to use EFS for your job.

make sure you have replaced efs id inline 11. otherwise, your code will fail to perform efs. If you are going to work in EBS then don’t worry about efs.

5.Create an EC2 machine with this launch Template on the batch.

Now go to AWS Batch console, go to Compute environments, and create compute environment.

a. Set Compute environment name.

b.Set minimum CPUs (for optimal cost set it to 0).

c.Set maximum CPUs.

d.Desired vCPUs.(for optimal cost set it to 0).

e. Set instance type as your need(t2,m4,r4 etc).

f. [Important] In the additional setting, set the launch template we have created above.

6.Create a Job Queue.

Jobs are submitted to a job queue. We will create a Job queue using the below process. (I have set properties which are compulsory)

a . Give a name to your Job queue.

b. Set Compute environment we have created above.

c. Set priority for your job queue (1–1000 Job queues with a higher integer value for priority are given preference for compute environments).

7.Create a batch Job definition with help of ECR.

AWS Batch job definitions specify how jobs are to be run. While each job must reference a job definition, many of the parameters that are specified in the job definition can be overridden at runtime.

a. Give a name to your Job definition.

b.Give the Image name we have pushed to AWS ECR using the below reference URL

c. You can edit CPUs, memory, job role, Volumes, Mount points, and add Environment variables.

d. Other properties are optional. So I have not covered them

8. Run Batch Job.

Now we are all set to run a job on the batch. We will configure the last part of the puzzle.

a. Give a name to your job.

b. Select a job definition we have created above.

c. Select a job queue we have created above.

d. Execution timeout(If your code stuck in an infinite loop or some permission issue, your job should be automatically killed).

e. You can again set vCPUs, memory here for your job.

f. Now click on submit a job. Hurrey!!! Job is submitted.

Sit back, Monitor, Relax, and hope the job will succeed.

Job is submitted to AWS batch. Now we need to monitor and see logs.

Logs of the job can be seen in two places.

  1. Cloud watch(link will be provided in the job status).
  2. ECS.

Job States

There are below job status, with a specific meaning.

i. Submitted - The Job is submitted to AWS infra. AWS will check if the job can be run or not.

ii. Pending- The response of AWS infra is not received yet.

iii. Runnable- Now batch will try to create an EC2 machine we have configured above. It will take some time. If it takes huge time there can be three possible reasons(click here)

1.Insufficient resources: Your job specifies more CPU or memory resources than the compute environment can allocate.

2.No assigned container instance: Instances can’t be created, and networking or security issues can prevent the container instance from joining the underlying Amazon Elastic Container Service (Amazon ECS) cluster.

3.Host-level problems: There could be problems inside the container instance at the level of the host or Docker daemon. For example, the volumes of the instance could be full, or the Docker daemon or Amazon ECS container agent can have stop or start issues.

iv. Starting- Now the EC2 machine is created, the container is deployed into EC2, Now running the container.

v. Running- Your code is running on an EC2 machine.

vi. Succeeded- As the name suggests your code is completed with success status.

vii. Failed- As the name suggests your code is completed with fail status.

Conclusion

We have seen how to create a job definition and submit a job on AWS batch. Why we want a batch- In the field of big data/data engineering once a code is developed then only data is changed and very few code changes will be made once the code is final just create a job and enjoy life. all environments will be handled by AWS.

I have tried to keep this blog short(Although most of the information is present). If you need any help or any confusion is there, comment below. I will be happy to help.

Written by

Senior data Engineer at lumiq.ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store