Image for post
Image for post

In this blog, We will see how to Run a Job on AWS batch with the help of container, S3, ec2, and environment variables to parameterize the Job.

I am using R(language) as a base container but possibilities are limitless.

First, let us understand the benefit and limitations of AWS Batch.

AWS Batch — This is a service provided by AWS with an essential task of running a Code on an EC2 machine with the capability of elasticity of memory, and storage without worrying about the configuration of the machine.

Limitation- It’s easy to run a job but when there is code change (even a dot) and the Dev-ops pipeline is not in place then we need to repeat the below process from step 2 to step 9. Although when we have a Dev-ops pipeline(Jenkins) then it will be a few seconds job task job. …


Hello there, Hope you are doing great and safe in this COVID situation. In this blog, I will share my story about fitness and how I became so fit to fat and Fat to fit and lost many KGs in the short span of 3 months. I will share my daily routine and diet I have taken at the time of transformation.

My Background

Presently I am a Senior Data Engineer at Lumiq.ai and Creating data pipelines.

I have completed my Engineering in May 2017 and I was in a few engineers who got an opportunity to Work in Accenture just after I completed engineering. …


Image for post
Image for post

Introduction

When it comes to big data and modern warehousing technology you must have heard about Apache hive.

Official Definition- The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. The structure can be projected onto data already in storage. A command-line tool and JDBC driver are provided to connect users to Hive.

Hive is created at Facebook but later Facebook donated hive to the Apache community.

Hive provides SQL like language called HiveQL with schema on read and transparently converts queries to MapReduce, Apache Tez, and Spark jobs.

One thing which makes the hive different from other databases/warehouses is it can digest all types of data format (Structured and semi Structured) and it uses Tez/map reduces in the background which reduces time to be executed by Hive Query. …


In this blog, I will try to explain one of Nifi Funcationlity (Rest API) which is used for purposes like stopping a processor, starting a processor changing state of the processor, service, processor group, input ports, etc.

Definition according to Documentation

The Rest API provides programmatic access to command and controls a NiFi instance in real-time. Start and stop processors, monitor queues, query provenance data, and more.

Prerequisite -Nifi is installed. In my case, I have installed nifi on port 8081 but by default, it will be installed on port 8080.

Scenario 1- We need to start a processor with API

In the First basic flow, we have generateFlowFile processor and we need to start this processor. …


Image for post
Image for post

In this article, we will go through boto3 documentation and listing files from AWS S3. Personally, when I was going through the documentation, I didn’t found a direct solution to this functionality. In this tutorial, we will get to know how to install boto3 and AWS, setup for AWS, creating buckets, and then listing all the files in a bucket.

Boto3

As per the documentation, Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. …

About

Shubham Kanungo

Senior data Engineer at lumiq.ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store