davidjegan/AWS-EMR-Node-Calculator: AWS-EMR-Node-Calculator

You are viewing the article davidjegan/AWS-EMR-Node-Calculator: AWS-EMR-Node-Calculator at Tnhelearning.edu.vn you can quickly access the necessary information in the table of contents of the article below.

Electromagnetic Wave Calculations

AWS Elastic Map Reduce(EMR) Node Calculator – a Serverless way

Table of Contents

Context

In order to ensure parallelism, perfect number of nodes should be chosen in EMR Clusters. This involves a complex look up and referencing. Using this tool, that arduous process is simplified. This tool, returns the exact nodes required for your application to run seemlessly.

Cluster Node Calculation Formulae

Read the default Mapred-site.xml
Get mapreduce.map.memory.mb and yarn.scheduler.maximum-allocation-mb values
Number of mappers = maximum allocation memory/mapreduce.map.memory

i.e., Total Mappers Required = Total Size of Input / Input Split Size

Numerator = Total Mappers * Time to process Sample files Denominator = Instance Mapper Capacity * Desired Processing Time

Estimated number of nodes = Numerator / Denominator

Pre-Requisite

Get a test Work Load
Number of Sample files should match the number of mappers
RUN an EMR cluster with single core and process the sample file.
The time taken to process is the Processing time

Industrial Compliance & Safety

Services and components

DynamoDB : NoSQL database offering of AWS
Lambda : A compute solution which can run without deploying servers
API Gateway: An Apification service of AWS to invoke the Lambda method
Front-end components: HTML, CSS, JS, Jquery and AJAX

Process Flow

Get the details of all instances in AWS Compute and store it in a DB
Create a Lambda function that refers this DB and returns the contents
Create an API endpoint to invoke this lambda method
Embed this API in the Front-end code
Parse the response and render the contents of the webpage dynamically
(Optional) Lambda function can be created to listen to AWS SNS notification of service change, to update the DynamoDB contents on the fly

Set-up

DynamoDB => Contains the data of instances
- Load the following contents into the DynamoDB using the following script
Lambda => To retrieve DB contents
- Create a lambda function in the AWS console
API Gateway
- Go to the API Gateway
- provide a name
- description
- endpoint type.
- Create a GETmethod
- Choose Lambda Functionas theIntegration type
- Turn on the Use Lambda Proxy Integration
- Provide the region and lambda name created in the previous step
- Click OKwhen the popup asks you to provide access to Lambda function.
- Reference Image:
- Click on ActionsandDeploy API
- Provide a stage name and description
- Deploythe API
- Note the Invoke URL, this will be used in the next step.
Front-End updation
- Embed this endpoint in the code at js file
- Run the html file. Provide the inputs and find the number of nodes at ease!

What Is An EMR? Everything You Need To Know

PS

The number of mappers depends on the number of Hadoop splits
If your files are smaller than HDFS or Amazon S3 split size, the number of mappers is equal to the number of files
If some or all of your files are larger than HDFS or Amazon S3 split size (fs.s3.block.size) the number of mappers is equal to the sum of each file divided by the HDFS/Amazon S3 block size.

Thank you for reading this post davidjegan/AWS-EMR-Node-Calculator: AWS-EMR-Node-Calculator at Tnhelearning.edu.vn You can comment, see more related articles below and hope to help you with interesting information.