About The Course
The Hadoop Cluster Administration training course is designed to provide knowledge and skills to become a successful Hadoop Architect. It starts with the fundamental concepts of Apache Hadoop and Hadoop Cluster. It covers topics to deploy, configure, manage, monitor, and secure a Hadoop Cluster. The course will also cover HBase Administration. There will be many challenging, practical and focused hands-on exercises for the learners. By the end of this Hadoop Cluster Administration training, you will be prepared to understand and solve real world problems that you may come across while working on Hadoop Cluster.
After the completion of 'Hadoop Administration' course at Edureka, you should be able to:
1. Get a clear understanding of Apache Hadoop, HDFS, Hadoop Cluster and Hadoop Administration
2. Gain insight on Hadoop 2.0, Name Node High Availability, HDFS Federation, YARN, MapReduce v2
3. Plan and Deploy a Hadoop Cluster
4. Load Data and Run Applications
5. Configuration and Performance Tuning
6. Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster
7. Secure a deployment and understand Backup and Recovery
8. Understand about Oozie, Hcatalog/Hive, and HBase Administration
Who should go for this course?
This course is best suited to systems administrators, windows administrators, linux administrators, Infrastructure engineers, DB Administrators, Big Data Architects, Mainframe Professionals and IT managers who are interested in learning Hadoop Administration.
Why Learn Hadoop Administration?
With the advent of Hadoop, there comes the need for professionals skilled in Hadoop Administration making it imperative to be skilled as a Hadoop Admin for better career, salary and job opportunities.
How will I execute the Practicals?
For your practical work, we will help you set up a Virtual Machine in your System. This will be a local access for you. You can also create an account on AWS EC2 and use 'Free tier usage' eligible servers to create your Hadoop Cluster on AWS EC2. Step by step procedure is documented and shared in LMS. Our 24/7 expert support team will also be available to assist you.
Which Case-Studies will be a part of the Course?
Towards the end of the Course, you will be working on a live project, which will use the different Hadoop ecosystem components to work together in a Hadoop implementation to solve Big Data Problems.
1. Setup a minimum 2 Node Hadoop Cluster
Node 1 - Namenode, datanode, tasktracker
Node 2 - Jobtracker, datanode, tasktracker
2. Create a simple text file and copy to HDFS
Find out the location of the node to which it went
Find in which data node the output files are written
3. Create a large text file and copy to HDFS with block size 256 MB Keep all the other files in default block size and find how block size has an impact on the performance
4. Set a spaceQuota of 200MB for projects and copy a file of 70MB with replication=2
What is the reason it is not letting you copy the file?
How will you solve this problem without increasing the spaceQuota?
5. Configure Rack Awareness and copy the file to HDFS
Find its rack distribution and the command used for it
How to change the replication factor of the existing file
The final certification project is based on real world use cases as follows:
Problem Statement 1:
1. Setup a Hadoop with single node or 2 node cluster with all daemons like namenode, datanode, jobtracker, tasktracker that must run in the cluster with block size = 128MB
2. Write a Namespace ID for the cluster and create a directory with name space quota as 10 & Space Quota of 100MB in the directory
3. Use distcp command to copy the projects to the same cluster and create the list of data nodes participating in the cluster
Problem statement 2:
1. Save the namespace of the Namenode, without using secondary namenode and edits file must merge, without stopping the namenode daemon.
2. Set include file, so that no other nodes can talk to the namenode
3. Set cluster Re-balancer threshold to 40%.
4. Set the map and reduce slots to s4 and 2 respectively for each node