Apache Hadoop logo Apache Hadoop logo background glow

Apache Hadoop

A software framework for distributed storage that facilitates using a network of many computers to solve problems involving massive amounts of data and computation using MapReduce programming model

&

+Distributed ComputingHadoop enables the processing of large data sets across clusters of computers.
+ScalabilityIt can scale from single servers to thousands of machines.
+StorageOffers local computation and storage capabilities.
+Programming ModelUtilizes simple programming models for distributed data processing.
+Fault ToleranceDesigned to handle failures at the application layer.
+HDFSHigh-throughput access to application data via the Hadoop Distributed File System.
+YARNShort for “Yet Another Resource Navigator”; Manages resources and job scheduling across the cluster.
+MapReduceA system for parallel processing of large data sets within the YARN framework.
-Complex SetupIntricate setup process, particularly challenging for beginners.
-Batch ProcessingSlower batch-processing model compared to alternatives like Apache Spark.
-No Real-Time ProcessingAbsence of support for real-time data processing
-High LatencyElevated latency stemming from batch processing
-Single Point of FailureVulnerability due to reliance on a single master node
-Data Locality ConstraintsDifficulties in ensuring data locality
-Scalability ChallengesComplications associated with scaling Hadoop clusters
-Resource-IntensiveSignificant hardware resources required for operation

Platform

Social

 

System Requirements

Version ↓
#Minimum
1
  • Apache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only) (Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported.)
  • Apache Hadoop from 3.0.x to 3.2.x now supports only Java 8
  • Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8
  • Java 6 is supported by v2.6 or earlier
2
SSH installed and SSHD running to use the Hadoop scripts that manage remote Hadoop daemons

Ratings

4.15
5

G2CROWD
4.3
5
based on 81 reviews
TrustRadius
8.0
10
based on 214 reviews

Written in

Java, C++, C

Initial Release

1 April 2006

Alternatives

Distributed File System
No alternative software available under 'Distributed File System' category.
Cloud Computing
Eureka   Kubecost   Pulumi IaC   Infracost   Terraform by HashiCorp   Velero   Apache Mahout   Apache Spark  

Notes