Apache Hadoop

A software framework for distributed storage that facilitates using a network of many computers to solve problems involving massive amounts of data and computation using MapReduce programming model

by Apache Software Foundation ·

4.2

5 ·⚖️ Free · Open

Blog ·News ·Mailing list ·Wiki ·Documentation ·FAQ ·IRC

Features & Limitations

+	Distributed Computing	Hadoop enables the processing of large data sets across clusters of computers.
+	Scalability	It can scale from single servers to thousands of machines.
+	Storage	Offers local computation and storage capabilities.
+	Programming Model	Utilizes simple programming models for distributed data processing.
+	Fault Tolerance	Designed to handle failures at the application layer.
+	HDFS	High-throughput access to application data via the Hadoop Distributed File System.
+	YARN	Short for “Yet Another Resource Navigator”; Manages resources and job scheduling across the cluster.
+	MapReduce	A system for parallel processing of large data sets within the YARN framework.
-	Complex Setup	Intricate setup process, particularly challenging for beginners.
-	Batch Processing	Slower batch-processing model compared to alternatives like Apache Spark.
-	No Real-Time Processing	Absence of support for real-time data processing
-	High Latency	Elevated latency stemming from batch processing
-	Single Point of Failure	Vulnerability due to reliance on a single master node
-	Data Locality Constraints	Difficulties in ensuring data locality
-	Scalability Challenges	Complications associated with scaling Hadoop clusters
-	Resource-Intensive	Significant hardware resources required for operation

Platform

Social

System Requirements

Version ↓

#	Minimum
1	Apache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only) (Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported.) Apache Hadoop from 3.0.x to 3.2.x now supports only Java 8 Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8 Java 6 is supported by v2.6 or earlier
2	SSH installed and SSHD running to use the Hadoop scripts that manage remote Hadoop daemons

Ratings

4.15

G2CROWD	4.3 5 based on 81 reviews
TrustRadius	8.0 10 based on 214 reviews

Developer

Apache Software Foundation

Written in

Java, C++, C

Initial Release

1 April 2006

Repository

https://git-wip-us.apache.org/repos/asf/hadoop.git
https://github.com/apache/hadoop

License

Apache v2

Alternatives

Distributed File System
No alternative software available under 'Distributed File System' category.
Cloud Computing
Apache Mahout Eureka Kubecost Pulumi IaC Infracost Terraform by HashiCorp Velero Apache Spark

Notes

Apache, Apache Hadoop name and logo are trademarks of Apache Software Foundation.
Hardware System requirements (optimal) are not from official website.