Apache Software Foundation

Apache Hadoop logo

Apache Hadoop

]

Apache(TM) Hadoop(R) is a library framework that facilitate using a network of many computers to solve problems involving massive amounts of data and computation providing for distributed storage and processing of big data using the MapReduce programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Read more →
Apache Mahout logo

Apache Mahout

]

Apache Mahout(TM) is an open source project that is primarily used for creating scalable machine learning algorithms. It implements machine learning techniques such as, collaborative filtering, clustering, recommendation and classification. It also provides Java libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections. A mahout is a word used in South Asian countries to describe one who drives an elephant as its master. The name comes from its close association with Apache Hadoop which uses an elephant as its logo.

Read more →
Apache MXNet (Incubating) logo

Apache MXNet (Incubating)

]

Apache MXNet is an open source multi-language machine learning (ML) library especially to train and deploy deep neural networks, on a wide array of devices. Once embedded in the host language, it blends declarative symbolic expression with imperative tensor computation. It is built on a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient.

Read more →
Apache Spark logo

Apache Spark

]

Apache Spark(TM) is an open-source distributed general-purpose cluster computing framework with (mostly) in-memory data processing engine that can do ETL, analytics, machine learning and graph processing on large volumes of data at rest (batch processing) or in motion (streaming processing) with rich concise high-level APIs for the programming languages: Scala, Python, Java, R, and SQL. In contrast to Hadoop’s two-stage disk-based MapReduce computation engine, Spark’s multi-stage (mostly) in-memory computing engine allows for running most computations in memory, and hence most of the time provides better performance for certain applications, e.

Read more →
Apache SystemML logo

Apache SystemML

]

Apache SystemML is a machine learning system with support for Java 8+, Scala 2.11+, Python 2.7/3.5+, Hadoop 2.6+, and Spark 2.1+. It provides a workplace for machine learning using big data. As It runs on top of Apache Spark, it automatically scales data, line by line, determining whether the code should be run on the driver or an Apache Spark cluster. Future releases may include additional deep learning with GPU capabilities such as importing and running neural network architectures and pre-trained models for training.

Read more →