Apache Spark logo

Apache Spark

Apache Spark is a distributed general-purpose cluster-computing framework

Features

Apache Spark(TM) is an open-source distributed general-purpose cluster computing framework with (mostly) in-memory data processing engine that can do ETL, analytics, machine learning and graph processing on large volumes of data at rest (batch processing) or in motion (streaming processing) with rich concise high-level APIs for the programming languages: Scala, Python, Java, R, and SQL.

In contrast to Hadoop’s two-stage disk-based MapReduce computation engine, Spark’s multi-stage (mostly) in-memory computing engine allows for running most computations in memory, and hence most of the time provides better performance for certain applications, e.g. iterative algorithms or interactive data mining.
- Mastering Apache Spark by Jacek Laskowski

See Apache Hadoop.

Libraries:

  1. <strong>Spark SQL</strong> is Apache Spark’s module for working with structured data.
  2. <strong>Spark Streaming</strong> makes it easy to build scalable fault-tolerant streaming applications.
  3. <strong>MLlib</strong> is Apache Spark’s scalable machine learning library.
  4. <strong>GraphX</strong> is Apache Spark’s API for graphs and graph-parallel computation.

News | Stack Overflow Q&amp;A | Community/Mailing Lists | Documentation | FAQ | IRC

Platform

Social

System Requirements

#Minimum
14-8 disks per node
28 GB to hundreds of GBs
310 Gigabit or higher network
48-16 cores per machine

Ratings

Aggregate
4.08 
 5

PAT RESEARCH7.7 
 10
based on professional's opinion
PAT RESEARCH8.2 
 10
based on 2 reviews
TrustRadius8.6 
 10
based on 101 reviews

Developer

Matei Zaharia(OD) at UC, Berkley's AMPLab, Apache Software Foundation

Written in

Scala, Java, Python, R

Initial Release

26 May 2014

Alternatives

Data Analytics

No alternative software available under 'Data Analytics' category.


Machine Learning
OpenCV   Apache Mahout   Apache MXNet (Incubating)   Apache SystemML   Eclipse Deeplearning4j   MALLET   Massive Online Analysis (MOA)   mlpack   Orange   PyTorch   scikit-learn   TensorFlow   The Microsoft Cognitive Toolkit   Torch   Weka   Yooreeka  


This page was last updated with commit: Following: - Fixed: missing sources for features now added - Removed: Google Analytics Async (deprecated) - Added: missing aria-labels to input elements - Updated: partials/seo.html code for new data structure - Fixed: changed aria-label to title for span and divs - Fixed: color of status icon on softpages not appearing correctly (5221a6e)