Apache Spark
Apache Spark is a distributed general-purpose cluster-computing framework
Features
Apache Spark(TM) is an open-source distributed general-purpose cluster computing framework with (mostly) in-memory data processing engine that can do ETL, analytics, machine learning and graph processing on large volumes of data at rest (batch processing) or in motion (streaming processing) with rich concise high-level APIs for the programming languages: Scala, Python, Java, R, and SQL.
In contrast to Hadoop’s two-stage disk-based MapReduce computation engine, Spark’s multi-stage (mostly) in-memory computing engine allows for running most computations in memory, and hence most of the time provides better performance for certain applications, e.g. iterative algorithms or interactive data mining.
- Mastering Apache Spark by Jacek Laskowski
See Apache Hadoop.
Libraries:
- <strong>Spark SQL</strong> is Apache Spark’s module for working with structured data.
- <strong>Spark Streaming</strong> makes it easy to build scalable fault-tolerant streaming applications.
- <strong>MLlib</strong> is Apache Spark’s scalable machine learning library.
- <strong>GraphX</strong> is Apache Spark’s API for graphs and graph-parallel computation.
News | Stack Overflow Q&A | Community/Mailing Lists | Documentation | FAQ | IRC
Developer
Matei Zaharia(OD) at UC, Berkley's AMPLab, Apache Software Foundation
Written in
Scala, Java, Python, R
Initial Release
26 May 2014
Repository
License
Apache v2
Categories
Alternatives
Data Analytics
No alternative software available under 'Data Analytics' category.
Machine Learning
OpenCV Apache Mahout Apache MXNet (Incubating) Apache SystemML Eclipse Deeplearning4j MALLET Massive Online Analysis (MOA) mlpack Orange PyTorch scikit-learn TensorFlow The Microsoft Cognitive Toolkit Torch Weka Yooreeka
This page was last updated with commit: Following: - Fixed: missing sources for features now added - Removed: Google Analytics Async (deprecated) - Added: missing aria-labels to input elements - Updated: partials/seo.html code for new data structure - Fixed: changed aria-label to title for span and divs - Fixed: color of status icon on softpages not appearing correctly (5221a6e)