A machine learning system for the end-to-end data science lifecycle, encompassing data integration, cleaning, feature engineering, efficient model training, and deployment.

+Algorithm CustomizabilityAllows customization via R-like and Python-like languages
+Hybrid Execution PlansCombines local, in-memory CPU and GPU operations with distributed operations on Apache Spark.
+Multiple Execution ModesIncludes Spark MLContext, Spark Batch, Hadoop Batch, Standalone, and JMLC for varied operational needs.
+Automatic OptimizationOptimizes based on data and cluster characteristics for efficiency and scalability.
+Declarative LanguagesProvides R-like syntax for various data science tasks
+Compressed Linear AlgebraEnhances large scale machine learning
+Principal Component AnalysisProvides scripts for statistical analysis
+CompatibilityCompatibility with popular programming languages (Support for Java 11 and Python 3.5+) for broader use.
+IntegrationWorks with Hadoop 3.3.x and Spark 3.5.x for big data processing.
+Nvidia CUDA and Intel MKL SupportUtilizes Nvidia CUDA 10.2 and Intel MKL (<=2019.x) for enhanced performance.
System Requirements

Apache Software Foundation

Written in

Java, R, Python, Scala

Initial Release

