scikit-learn

A SciPy based Python library for machine learning tasks like classification, regression, and clustering

by David Cournapeau, Other contributors ·

4.7

5 ·⚖️ Free · Open

Documentation ·Wiki ·Mailing list ·Stack Overflow ·FAQ ·IRC

Features & Limitations

+	Supervised Learning Algorithms	Tools for common supervised learning algorithms such as linear regression, support vector machines, and random forests; allowing you to build models for prediction tasks
+	Unsupervised Learning Algorithms	Implements unsupervised learning methods like clustering, factor analysis, and principal component analysis; for exploring unlabeled data and uncovering hidden patterns
+	Cross-validation	Techniques to assess the predictive performance of the models, choose the best model and prevent overfitting
+	Preprocessing	Functions for preprocessing data, such as scaling, centring, normalization, binarization, and imputation of missing values
+	Model Evaluation	Metrics and scoring functions to evaluate the performance of models
+	Pipeline	Streamlining the machine learning workflow by chaining transformations and models
+	Grid Search	Methods for parameter tuning to determine the best model parameters and avoid manual exploration
+	Persistence	Allows saving and loading models for later use, facilitating deployment and reusability
+	Scalability	Supports handling large datasets through efficient algorithms and integration with tools like scikit-learn pipelines
+	Visualization	Offers tools for visualizing data and model performance through integration with libraries like Matplotlib
+	Feature Extraction	Tools for extracting features from data such as text and images for machine learning algorithms
+	Dimensionality Reduction	Methods like PCA and feature selection techniques to reduce the number of features
+	Ensemble Methods	Combines the predictions of several base estimators to improve generalizability and robustness over a single estimator
+	Feature Selection	Techniques for feature selection to improve estimators’ accuracy scores or to boost their performance on very high-dimensional datasets
+	Datasets	Provides several toy datasets to practice machine-learning techniques
+	Metrics	Offers a wide range of performance metrics for classification, regression, clustering, and pairwise metrics
+	Semi-Supervised Learning	Algorithms for semi-supervised learning problems
+	Nearest Neighbors	Algorithms for unsupervised and supervised neighbors-based learning methods
+	Gaussian Processes	Tools for Gaussian process regression and classification
+	Manifold Learning	Algorithms for manifold learning with an emphasis on non-linear dimensionality reduction
+	Covariance Estimation	Methods for robust covariance estimation and Mahalanobis distances relevance
+	Isotonic Regression	Implements isotonic regression to fit a non-decreasing function to data
+	Multiclass and Multilabel Algorithms	Strategies to solve multiclass and multilabel classification problems
+	Random Projection	Methods for reducing dimensionality through random projection matrix generation
-	Limited Deep Learning Support	Limited capabilities for deep learning tasks
-	High-Dimensional Data	Challenges in effectively handling high-dimensional data
-	Graph Algorithms	Not optimized for graph algorithms
-	String Processing	Not very efficient at processing strings
-	Hyperparameter Spaces	Awkward definition of hyperparameters and search spaces in models

Platform

Social

System Requirements

Version ↓

#	Minimum
1	numpy >= 1.19.5 scipy >= 1.6.3 joblib >= 1.2 threadpoolctl >= 3.1
2	Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4 Scikit-learn 0.21 supported Python 3.5-3.7 Scikit-learn 0.22 supported Python 3.5-3.8. Scikit-learn 0.23-0.24 required Python 3.6 or newer Scikit-learn 1.0 supported Python 3.7-3.10 Scikit-learn 1.1, 1.2 and 1.3 support Python 3.8-3.12 Scikit-learn 1.4 requires Python 3.9 or newer

Ratings

4.70

G2CROWD	4.9 5 based on 30 reviews
InfoWorld	4.5 5 based on professional's opinion

Developer

David Cournapeau, Other contributors

Written in

Python, Cython, C, C++

Initial Release

June 2007

Repository

https://github.com/scikit-learn/scikit-learn

License

BSD-3

Alternatives

Machine Learning
Apache Mahout Massive Online Analysis TensorFlow Apache Spark Apache MXNet Apache SystemDS Eclipse Deeplearning4j MALLET mlpack OpenCV Orange PyTorch The Microsoft Cognitive Toolkit Torch Weka Yooreeka
Data Mining
KNIME Analytics Platform Massive Online Analysis ELKI OpenNN Orange Weka Yooreeka
Data Analysis
KNIME Analytics Platform Orange pandas

scikit-learn

Features & Limitations

Platform

Social

System Requirements

Ratings

Developer

Written in

Initial Release

Repository

License

Categories

Alternatives