![]() | |
Original author(s) | David Cournapeau |
---|---|
Initial release | June 2007; 17 years ago (2007-06) |
Stable release | |
Repository | |
Written in | Python,Cython,C andC++[2] |
Operating system | Linux,macOS,Windows |
Type | Library formachine learning |
License | New BSD License |
Website | scikit-learn |
scikit-learn (formerlyscikits.learn and also known assklearn) is afree and open-sourcemachine learninglibrary for thePythonprogramming language.[3]It features variousclassification,regression andclusteringalgorithms includingsupport-vector machines,random forests,gradient boosting,k-means andDBSCAN, and is designed to interoperate with thePython numerical and scientific librariesNumPy andSciPy. Scikit-learn is aNumFOCUS fiscally sponsored project.[4]
The scikit-learn project started as scikits.learn, aGoogle Summer of Code project by Frenchdata scientistDavid Cournapeau. The name of the project stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately developed and distributed third-party extension toSciPy.[5] The originalcodebase was later rewritten by otherdevelopers.[who?] In 2010, contributors Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort and Vincent Michel, from theFrench Institute for Research in Computer Science and Automation inSaclay,France, took leadership of the project and released the first public version of the library on February 1, 2010.[6] In November 2012, scikit-learn as well asscikit-image were described as two of the "well-maintained and popular" scikits libraries[update].[7] In 2019, it was noted that scikit-learn is one of the most popular machine learning libraries onGitHub.[8]
estimator.fit()
andestimator.predict()
), which libraries can implementPipeline
), including data pre-processing and model fittingFitting arandom forest classifier:
>>>fromsklearn.ensembleimportRandomForestClassifier>>>classifier=RandomForestClassifier(random_state=0)>>>X=[[1,2,3],# 2 samples, 3 features...[11,12,13]]>>>y=[0,1]# classes of each sample>>>classifier.fit(X,y)RandomForestClassifier(random_state=0)
scikit-learn is largely written in Python, and usesNumPy extensively for high-performance linear algebra and array operations. Furthermore, some core algorithms are written inCython to improve performance. Support vector machines are implemented by a Cython wrapper aroundLIBSVM; logistic regression and linear support vector machines by a similar wrapper aroundLIBLINEAR. In such cases, extending these methods with Python may not be possible.
scikit-learn integrates well with many other Python libraries, such asMatplotlib andplotly for plotting,NumPy for array vectorization,Pandas dataframes,SciPy, and many more.
scikit-learn was initially developed by David Cournapeau as a Google Summer of Code project in 2007. Later that year, Matthieu Brucher joined the project and started to use it as a part of his thesis work. In 2010,INRIA, theFrench Institute for Research in Computer Science and Automation, got involved and the first public release (v0.1 beta) was published in late January 2010.