dict_learning#
- sklearn.decomposition.dict_learning(X,n_components,*,alpha,max_iter=100,tol=1e-08,method='lars',n_jobs=None,dict_init=None,code_init=None,callback=None,verbose=False,random_state=None,return_n_iter=False,positive_dict=False,positive_code=False,method_max_iter=1000)[source]#
Solve a dictionary learning matrix factorization problem.
Finds the best dictionary and the corresponding sparse code forapproximating the data matrix X by solving:
(U^*,V^*)=argmin0.5||X-UV||_Fro^2+alpha*||U||_1,1(U,V)with||V_k||_2=1forall0<=k<n_components
where V is the dictionary and U is the sparse code. ||.||_Fro stands forthe Frobenius norm and ||.||_1,1 stands for the entry-wise matrix normwhich is the sum of the absolute values of all the entries in the matrix.
Read more in theUser Guide.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Data matrix.
- n_componentsint
Number of dictionary atoms to extract.
- alphaint or float
Sparsity controlling parameter.
- max_iterint, default=100
Maximum number of iterations to perform.
- tolfloat, default=1e-8
Tolerance for the stopping condition.
- method{‘lars’, ‘cd’}, default=’lars’
The method used:
'lars': uses the least angle regression method to solve the lassoproblem (
linear_model.lars_path);
'cd': uses the coordinate descent method to compute theLasso solution (linear_model.Lasso). Lars will be faster ifthe estimated components are sparse.
- n_jobsint, default=None
Number of parallel jobs to run.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.- dict_initndarray of shape (n_components, n_features), default=None
Initial value for the dictionary for warm restart scenarios. Only usedif
code_initanddict_initare not None.- code_initndarray of shape (n_samples, n_components), default=None
Initial value for the sparse code for warm restart scenarios. Only usedif
code_initanddict_initare not None.- callbackcallable, default=None
Callable that gets invoked every five iterations.
- verbosebool, default=False
To control the verbosity of the procedure.
- random_stateint, RandomState instance or None, default=None
Used for randomly initializing the dictionary. Pass an int forreproducible results across multiple function calls.SeeGlossary.
- return_n_iterbool, default=False
Whether or not to return the number of iterations.
- positive_dictbool, default=False
Whether to enforce positivity when finding the dictionary.
Added in version 0.20.
- positive_codebool, default=False
Whether to enforce positivity when finding the code.
Added in version 0.20.
- method_max_iterint, default=1000
Maximum number of iterations to perform.
Added in version 0.22.
- Returns:
- codendarray of shape (n_samples, n_components)
The sparse code factor in the matrix factorization.
- dictionaryndarray of shape (n_components, n_features),
The dictionary factor in the matrix factorization.
- errorsarray
Vector of errors at each iteration.
- n_iterint
Number of iterations run. Returned only if
return_n_iterisset to True.
See also
dict_learning_onlineSolve a dictionary learning matrix factorization problem online.
DictionaryLearningFind a dictionary that sparsely encodes data.
MiniBatchDictionaryLearningA faster, less accurate version of the dictionary learning algorithm.
SparsePCASparse Principal Components Analysis.
MiniBatchSparsePCAMini-batch Sparse Principal Components Analysis.
Examples
>>>importnumpyasnp>>>fromsklearn.datasetsimportmake_sparse_coded_signal>>>fromsklearn.decompositionimportdict_learning>>>X,_,_=make_sparse_coded_signal(...n_samples=30,n_components=15,n_features=20,n_nonzero_coefs=10,...random_state=42,...)>>>U,V,errors=dict_learning(X,n_components=15,alpha=0.1,random_state=42)
We can check the level of sparsity of
U:>>>np.mean(U==0)np.float64(0.62)
We can compare the average squared euclidean norm of the reconstructionerror of the sparse coded signal relative to the squared euclidean norm ofthe original signal:
>>>X_hat=U@V>>>np.mean(np.sum((X_hat-X)**2,axis=1)/np.sum(X**2,axis=1))np.float64(0.0192)
