dict_learning_online#
- sklearn.decomposition.dict_learning_online(X,n_components=2,*,alpha=1,max_iter=100,return_code=True,dict_init=None,callback=None,batch_size=256,verbose=False,shuffle=True,n_jobs=None,method='lars',random_state=None,positive_dict=False,positive_code=False,method_max_iter=1000,tol=0.001,max_no_improvement=10)[source]#
Solve a dictionary learning matrix factorization problem online.
Finds the best dictionary and the corresponding sparse code forapproximating the data matrix X by solving:
(U^*,V^*)=argmin0.5||X-UV||_Fro^2+alpha*||U||_1,1(U,V)with||V_k||_2=1forall0<=k<n_components
where V is the dictionary and U is the sparse code. ||.||_Fro stands forthe Frobenius norm and ||.||_1,1 stands for the entry-wise matrix normwhich is the sum of the absolute values of all the entries in the matrix.This is accomplished by repeatedly iterating over mini-batches by slicingthe input data.
Read more in theUser Guide.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Data matrix.
- n_componentsint or None, default=2
Number of dictionary atoms to extract. If None, then
n_componentsis set ton_features.- alphafloat, default=1
Sparsity controlling parameter.
- max_iterint, default=100
Maximum number of iterations over the complete dataset beforestopping independently of any early stopping criterion heuristics.
Added in version 1.1.
- return_codebool, default=True
Whether to also return the code U or just the dictionary
V.- dict_initndarray of shape (n_components, n_features), default=None
Initial values for the dictionary for warm restart scenarios.If
None, the initial values for the dictionary are createdwith an SVD decomposition of the data viarandomized_svd.- callbackcallable, default=None
A callable that gets invoked at the end of each iteration.
- batch_sizeint, default=256
The number of samples to take in each batch.
Changed in version 1.3:The default value of
batch_sizechanged from 3 to 256 in version 1.3.- verbosebool, default=False
To control the verbosity of the procedure.
- shufflebool, default=True
Whether to shuffle the data before splitting it in batches.
- n_jobsint, default=None
Number of parallel jobs to run.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.- method{‘lars’, ‘cd’}, default=’lars’
'lars': uses the least angle regression method to solve the lassoproblem (linear_model.lars_path);'cd': uses the coordinate descent method to compute theLasso solution (linear_model.Lasso). Lars will be faster ifthe estimated components are sparse.
- random_stateint, RandomState instance or None, default=None
Used for initializing the dictionary when
dict_initis notspecified, randomly shuffling the data whenshuffleis set toTrue, and updating the dictionary. Pass an int for reproducibleresults across multiple function calls.SeeGlossary.- positive_dictbool, default=False
Whether to enforce positivity when finding the dictionary.
Added in version 0.20.
- positive_codebool, default=False
Whether to enforce positivity when finding the code.
Added in version 0.20.
- method_max_iterint, default=1000
Maximum number of iterations to perform when solving the lasso problem.
Added in version 0.22.
- tolfloat, default=1e-3
Control early stopping based on the norm of the differences in thedictionary between 2 steps.
To disable early stopping based on changes in the dictionary, set
tolto 0.0.Added in version 1.1.
- max_no_improvementint, default=10
Control early stopping based on the consecutive number of mini batchesthat does not yield an improvement on the smoothed cost function.
To disable convergence detection based on cost function, set
max_no_improvementto None.Added in version 1.1.
- Returns:
- codendarray of shape (n_samples, n_components),
The sparse code (only returned if
return_code=True).- dictionaryndarray of shape (n_components, n_features),
The solutions to the dictionary learning problem.
- n_iterint
Number of iterations run. Returned only if
return_n_iterisset toTrue.
See also
dict_learningSolve a dictionary learning matrix factorization problem.
DictionaryLearningFind a dictionary that sparsely encodes data.
MiniBatchDictionaryLearningA faster, less accurate, version of the dictionary learning algorithm.
SparsePCASparse Principal Components Analysis.
MiniBatchSparsePCAMini-batch Sparse Principal Components Analysis.
Examples
>>>importnumpyasnp>>>fromsklearn.datasetsimportmake_sparse_coded_signal>>>fromsklearn.decompositionimportdict_learning_online>>>X,_,_=make_sparse_coded_signal(...n_samples=30,n_components=15,n_features=20,n_nonzero_coefs=10,...random_state=42,...)>>>U,V=dict_learning_online(...X,n_components=15,alpha=0.2,max_iter=20,batch_size=3,random_state=42...)
We can check the level of sparsity of
U:>>>np.mean(U==0)np.float64(0.53)
We can compare the average squared euclidean norm of the reconstructionerror of the sparse coded signal relative to the squared euclidean norm ofthe original signal:
>>>X_hat=U@V>>>np.mean(np.sum((X_hat-X)**2,axis=1)/np.sum(X**2,axis=1))np.float64(0.053)
