Movatterモバイル変換

[0]ホーム

uwot 0.2.4

Bug fixes and minorimprovements

The installation status of optional dependencies were not beingdetected correctly. This meant that different packages could be used forinitialization in unpredictable ways depending on whether they had beenexplicitly loaded or not. Thank youhsuknowledge for the report(https://github.com/jlmelville/uwot/issues/134).
Users of thebbknnR packagewere suffering from a not-helpful error message when the custom neighbordata contained missing values. An explicit check has been added andalthough this is still a fatal error, the message should be moreinformative (https://github.com/jlmelville/uwot/issues/135).
Fixed a partially-specified parameter name being passed to irlba.Thank youHugo Gruson for thefix (https://github.com/jlmelville/uwot/pull/136).
Development-only: fixed an incorrect use oftestthat::expect in a unit test. Thank youHadley Wickham for the fix (https://github.com/jlmelville/uwot/pull/138).

uwot 0.2.3

New features:

New parameter:rng_type. This will be used in favor ofthe booleanpcg_rand parameter, althoughpcg_rand will still work for backwards compatibility.
New negative sampling option: setrng_type = "deterministic" to use a deterministic samplingof vertices during the optimization phase. This should givequalitatively similar results to using a real PRNG, but has theadvantage of being faster and giving more reproducible output. Thisfeature was inspired by a comment byLelandMcInnes on Reddit.

Bug fixes and minorimprovements

Settingnum_threads directly inumap2 didnot result in the number of SGD threads being updated to that value whenbatch = TRUE, which it should have been.
Despite assertions to the contrary in version 0.2.1,umap_transform continued to return the fuzzy graph intransposed form. Thank youPedroMilanezAlmeidafor reopening the issue (https://github.com/jlmelville/uwot/issues/118).
Relative paths could not be used to save a model. Thank youWouter van der Bijl for the bugreport (https://github.com/jlmelville/uwot/issues/131) and thesuggested fix.
repulsion_strength was silently ignored if used withtumap orumap2 witha = 1, b = 1.Ignoring the setting was on purpose, but it was not documented anywhere.repulsion_strength is now compatible with thesesettings.
It’s no longer an error to provide apca argument ifthe input data has a maximum rank smaller than the value ofpca. No PCA is applied in this case. Ifverbose = TRUE, a message will be printed to inform theuser.

uwot 0.2.2

Bug fixes and minorimprovements

RSpectra is now a required dependency (again). It was arequired dependency up until version 0.1.12, when it became optional(irlba was used in its place). However, problems withinteractions of the current version ofirlba with an ABIchange in theMatrix package means that it’s hard fordownstream packages and users to builduwot withoutre-installingMatrix andirlba from source,which may not be an option for some people. Also it was causing a CRANcheck error. I have changed some tests, examples and vignettes to useRSpectra explicitly, and to only testirlbacode-paths where necessary. Seehttps://github.com/jlmelville/uwot/issues/115 and linkstherein for more details.

uwot 0.2.1

New features:

TheHNSW approximatenearest neighbor search algorithm is now supported via theRcppHNSW package.Setnn_method = "hnsw" to use it. The behavior of themethod can be controlled by the newnn_args parameter, alist which may containM,ef_construction andef. See the hnswlib library’sALGO_PARAMSdocumentation for details on these parameters. Although typicallyfaster than Annoy (for a given accuracy), be aware that the onlysupportedmetric values are"euclidean","cosine" and"correlation". Finally, RcppHNSWis only a suggested package, not a requirement, so you need to installit yourself (e.g. viainstall.packages("RcppHNSW")). Alsosee thearticleon HNSW in uwot in the documentation.
The nearest neighbor descent approximate nearest neighbor searchalgorithm is now supported via thernndescentpackage. Setnn_method = "nndescent" to use it. Thebehavior of the method can be controlled by the newnn_argsparameter. There are many supported metrics and possible parameters thatcan be set innn_args, so please see thearticleon nearest neighbor descent in uwot in the documentation, and alsothe rnndescent package’sdocumentationfor details.rnndescent is only a suggested package, not arequirement, so you need to install it yourself (e.g. viainstall.packages("rnndescent")).
New function:umap2, which acts likeumapbut with modified defaults, reflecting my experience with UMAP andcorrecting some small mistakes. See theumap2article for more details.

Bug fixes and minorimprovements

init_sdev = "range" caused an error with auser-suppliedinit matrix.
Transforming new data with thecorrelation metric wasactually using thecosine metric if you saved and reloadedthe model. Thank youHolly Hallfor the report and helpful detective work (https://github.com/jlmelville/uwot/issues/117).
umap_transform could fail if the new data to betransformed had thescaled:center andscaled:scale attributes set (e.g. from applying thescale function).
If you askedumap_transform to return the fuzzy graph (ret_extra = c("fgraph")), it was transposed whenbatch = TRUE, n_epochs = 0. Thank youPedroMilanezAlmeidafor reporting (https://github.com/jlmelville/uwot/issues/118).
Settingn_sgd_threads = "auto" withumap_transform caused a crash.
A warning was being emitted due to not being specific enough aboutwhatdist class was meant that may have been particularlyaffecting Seurat users. Thank youAndiMunteanu for reporting(and suggesting a solution) (https://github.com/jlmelville/uwot/issues/121).

uwot 0.1.16

Bug fixes and minorimprovements

A small change to a header file was required to fully support thenext version ofRcppAnnoy. ThankyouDirk Eddelbuettel forthe PR (https://github.com/jlmelville/uwot/issues/112).

uwot 0.1.15

New features:

New function:optimize_graph_layout. Use this toproduce optimized output coordinates that reflect an input similaritygraph (such as that produced by thesimilarity_graphfunction.similarity_graph followed byoptimize_graph_layout is the same as runningumap, so the purpose of these functions is to allow formore flexibility and decoupling between generating the nearest neighborgraph and optimizing the low-dimensional approximation to it. Based on arequest by userChengwei94(https://github.com/jlmelville/uwot/issues/98).
New functions:simplicial_set_union andsimplicial_set_intersect. These allow for the combinationof different fuzzy graph representations of a dataset into a singlefuzzy graph using the UMAP simplicial set operations. Based on a requestin the Python UMAP issues tracker by userDhar xion.
New parameter forumap_transform:ret_extra. This works like the equivalent parameter forumap, and should be a character vector specifying the extrainformation you would like returned in addition to the embedding, inwhich case a list will be returned with anembedding membercontaining the optimized coordinates. Supported values are"fgraph","nn","sigma" and"localr". Based on a request by userPedroMilanezAlmeida(https://github.com/jlmelville/uwot/issues/104).
New parameter fromumap,tumap andumap_transform:seed. This will do theequivalent of callingset.seed internally, and hence willhelp with reproducibility. The chosen seed is exported ifret_model = TRUE andumap_transform will usethat seed if present, so you only need to specify it inumap_transform if you want to change the seed. The defaultbehavior remains to not modify the random number state. Based on arequest bySuhasSrinivasan (https://github.com/jlmelville/uwot/issues/110).

Bug fixes and minorimprovements

A new setting forinit_sdev: setinit_sdev = "range" and initial coordinates will berange-scaled so each column takes values between 0-10. Thispre-processing was added to the Python UMAP package at some point afteruwot began development and so should probably always beused with the defaultinit = "spectral" setting. However,it is not set by default to maintain backwards compatibility with olderversions ofuwot.
ret_extra = c("sigma") is now supported bylvish. The Gaussian bandwidths are returned in asigma vector. In addition, a vector of intrinsicdimensionalities estimated for each point using an analytical expressionof the finite difference method given byLee andco-workers is returned in thedint vector.
Themin_dist andspread parameters are nowreturned in the model whenumap is run withret_model = TRUE. This is just for documentation purposes,these values are not used directly by the model inumap_transform. If the parametersa andb are set directly when invokingumap, thenbothmin_dist andspread will be set toNULL in the returned model. This feature was added inresponse to a question fromkjiang18 (https://github.com/jlmelville/uwot/issues/95).
Some new checks for NA values in input data have been added. Also awarning will be emitted ifn_components seems to have beenset too high.
Ifn_components was greater thann_neighbors thenumap_transform would crashthe R session. Thank you toChVavfor reporting this (https://github.com/jlmelville/uwot/issues/102).
Usingumap_transform with a model wheredens_scale was set could cause a segmentation fault,destroying the session. Even if it didn’t it could give an entirelyartifactual “ring” structure. Thank youFemkeSmit for reporting this andproviding assistance in diagnosing the underlying cause (https://github.com/jlmelville/uwot/issues/103).
If you setbinary_edge_weights = TRUE, this setting wasnot exported whenret_model = TRUE, and was therefore notrespected byumap_transform. This has now been fixed, butyou will need to regenerate any models that used binary edgeweights.
The rdoc for theinit param said that if there weremultiple disconnected components, a spectral initialization wouldattempt to merge multiple sub-graphs. Not true: actually, spectralinitialization is abandoned in favor of PCA. The documentation has beenupdated to reflect the true state of affairs. No idea what I wasthinking of there.
load_model andsave_model didn’t work onWindows 7 due to how the version oftar there handles driveletters. Thank youmytarmailfor the report (https://github.com/jlmelville/uwot/issues/109).
Warn if the initial coordinates have a very large scale (a standarddeviation > 10.0), because this can lead to small gradients and pooroptimization. Thank youSuhasSrinivasan for thereport (https://github.com/jlmelville/uwot/issues/110).
A change to accommodate a forthcoming version ofRcppAnnoy. ThankyouDirk Eddelbuettel forthe PR (https://github.com/jlmelville/uwot/issues/111).

uwot 0.1.14

New features

New function:similarity_graph. If you are moreinterested in the high-dimensional graph/fuzzy simplicial setrepresentation of your input data, and don’t care about the lowdimensional approximation, thesimilarity_graph functionoffers a similar API toumap, but neither theinitialization nor optimization of low-dimensional coordinates will beperformed. The return value is the same as that which would be returnedin the results list as thefgraph member if you hadprovidedret_extra = c("fgraph"). Compared to getting thesame result via runningumap, this function is a bit moreconvenient to use, makes your intention clearer if you would bediscarding the embedding, and saves a small amount of time. At-SNE/LargeVis similarity graph can be returned by settingmethod = "largevis".

Bug fixes and minorimprovements

If a model was generated without using pre-generated nearestneighbors, you couldn’t useumap_transform withpre-generated nearest neighbors (also the error message was completelyuseless). Thank you toAustinHartman for reportingthis (https://github.com/jlmelville/uwot/issues/97).

uwot 0.1.13

This is a resubmission of 0.1.12 but with an internal function(fuzzy_simplicial_set) refactored to behave more like thatof previous versions. This change was breaking the behavior of the CRANpackagebbknnR.

uwot 0.1.12

New features

New parameter:dens_weight. If set to a value between 0and 1, an attempt is made to include the relative local densities of theinput data in the output coordinates. This is an approximation to thedensMAP method. Alarge value ofdens_weight will use a larger range ofoutput densities to reflect the input data. If the data is too spreadout, reduce the value ofdens_weight. For more informationsee thedocumentationat the uwot repo.
New parameter:binary_edge_weights. If set toTRUE, instead of smoothed knn distances, non-zero edgeweights all have a value of 1. This is howPaCMAP works andthere ispractical andtheoreticalreasons to believe this won’t have a big effect on UMAP but you can tryit yourself.
New options forret_extra:
- "sigma": the return value will contain asigma entry, a vector of the smooth knn distance scalingnormalization factors, one for each observation in the input data. Asmall value indicates a high density of points in the local neighborhoodof that observation. Forlvish the equivalent bandwidthscalculated for the input perplexity is returned.
- also, a vectorrho will be exported, which is thedistance to the nearest neighbor after the number of neighbors specifiedby thelocal_connectivity. Only applies forumap andtumap.
- "localr": exports a vector of the local radii, the sumofsigma andrho and used to scale the outputcoordinates whendens_weight is set. Even if not usingdens_weight, visualizing the output coordinates using acolor scale based on the value oflocalr can reveal regionsof the input data with different densities.
For functionsumap andtumap only: newdata type for precomputed nearest neighbor data passed as thenn_method parameter: you may use a sparse distance matrixof formatdgCMatrix with dimensionsN x NwhereN is the number of observations in the input data.Distances should be arranged by column, i.e. a non-zero entry in rowj of theith column indicates that thejth observation in the input data is a nearest neighbor oftheith observation with the distance given by the value ofthat element. Note that this is a different format to the sparsedistance matrix that can be passed as input toX: notably,the matrix is not assumed to be symmetric. Unlike other input formats,you may have a different number of neighbors for each observation (butthere must be at least one neighbor defined per observation).
umap_transform can also take a sparse distance matrixas itsnn_method parameter if precomputed nearest neighbordata is used to generate an initial model. The format is the same as forthenn_method withumap. Because distances arearranged by columns, the expected dimensions of the sparse matrix isN_model x N_new whereN_model is the number ofobservations in the original data andN_new is the numberof observations in the data to be transformed.

Bug fixes and minorimprovements

Models couldn’t be re-saved after loading. Thank you toilyakorsunsky for reportingthis (https://github.com/jlmelville/uwot/issues/88).
RSpectrais now a ‘Suggests’, rather than an ‘Imports’. If you have RSpectrainstalled, it will be used automatically where previous versionsrequired it (for spectral initialization). Otherwise,irlba will be used.For two-dimensional output, you are unlikely to notice much differencein speed or accuracy with real-world data. For highly-structuredsimulation datasets (e.g. spectral initialization of a 1D line) thenRSpectra will give much better, faster initializations, but these arenot the typical use cases envisaged for this package. For embedding intohigher dimensions (e.g. n_components = 100 or higher),RSpectra is recommended and will likely out-perform irlba even if youhave installed a good linear algebra library.
init = "laplacian" returned the wrong coordinatesbecause of a slightly subtle issue around how to order the eigenvectorswhen using the random walk transition matrix rather than normalizedgraph laplacians.
Theinit_sdev parameter was ignored when theinit parameter was a user-supplied matrix. Now the inputwill be scaled.
Matrix input was being converted to and from a data frame duringpre-processing, causing R to allocate memory that it was disinclined toever give up even after the function exited. This unnecessarymanipulation is now avoided.
The behavior of thebandwidth parameter has beenchanged to give results more like the current version (0.5.2) of thePython UMAP implementation. This is likely to be a breaking change fornon-default settings ofbandwidth, but this is not aparameter which is actually exposed by the Python UMAP public API anymore, so is on the road to deprecation in uwot too and I don’t recommendyou change this.
Transforming data with multiple blocks would give an error if thenumber of rows of the new data did not equal the number of number ofrows in the original data.

uwot 0.1.11

New features

New parameter:batch. IfTRUE, thenresults are reproducible whenn_sgd_threads > 1 (as longas you useset.seed). The price to be paid is that theoptimization is slightly less efficient (because coordinates are notupdated as quickly and hence gradients are staler for longer), so it ishighly recommended to setn_epochs = 500 or higher. Thankyou toAaron Lun who not only cameup with a way to implement this feature, but also wrote an entireC++ implementation of UMAPwhich does it (https://github.com/jlmelville/uwot/issues/83).
New parameter:opt_args. The default optimizationmethod whenbatch = TRUE isAdam. You can control itsparameters by passing them in theopt_args list. As Adam isa momentum-based method it requires extra storage of previous gradientdata. To avoid the extra memory overhead you can also useopt_args = list(method = "sgd") to use a stochasticgradient descent method like that used whenbatch = FALSE.
New parameter:epoch_callback. You may now pass afunction which will be invoked at the end of each epoch. Mainly usefulfor producing an image of the state of the embedding at different pointsduring the optimization. This is another feature taken fromumappp.
New parameter:pca_method, used when thepca parameter is supplied to reduce the initialdimensionality of the data. This controls which method is used to carryout the PCA and can be set to one of:
- "irlba" which usesirlba::irlba tocalculate a truncated SVD. If this routine deems that you are trying toextract 50% or more of the singular vectors, you will see a warning tothat effect logged to the console.
- "rsvd", which usesirlba::svdr fortruncated SVD. This method uses a small number of iterations whichshould give an accuracy/speed up trade-off similar to that of thescikit-learnTruncatedSVD method. This can be much faster than using"irlba" but potentially at a cost in accuracy. However, forthe purposes of dimensionality reduction as input to nearest neighborsearch, this doesn’t seem to matter much.
- "bigstatsr", which uses thebigstatsrpackage will be used.Note: that this isnot adependency ofuwot. If you want to usebigstatsr, you must install it yourself. On platformswithout easy access to fast linear algebra libraries (e.g. Windows),usingbigstatsr may give a speed up to PCAcalculations.
- "svd", which usesbase::svd.Warning: this is likely to be very slow for mostdatasets and exists as a fallback for small datasets where the"irlba" method would print a warning.
- "auto" (the default) which uses"irlba" tocalculate a truncated SVD, unless you are attempting to extract 50% ormore of the singular vectors, in which case"svd" isused.

Bug fixes and minorimprovements

If row names are provided in the input data (or nearest neighbordata, or initialization data if it’s a matrix), this will be used toname the rows of the output embedding (https://github.com/jlmelville/uwot/issues/81), and alsothe nearest neighbor data if you setret_nn = TRUE. If thenames exist in more than one of the input data parameters listed above,but are inconsistent, no guarantees are made about which names will beused. Thank youjwijffels forreporting this.
Inumap_transform, the learning rate is now down-scaledby a factor of 4, consistent with the Python implementation of UMAP. Ifyou need the old behavior back, use the (newly added)learning_rate parameter inumap_transform toset it explicitly. If you used the default value inumapwhen creating the model, the correct setting inumap_transform islearning_rate = 1.0.
Settingnn_method = "annoy" andverbose = TRUE would lead to an error with datasets withfewer than 50 items in them.
Using multiple pre-computed nearest neighbors blocks is nowsupported withumap_transform (this was incorrectlydocumented to work).
Documentation around pre-calculated nearest neighbor data forumap_transform was wrong in other ways: it has now beencorrected to indicate that there should be neighbor data for each itemin the test data, but the neighbors and distances should refer to itemsin training data (i.e. the data used to build the model).
n_neighbors parameter is now correctly ignored in modelgeneration if pre-calculated nearest neighbor data is provided.
Documentation incorrectly saidgrain_size didn’t doanything.

uwot 0.1.10

This release is mainly to allow for some internal changes to keepcompatibility with RcppAnnoy, used for the nearest neighborcalculations.

Bug fixes and minorimprovements

Passing in data with missing values will now raise an error early.Missing data in factor columns intended for supervised UMAP is still ok.Thank you David McGaughey for tweeting about this issue.
The documentation for the return value ofumap andtumap now note that the contents of themodellist are subject to change and not intended to be part of the uwotpublic API. I recommend not relying on the structure of themodel, especially if your package is intended to appear onCRAN or Bioconductor, as any breakages will delay future releases ofuwot to CRAN.

uwot 0.1.9

New features

New metric:metric = "correlation" a distance based onthe Pearson correlation (https://github.com/jlmelville/uwot/issues/22).Supporting this required a change to the internals of how nearestneighbor data is stored. Backwards compatibility with models generatedby previous versions usingret_model = TRUE should havebeen preserved.

Bug fixes and minorimprovements

New parameter,nn_method, forumap_transform: pass a list containing pre-computed nearestneighbor data (identical to that used in theumapfunction). You should not pass anything to theX parameterin this case. This extends the functionality for transforming new pointsto the case where nearest neighbor data between the original data andnew data can be calculated external touwot. Thanks toYuhan Hao for contributing the PR(https://github.com/jlmelville/uwot/issues/63 andhttps://github.com/jlmelville/uwot/issues/64).
New parameter,init, forumap_transform:provides a variety of options for initializing the output coordinates,analogously to the same parameter in theumap function (butwithout as many options currently). This is intended to replaceinit_weighted, which should be considered deprecated, butwon’t be removed until uwot 1.0 (whenever that is). Instead ofinit_weighted = TRUE, useinit = "weighted";replaceinit_weighted = FALSE withinit = "average". Additionally, you can pass a matrix toinit to act as the initial coordinates.
Also inumap_transform: previously, settingn_epochs = 0 was ignored: at least one iteration ofoptimization was applied. Now,n_epochs = 0 is respected,and will return the initialized coordinates without any furtheroptimization.
Minor performance improvement for single-threaded nearest neighborsearch whenverbose = TRUE: the progress bar calculationswere taking up a detectable amount of time and has now been fixed. Withvery small data sets (< 50 items) the progress bar will no longerappear when building the index.
Passing a sparse distance matrix as input now supports upper/lowertriangular matrix storage rather than wasting storage using anexplicitly symmetric sparse matrix.
Minor license change: uwot used to be licensed under GPL-3 only; nowit is GPL-3 or later.

uwot 0.1.8

Bug fixes and minorimprovements

default forn_threads is nowNULL toprovide a bit more protection from changing dependencies.
parallel code now uses the standard C++11 implementation ofthreading rather than tinythread++.
Thegrain_size parameter has been undeprecated. As theversion that deprecated this never made it to CRAN, this is unlikely tohave affected many people.

uwot 0.1.7

Bug fixes and minorimprovements

uwot should no longer trigger undefined behavior in sanitizers, dueto the temporary replacement of the RcppParallel package with code“borrowed” from that package and using tinythread++ rather than tbb (https://github.com/jlmelville/uwot/issues/52).
Further sanitizer improvements in the nearest neighbor search codedue to the upstream efforts oferikbern andeddelbuettel (https://github.com/jlmelville/uwot/issues/50).
Thegrain_size parameter is now ignored and remains toavoid breaking backwards compatibility only.

uwot 0.1.6

New features

New parameter,ret_extra, a vector which can containany combination of:"model" (same asret_model = TRUE),"nn" (same asret_nn = TRUE) andfgraph (see below).
New return value data: If theret_extra vector contains"fgraph", the returned list will contain anfgraph item representing the fuzzy simplicial input graphas a sparse N x N matrix. Forlvish, use"P"instead of"fgraph” (https://github.com/jlmelville/uwot/issues/47). Note thatthere is a further sparsifying step where edges with a very lowmembership are removed if there is no prospect of the edge being sampledduring optimization. This is controlled byn_epochs: thesmaller the value, the more sparsifying will occur. If you are onlyinterested in the fuzzy graph and not the embedded coordinates, setn_epochs = 0.
New function:unload_uwot, to unload the Annoy nearestneighbor indices in a model. This prevents the model from being used inumap_transform, but allows for the temporary workingdirectory created by bothsave_uwot andload_uwot to be deleted. Previously, bothload_uwot andsave_uwot were attempting todelete the temporary working directories they used, but would alwayssilently fail because Annoy is making use of files in thosedirectories.
An attempt has been made to reduce the variability of results due todifferent compiler and C++ library versions on different machines.Visually results are unchanged in most cases, but this is a breakingchange in terms of numerical output. The best chance of obtainingfloating point determinism across machines is to useinit = "spca", fixed values ofa andb (rather than allowing them to be calculated throughsettingmin_dist andspread) andapprox_pow = TRUE. Using thetumap method withinit = "spca" is probably the most robust approach.

Bug fixes and minorimprovements

New behavior whenn_epochs = 0. This used to behavelike (n_epochs = NULL) and gave a default number of epochs(dependent on the number of vertices in the dataset). Now it moreusefully carries out all calculations except optimization, so thereturned coordinates are those specified by theinitparameter, so this is an easy way to access e.g. the spectral or PCAinitialization coordinates. If you want the input fuzzy graph(ret_extra vector contains"fgraph"), thiswill also prevent the graph having edges with very low membership beingremoved. You still get the old default epochs behavior by settingn_epochs = NULL or to a negative value.
save_uwot andload_uwot have been updatedwith averbose parameter so it’s easier to see whattemporary files are being created.
save_uwot has a new parameter,unload,which if set toTRUE will delete the working directory foryou, at the cost of unloading the model, i.e. it can’t be used withumap_transform until you reload it withload_uwot.
save_uwot now returns the saved model with an extrafield,mod_dir, which points to the location of thetemporary working directory, so you should now assign the result ofcallingsave_uwot to the model you saved, e.g.model <- save_uwot(model, "my_model_file"). This fieldis intended for use withunload_uwot.
load_uwot also returns the model with amod_dir item for use withunload_uwot.
save_uwot andload_uwot were not correctlyhandling relative paths.
A previous bug fix toload_uwot in uwot 0.1.4 to workwith newer versions of RcppAnnoy (https://github.com/jlmelville/uwot/issues/31) failed inthe typical case of a single metric for the nearest neighbor searchusing all available columns, giving an error message along the lines of:Error: index size <size> is not a multiple of vector size <size>.This has now been fixed, but required changes to bothsave_uwot andload_uwot, so existing savedmodels must be regenerated. Thank you to reporterOuNao.

uwot 0.1.5

Bug fixes and minorimprovements

The R API was being accessed from inside multi-threaded code to seedthe (non-R) random number generators. Probably this was causing users indownstream projects (seurat and monocle) to experience strangeRcppParallel-related crashes. Thanks toaldojongejan for reportingthis (https://github.com/jlmelville/uwot/issues/39).
Passing a floating point value smaller than one ton_threads caused a crash. This was particularly insidiousif running with a system with only one default thread available as thedefaultn_threads becomes0.5. Nown_threads (andn_sgd_threads) are rounded tothe nearest integer.
Initialization of supervised UMAP should now be faster (https://github.com/jlmelville/uwot/issues/34).Contributed byAaron Lun.

uwot 0.1.4

Bug fixes and minorimprovements

Fixed incorrect loading of Annoy indexes to be compatible with newerversions of RcppAnnoy (https://github.com/jlmelville/uwot/issues/31). My thanksto Dirk Eddelbuettel and Erik Bernhardsson for aid in identifying theproblem.
Fix forERROR: there is already an InterruptableProgressMonitor instance defined.
Ifverbose = TRUE, thea,bcurve parameters are now logged.

uwot 0.1.3

Bug fixes and minorimprovements

Fixed an issue where the session would crash if the Annoy nearestneighbor search was unable to find k neighbors for an item.

Known issue

Even with a fix for the bug mentioned above, if the nearest neighborindex file is larger than 2GB in size, Annoy may not be able to read thedata back in. This should only occur with very large or high-dimensionaldatasets. The nearest neighbor search will fail under these conditions.A work-around is to setn_threads = 0, because the indexwill not be written to disk and re-loaded under these circumstances, atthe cost of a longer search time. Alternatively, set thepca parameter to reduce the dimensionality or lowern_trees, both of which will reduce the size of the index ondisk. However, either may lower the accuracy of the nearest neighborresults.

uwot 0.1.2

Initial CRAN release.

New features

New parameter,tmpdir, which allows the user to specifythe temporary directory where nearest neighbor indexes will be writtenduring Annoy nearest neighbor search. The default isbase::tempdir(). Only used ifn_threads > 1andnn_method = "annoy".

Bug fixes and minorimprovements

Fixed an issue withlvish where there was anoff-by-one error when calculating input probabilities.
Added a safe-guard tolvish to prevent the gaussianprecision, beta, becoming overly large when the binary search failsduring perplexity calibration.
Thelvish perplexity calibration uses thelog-sum-exp trick to avoid numeric underflow if beta becomeslarge.

uwot 0.0.0.9010 (31 March2019)

New features

New parameter:pcg_rand. IfTRUE (thedefault), then a random number generator fromthe PCG family is used during thestochastic optimization phase. The old PRNG, a direct translation of animplementation of the Tausworthe “taus88” PRNG used in the Pythonversion of UMAP, can be obtained by settingpcg_rand = FALSE. The new PRNG is slower, but is likelysuperior in its statistical randomness. This change in behavior will bebreak backwards compatibility: you will now get slightly differentresults even with the same seed.
New parameter:fast_sgd. IfTRUE, then thefollowing combination of parameters are set:n_sgd_threads = "auto",pcg_rand = FALSE andapprox_pow = TRUE. These will result in a substantiallyfaster optimization phase, at the cost of being slightly less accurateand results not being exactly repeatable.fast_sgd = FALSEby default but if you are only interested in visualization, thenfast_sgd gives perfectly good results. For more genericdimensionality reduction and reproducibility, keepfast_sgd = FALSE.
New parameter:init_sdev which specifies how large thestandard deviation of each column of the initial coordinates should be.This will scale any input coordinates (including user-provided matrixcoordinates).init = "spca" can now be thought of as analias ofinit = "pca", init_sdev = 1e-4. This may be tooaggressive scaling for some datasets. The typical UMAP spectralinitializations tend to result in standard deviations of around2 to5, so this might be more appropriate insome cases. If spectral initialization detects multiple components inthe affinity graph and falls back to scaled PCA, it usesinit_sdev = 1.
As a result of addinginit_sdev, theinitoptionssspectral,slaplacian andsnormlaplacian have been removed (they weren’t around forvery long anyway). You can get the same behavior by e.g.init = "spectral", init_sdev = 1e-4.init = "spca" is sticking around because I use it alot.

Bug fixes and minorimprovements

Spectral initialization (the default) was sometimes generatingcoordinates that had too large a range, due to an erroneous scale factorthat failed to account for negative coordinate values. This could giverise to embeddings with very noticeable outliers distant from the mainclusters.
Also during spectral initialization, the amount of noise being addedhad a standard deviation an order of magnitude too large compared to thePython implementation (this probably didn’t make any differencethough).
If requesting a spectral initialization, but multiple disconnectedcomponents are present, fall back toinit = "spca".
Removed dependency on C++<random> header. Thisbreaks backwards compatibility even if you setpcg_rand = FALSE.
metric = "cosine" results were incorrectly using theunmodified Annoy angular distance.
Numeric matrix columns can be specified as the target for thecategorical metric (fixeshttps://github.com/jlmelville/uwot/issues/20).

uwot 0.0.0.9009 (1 January2019)

Data is now stored column-wise during optimization, which shouldresult in an increase in performance for larger values ofn_components (e.g. approximately 50% faster optimizationtime with MNIST andn_components = 50).
New parameter:pca_center, which controls whether tocenter the data before applying PCA. It would be typical to set this toFALSE if you are applying PCA to binary data (although noteyou can’t use this with setting withmetric = "hamming")
PCA will now be used when themetric is"manhattan" and"cosine". It’s stillnot applied when using"hamming" (data still needsto be in binary format, not real-valued).
If using mixed datatypes, you may override thepca andpca_center parameter values for a given data block by usinga list for the value of the metric, with the column ids/names as anunnamed item and the overriding values as named items, e.g. instead ofmanhattan = 1:100, usemanhattan = list(1:100, pca_center = FALSE) to turn off PCAcentering for just that block. This functionality exists mainly for thecase where you have mixed binary and real-valued data and want to applyPCA to both data types. It’s normal to apply centering to real-valueddata but not to binary data.

Bug fixes and minorimprovements

Fixed bug that affectedumap_transform, where negativesampling was over the size of the test data (should be the trainingdata).
Some other performance improvements (around 10% faster for theoptimization stage with MNIST).
Whenverbose = TRUE, log the Annoy recall accuracy,which may help tune values ofn_trees andsearch_k.

uwot 0.0.0.9008 (December 232018)

New features

New parameter:n_sgd_threads, which controls the numberof threads used in the stochastic gradient descent. By default this isnow single-threaded and should result in reproducible results when usingset.seed. To get back the old, less consistent, but fastersettings, setn_sgd_threads = "auto".
API change for consistency with Python UMAP:
- alpha is nowlearning_rate.
- gamma is nowrepulsion_strength.
Default spectral initialization now looks for disconnectedcomponents and initializes them separately (also applies tolaplacian andnormlaplacian).
Newinit options:sspectral,snormlaplacian andslaplacian. These are likespectral,normlaplacian,laplacian respectively, but scaled so that each dimensionhas a standard deviation of 1e-4. This is like the difference betweenthepca andspca options.

Bug fixes and minorimprovements

Hamming distance support (was actually using Euclideandistance).
Smooth knn/perplexity calibration results had a small dependency onthe number of threads used.
Anomalously long spectral initialization times should now bereduced.
Internal changes and fixes thanks to a code review byAaron Lun.

uwot 0.0.0.9007 (December 92018)

New features

New parameterpca: set this to a positive integer toreduce matrix of data frames to that number of columns using PCA. Onlyworks ifmetric = "euclidean". If you have > 100columns, this can substantially improve the speed of the nearestneighbor search. t-SNE implementations often set this value to 50.

Bug fixes and minorimprovements

Laplacian Eigenmap initialization convergence failure is nowcorrectly detected.
C++ code was over-writing data passed from R as a functionargument.

uwot 0.0.0.9006 (December 52018)

New features

Highly experimental mixed data type support formetric:instead of specifying a single metric name(e.g. metric = "euclidean"), you can pass a list, where thename of each item is the metric to use and the value is a vector of thenames of the columns to use with that metric, e.g.metric = list("euclidean" = c("A1", "A2"), "cosine" = c("B1", "B2", "B3"))treats columnsA1 andA2 as one block, usingthe Euclidean distance to find nearest neighbors, whereasB1,B2 andB3 are treated as asecond block, using the cosine distance.
Factor columns can also be used in the metric, using the metric namecategorical.
y may now be a data frame or matrix if multiple targetdata is available.
New parametertarget_metric, to specify the distancemetric to use with numericaly. This has the samecapabilities asmetric.
Multiple external nearest neighbor data sources are now supported.Instead of passing a list of two matrices, pass a list of lists, one foreach external metric.
More details on mixed data types can be found athttps://github.com/jlmelville/uwot#mixed-data-types.
Compatibility with older versions of RcppParallel (contributed bysirusb).
scale = "Z" To Z-scale each column of input (synonymforscale = TRUE orscale = "scale").
New scaling option,scale = "colrange" to scale columnsin the range (0, 1).

uwot 0.0.0.9005 (November 42018)

New features

Hamming distance is now supported, due to upgrade to RcppAnnoy0.0.11.

uwot 0.0.0.9004 (October 212018)

New features

For supervised UMAP with numericy, you may passnearest neighbor data directly, in the same format as that supported byX-related nearest neighbor data. This may be useful if youdon’t want to use Euclidean distances for they data, or ifyou have missing data (and have a way to assign nearest neighbors forthose cases, obviously). See theNearestNeighbor Data Format section for details.

uwot 0.0.0.9003 (September 222018)

New features

New parameterret_nn: whenTRUE returnsnearest neighbor matrices as ann list: indices in itemidx and distances in itemdist. Embeddedcoordinates are inembedding. Bothret_nn andret_model can beTRUE, and should not causeany compatibility issues with supervised embeddings.
nn_method can now take precomputed nearest neighbordata. Must be a list of two matrices:idx, containinginteger indexes, anddist containing distances. By nocoincidence, this is the format return byret_nn.

Bug fixes and minorimprovements

Embedding ton_components = 1 was broken (https://github.com/jlmelville/uwot/issues/6)
User-supplied matrices toinit parameter were beingmodified, in defiance of basic R pass-by-copy semantics.

uwot 0.0.0.9002 (August 142018)

Bug fixes and minorimprovements

metric = "cosine" is working again forn_threads greater than0 (https://github.com/jlmelville/uwot/issues/5)

uwot 0.0.0.9001

New features

August 5 2018. You can now use an existing embedding toadd new points viaumap_transform. See the example sectionbelow.
August 1 2018. Numerical vectors are now supported forsupervised dimension reduction.
July 31 2018. (Very) initial support for superviseddimension reduction: categorical data only at the moment. Pass in afactor vector (useNA for unknown labels) as they parameter and edges with bad (or unknown) labels aredown-weighted, hopefully leading to better separation of classes. Thisworks remarkably well for the Fashion MNIST dataset.
July 22 2018. You can now use the cosine and Manhattandistances with the Annoy nearest neighbor search, viametric = "cosine" andmetric = "manhattan",respectively. Hamming distance is not supported because RcppAnnoydoesn’t yet support it.

[8]ページ先頭