testthat::expect in a unit test. Thank youHadley Wickham for the fix (https://github.com/jlmelville/uwot/pull/138).rng_type. This will be used in favor ofthe booleanpcg_rand parameter, althoughpcg_rand will still work for backwards compatibility.rng_type = "deterministic" to use a deterministic samplingof vertices during the optimization phase. This should givequalitatively similar results to using a real PRNG, but has theadvantage of being faster and giving more reproducible output. Thisfeature was inspired by a comment byLelandMcInnes on Reddit.num_threads directly inumap2 didnot result in the number of SGD threads being updated to that value whenbatch = TRUE, which it should have been.umap_transform continued to return the fuzzy graph intransposed form. Thank youPedroMilanezAlmeidafor reopening the issue (https://github.com/jlmelville/uwot/issues/118).repulsion_strength was silently ignored if used withtumap orumap2 witha = 1, b = 1.Ignoring the setting was on purpose, but it was not documented anywhere.repulsion_strength is now compatible with thesesettings.pca argument ifthe input data has a maximum rank smaller than the value ofpca. No PCA is applied in this case. Ifverbose = TRUE, a message will be printed to inform theuser.RSpectra is now a required dependency (again). It was arequired dependency up until version 0.1.12, when it became optional(irlba was used in its place). However, problems withinteractions of the current version ofirlba with an ABIchange in theMatrix package means that it’s hard fordownstream packages and users to builduwot withoutre-installingMatrix andirlba from source,which may not be an option for some people. Also it was causing a CRANcheck error. I have changed some tests, examples and vignettes to useRSpectra explicitly, and to only testirlbacode-paths where necessary. Seehttps://github.com/jlmelville/uwot/issues/115 and linkstherein for more details.nn_method = "hnsw" to use it. The behavior of themethod can be controlled by the newnn_args parameter, alist which may containM,ef_construction andef. See the hnswlib library’sALGO_PARAMSdocumentation for details on these parameters. Although typicallyfaster than Annoy (for a given accuracy), be aware that the onlysupportedmetric values are"euclidean","cosine" and"correlation". Finally, RcppHNSWis only a suggested package, not a requirement, so you need to installit yourself (e.g. viainstall.packages("RcppHNSW")). Alsosee thearticleon HNSW in uwot in the documentation.nn_method = "nndescent" to use it. Thebehavior of the method can be controlled by the newnn_argsparameter. There are many supported metrics and possible parameters thatcan be set innn_args, so please see thearticleon nearest neighbor descent in uwot in the documentation, and alsothe rnndescent package’sdocumentationfor details.rnndescent is only a suggested package, not arequirement, so you need to install it yourself (e.g. viainstall.packages("rnndescent")).umap2, which acts likeumapbut with modified defaults, reflecting my experience with UMAP andcorrecting some small mistakes. See theumap2article for more details.init_sdev = "range" caused an error with auser-suppliedinit matrix.correlation metric wasactually using thecosine metric if you saved and reloadedthe model. Thank youHolly Hallfor the report and helpful detective work (https://github.com/jlmelville/uwot/issues/117).umap_transform could fail if the new data to betransformed had thescaled:center andscaled:scale attributes set (e.g. from applying thescale function).umap_transform to return the fuzzy graph (ret_extra = c("fgraph")), it was transposed whenbatch = TRUE, n_epochs = 0. Thank youPedroMilanezAlmeidafor reporting (https://github.com/jlmelville/uwot/issues/118).n_sgd_threads = "auto" withumap_transform caused a crash.dist class was meant that may have been particularlyaffecting Seurat users. Thank youAndiMunteanu for reporting(and suggesting a solution) (https://github.com/jlmelville/uwot/issues/121).optimize_graph_layout. Use this toproduce optimized output coordinates that reflect an input similaritygraph (such as that produced by thesimilarity_graphfunction.similarity_graph followed byoptimize_graph_layout is the same as runningumap, so the purpose of these functions is to allow formore flexibility and decoupling between generating the nearest neighborgraph and optimizing the low-dimensional approximation to it. Based on arequest by userChengwei94(https://github.com/jlmelville/uwot/issues/98).simplicial_set_union andsimplicial_set_intersect. These allow for the combinationof different fuzzy graph representations of a dataset into a singlefuzzy graph using the UMAP simplicial set operations. Based on a requestin the Python UMAP issues tracker by userDhar xion.umap_transform:ret_extra. This works like the equivalent parameter forumap, and should be a character vector specifying the extrainformation you would like returned in addition to the embedding, inwhich case a list will be returned with anembedding membercontaining the optimized coordinates. Supported values are"fgraph","nn","sigma" and"localr". Based on a request by userPedroMilanezAlmeida(https://github.com/jlmelville/uwot/issues/104).umap,tumap andumap_transform:seed. This will do theequivalent of callingset.seed internally, and hence willhelp with reproducibility. The chosen seed is exported ifret_model = TRUE andumap_transform will usethat seed if present, so you only need to specify it inumap_transform if you want to change the seed. The defaultbehavior remains to not modify the random number state. Based on arequest bySuhasSrinivasan (https://github.com/jlmelville/uwot/issues/110).init_sdev: setinit_sdev = "range" and initial coordinates will berange-scaled so each column takes values between 0-10. Thispre-processing was added to the Python UMAP package at some point afteruwot began development and so should probably always beused with the defaultinit = "spectral" setting. However,it is not set by default to maintain backwards compatibility with olderversions ofuwot.ret_extra = c("sigma") is now supported bylvish. The Gaussian bandwidths are returned in asigma vector. In addition, a vector of intrinsicdimensionalities estimated for each point using an analytical expressionof the finite difference method given byLee andco-workers is returned in thedint vector.min_dist andspread parameters are nowreturned in the model whenumap is run withret_model = TRUE. This is just for documentation purposes,these values are not used directly by the model inumap_transform. If the parametersa andb are set directly when invokingumap, thenbothmin_dist andspread will be set toNULL in the returned model. This feature was added inresponse to a question fromkjiang18 (https://github.com/jlmelville/uwot/issues/95).n_components seems to have beenset too high.n_components was greater thann_neighbors thenumap_transform would crashthe R session. Thank you toChVavfor reporting this (https://github.com/jlmelville/uwot/issues/102).umap_transform with a model wheredens_scale was set could cause a segmentation fault,destroying the session. Even if it didn’t it could give an entirelyartifactual “ring” structure. Thank youFemkeSmit for reporting this andproviding assistance in diagnosing the underlying cause (https://github.com/jlmelville/uwot/issues/103).binary_edge_weights = TRUE, this setting wasnot exported whenret_model = TRUE, and was therefore notrespected byumap_transform. This has now been fixed, butyou will need to regenerate any models that used binary edgeweights.init param said that if there weremultiple disconnected components, a spectral initialization wouldattempt to merge multiple sub-graphs. Not true: actually, spectralinitialization is abandoned in favor of PCA. The documentation has beenupdated to reflect the true state of affairs. No idea what I wasthinking of there.load_model andsave_model didn’t work onWindows 7 due to how the version oftar there handles driveletters. Thank youmytarmailfor the report (https://github.com/jlmelville/uwot/issues/109).similarity_graph. If you are moreinterested in the high-dimensional graph/fuzzy simplicial setrepresentation of your input data, and don’t care about the lowdimensional approximation, thesimilarity_graph functionoffers a similar API toumap, but neither theinitialization nor optimization of low-dimensional coordinates will beperformed. The return value is the same as that which would be returnedin the results list as thefgraph member if you hadprovidedret_extra = c("fgraph"). Compared to getting thesame result via runningumap, this function is a bit moreconvenient to use, makes your intention clearer if you would bediscarding the embedding, and saves a small amount of time. At-SNE/LargeVis similarity graph can be returned by settingmethod = "largevis".umap_transform withpre-generated nearest neighbors (also the error message was completelyuseless). Thank you toAustinHartman for reportingthis (https://github.com/jlmelville/uwot/issues/97).fuzzy_simplicial_set) refactored to behave more like thatof previous versions. This change was breaking the behavior of the CRANpackagebbknnR.dens_weight. If set to a value between 0and 1, an attempt is made to include the relative local densities of theinput data in the output coordinates. This is an approximation to thedensMAP method. Alarge value ofdens_weight will use a larger range ofoutput densities to reflect the input data. If the data is too spreadout, reduce the value ofdens_weight. For more informationsee thedocumentationat the uwot repo.binary_edge_weights. If set toTRUE, instead of smoothed knn distances, non-zero edgeweights all have a value of 1. This is howPaCMAP works andthere ispractical andtheoreticalreasons to believe this won’t have a big effect on UMAP but you can tryit yourself.ret_extra:"sigma": the return value will contain asigma entry, a vector of the smooth knn distance scalingnormalization factors, one for each observation in the input data. Asmall value indicates a high density of points in the local neighborhoodof that observation. Forlvish the equivalent bandwidthscalculated for the input perplexity is returned.rho will be exported, which is thedistance to the nearest neighbor after the number of neighbors specifiedby thelocal_connectivity. Only applies forumap andtumap."localr": exports a vector of the local radii, the sumofsigma andrho and used to scale the outputcoordinates whendens_weight is set. Even if not usingdens_weight, visualizing the output coordinates using acolor scale based on the value oflocalr can reveal regionsof the input data with different densities.umap andtumap only: newdata type for precomputed nearest neighbor data passed as thenn_method parameter: you may use a sparse distance matrixof formatdgCMatrix with dimensionsN x NwhereN is the number of observations in the input data.Distances should be arranged by column, i.e. a non-zero entry in rowj of theith column indicates that thejth observation in the input data is a nearest neighbor oftheith observation with the distance given by the value ofthat element. Note that this is a different format to the sparsedistance matrix that can be passed as input toX: notably,the matrix is not assumed to be symmetric. Unlike other input formats,you may have a different number of neighbors for each observation (butthere must be at least one neighbor defined per observation).umap_transform can also take a sparse distance matrixas itsnn_method parameter if precomputed nearest neighbordata is used to generate an initial model. The format is the same as forthenn_method withumap. Because distances arearranged by columns, the expected dimensions of the sparse matrix isN_model x N_new whereN_model is the number ofobservations in the original data andN_new is the numberof observations in the data to be transformed.n_components = 100 or higher),RSpectra is recommended and will likely out-perform irlba even if youhave installed a good linear algebra library.init = "laplacian" returned the wrong coordinatesbecause of a slightly subtle issue around how to order the eigenvectorswhen using the random walk transition matrix rather than normalizedgraph laplacians.init_sdev parameter was ignored when theinit parameter was a user-supplied matrix. Now the inputwill be scaled.bandwidth parameter has beenchanged to give results more like the current version (0.5.2) of thePython UMAP implementation. This is likely to be a breaking change fornon-default settings ofbandwidth, but this is not aparameter which is actually exposed by the Python UMAP public API anymore, so is on the road to deprecation in uwot too and I don’t recommendyou change this.batch. IfTRUE, thenresults are reproducible whenn_sgd_threads > 1 (as longas you useset.seed). The price to be paid is that theoptimization is slightly less efficient (because coordinates are notupdated as quickly and hence gradients are staler for longer), so it ishighly recommended to setn_epochs = 500 or higher. Thankyou toAaron Lun who not only cameup with a way to implement this feature, but also wrote an entireC++ implementation of UMAPwhich does it (https://github.com/jlmelville/uwot/issues/83).opt_args. The default optimizationmethod whenbatch = TRUE isAdam. You can control itsparameters by passing them in theopt_args list. As Adam isa momentum-based method it requires extra storage of previous gradientdata. To avoid the extra memory overhead you can also useopt_args = list(method = "sgd") to use a stochasticgradient descent method like that used whenbatch = FALSE.epoch_callback. You may now pass afunction which will be invoked at the end of each epoch. Mainly usefulfor producing an image of the state of the embedding at different pointsduring the optimization. This is another feature taken fromumappp.pca_method, used when thepca parameter is supplied to reduce the initialdimensionality of the data. This controls which method is used to carryout the PCA and can be set to one of:"irlba" which usesirlba::irlba tocalculate a truncated SVD. If this routine deems that you are trying toextract 50% or more of the singular vectors, you will see a warning tothat effect logged to the console."rsvd", which usesirlba::svdr fortruncated SVD. This method uses a small number of iterations whichshould give an accuracy/speed up trade-off similar to that of thescikit-learnTruncatedSVD method. This can be much faster than using"irlba" but potentially at a cost in accuracy. However, forthe purposes of dimensionality reduction as input to nearest neighborsearch, this doesn’t seem to matter much."bigstatsr", which uses thebigstatsrpackage will be used.Note: that this isnot adependency ofuwot. If you want to usebigstatsr, you must install it yourself. On platformswithout easy access to fast linear algebra libraries (e.g. Windows),usingbigstatsr may give a speed up to PCAcalculations."svd", which usesbase::svd.Warning: this is likely to be very slow for mostdatasets and exists as a fallback for small datasets where the"irlba" method would print a warning."auto" (the default) which uses"irlba" tocalculate a truncated SVD, unless you are attempting to extract 50% ormore of the singular vectors, in which case"svd" isused.ret_nn = TRUE. If thenames exist in more than one of the input data parameters listed above,but are inconsistent, no guarantees are made about which names will beused. Thank youjwijffels forreporting this.umap_transform, the learning rate is now down-scaledby a factor of 4, consistent with the Python implementation of UMAP. Ifyou need the old behavior back, use the (newly added)learning_rate parameter inumap_transform toset it explicitly. If you used the default value inumapwhen creating the model, the correct setting inumap_transform islearning_rate = 1.0.nn_method = "annoy" andverbose = TRUE would lead to an error with datasets withfewer than 50 items in them.umap_transform (this was incorrectlydocumented to work).umap_transform was wrong in other ways: it has now beencorrected to indicate that there should be neighbor data for each itemin the test data, but the neighbors and distances should refer to itemsin training data (i.e. the data used to build the model).n_neighbors parameter is now correctly ignored in modelgeneration if pre-calculated nearest neighbor data is provided.grain_size didn’t doanything.This release is mainly to allow for some internal changes to keepcompatibility with RcppAnnoy, used for the nearest neighborcalculations.
umap andtumap now note that the contents of themodellist are subject to change and not intended to be part of the uwotpublic API. I recommend not relying on the structure of themodel, especially if your package is intended to appear onCRAN or Bioconductor, as any breakages will delay future releases ofuwot to CRAN.metric = "correlation" a distance based onthe Pearson correlation (https://github.com/jlmelville/uwot/issues/22).Supporting this required a change to the internals of how nearestneighbor data is stored. Backwards compatibility with models generatedby previous versions usingret_model = TRUE should havebeen preserved.nn_method, forumap_transform: pass a list containing pre-computed nearestneighbor data (identical to that used in theumapfunction). You should not pass anything to theX parameterin this case. This extends the functionality for transforming new pointsto the case where nearest neighbor data between the original data andnew data can be calculated external touwot. Thanks toYuhan Hao for contributing the PR(https://github.com/jlmelville/uwot/issues/63 andhttps://github.com/jlmelville/uwot/issues/64).init, forumap_transform:provides a variety of options for initializing the output coordinates,analogously to the same parameter in theumap function (butwithout as many options currently). This is intended to replaceinit_weighted, which should be considered deprecated, butwon’t be removed until uwot 1.0 (whenever that is). Instead ofinit_weighted = TRUE, useinit = "weighted";replaceinit_weighted = FALSE withinit = "average". Additionally, you can pass a matrix toinit to act as the initial coordinates.umap_transform: previously, settingn_epochs = 0 was ignored: at least one iteration ofoptimization was applied. Now,n_epochs = 0 is respected,and will return the initialized coordinates without any furtheroptimization.verbose = TRUE: the progress bar calculationswere taking up a detectable amount of time and has now been fixed. Withvery small data sets (< 50 items) the progress bar will no longerappear when building the index.n_threads is nowNULL toprovide a bit more protection from changing dependencies.grain_size parameter has been undeprecated. As theversion that deprecated this never made it to CRAN, this is unlikely tohave affected many people.grain_size parameter is now ignored and remains toavoid breaking backwards compatibility only.ret_extra, a vector which can containany combination of:"model" (same asret_model = TRUE),"nn" (same asret_nn = TRUE) andfgraph (see below).ret_extra vector contains"fgraph", the returned list will contain anfgraph item representing the fuzzy simplicial input graphas a sparse N x N matrix. Forlvish, use"P"instead of"fgraph” (https://github.com/jlmelville/uwot/issues/47). Note thatthere is a further sparsifying step where edges with a very lowmembership are removed if there is no prospect of the edge being sampledduring optimization. This is controlled byn_epochs: thesmaller the value, the more sparsifying will occur. If you are onlyinterested in the fuzzy graph and not the embedded coordinates, setn_epochs = 0.unload_uwot, to unload the Annoy nearestneighbor indices in a model. This prevents the model from being used inumap_transform, but allows for the temporary workingdirectory created by bothsave_uwot andload_uwot to be deleted. Previously, bothload_uwot andsave_uwot were attempting todelete the temporary working directories they used, but would alwayssilently fail because Annoy is making use of files in thosedirectories.init = "spca", fixed values ofa andb (rather than allowing them to be calculated throughsettingmin_dist andspread) andapprox_pow = TRUE. Using thetumap method withinit = "spca" is probably the most robust approach.n_epochs = 0. This used to behavelike (n_epochs = NULL) and gave a default number of epochs(dependent on the number of vertices in the dataset). Now it moreusefully carries out all calculations except optimization, so thereturned coordinates are those specified by theinitparameter, so this is an easy way to access e.g. the spectral or PCAinitialization coordinates. If you want the input fuzzy graph(ret_extra vector contains"fgraph"), thiswill also prevent the graph having edges with very low membership beingremoved. You still get the old default epochs behavior by settingn_epochs = NULL or to a negative value.save_uwot andload_uwot have been updatedwith averbose parameter so it’s easier to see whattemporary files are being created.save_uwot has a new parameter,unload,which if set toTRUE will delete the working directory foryou, at the cost of unloading the model, i.e. it can’t be used withumap_transform until you reload it withload_uwot.save_uwot now returns the saved model with an extrafield,mod_dir, which points to the location of thetemporary working directory, so you should now assign the result ofcallingsave_uwot to the model you saved, e.g.model <- save_uwot(model, "my_model_file"). This fieldis intended for use withunload_uwot.load_uwot also returns the model with amod_dir item for use withunload_uwot.save_uwot andload_uwot were not correctlyhandling relative paths.load_uwot in uwot 0.1.4 to workwith newer versions of RcppAnnoy (https://github.com/jlmelville/uwot/issues/31) failed inthe typical case of a single metric for the nearest neighbor searchusing all available columns, giving an error message along the lines of:Error: index size <size> is not a multiple of vector size <size>.This has now been fixed, but required changes to bothsave_uwot andload_uwot, so existing savedmodels must be regenerated. Thank you to reporterOuNao.n_threads caused a crash. This was particularly insidiousif running with a system with only one default thread available as thedefaultn_threads becomes0.5. Nown_threads (andn_sgd_threads) are rounded tothe nearest integer.ERROR: there is already an InterruptableProgressMonitor instance defined.verbose = TRUE, thea,bcurve parameters are now logged.Even with a fix for the bug mentioned above, if the nearest neighborindex file is larger than 2GB in size, Annoy may not be able to read thedata back in. This should only occur with very large or high-dimensionaldatasets. The nearest neighbor search will fail under these conditions.A work-around is to setn_threads = 0, because the indexwill not be written to disk and re-loaded under these circumstances, atthe cost of a longer search time. Alternatively, set thepca parameter to reduce the dimensionality or lowern_trees, both of which will reduce the size of the index ondisk. However, either may lower the accuracy of the nearest neighborresults.
Initial CRAN release.
tmpdir, which allows the user to specifythe temporary directory where nearest neighbor indexes will be writtenduring Annoy nearest neighbor search. The default isbase::tempdir(). Only used ifn_threads > 1andnn_method = "annoy".Fixed an issue withlvish where there was anoff-by-one error when calculating input probabilities.
Added a safe-guard tolvish to prevent the gaussianprecision, beta, becoming overly large when the binary search failsduring perplexity calibration.
Thelvish perplexity calibration uses thelog-sum-exp trick to avoid numeric underflow if beta becomeslarge.
pcg_rand. IfTRUE (thedefault), then a random number generator fromthe PCG family is used during thestochastic optimization phase. The old PRNG, a direct translation of animplementation of the Tausworthe “taus88” PRNG used in the Pythonversion of UMAP, can be obtained by settingpcg_rand = FALSE. The new PRNG is slower, but is likelysuperior in its statistical randomness. This change in behavior will bebreak backwards compatibility: you will now get slightly differentresults even with the same seed.fast_sgd. IfTRUE, then thefollowing combination of parameters are set:n_sgd_threads = "auto",pcg_rand = FALSE andapprox_pow = TRUE. These will result in a substantiallyfaster optimization phase, at the cost of being slightly less accurateand results not being exactly repeatable.fast_sgd = FALSEby default but if you are only interested in visualization, thenfast_sgd gives perfectly good results. For more genericdimensionality reduction and reproducibility, keepfast_sgd = FALSE.init_sdev which specifies how large thestandard deviation of each column of the initial coordinates should be.This will scale any input coordinates (including user-provided matrixcoordinates).init = "spca" can now be thought of as analias ofinit = "pca", init_sdev = 1e-4. This may be tooaggressive scaling for some datasets. The typical UMAP spectralinitializations tend to result in standard deviations of around2 to5, so this might be more appropriate insome cases. If spectral initialization detects multiple components inthe affinity graph and falls back to scaled PCA, it usesinit_sdev = 1.init_sdev, theinitoptionssspectral,slaplacian andsnormlaplacian have been removed (they weren’t around forvery long anyway). You can get the same behavior by e.g.init = "spectral", init_sdev = 1e-4.init = "spca" is sticking around because I use it alot.init = "spca".<random> header. Thisbreaks backwards compatibility even if you setpcg_rand = FALSE.metric = "cosine" results were incorrectly using theunmodified Annoy angular distance.categorical metric (fixeshttps://github.com/jlmelville/uwot/issues/20).n_components (e.g. approximately 50% faster optimizationtime with MNIST andn_components = 50).pca_center, which controls whether tocenter the data before applying PCA. It would be typical to set this toFALSE if you are applying PCA to binary data (although noteyou can’t use this with setting withmetric = "hamming")metric is"manhattan" and"cosine". It’s stillnot applied when using"hamming" (data still needsto be in binary format, not real-valued).pca andpca_center parameter values for a given data block by usinga list for the value of the metric, with the column ids/names as anunnamed item and the overriding values as named items, e.g. instead ofmanhattan = 1:100, usemanhattan = list(1:100, pca_center = FALSE) to turn off PCAcentering for just that block. This functionality exists mainly for thecase where you have mixed binary and real-valued data and want to applyPCA to both data types. It’s normal to apply centering to real-valueddata but not to binary data.umap_transform, where negativesampling was over the size of the test data (should be the trainingdata).verbose = TRUE, log the Annoy recall accuracy,which may help tune values ofn_trees andsearch_k.n_sgd_threads, which controls the numberof threads used in the stochastic gradient descent. By default this isnow single-threaded and should result in reproducible results when usingset.seed. To get back the old, less consistent, but fastersettings, setn_sgd_threads = "auto".alpha is nowlearning_rate.gamma is nowrepulsion_strength.laplacian andnormlaplacian).init options:sspectral,snormlaplacian andslaplacian. These are likespectral,normlaplacian,laplacian respectively, but scaled so that each dimensionhas a standard deviation of 1e-4. This is like the difference betweenthepca andspca options.pca: set this to a positive integer toreduce matrix of data frames to that number of columns using PCA. Onlyworks ifmetric = "euclidean". If you have > 100columns, this can substantially improve the speed of the nearestneighbor search. t-SNE implementations often set this value to 50.metric:instead of specifying a single metric name(e.g. metric = "euclidean"), you can pass a list, where thename of each item is the metric to use and the value is a vector of thenames of the columns to use with that metric, e.g.metric = list("euclidean" = c("A1", "A2"), "cosine" = c("B1", "B2", "B3"))treats columnsA1 andA2 as one block, usingthe Euclidean distance to find nearest neighbors, whereasB1,B2 andB3 are treated as asecond block, using the cosine distance.categorical.y may now be a data frame or matrix if multiple targetdata is available.target_metric, to specify the distancemetric to use with numericaly. This has the samecapabilities asmetric.scale = "Z" To Z-scale each column of input (synonymforscale = TRUE orscale = "scale").scale = "colrange" to scale columnsin the range (0, 1).y, you may passnearest neighbor data directly, in the same format as that supported byX-related nearest neighbor data. This may be useful if youdon’t want to use Euclidean distances for they data, or ifyou have missing data (and have a way to assign nearest neighbors forthose cases, obviously). See theNearestNeighbor Data Format section for details.ret_nn: whenTRUE returnsnearest neighbor matrices as ann list: indices in itemidx and distances in itemdist. Embeddedcoordinates are inembedding. Bothret_nn andret_model can beTRUE, and should not causeany compatibility issues with supervised embeddings.nn_method can now take precomputed nearest neighbordata. Must be a list of two matrices:idx, containinginteger indexes, anddist containing distances. By nocoincidence, this is the format return byret_nn.n_components = 1 was broken (https://github.com/jlmelville/uwot/issues/6)init parameter were beingmodified, in defiance of basic R pass-by-copy semantics.metric = "cosine" is working again forn_threads greater than0 (https://github.com/jlmelville/uwot/issues/5)August 5 2018. You can now use an existing embedding toadd new points viaumap_transform. See the example sectionbelow.
August 1 2018. Numerical vectors are now supported forsupervised dimension reduction.
July 31 2018. (Very) initial support for superviseddimension reduction: categorical data only at the moment. Pass in afactor vector (useNA for unknown labels) as they parameter and edges with bad (or unknown) labels aredown-weighted, hopefully leading to better separation of classes. Thisworks remarkably well for the Fashion MNIST dataset.
July 22 2018. You can now use the cosine and Manhattandistances with the Annoy nearest neighbor search, viametric = "cosine" andmetric = "manhattan",respectively. Hamming distance is not supported because RcppAnnoydoesn’t yet support it.