Public Interface

Reference for the public interface for UMAP.jl.

Contents

Public Interface

UMAP.fitFunction
function fit(data[, n_components=2]; <kwargs>) -> UMAPResult

Embed data into a n_components-dimensional space. Returns a UMAPResult.

Keyword Arguments

  • n_neighbors::Integer = 15: the number of neighbors to consider as locally connected. Larger values capture more global structure in the data, while small values capture more local structure.
  • metric::{SemiMetric, Symbol} = Euclidean(): the metric to calculate distance in the input space. It is also possible to pass metric = :precomputed to treat data like a precomputed distance matrix.
  • n_epochs::Integer = 300: the number of training epochs for embedding optimization
  • learning_rate::Real = 1: the initial learning rate during optimization
  • init::AbstractInitialization = UMAPA.SpectralInitialization(): how to initialize the output embedding; valid options are UMAP.SpectralInitialization() and UMAP.UniformInitialization()
  • min_dist::Real = 0.1: the minimum spacing of points in the output embedding
  • spread::Real = 1: the effective scale of embedded points. Determines how clustered embedded points are in combination with min_dist.
  • set_operation_ratio::Real = 1: interpolates between fuzzy set union and fuzzy set intersection when constructing the UMAP graph (global fuzzy simplicial set). The value of this parameter should be between 1.0 and 0.0: 1.0 indicates pure fuzzy union, while 0.0 indicates pure fuzzy intersection.
  • local_connectivity::Integer = 1: the number of nearest neighbors that should be assumed to be locally connected. The higher this value, the more connected the manifold becomes. This should not be set higher than the intrinsic dimension of the manifold.
  • repulsion_strength::Real = 1: the weighting of negative samples during the optimization process.
  • neg_sample_rate::Integer = 5: the number of negative samples to select for each positive sample. Higher values will increase computational cost but result in slightly more accuracy.
source
UMAP.transformMethod
transform(result::UMAPResult, queries, knn_params, src_params, gbl_params, tgt_params, opt_params)

Transform the UMAP result for new queries. This method allows overriding the transform-time parameters by passing in configuration structs directly.

source
UMAP.transformMethod
transform(result::UMAPResult, queries) -> UMAPTransformResult

Use the given UMAP result to embed new points into an existing embedding. queries is a matrix or vector of some number of points in the same space as result.data. The returned embedding is the embedding of these points in n-dimensional space, where n is the dimensionality of result.embedding. This embedding is created by finding neighbors of queries in result.embedding and optimizing cross entropy according to membership strengths according to these neighbors.

The transform is parameterized by the config found in result. For that reason, the type of result must match exactly result.data - including as a named tuple if necessary.

source

Index