Usage
Graph Construction
nndescent
builds the approximate kNN graph:
nndescent(data, n_neighbors, metric; max_iters, sample_rate, precision) -> graph
data
: The set of points to build the graph from. This must either be a vector of pointsAbstractVector{V}
or anAbstractMatrix
. In the matrix case, each column is assumed to be a point.n_neighbors
: A positive integer specifying the number of neighbors to find. As the quality of the approximate neighbors returned depends on this parameter, this should probably never be less than5
.metric
: Any subtype of thePreMetric
abstract type fromDistances.jl
, including user-defined subtypes.
The behavior of NNDescent can be tuned with several keyword arguments.
max_iters
: This controls the maximum number of iterations to search for neighbors. Default is10
.sample_rate
: Controls how many candidate neighbors are evaluated each iteration. Lower values result in fewer distance calculations per iteration, trading off accuracy for speed. Must be a value in (0, 1]; default is1
.precision
: Execution will terminate early (i.e. beforemax_iters
is reached) if the proportion of edges updated after an iteration is less than this value. Default.001
.
The k-nearest neighbors can be accessed with
knn_matrices(graph) -> indices, distances
The indices
and distances
are both KxN
matrices containing ids and distances to each point's neighbors, respectively, where K = n_neighbors
and N
is the number of points in the dataset.
Example:
using NearestNeighborDescent
data = [rand(10) for _ in 1:1000]
# OR data = rand(10, 1000)
# nn descent search
graph = nndescent(data, 10, Euclidean())
# access point i's jth nearest neighbor:
indices, distances = knn_matrices(graph)
indices[j, i]
distances[j, i]
Querying
Once constructed, the ApproximateKNNGraph
can be used to find the nearest neighbors to new points. This is done via the search
method:
search(graph, queries, n_neighbors; max_candidates) -> indices, distances
graph
: An instance ofApproximateKNNGraph
queries
: A vector of new data points of typeVector{V}
orAbstractMatrix
. Note that the type of the query points must be compatible with the original dataset points.n_neighbors
: The number of neighbors to find for each query. This does not have to be the same as the number used to constructgraph
.max_candidates
: Each query maintains a heap of candidate neighbors - the maximum size of that heap is controlled with this keyword arg (defaultmax(n_neighbors, 20)
).
This will automatically use all available threads for the search. Similar to knn_matrices
, this returns two matrices for the indices and distances to the nearest neighbors of each query.
Example:
queries = [rand(10) for _ in 1:100]
# OR queries = rand(10, 100)
indices, distances = search(graph, queries, 5)