Benchmarking Recall and Latency

bigANNOY includes exported benchmark helpers so you can measure three related things with the same interface:

index build time
search time
optional Euclidean recall against an exact bigKNN baseline
comparison against direct RcppAnnoy
scaling with data volume and generated index size

This vignette shows how to use those helpers for both quick one-off runs and small parameter sweeps.

What the Benchmark Helpers Do

The package currently exports four benchmark functions:

benchmark_annoy_bigmatrix() for one build-and-search configuration
benchmark_annoy_recall_suite() for a grid of n_trees and search_k settings on the same dataset
benchmark_annoy_vs_rcppannoy() for a direct comparison between the package’s bigmemory workflow and a dense RcppAnnoy baseline
benchmark_annoy_volume_suite() for scaling studies across larger synthetic data sizes

These helpers can work with:

synthetic data generated on the fly
user-supplied dense matrices
big.matrix inputs, descriptors, descriptor paths, and external pointers

They can also write summaries to CSV so results can be saved outside the current R session, and the comparison helpers add byte-oriented fields for the reference data, query data, Annoy index file, and total persisted artifacts.

Load the Package

library(bigANNOY)

Create a Benchmark Workspace

We will write any temporary benchmark files into a dedicated directory so the workflow is easy to inspect.

bench_dir <- tempfile("bigannoy-benchmark-")
dir.create(bench_dir, recursive = TRUE, showWarnings = FALSE)
bench_dir

#> [1] "/var/folders/h9/npmqbtmx4wlblg4wks47yj5c0000gn/T//RtmpBEyDSE/bigannoy-benchmark-b1f6e2ddb03"

A Single Synthetic Benchmark Run

The simplest benchmark call uses synthetic data. This is useful when you want a quick sense of how build and search times respond to n_trees, search_k, and the problem dimensions.

single_csv <- file.path(bench_dir, "single.csv")

single <- benchmark_annoy_bigmatrix(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = single_csv,
  load_mode = "eager"
)

single$summary

metric	backend	filebacked	self_search	load_mode	n_ref	n_query	n_dim	k	n_trees	search_k	build_threads	build_elapsed	search_elapsed	exact_elapsed	recall_at_k	index_id
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	10	50	-1	0.228	0.001			annoy-20260327203932-1c5bd3cc4ee2

The returned object contains more than just the summary row.

names(single)

#> [1] "summary"         "params"          "index_path"      "metadata_path"  
#> [5] "exact_available" "validation"

single$params

metric	backend	filebacked	self_search	load_mode	n_ref	n_query	n_dim	k	n_trees	search_k	build_threads
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	10	50	-1

single$exact_available

#> [1] FALSE

Because exact = FALSE, the benchmark skips the exact bigKNN comparison and focuses only on the approximate Annoy path.

Validation Is Part of the Benchmark Workflow

The benchmark helpers also validate the built Annoy index before measuring the search step. That helps ensure the timing result corresponds to a usable, reopenable index rather than a partially successful build.

single$validation$valid

#> [1] TRUE

single$validation$checks[, c("check", "passed", "severity")]

check	passed	severity
index_file	TRUE	error
metric	TRUE	error
dimensions	TRUE	error
items	TRUE	error
file_size	TRUE	error
file_md5	TRUE	error
file_mtime	TRUE	warning
load	TRUE	error

The same summary is also written to CSV when output_path is supplied.

read.csv(single_csv, stringsAsFactors = FALSE)

metric	backend	filebacked	self_search	load_mode	n_ref	n_query	n_dim	k	n_trees	search_k	build_threads	build_elapsed	search_elapsed	exact_elapsed	recall_at_k	index_id
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	10	50	-1	0.228	0.001			annoy-20260327203932-1c5bd3cc4ee2

External-Query Versus Self-Search Benchmarks

One subtle but important detail is how synthetic data generation works:

if x = NULL and query is omitted, the benchmark generates a separate synthetic query matrix
if x = NULL and query = NULL is supplied explicitly, the benchmark runs self-search on the reference matrix

That difference is reflected in the self_search and n_query fields.

external_run <- benchmark_annoy_bigmatrix(
  n_ref = 120L,
  n_query = 12L,
  n_dim = 5L,
  k = 3L,
  n_trees = 8L,
  exact = FALSE,
  path_dir = bench_dir
)

self_run <- benchmark_annoy_bigmatrix(
  n_ref = 120L,
  query = NULL,
  n_dim = 5L,
  k = 3L,
  n_trees = 8L,
  exact = FALSE,
  path_dir = bench_dir
)

shape_cols <- c("self_search", "n_ref", "n_query", "k")

rbind(
  external = external_run[["summary"]][, shape_cols],
  self = self_run[["summary"]][, shape_cols]
)

	self_search	n_ref	n_query	k
external	FALSE	120	12	3
self	TRUE	120	120	3

That distinction matters when you are benchmarking workflows that mirror either training-set neighbour search or truly external query traffic.

Benchmark a Recall Suite Across Parameter Grids

For tuning work, a single benchmark point is usually not enough. The suite helper runs a grid of n_trees and search_k values on the same dataset so you can compare trade-offs more systematically.

suite_csv <- file.path(bench_dir, "suite.csv")

suite <- benchmark_annoy_recall_suite(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = c(5L, 10L),
  search_k = c(-1L, 50L),
  exact = FALSE,
  path_dir = bench_dir,
  output_path = suite_csv,
  load_mode = "eager"
)

suite$summary

metric	backend	filebacked	self_search	load_mode	n_ref	n_query	n_dim	k	n_trees	search_k	build_threads	build_elapsed	search_elapsed	index_id
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	5	-1	-1	0.006	0.000	annoy-20260327203932-8ca097928d75
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	5	50	-1	0.006	0.001	annoy-20260327203932-8ca097928d75
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	10	-1	-1	0.008	0.001	annoy-20260327203932-1c5bd3cc4ee2
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	10	50	-1	0.008	0.000	annoy-20260327203932-1c5bd3cc4ee2

Each row corresponds to one (n_trees, search_k) configuration on the same underlying benchmark dataset.

The saved CSV contains the same summary table.

read.csv(suite_csv, stringsAsFactors = FALSE)

metric	backend	filebacked	self_search	load_mode	n_ref	n_query	n_dim	k	n_trees	search_k	build_threads	build_elapsed	search_elapsed	index_id
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	5	-1	-1	0.006	0.000	annoy-20260327203932-8ca097928d75
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	5	50	-1	0.006	0.001	annoy-20260327203932-8ca097928d75
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	10	-1	-1	0.008	0.001	annoy-20260327203932-1c5bd3cc4ee2
euclidean	cpp	FALSE	FALSE	eager	200	20	6	3	10	50	-1	0.008	0.000	annoy-20260327203932-1c5bd3cc4ee2

Optional Exact Recall Against bigKNN

For Euclidean workloads, the benchmark helpers can optionally compare Annoy results against the exact bigKNN baseline and report:

exact_elapsed
recall_at_k

That comparison is only available when the runtime package bigKNN is installed.

if (length(find.package("bigKNN", quiet = TRUE)) > 0L) {
  exact_run <- benchmark_annoy_bigmatrix(
    n_ref = 150L,
    n_query = 15L,
    n_dim = 5L,
    k = 3L,
    n_trees = 10L,
    search_k = 50L,
    metric = "euclidean",
    exact = TRUE,
    path_dir = bench_dir
  )

  exact_run$exact_available
  exact_run$summary[, c("build_elapsed", "search_elapsed", "exact_elapsed", "recall_at_k")]
} else {
  "Exact baseline example skipped because bigKNN is not installed."
}

build_elapsed	search_elapsed	exact_elapsed	recall_at_k
0.007	0	0.003	0.956

This is the most direct way to answer the practical question, “How much search speed am I buying, and what recall do I lose in return?”

Benchmark User-Supplied Data

Synthetic data is convenient, but real benchmarking usually needs real data. Both benchmark helpers can accept user-supplied reference and query inputs.

ref <- matrix(rnorm(80 * 4), nrow = 80, ncol = 4)
query <- matrix(rnorm(12 * 4), nrow = 12, ncol = 4)

user_run <- benchmark_annoy_bigmatrix(
  x = ref,
  query = query,
  k = 3L,
  n_trees = 12L,
  search_k = 40L,
  exact = FALSE,
  filebacked = TRUE,
  path_dir = bench_dir,
  load_mode = "eager"
)

user_run$summary[, c(
  "filebacked",
  "self_search",
  "n_ref",
  "n_query",
  "n_dim",
  "build_elapsed",
  "search_elapsed"
)]

filebacked	self_search	n_ref	n_query	n_dim	build_elapsed	search_elapsed
TRUE	FALSE	80	12	4	0.006	0

When filebacked = TRUE, dense reference inputs are first converted into a file-backed big.matrix before the Annoy build starts. That can be useful when you want the benchmark workflow to resemble the package’s real persisted data path more closely.

Compare bigANNOY with Direct RcppAnnoy

When you want to understand the cost of the bigmemory-oriented wrapper itself, the most useful benchmark is not an exact Euclidean baseline. It is a direct comparison with plain RcppAnnoy, using the same synthetic dataset, the same metric, the same n_trees, and the same search_k.

That is what benchmark_annoy_vs_rcppannoy() provides.

compare_csv <- file.path(bench_dir, "compare.csv")

compare_run <- benchmark_annoy_vs_rcppannoy(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = compare_csv,
  load_mode = "eager"
)

compare_run$summary[, c(
  "implementation",
  "reference_storage",
  "n_ref",
  "n_query",
  "n_dim",
  "total_data_bytes",
  "index_bytes",
  "build_elapsed",
  "search_elapsed"
)]

implementation	reference_storage	n_ref	n_query	n_dim	total_data_bytes	index_bytes	build_elapsed	search_elapsed
bigANNOY	bigmatrix	200	20	6	10560	35840	0.007	0.000
RcppAnnoy	dense_matrix	200	20	6	10560	35840	0.004	0.001

This benchmark is useful for a different question from the earlier exact baseline:

benchmark_annoy_bigmatrix() asks how approximate Annoy behaves on a given dataset and, optionally, how much recall it loses against exact bigKNN
benchmark_annoy_vs_rcppannoy() asks how much overhead or benefit comes from the package’s bigmemory and persistence workflow relative to direct RcppAnnoy

The output also includes data-volume fields:

ref_bytes: estimated bytes in the reference matrix
query_bytes: estimated bytes in the query matrix
total_data_bytes: reference plus effective query volume
index_bytes: bytes in the saved Annoy index
metadata_bytes: bytes in the sidecar metadata file
artifact_bytes: persisted Annoy artifacts written by the workflow

The generated CSV contains the same comparison table.

read.csv(compare_csv, stringsAsFactors = FALSE)[, c(
  "implementation",
  "ref_bytes",
  "query_bytes",
  "index_bytes",
  "metadata_bytes",
  "artifact_bytes"
)]

implementation	ref_bytes	query_bytes	index_bytes	metadata_bytes	artifact_bytes
bigANNOY	9600	960	35840	1188	37028
RcppAnnoy	9600	960	35840	0	35840

In practice, the comparison table helps answer two operational questions:

Is bigANNOY close enough to plain RcppAnnoy on build and search speed for this workload?
How large is the persisted Annoy index relative to the input data volume?

Benchmark Scaling by Data Volume

A single comparison point is useful, but it does not tell you whether the wrapper overhead stays modest as the problem gets larger. The volume suite runs the same bigANNOY versus RcppAnnoy comparison across a grid of synthetic data sizes.

volume_csv <- file.path(bench_dir, "volume.csv")

volume_run <- benchmark_annoy_volume_suite(
  n_ref = c(200L, 500L),
  n_query = 20L,
  n_dim = c(6L, 12L),
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = volume_csv,
  load_mode = "eager"
)

volume_run$summary[, c(
  "implementation",
  "n_ref",
  "n_dim",
  "total_data_bytes",
  "index_bytes",
  "build_elapsed",
  "search_elapsed"
)]

implementation	n_ref	n_dim	total_data_bytes	index_bytes	build_elapsed	search_elapsed
bigANNOY	200	6	10560	35840	0.007	0.001
RcppAnnoy	200	6	10560	35840	0.004	0.000
bigANNOY	200	12	21120	36864	0.007	0.000
RcppAnnoy	200	12	21120	36864	0.003	0.001
bigANNOY	500	6	24960	89440	0.011	0.000
RcppAnnoy	500	6	24960	89440	0.009	0.001
bigANNOY	500	12	49920	99072	0.010	0.001
RcppAnnoy	500	12	49920	99072	0.008	0.000

This kind of table is especially useful when you want to prepare a more formal benchmark note for a package release or for internal performance regression tracking:

it shows how build time changes as reference size grows
it shows how query time changes as dimension grows
it shows whether index size scales roughly as expected with data volume
it makes the bigANNOY versus direct RcppAnnoy gap visible across more than one benchmark point

Interpreting the Main Summary Columns

The most useful summary fields are:

build_elapsed: time spent creating the Annoy index
search_elapsed: time spent running the search step
exact_elapsed: time spent on the exact Euclidean baseline, when available
recall_at_k: average overlap with the exact top-k neighbours
implementation: whether the row came from bigANNOY or direct RcppAnnoy
n_trees: index quality/size control at build time
search_k: query effort control at search time
self_search: whether the benchmark searched the reference rows against themselves
filebacked: whether dense reference data was converted into a file-backed big.matrix
ref_bytes, query_bytes, and index_bytes: the rough data and artifact volume associated with the benchmark

In practice:

raise search_k first when recall is too low
increase n_trees when higher search budgets alone are not enough
compare search_elapsed and recall_at_k together instead of optimizing either one in isolation
use benchmark_annoy_vs_rcppannoy() when you want to reason about package overhead rather than approximate-versus-exact quality
use benchmark_annoy_volume_suite() when you need a more formal scaling table for release notes or internal reports

Installed Benchmark Runner

The package also installs a command-line benchmark script. That is convenient when you want to run a benchmark outside an interactive R session or save CSV output from shell scripts.

The installed path is:

system.file("benchmarks", "benchmark_annoy.R", package = "bigANNOY")

#> [1] "/private/var/folders/h9/npmqbtmx4wlblg4wks47yj5c0000gn/T/RtmpZQIr85/Rinstb1a52c9fdab4/bigANNOY/benchmarks/benchmark_annoy.R"

Example single-run command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=single \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --n_trees=100 \
  --search_k=5000 \
  --load_mode=eager

Example suite command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=suite \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --suite_trees=10,50,100 \
  --suite_search_k=-1,2000,10000 \
  --output_path=/tmp/bigannoy_suite.csv

Example direct-comparison command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=compare \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --n_trees=100 \
  --search_k=5000 \
  --load_mode=eager

Example volume-suite command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=volume \
  --suite_n_ref=2000,5000,10000 \
  --suite_n_query=200 \
  --suite_n_dim=20,50 \
  --k=10 \
  --n_trees=50 \
  --search_k=1000 \
  --output_path=/tmp/bigannoy_volume.csv

Recommended Workflow

A practical tuning workflow usually looks like this:

start with a small single benchmark to confirm dimensions and plumbing
switch to a suite over a small n_trees by search_k grid
enable exact Euclidean benchmarking when bigKNN is available
compare recall and latency together
repeat the same workflow on user-supplied data before drawing conclusions

Recap

bigANNOY’s benchmark helpers are designed to make performance work part of the normal package workflow, not a separate ad hoc script:

benchmark_annoy_bigmatrix() for one configuration
benchmark_annoy_recall_suite() for parameter sweeps
benchmark_annoy_vs_rcppannoy() for direct implementation comparison
benchmark_annoy_volume_suite() for speed and size scaling studies
optional exact recall against bigKNN
CSV output for saved summaries
support for both synthetic and user-supplied data

The next vignette to read after this one is usually Metrics and Tuning, which goes deeper on how to choose metrics and search/build controls.