Benchmarking Recall and Latency

bigANNOY includes exported benchmark helpers so you can measure three related things with the same interface:

This vignette shows how to use those helpers for both quick one-off runs and small parameter sweeps.

What the Benchmark Helpers Do

The package currently exports four benchmark functions:

These helpers can work with:

They can also write summaries to CSV so results can be saved outside the current R session, and the comparison helpers add byte-oriented fields for the reference data, query data, Annoy index file, and total persisted artifacts.

Load the Package

library(bigANNOY)

Create a Benchmark Workspace

We will write any temporary benchmark files into a dedicated directory so the workflow is easy to inspect.

bench_dir <- tempfile("bigannoy-benchmark-")
dir.create(bench_dir, recursive = TRUE, showWarnings = FALSE)
bench_dir
#> [1] "/var/folders/h9/npmqbtmx4wlblg4wks47yj5c0000gn/T//RtmpBEyDSE/bigannoy-benchmark-b1f6e2ddb03"

A Single Synthetic Benchmark Run

The simplest benchmark call uses synthetic data. This is useful when you want a quick sense of how build and search times respond to n_trees, search_k, and the problem dimensions.

single_csv <- file.path(bench_dir, "single.csv")

single <- benchmark_annoy_bigmatrix(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = single_csv,
  load_mode = "eager"
)

single$summary
metric backend filebacked self_search load_mode n_ref n_query n_dim k n_trees search_k build_threads build_elapsed search_elapsed exact_elapsed recall_at_k index_id
euclidean cpp FALSE FALSE eager 200 20 6 3 10 50 -1 0.228 0.001 annoy-20260327203932-1c5bd3cc4ee2

The returned object contains more than just the summary row.

names(single)
#> [1] "summary"         "params"          "index_path"      "metadata_path"  
#> [5] "exact_available" "validation"     
single$params
metric backend filebacked self_search load_mode n_ref n_query n_dim k n_trees search_k build_threads
euclidean cpp FALSE FALSE eager 200 20 6 3 10 50 -1
single$exact_available
#> [1] FALSE

Because exact = FALSE, the benchmark skips the exact bigKNN comparison and focuses only on the approximate Annoy path.

Validation Is Part of the Benchmark Workflow

The benchmark helpers also validate the built Annoy index before measuring the search step. That helps ensure the timing result corresponds to a usable, reopenable index rather than a partially successful build.

single$validation$valid
#> [1] TRUE
single$validation$checks[, c("check", "passed", "severity")]
check passed severity
index_file TRUE error
metric TRUE error
dimensions TRUE error
items TRUE error
file_size TRUE error
file_md5 TRUE error
file_mtime TRUE warning
load TRUE error

The same summary is also written to CSV when output_path is supplied.

read.csv(single_csv, stringsAsFactors = FALSE)
metric backend filebacked self_search load_mode n_ref n_query n_dim k n_trees search_k build_threads build_elapsed search_elapsed exact_elapsed recall_at_k index_id
euclidean cpp FALSE FALSE eager 200 20 6 3 10 50 -1 0.228 0.001 annoy-20260327203932-1c5bd3cc4ee2

External-Query Versus Self-Search Benchmarks

One subtle but important detail is how synthetic data generation works:

That difference is reflected in the self_search and n_query fields.

external_run <- benchmark_annoy_bigmatrix(
  n_ref = 120L,
  n_query = 12L,
  n_dim = 5L,
  k = 3L,
  n_trees = 8L,
  exact = FALSE,
  path_dir = bench_dir
)

self_run <- benchmark_annoy_bigmatrix(
  n_ref = 120L,
  query = NULL,
  n_dim = 5L,
  k = 3L,
  n_trees = 8L,
  exact = FALSE,
  path_dir = bench_dir
)

shape_cols <- c("self_search", "n_ref", "n_query", "k")

rbind(
  external = external_run[["summary"]][, shape_cols],
  self = self_run[["summary"]][, shape_cols]
)
self_search n_ref n_query k
external FALSE 120 12 3
self TRUE 120 120 3

That distinction matters when you are benchmarking workflows that mirror either training-set neighbour search or truly external query traffic.

Benchmark a Recall Suite Across Parameter Grids

For tuning work, a single benchmark point is usually not enough. The suite helper runs a grid of n_trees and search_k values on the same dataset so you can compare trade-offs more systematically.

suite_csv <- file.path(bench_dir, "suite.csv")

suite <- benchmark_annoy_recall_suite(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = c(5L, 10L),
  search_k = c(-1L, 50L),
  exact = FALSE,
  path_dir = bench_dir,
  output_path = suite_csv,
  load_mode = "eager"
)

suite$summary
metric backend filebacked self_search load_mode n_ref n_query n_dim k n_trees search_k build_threads build_elapsed search_elapsed exact_elapsed recall_at_k index_id
euclidean cpp FALSE FALSE eager 200 20 6 3 5 -1 -1 0.006 0.000 annoy-20260327203932-8ca097928d75
euclidean cpp FALSE FALSE eager 200 20 6 3 5 50 -1 0.006 0.001 annoy-20260327203932-8ca097928d75
euclidean cpp FALSE FALSE eager 200 20 6 3 10 -1 -1 0.008 0.001 annoy-20260327203932-1c5bd3cc4ee2
euclidean cpp FALSE FALSE eager 200 20 6 3 10 50 -1 0.008 0.000 annoy-20260327203932-1c5bd3cc4ee2

Each row corresponds to one (n_trees, search_k) configuration on the same underlying benchmark dataset.

The saved CSV contains the same summary table.

read.csv(suite_csv, stringsAsFactors = FALSE)
metric backend filebacked self_search load_mode n_ref n_query n_dim k n_trees search_k build_threads build_elapsed search_elapsed exact_elapsed recall_at_k index_id
euclidean cpp FALSE FALSE eager 200 20 6 3 5 -1 -1 0.006 0.000 annoy-20260327203932-8ca097928d75
euclidean cpp FALSE FALSE eager 200 20 6 3 5 50 -1 0.006 0.001 annoy-20260327203932-8ca097928d75
euclidean cpp FALSE FALSE eager 200 20 6 3 10 -1 -1 0.008 0.001 annoy-20260327203932-1c5bd3cc4ee2
euclidean cpp FALSE FALSE eager 200 20 6 3 10 50 -1 0.008 0.000 annoy-20260327203932-1c5bd3cc4ee2

Optional Exact Recall Against bigKNN

For Euclidean workloads, the benchmark helpers can optionally compare Annoy results against the exact bigKNN baseline and report:

That comparison is only available when the runtime package bigKNN is installed.

if (length(find.package("bigKNN", quiet = TRUE)) > 0L) {
  exact_run <- benchmark_annoy_bigmatrix(
    n_ref = 150L,
    n_query = 15L,
    n_dim = 5L,
    k = 3L,
    n_trees = 10L,
    search_k = 50L,
    metric = "euclidean",
    exact = TRUE,
    path_dir = bench_dir
  )

  exact_run$exact_available
  exact_run$summary[, c("build_elapsed", "search_elapsed", "exact_elapsed", "recall_at_k")]
} else {
  "Exact baseline example skipped because bigKNN is not installed."
}
build_elapsed search_elapsed exact_elapsed recall_at_k
0.007 0 0.003 0.956

This is the most direct way to answer the practical question, “How much search speed am I buying, and what recall do I lose in return?”

Benchmark User-Supplied Data

Synthetic data is convenient, but real benchmarking usually needs real data. Both benchmark helpers can accept user-supplied reference and query inputs.

ref <- matrix(rnorm(80 * 4), nrow = 80, ncol = 4)
query <- matrix(rnorm(12 * 4), nrow = 12, ncol = 4)

user_run <- benchmark_annoy_bigmatrix(
  x = ref,
  query = query,
  k = 3L,
  n_trees = 12L,
  search_k = 40L,
  exact = FALSE,
  filebacked = TRUE,
  path_dir = bench_dir,
  load_mode = "eager"
)

user_run$summary[, c(
  "filebacked",
  "self_search",
  "n_ref",
  "n_query",
  "n_dim",
  "build_elapsed",
  "search_elapsed"
)]
filebacked self_search n_ref n_query n_dim build_elapsed search_elapsed
TRUE FALSE 80 12 4 0.006 0

When filebacked = TRUE, dense reference inputs are first converted into a file-backed big.matrix before the Annoy build starts. That can be useful when you want the benchmark workflow to resemble the package’s real persisted data path more closely.

Compare bigANNOY with Direct RcppAnnoy

When you want to understand the cost of the bigmemory-oriented wrapper itself, the most useful benchmark is not an exact Euclidean baseline. It is a direct comparison with plain RcppAnnoy, using the same synthetic dataset, the same metric, the same n_trees, and the same search_k.

That is what benchmark_annoy_vs_rcppannoy() provides.

compare_csv <- file.path(bench_dir, "compare.csv")

compare_run <- benchmark_annoy_vs_rcppannoy(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = compare_csv,
  load_mode = "eager"
)

compare_run$summary[, c(
  "implementation",
  "reference_storage",
  "n_ref",
  "n_query",
  "n_dim",
  "total_data_bytes",
  "index_bytes",
  "build_elapsed",
  "search_elapsed"
)]
implementation reference_storage n_ref n_query n_dim total_data_bytes index_bytes build_elapsed search_elapsed
bigANNOY bigmatrix 200 20 6 10560 35840 0.007 0.000
RcppAnnoy dense_matrix 200 20 6 10560 35840 0.004 0.001

This benchmark is useful for a different question from the earlier exact baseline:

The output also includes data-volume fields:

The generated CSV contains the same comparison table.

read.csv(compare_csv, stringsAsFactors = FALSE)[, c(
  "implementation",
  "ref_bytes",
  "query_bytes",
  "index_bytes",
  "metadata_bytes",
  "artifact_bytes"
)]
implementation ref_bytes query_bytes index_bytes metadata_bytes artifact_bytes
bigANNOY 9600 960 35840 1188 37028
RcppAnnoy 9600 960 35840 0 35840

In practice, the comparison table helps answer two operational questions:

Benchmark Scaling by Data Volume

A single comparison point is useful, but it does not tell you whether the wrapper overhead stays modest as the problem gets larger. The volume suite runs the same bigANNOY versus RcppAnnoy comparison across a grid of synthetic data sizes.

volume_csv <- file.path(bench_dir, "volume.csv")

volume_run <- benchmark_annoy_volume_suite(
  n_ref = c(200L, 500L),
  n_query = 20L,
  n_dim = c(6L, 12L),
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = volume_csv,
  load_mode = "eager"
)

volume_run$summary[, c(
  "implementation",
  "n_ref",
  "n_dim",
  "total_data_bytes",
  "index_bytes",
  "build_elapsed",
  "search_elapsed"
)]
implementation n_ref n_dim total_data_bytes index_bytes build_elapsed search_elapsed
bigANNOY 200 6 10560 35840 0.007 0.001
RcppAnnoy 200 6 10560 35840 0.004 0.000
bigANNOY 200 12 21120 36864 0.007 0.000
RcppAnnoy 200 12 21120 36864 0.003 0.001
bigANNOY 500 6 24960 89440 0.011 0.000
RcppAnnoy 500 6 24960 89440 0.009 0.001
bigANNOY 500 12 49920 99072 0.010 0.001
RcppAnnoy 500 12 49920 99072 0.008 0.000

This kind of table is especially useful when you want to prepare a more formal benchmark note for a package release or for internal performance regression tracking:

Interpreting the Main Summary Columns

The most useful summary fields are:

In practice:

Installed Benchmark Runner

The package also installs a command-line benchmark script. That is convenient when you want to run a benchmark outside an interactive R session or save CSV output from shell scripts.

The installed path is:

system.file("benchmarks", "benchmark_annoy.R", package = "bigANNOY")
#> [1] "/private/var/folders/h9/npmqbtmx4wlblg4wks47yj5c0000gn/T/RtmpZQIr85/Rinstb1a52c9fdab4/bigANNOY/benchmarks/benchmark_annoy.R"

Example single-run command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=single \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --n_trees=100 \
  --search_k=5000 \
  --load_mode=eager

Example suite command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=suite \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --suite_trees=10,50,100 \
  --suite_search_k=-1,2000,10000 \
  --output_path=/tmp/bigannoy_suite.csv

Example direct-comparison command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=compare \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --n_trees=100 \
  --search_k=5000 \
  --load_mode=eager

Example volume-suite command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=volume \
  --suite_n_ref=2000,5000,10000 \
  --suite_n_query=200 \
  --suite_n_dim=20,50 \
  --k=10 \
  --n_trees=50 \
  --search_k=1000 \
  --output_path=/tmp/bigannoy_volume.csv

Recommended Workflow

A practical tuning workflow usually looks like this:

  1. start with a small single benchmark to confirm dimensions and plumbing
  2. switch to a suite over a small n_trees by search_k grid
  3. enable exact Euclidean benchmarking when bigKNN is available
  4. compare recall and latency together
  5. repeat the same workflow on user-supplied data before drawing conclusions

Recap

bigANNOY’s benchmark helpers are designed to make performance work part of the normal package workflow, not a separate ad hoc script:

The next vignette to read after this one is usually Metrics and Tuning, which goes deeper on how to choose metrics and search/build controls.