% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/find_outliers.R
\name{find_outliers}
\alias{find_outliers}
\title{Find Outlier Groups Based on Energy Distance}
\usage{
find_outliers(formula, data, cutoff = 0.99, R = 500, plot = TRUE)
}
\arguments{
\item{formula}{A formula specifying the group variable and variables.
e.g., `study ~ var1 + var2 +...`. The group variable should be a factor or will be converted to one.}

\item{data}{A data frame containing the variables specified in the formula.}

\item{cutoff}{Numeric. Percentile threshold for the permutation-based cutoff (default 0.99).
The cutoff is determined by permuting group labels and calculating the percentile of
permuted median distances.}

\item{R}{Integer. Number of permutations for determining the cutoff (default 500).}

\item{plot}{Logical. If TRUE (default), returns a visualization of the outlier analysis.}
}
\value{
If `plot = TRUE`, returns a list with:
\itemize{
  \item `cutoff_value`: The permutation-based cutoff value used for outlier detection.
  \item `summary`: Data frame with group, median_distance, outlier_score, and is_outlier columns.
  \item `heatmap`: A ggplot2 heatmap of pairwise energy distances.
  \item `barplot`: A ggplot2 bar plot showing median distance to other groups.
}
If `plot = FALSE`, returns only the elements without plots.
}
\description{
Identifies groups (e.g., studies) that are most distant from the average
group based on energy distance across multiple variables.
}
\details{
Groups with high median distance to other groups are identified as potential outliers.
The outlier_score is a z-score that indicates how many standard deviations a group's
median distance is from the overall median distance.

Before distance calculation, all covariates are scaled to mean 0 and standard deviation 1.
}
\examples{

# Example 1: 10 studies with real outliers (Study-8, Study-9, Study-10)
set.seed(123)
dat <- data.frame(
  study = factor(rep(paste0("Study-", 1:10), each = 20)),
  var1 = c(rnorm(20, 10, 1), rnorm(20, 10, 1), rnorm(20, 10, 1), rnorm(20, 10, 1),
           rnorm(20, 10, 1), rnorm(20, 10, 1), rnorm(20, 10, 1), rnorm(20, 15, 1),
           rnorm(20, 10, 1), rnorm(20, 16, 1)),
  var2 = c(rnorm(20, 5, 1), rnorm(20, 5, 1), rnorm(20, 5, 1), rnorm(20, 5, 1),
           rnorm(20, 5, 1), rnorm(20, 5, 1), rnorm(20, 5, 1), rnorm(20, 5, 1),
           rnorm(20, 10, 1), rnorm(20, 5, 1))
)
out <- find_outliers(study ~ var1 + var2, data = dat, R = 200)
out$summary      # Study-8, Study-9, Study-10 should be flagged
out$cutoff_value # Permutation-based threshold

# Example 2: 20 studies with NO real outliers (all from same distribution)
set.seed(456)
dat_no_outliers <- data.frame(
  study = factor(rep(paste0("Study-", 1:20), each = 15)),
  var1 = rnorm(300, 10, 2),
  var2 = rnorm(300, 5, 1)
)
out2 <- find_outliers(study ~ var1 + var2, data = dat_no_outliers, R = 200)
out2$summary     # Should have few or no outliers flagged
sum(out2$is_outlier)  # Count of flagged outliers (expected: 0 or very few)

}
