Multiple testing procedures are important tools for identifying statistically significant findings in massive and complex data while controlling a specific error rate. An important focus has been given to methods controlling the false discovery rate (FDR), i.e., the expected proportion of falsely rejected hypotheses among all rejected hypotheses, which has become the standard error rate for high dimensional data analysis. Since the original procedure of Benjamini and Hochberg (1995), much effort has been undertaken to design FDR controlling procedures that adapt to various underlying structures of the data, such as the quantity of signal, the signal strength and the dependencies, among others.

The **R** package **DiscreteFDR**,
presented in this paper, deals with adaptation to discrete and
non-identically distributed test statics by implementing procedures
developed by Döhler, Durand
and Roquain (2018) (in the sequel abbreviated as [DDR]). This type of data
arises in many relevant applications, in particular when data represent
frequencies or counts. Examples can be found in clinical studies (see
e.g., Westfall
and Wolfinger (1997)), genome-wide association studies (GWAS) (see
e.g., Dickhaus,
Straßburger et al. (2012)) and next generation sequencing data (NGS)
(see e.g., Doerge and Chen
(2015)}).

To give a first impression of how **DiscreteFDR** works,
we consider an artificial toy example. A more realistic example
involving pharmacovigilance data is given in Section 2.

Suppose we would like to compare two treatments in nine different populations. For each population we do this by evaluating the responders and non-responders for each treatment. This leads to categorical data which can be represented, for each population \(i = 1, \ldots, 9\) in the following 2 \(\times\) 2 table:

Responders | Non-responders | ||
---|---|---|---|

Treatment 1 | \(x_{1i}\) | \(y_{1i}\) | \(n_{1i}\) |

Treatment 2 | \(x_{2i}\) | \(y_{2i}\) | \(n_{2i}\) |

Total |
\(x_{1i} + x_{2i}\) | \(y_{1i} + y_{2i}\) | \(n = n_{1i} + n_{2i}\) |

Denoting the responder probabilities for population \(i\) by \(\pi_{1i}\) and \(\pi_{2i}\) we can test e.g.

\[H_{0i}: \pi_{1i} = \pi_{2i} \qquad \text{vs.} \qquad H_{1i}: \pi_{1i} \neq \pi_{2i}\]

by using Fisher’s (two-sided) exact test (see Lehmann and Romano
(2006)), which is implemented in the **R** function
`fisher.test`

. Suppose the data in the nine populations are
independent and we observe the following data frame `df`

```
library(knitr)
X1 <- c(4, 2, 2, 14, 6, 9, 4, 0, 1)
X2 <- c(0, 0, 1, 3, 2, 1, 2, 2, 2)
N1 <- rep(148, 9)
N2 <- rep(132, 9)
Y1 <- N1 - X1
Y2 <- N2 - X2
df <- data.frame(X1, Y1, X2, Y2)
kable(df, caption = "Toy Example")
```

X1 | Y1 | X2 | Y2 |
---|---|---|---|

4 | 144 | 0 | 132 |

2 | 146 | 0 | 132 |

2 | 146 | 1 | 131 |

14 | 134 | 3 | 129 |

6 | 142 | 2 | 130 |

9 | 139 | 1 | 131 |

4 | 144 | 2 | 130 |

0 | 148 | 2 | 130 |

1 | 147 | 2 | 130 |

In this data frame each of the 9 rows represents the data of an
observed 2 \(\times\) 2 table: e.g.,
the third row of the data corresponds to \(x_{13} = 2, y_{13} = 146, x_{23} = 1, y_{23} =
131\). Even though in this example, the total number of tested
hypotheses \(m = 9\) is very small, for
illustrative purposes we deal with the multiplicity problem here by
controlling FDR at level \(\alpha =
5\%\). The DBH step-down procedure can be applied directly to the
data frame object `df`

and perform Fisher’s exact test
in-between. This yields an S3 object of class `DiscreteFDR`

,
for which we provide both `print`

and `summary`

methods:

```
library(DiscreteFDR)
DBH.sd.fast <- direct.discrete.BH(df, "fisher", direction = "sd")
print(DBH.sd.fast)
#>
#> Discrete Benjamini-Hochberg procedure (step-down)
#>
#> Data: df
#> Number of tests = 9
#> Number of rejections = 2 at global FDR level 0.05
#> (Original BH rejections = 0)
#> Largest rejected p value: 0.02126871
summary(DBH.sd.fast)
#>
#> Discrete Benjamini-Hochberg procedure (step-down)
#>
#> Data: df
#> Number of tests = 9
#> Number of rejections = 2 at global FDR level 0.05
#> (Original BH rejections = 0)
#> Largest rejected p value: 0.02126871
#>
#> Index P.value Adjusted Rejected
#> 1 4 0.01243145 0.03819796 TRUE
#> 2 6 0.02126871 0.03819796 TRUE
#> 3 1 0.12476691 0.25630985 FALSE
#> 4 8 0.22135177 0.47895996 FALSE
#> 5 5 0.28849298 0.51482782 FALSE
#> 6 2 0.49984639 1.00000000 FALSE
#> 7 9 0.60329543 1.00000000 FALSE
#> 8 7 0.68723229 1.00000000 FALSE
#> 9 3 1.00000000 1.00000000 FALSE
```

The output of the `summary`

function contains the same
output as the `print`

method, but adds a table that lists the
raw \(p\)-values, their adjusted
counterparts and their respective rejection decisions. It is sorted by
raw \(p\)-values in ascending order.
Our `summary`

method actually creates a
`summary.DiscreteFDR`

object, which includes all contents of
an `DiscreteFDR`

object plus the aforementioned table. This
table can be accessed directly by the `$Table`

command.

```
DBH.sd.fast.summary <- summary(DBH.sd.fast)
summary.table <- DBH.sd.fast.summary$Table
kable(summary.table, caption = "Summary table")
```

Index | P.value | Adjusted | Rejected |
---|---|---|---|

4 | 0.0124314 | 0.0381980 | TRUE |

6 | 0.0212687 | 0.0381980 | TRUE |

1 | 0.1247669 | 0.2563098 | FALSE |

8 | 0.2213518 | 0.4789600 | FALSE |

5 | 0.2884930 | 0.5148278 | FALSE |

2 | 0.4998464 | 1.0000000 | FALSE |

9 | 0.6032954 | 1.0000000 | FALSE |

7 | 0.6872323 | 1.0000000 | FALSE |

3 | 1.0000000 | 1.0000000 | FALSE |

Thus we can reject two hypotheses at FDR-level \(\alpha = 5\%\). Note, that our
`print`

method also gives the number of rejections of the
usual (continuous) BH procedure. In order to compare its adjusted \(p\)-values with ours, we have to determine
the raw \(p\)-values first. This would
be possible by applying the `fisher.test`

function to all
nine 2 \(\times\) 2 tables.
Alternatively, we may use the more convenient function
`generate.pvalues`

, which is included in our package, for
accessing the raw \(p\)-values. Since
it only accept hypothesis test functions from the package
`DiscreteTests`

(either as a function object or a character
string that can be abbreviated), we could also use such a function
directly, e.g. `fisher.test.pv`

. An even more simple way is
to extract them from the `DiscreteFDR`

object that we
obtained before and contains the results:

```
# compute results of Fisher's exact test for each row of 'df'
fisher.p <- generate.pvalues(df, "fisher", list(alternative = "two.sided"))
# extract raw observed p-values
raw.pvalues <- fisher.p$get_pvalues()
# or
library(DiscreteTests)
fisher.p.2 <- fisher.test.pv(df, "two.sided")
raw.pvalues.2 <- fisher.p.2$get_pvalues()
# or:
raw.pvalues.3 <- DBH.sd.fast$Data$raw.pvalues
all(raw.pvalues == raw.pvalues.2) && all(raw.pvalues == raw.pvalues.3)
#> [1] TRUE
p.adjust(raw.pvalues, method = "BH")
#> [1] 0.37430072 0.74976959 1.00000000 0.09570921 0.51928737 0.09570921 0.77313633
#> [8] 0.49804147 0.77313633
```

Applying the continuous BH procedure from the **stats**
package in the last line of code, we find that we cannot reject any
hypotheses at FDR-level \(\alpha =
5\%\). Another approach of determining this is to compare the
critical values of the discrete and continuous BH procedures. In the
discrete case, we need the observed \(p\)-values and their distributions. Both
were computed by our `generate.pvalues`

function above. We
can either extract them and pass them to the `DBH`

function,
or directly apply the function to the test results object itself:

```
# extract raw observed p-values
raw.pvalues <- fisher.p$get_pvalues()
# extract p-value support sets
pCDFlist <- fisher.p$get_pvalue_supports()
# use raw pvalues and list of supports:
DBH.sd.crit <- DBH(raw.pvalues, pCDFlist, 0.05, "sd", TRUE)
crit.vals.BH.disc <- DBH.sd.crit$Critical.values
# use test results object directly
DBH.sd.crit.2 <- DBH(fisher.p, 0.05, "sd", TRUE)
crit.vals.BH.disc.2 <- DBH.sd.crit.2$Critical.values
# compare
all(crit.vals.BH.disc == crit.vals.BH.disc.2)
#> [1] TRUE
```

The latter way enables the use of a pipe:

```
df |>
fisher.test.pv(alternative = "two.sided") |>
DBH(0.05, "sd", TRUE) |>
with(Critical.values)
#> [1] 0.01243145 0.02832448 0.03109596 0.04839433 0.05014119 0.07657062 0.07657062
#> [8] 0.10328523 0.10328523
```

Now, we can compare the critical values:

```
crit.vals.BH.cont <- 1:9 * 0.05/9
tab <- data.frame(raw.pvalues = sort(raw.pvalues), crit.vals.BH.disc, crit.vals.BH.cont)
kable(tab, caption = "Observed p-values and critical values")
```

raw.pvalues | crit.vals.BH.disc | crit.vals.BH.cont |
---|---|---|

0.0124314 | 0.0124314 | 0.0055556 |

0.0212687 | 0.0283245 | 0.0111111 |

0.1247669 | 0.0310960 | 0.0166667 |

0.2213518 | 0.0483943 | 0.0222222 |

0.2884930 | 0.0501412 | 0.0277778 |

0.4998464 | 0.0765706 | 0.0333333 |

0.6032954 | 0.0765706 | 0.0388889 |

0.6872323 | 0.1032852 | 0.0444444 |

1.0000000 | 0.1032852 | 0.0500000 |

Obviously, the critical values of the discrete approach are considerably larger than those of the continuous method. As a result, the two smallest \(p\)-values are rejected by the discrete BH procedure, because they are smaller than or equal to the respective critical values. The continuous BH approach does not reject any hypothesis, because all its critical values are smaller than the observed \(p\)-values.

For illustration, our package includes a `plot`

method for
`DiscreteFDR`

S3 class objects. It allows to define colors,
line types and point characters separately for accepted and rejected
\(p\)-values and for critical values
(if present in the object).

```
plot(DBH.sd.crit, col = c("red", "blue", "green"), pch = c(4, 2, 19), lwd = 2, type.crit = 'o',
legend = "topleft", cex = 1.3)
```

Now, it is visible which observed \(p\)-values are rejected. A comparison with the continuous BH procedure could be done as follows:

```
plot(DBH.sd.crit, col = c("red", "blue", "green"), pch = c(4, 2, 19), lwd = 2, type.crit = 'o',
cex = 1.3, ylim = c(0, 0.25), main = "Comparison of discrete and continuous BH procedures")
points(crit.vals.BH.cont, pch = 19, cex = 1.3, lwd = 2)
legend("topright", c("Rejected", "Accepted", "Critical Values (disc.)", "Critical Values (cont.)"),
col = c("red", "blue", "green", "black"), pch = c(4, 2, 19, 19), lwd = 2, lty = 0)
```

As this example illustrates, the discrete approach can potentially yield a large increase in power. The gain depends on the involved distribution functions and the raw \(p\)-values. To appreciate where this comes from, it is instructive to consider the distribution functions \(F_1, \ldots, F_9\) of the \(p\)-values under the null in more detail. Take for instance the first 2 \(\times\) 2 table:

Responders | Non-responders | ||
---|---|---|---|

Treatment 1 | 4 | 144 | 148 |

Treatment 2 | 0 | 132 | 132 |

Total |
4 | 276 | 280 |

Fisher’s exact test works by determining the probability of observing
this (or a more ‘extreme’) table, given that the margins are fixed. So
each \(F_i\) is determined by the
margins of table \(i\). Since \(x_{11} + x_{21} = 4\), the only potentially
observable tables are given by \(x_{11} = 0,
\ldots, 4\). For each one of these 5 values a \(p\)-value can be determined using the
hypergeometric distribution. Therefore, the \(p\)-value of any 2 \(\times\) 2 table with margins given by the
above table can take (at most) 5 distinct values, say \(x_1, \ldots, x_5\). Combining these 5
values into a set, we obtain the *support* \(\mathcal{A}_1 = \{x_1, \ldots, x_5\}\) of
distribution \(F_1\). Now we can
continue in this vein for the remaining 2 \(\times\) 2 tables to obtain the supports
\(\mathcal{A}_1, \ldots,
\mathcal{A}_9\) for the distribution functions \(F_1, \ldots, F_{9}\). The supports can be
accessed via the `$get_pvalue_supports()`

function of the
results object, e.g.

```
# extract p-value support sets
pCDFlist <- fisher.p$get_pvalue_supports()
# extract 1st and 5th support set
pCDFlist[c(1,5)]
#> [[1]]
#> [1] 0.04820493 0.12476691 0.34598645 0.62477763 1.00000000
#>
#> [[2]]
#> [1] 0.002173856 0.007733719 0.028324482 0.069964309 0.154043258 0.288492981
#> [7] 0.481808361 0.726262402 1.000000000
```

Here, this returns \(\mathcal{A}_1\) and \(\mathcal{A}_5\). Panel (a) in the following figure shows a graph of the distribution functions \(F_1, \ldots, F_9\). Each \(F_i\) is a step-function with \(F_i(0) = 0\), the jumps occurring only on the support \(\mathcal{A}_i\) and \(F_i(x) = x\) only for \(x \in \mathcal{A}_i\). In particular, all distributions are stochastically larger than the uniform distribution (i.e., \(F_i(x) \le x\)), but in a heterogeneous manner. This heterogeneity can be exploited e.g., by transforming the raw \(p\)-values from the exact Fisher’s test using the function \[\displaystyle \xi_{\text{SD}}(x) = \sum_{i = 1}^{9} \frac{F_i(x)}{1 - F_i(x)}.\] Panel (b) of the following plot shows that \(\xi_{\text{SD}}\) is a step function. Its jumps occur on the joint support \(\mathcal{A}= \mathcal{A}_1 \cup \ldots \cup \mathcal{A}_9\). Panel (b) also shows that \(\displaystyle \xi_{\text{SD}}(x) \ll x\), at least for small values of \(x\). It turns out that the critical values of our new DBH step-down procedure are essentially given by inverting \(\xi_{\text{SD}}\) at the critical values of the [BH] procedure \(1 \cdot \alpha / 9, 2 \cdot \alpha / 9, \ldots, \alpha\), so that these values are considerably larger than the [BH] critical values. This is illustrated in panel (c) along with the ordered \(p\)~values. In particular, all asterisks are located above the green [BH] dots, therefore this procedure can not reject any hypothesis. In contrast, the two smallest \(p\)~values are located below red DBH step-down dots, so that this procedure rejects two hypotheses as we had already seen earlier.

```
stepf <- lapply(pCDFlist, function(x) stepfun(x, c(0, x)))
par(mfcol = c(1, 3), mai = c(1, 0.5, 0.3, 0.1))
plot(stepf[[1]], xlim = c(0, 1), ylim = c(0, 1), do.points = FALSE, lwd = 1, lty = 1, ylab = "F(x)",
main = "(a)")
for(i in (2:9)){
plot(stepf[[i]], add = TRUE, do.points = FALSE, lwd = 1, col = i)
}
segments(0, 0, 1, 1, col = "grey", lty = 2)
# Plot xi
support <- sort(unique(unlist(pCDFlist)))
components <- lapply(stepf, function(s){s(support) / (1 - s(support))})
xi.values <- 1/9 * Reduce('+', components)
xi <- stepfun(support, c(0, xi.values))
plot(xi, xlim = c(0, 0.10), ylim = c(0, 0.10), do.points = FALSE, ylab = expression(xi), main = "(b)")
segments(0, 0, 0.1, 0.1, col = "grey", lty = 2)
# Plot discrete critical values as well a BH constants and raw p-values
DBH.sd <- DBH(fisher.p, direction = "sd", ret.crit.consts = TRUE)
plot(DBH.sd, col = c("black", "black", "red"), pch = c(4, 4, 19), type.crit = 'p', ylim = c(0, 0.15),
cex = 1.3, main = "(c)", ylab = "Critical Values")
points(1:9, 0.05 * (1:9) / 9, col = "green", pch = 19, cex = 1.3)
mtext("Figure 1", 1, outer = TRUE, line = -2)
```

Panel (a) depicts the distribution functions \(F_1, \ldots, F_9\) in various colors, (b) is a graph of the transformation \(\xi_{\text{SD}}\). The uniform distribution function is shown in light grey in (a) and (b). Panel (c) shows the [BH] critical values (green dots), the DBH step-down critical values (red dots) and the sorted raw \(p\)-values (asterisks).

To illustrate how the procedures in **DiscreteFDR** can
be used for real data, we revisit the analysis of the pharmacovigilance
data from Heller and Gur
(2011) performed in [DDR]. This data set is
obtained from a database for reporting, investigating and monitoring
adverse drug reactions due to the Medicines and Healthcare products
Regulatory Agency in the United Kingdom. It contains the number of
reported cases of amnesia as well as the total number of adverse events
reported for each of the \(m = 2446\)
drugs in the database. For more details we refer to Heller and Gur (2011) and
to the accompanying R-package **discreteMTP** (Heller et
al. (2012)) (no longer available on CRAN), which also contains the
data. Heller and Gur
(2011) investigate the association between reports of amnesia and
suspected drugs by performing for each drug a Fisher’s exact test
(one-sided) for testing association between the drug and amnesia while
adjusting for multiplicity by using several (discrete) FDR procedures.
In what follows we present code that reproduces parts of Figure 2 and
Table 1 in [DDR].

We proceed as in the example in Section 1. Since we need to access
the critical values, we first determine the \(p\)-values and their support for the data
set `amnesia`

contained for convenience in the package
**DiscreteFDR**. For this, we use
`generate.pvalues`

in conjunction with the pre-processing
function `reconstruct_two`

from package
`DiscreteDatasets`

, which rebuilds \(2 \times 2\) tables from single columns or
rows by using additional knowledge of the marginals.

```
library(DiscreteFDR)
library(DiscreteDatasets)
data(amnesia)
amnesia.formatted <- generate.pvalues(amnesia, "fisher", list(alternative = "greater"), reconstruct_two)
```

A more comprehensible way is the use of a pipe:

```
library(DiscreteDatasets)
library(DiscreteTests)
amnesia.formatted <- amnesia |>
reconstruct_two() |>
fisher.test.pv(alternative = "greater")
```

Then we perform the FDR analysis with functions `DBH`

and
`ADBH`

(SU and SD) and `DBR`

at level \(\alpha = 0.05\) including critical
values.

```
DBH.su <- DBH(amnesia.formatted, ret.crit.consts = TRUE)
DBH.sd <- DBH(amnesia.formatted, direction = "sd", ret.crit.consts = TRUE)
ADBH.su <- ADBH(amnesia.formatted, ret.crit.consts = TRUE)
ADBH.sd <- ADBH(amnesia.formatted, direction = "sd", ret.crit.consts = TRUE)
DBR <- DBR(amnesia.formatted, ret.crit.consts = TRUE)
```

It is helpful to have a histogram of the observed \(p\)-pvalues. For this, this package
provides a `hist`

method for `DiscreteFDR`

class
objects, too.

This histogram indicates a highly discrete \(p\)-value distribution, which strongly suggests the use of discrete methods.

By accessing the critical values we can now generate a plot similar to Figure 2 from [DDR]. Note that both [DBH-SU] and [DBH-SD] are visually indistinguishable from their adaptive counterparts.

```
raw.pvalues <- amnesia.formatted$get_pvalues()
m <- length(raw.pvalues)
crit.values.BH <- 0.05 * seq_len(m) / m
scale.points <- 0.7
plot(DBH.su, col = c("black", "black", "orange"), pch = NA, type.crit = 'p', xlim = c(1, 100),
ylim = c(0, DBH.su$Critical.values[100]), ylab = "critical values", cex = scale.points, main = "")
points(crit.values.BH[1:105], col = "green", pch = 19, cex = scale.points)
points(DBH.sd$Critical.values[1:105], col = "red", pch = 19, cex = scale.points)
points(ADBH.su$Critical.values[1:105], col = "blue", pch = 19, cex = scale.points)
points(ADBH.sd$Critical.values[1:105], col = "purple", pch = 19, cex = scale.points)
points(DBR$Critical.values[1:105], col = "yellow", pch = 19, cex = scale.points)
points(sort(raw.pvalues), pch = 4, cex = scale.points)
mtext("Figure 2", 1, outer = TRUE, line = -1)
```

Critical values for [BH] (green dots), [DBH-SU] (orange dots), [DBH-SD] (red dots), [A-DBH-SU] (blue dots), [A-DBH-SD] (purple dots), [DBR] (yellow dots). The sorted raw \(p\)-values are represented by asterisks.

The rejected hypotheses can be accessed via the command
`$Indices`

. The following code yields some of the values from
Table 1 in [DDR]:

```
rej.BH <- length(which(p.adjust(raw.pvalues, method = "BH") <= 0.05))
rej.DBH.su <- length(DBH.su$Indices)
rej.DBH.sd <- length(DBH.sd$Indices)
rej.ADBH.su <- length(ADBH.su$Indices)
rej.ADBH.sd <- length(ADBH.sd$Indices)
rej.DBR <- length(DBR$Indices)
c(rej.BH, rej.DBH.su, rej.DBH.sd, rej.ADBH.su, rej.ADBH.sd, rej.DBR)
#> [1] 24 27 27 27 27 27
```

The (continuous) BH rejects only 24 hypotheses whereas all the
discrete procedures implemented in **DiscreteFDR** are able
to identify three additional drug candidates potentially associated with
amnesia.

In this section we sketch how can be used to analyze arbitrary
multiple discrete tests. Jiménez-Otero et
al. (2018) used **DiscreteFDR** to detect disorder in
NGS experiments based on one-sample tests of the Poisson mean. Rather
than reproducing their analysis in detail, we illustrate the general
approach by using a toy example similar to the one in Section 1 and show
how the test of the Poisson mean can be accommodated by
**DiscreteFDR**.

To fix ideas, suppose we observe \(m = 9\) independent Poisson distributed counts \(N_1, \ldots, N_9\) (Jiménez-Otero et al. (2018) used this to model the read counts of different DNA bases). We assume that \(N_i \sim \text{Pois}(\lambda_i)\) and the goal is to identify cases where \(\lambda_i\) is larger than some pre-specified value \(\lambda^0_i\), i.e., we have the (one-sided) multiple testing problem \[H_{0i}: \lambda_i = \lambda^0_i \qquad \text{vs.} \qquad H_{1i}: \lambda_i > \lambda^0_i.\] As in Section 1, the goal is to adjust for multiple testing by using the [DBH-SD] procedure at FDR-level \(\alpha = 5\%\). In our example the observations \(n_1,\ldots, n_9\) and parameters \(\lambda^0_1, \ldots, \lambda^0_9\) are given as follows:

```
lambda.vector <- c(0.6, 1.2, 0.7, 1.3, 1.0, 0.2, 0.8, 1.3, 0.9)
observations <- c(3, 3, 1, 2, 3, 3, 1, 2, 4)
configuration <- cbind(observations, lambda.vector)
alpha <- 0.05
m <- length(observations)
print(configuration)
#> observations lambda.vector
#> [1,] 3 0.6
#> [2,] 3 1.2
#> [3,] 1 0.7
#> [4,] 2 1.3
#> [5,] 3 1.0
#> [6,] 3 0.2
#> [7,] 1 0.8
#> [8,] 2 1.3
#> [9,] 4 0.9
```

Denote by \(G_i\) the distribution
of \(N_i\) under \(H_{0i}\) i.e., \(G_i(x) = P(N_i \le x)\). For observations
\(n_1,\ldots, n_9\) of \(N_1, \ldots, N_9\) the \(p\)-values for the above one-sided test are
given by \[p_i = P(N_i \ge n_i) = P(N_i >
n_i - 1) = \overline{G_i}(n_i - 1),\] where \(\overline{G_i}(x) = P(N_i > x) = 1 -
G_i(x)\) denotes the survival function of the Poisson
distribution with parameter \(\lambda^0_i\). Thus the raw \(p\)-values are determined by the following
**R** code:

```
raw.pvalues <- ppois(observations - 1, lambda.vector, lower.tail = FALSE)
poisson.p <- poisson.test.pv(observations, lambda.vector, "greater")
raw.pvalues.2 <- poisson.p$get_pvalues()
print(raw.pvalues.2)
#> [1] 0.023115288 0.120512901 0.503414696 0.373176876 0.080301397 0.001148481
#> [7] 0.550671036 0.373176876 0.013458721
```

Following the definition of the function in **R** we
define the inverse function of \(\overline{G_i}\) by \[\left(\overline{G_i}\right)^{-1}(p) = \min\{n \in
\mathbb{N}: \overline{G_i}(n) \le p\}\] and obtain for the
distribution function of the \(i\)-th
\(p\)-value under the null \[F_i(x) =
\overline{G_i}\left(\left(\overline{G_i}\right)^{-1}(x)\right).\]
Each function \(F_i\) is a step
function with \(F_i(0) = 0\), \(F_i(1) = 1\) and there exists an infinite
sequence of jumps at locations \(1 = x_1 >
x_2 > \ldots > x_n > x_{n + 1} > \ldots > 0\) such
that \(F(x_j) = x_j\) for \(j \in \mathbb{N}\).

Initially it seems that we run into a problem if we want to determine the critical values of [DBH-SD] since the supports of \(F_1, \ldots, F_9\) are no longer finite (but still discrete). We can deal with this problem by using the observation that it is sufficient to consider new, restricted supports \(\mathcal{A}_i \cap [s^{\tiny \mbox{min}},1]\) where the lower threshold satisfies \[\begin{align} s^{\tiny \mbox{min}} &\le \tau^{\tiny \mbox{min}}_1 = \max \left\{t \in \mathcal{A}\::\: t \leq y^{\tiny \mbox{min}} \right\} \qquad \text{where} \qquad y^{\tiny \mbox{min}} = \frac{\alpha}{m} \cdot \left(1 + \frac{\alpha}{m} \right)^{-1}. \end{align}\] To determine such an \(s^{\tiny \mbox{min}}\) we proceed as follows. Define \(n^{\tiny \mbox{max}}_i = \left(\overline{G_i}\right)^{-1}(y^{\tiny \mbox{min}}) + 1\), \(t^{\tiny \mbox{min}}_i = \overline{G_i}(n^{\tiny \mbox{max}}_i - 1)\) and set \(s^{\tiny \mbox{min}} = \min\left(t^{\tiny \mbox{min}}_1, \ldots, t^{\tiny \mbox{min}}_9 \right)\). It is easily checked that this choice of \(s^{\tiny \mbox{min}}\) satisfies the above equation. We can determine \(s^{\tiny \mbox{min}}\) by the following code

```
y.min <- alpha/m * (1 + alpha/m)^(-1)
n.max <- qpois(y.min, lambda.vector, lower.tail = FALSE) + 1
t.min <- ppois(n.max - 1, lambda.vector, lower.tail = FALSE)
s.min <- min(t.min)
print(s.min)
#> [1] 0.0007855354
```

The `poisson.test.pv`

function from package
`DiscreteTests`

computes the support with \(y^{\tiny \mbox{min}}\) being the smallest
observable p-value which can be represented by double precision,
i.e. the smallest one that is not rounded to 0.

```
sapply(poisson.p$get_pvalue_supports(), min)
#> [1] 1.482197e-323 7.905050e-323 2.519735e-322 5.434722e-323 9.881313e-324
#> [6] 8.893182e-323 4.397184e-322 5.434722e-323 1.333977e-322
```

For determining the restricted supports it is actually more
convenient to work with \(n^{\tiny
\mbox{max}}_i\) than \(s^{\tiny
\mbox{min}}\). We can subsequently use these supports as the
`pCDFlist`

argument in the usual way when calling the
`DBH`

function:

```
supports <- lapply(1:m, function(w){sort(ppois(0:n.max[w] - 1, lambda.vector[w], lower.tail = FALSE))})
DBH.sd <- DBH(raw.pvalues, supports, direction = "sd", ret.crit.consts = TRUE)
print(DBH.sd)
#>
#> Discrete Benjamini-Hochberg procedure (step-down)
#>
#> Data: raw.pvalues and supports
#> Number of tests = 9
#> Number of rejections = 3 at global FDR level 0.05
#> (Original BH rejections = 1)
#> Largest rejected p value: 0.02311529
```

We can also use the results object of
`poisson.test.pv`

:

```
DBH.sd.2 <- DBH(poisson.p, direction = "sd", ret.crit.consts = TRUE)
print(DBH.sd.2)
#>
#> Discrete Benjamini-Hochberg procedure (step-down)
#>
#> Data: poisson.p
#> Number of tests = 9
#> Number of rejections = 3 at global FDR level 0.05
#> (Original BH rejections = 1)
#> Largest rejected p value: 0.02311529
```

Figure 3 shows a summary similar to Figure 1. Applying the continuous BH procedure

```
p.adjust(raw.pvalues, method = "BH")
#> [1] 0.06934586 0.21692322 0.55067104 0.47979884 0.18067814 0.01033633 0.55067104
#> [8] 0.47979884 0.06056424
```

results in one rejection at FDR-level \(\alpha = 5\%\), whereas the DBH step-down procedure can reject three hypotheses:

```
DBH.sd$Adjusted
#> [1] 0.039602625 0.101622881 0.580898946 0.522450788 0.101509307 0.001935955
#> [7] 0.626257875 0.522450788 0.033073393
```

This information can also be obtained by our `print`

or
`summary`

methods:

```
print(DBH.sd)
#>
#> Discrete Benjamini-Hochberg procedure (step-down)
#>
#> Data: raw.pvalues and supports
#> Number of tests = 9
#> Number of rejections = 3 at global FDR level 0.05
#> (Original BH rejections = 1)
#> Largest rejected p value: 0.02311529
summary(DBH.sd)
#>
#> Discrete Benjamini-Hochberg procedure (step-down)
#>
#> Data: raw.pvalues and supports
#> Number of tests = 9
#> Number of rejections = 3 at global FDR level 0.05
#> (Original BH rejections = 1)
#> Largest rejected p value: 0.02311529
#>
#> Index P.value Critical.value Adjusted Rejected
#> 1 6 0.001148481 0.009079858 0.001935955 TRUE
#> 2 9 0.013458721 0.018988157 0.033073393 TRUE
#> 3 1 0.023115288 0.033768968 0.039602625 TRUE
#> 4 5 0.080301397 0.034141584 0.101509307 FALSE
#> 5 2 0.120512901 0.043095453 0.101622881 FALSE
#> 6 4 0.373176876 0.047422596 0.522450788 FALSE
#> 7 8 0.373176876 0.062856934 0.522450788 FALSE
#> 8 3 0.503414696 0.062856934 0.580898946 FALSE
#> 9 7 0.550671036 0.080301397 0.626257875 FALSE
```

As in Figure 1, Panel (c) presents a graphical comparison between the two procedures applied to the \(p\)-values.

```
stepf <- lapply(supports, function(x) stepfun(x, c(0, x)))
par(mfcol = c(1, 3), mai = c(1, 0.5, 0.3, 0.1))
plot(stepf[[1]], xlim = c(0,1), ylim = c(0,1), do.points = FALSE, lwd = 1, lty = 1, ylab = "F(x)",
main = "(a)")
for(i in (2:9)){
plot(stepf[[i]], add = TRUE, do.points = FALSE, lwd = 1, col = i)
}
segments(0, 0, 1, 1, col = "grey", lty = 2)
# Plot xi
support <- sort(unique(unlist(supports)))
components <- lapply(stepf, function(s){s(support) / (1 - s(support))})
xi.values <- 1/9 * Reduce('+', components)
xi <- stepfun(support, c(0, xi.values))
plot(xi, xlim = c(0, 0.10), ylim = c(0, 0.10), do.points = FALSE, ylab = expression(xi), main = "(b)")
segments(0, 0, 0.1, 0.1, col = "grey", lty = 2)
# Plot discrete critical values as well a BH constants
DBH.sd <- DBH(raw.pvalues, supports, direction = "sd", ret.crit.consts = TRUE)
plot(DBH.sd, col = c("black", "black", "red"), pch = c(4, 4, 19), type.crit = 'p', ylim = c(0, 0.15),
cex = 1.3, main = "(c)", ylab = "Critical Values")
points(1:9, 0.05 * (1:9) / 9, col = "green", pch = 19, cex = 1.3)
mtext("Figure 3", 1, outer = TRUE, line = -2)
```

Panel (a) depicts the distribution functions \(F_1, \ldots, F_9\) in various colors, (b) is a graph of the transformation function \(\xi_{\text{SD}}\). The uniform distribution function is shown in light grey in (a) and (b). Panel (c) shows the [BH] critical values (green dots), the DBH step-down critical values (red dots) and the sorted raw \(p\)-values (asterisks).