repo.data

options(repos = c("@CRAN@" = "https://CRAN.R-project.org"))
library(repo.data)

This vignette is written for concerned package maintainers or users that want to check their package in relationship with other packages. For general usage of the other functions check the manual or package index (a vignette might come too).

Keeping up with the repositories

Packages required by a package might have their own dependencies with minimal versions requirements. The maintainers or developers might wonder what is the oldest version of each recursive package their users are required to have. This is useful for developing packages that should remain compatible with old versions of R and packages.

pd <- package_dependencies("ggeasy")
head(pd)
#>        Name Version    Type   Op Package
#> 1         R   4.4.0 Depends   >=    <NA>
#> 2       cli   3.4.0 Imports   >=    <NA>
#> 3 lifecycle   1.0.3 Imports   >=    <NA>
#> 4     rlang   1.1.0 Imports   >=    <NA>
#> 5     vctrs   0.6.0 Imports   >=    <NA>
#> 6 grDevices    <NA>    <NA> <NA>    <NA>

package_dependencies() identify the minimal version required for each dependency. If no version is required by any dependencies NA is used.

We can identify packages that are required on a lower version than one of the dependencies with

# Discover the requirements that can be upgraded
update_dependencies("ggeasy")
#>       Name Version
#> 1 testthat   3.2.0

Increasing these version requirements on {ggeasy} won’t affect users as they already should have these versions installed as required by other dependencies.

We can also be interested on since when can users install a package. There can be two possible answers:

We can use package_date() to get those answers:

package_date("ggeasy")
#>                  Published             deps_available 
#> "2025-06-15 08:40:04 CEST" "2021-05-18 09:05:22 CEST"

Why are they important? The first one is important to know if it hasn’t been updated in a long time. The second one helps estimate if it can be installed on old systems without updating anything else. If the date the dependencies are available is closer to the published date, the users will need to have updated systems and dependencies.

Improving packages

Help pages are found via alias, when a user press ?word it searches for alias. Checking for existing alias might help you to find packages and reduce the confusion on the help pages.

alias <- cran_alias(c("fect", "gsynth"))
#> Retrieving cran_aliases, this might take a bit.
#> Caching results to be faster next call in this session.
dup_alias <- duplicated_alias(alias)
head(dup_alias)
#>          Target Package             Source
#> 1         XXinv    fect   fect-internal.Rd
#> 2         XXinv  gsynth gsynth-internal.Rd
#> 3      Y_demean    fect   fect-internal.Rd
#> 4      Y_demean  gsynth gsynth-internal.Rd
#> 5 _gsynth_XXinv    fect   fect-internal.Rd
#> 6 _gsynth_XXinv  gsynth gsynth-internal.Rd

For example these two packages have the same alias for the internal functions but most of them point to the same file.

Connecting help pages

Often it is helpful to link help pages so that:

pkg <- "BaseSet"
head(cran_help_pages_wo_links(pkg))
#> Retrieving base_aliases, this might take a bit.
#> Caching results to be faster next call in this session.
#> Warning: Omitting packages bartMachineJARs, clean, fontBitstreamVera, fontLiberation, GreedyExperimentalDesignJARs, hse, LifeInsuranceContracts, openNLPdata, positron.tutorials, RKEAjars, RMOAjars, ROI.plugin.cplex, ROI.plugin.glpk, ROI.plugin.ipop, ROI.plugin.symphony, rsparkling, RWekajars, Sejong.
#> Maybe they are currently not on CRAN?
#> Warning: Packages 'Hmisc', 'geneHapR', 'sfsmisc' have targets not present in a
#> OS.
#> Retrieving cran_rdxrefs, this might take a bit.
#> Caching results to be faster next call in this session.
#> Warning in cran_links(cran_pkgs): Omitting packages bartMachineJARs, clean, fontBitstreamVera, fontLiberation, GreedyExperimentalDesignJARs, hse, LifeInsuranceContracts, openNLPdata, positron.tutorials, RKEAjars, RMOAjars, ROI.plugin.cplex, ROI.plugin.glpk, ROI.plugin.ipop, ROI.plugin.symphony, rsparkling, RWekajars, Sejong.
#> Maybe they are currently not on CRAN?
#> Warning: Some pages point to different places according to the OS.
#> Warning: Some links are distinct depending on the OS.
#>   Package             Source
#> 1 BaseSet BaseSet-package.Rd
#> 2 BaseSet   TidySet-class.Rd
#> 3 BaseSet        activate.Rd
#> 4 BaseSet      add_column.Rd
#> 5 BaseSet    add_elements.Rd
#> 6 BaseSet    add_relation.Rd
head(cran_help_pages_not_linked(pkg))
#>   Package             Source
#> 1 BaseSet BaseSet-package.Rd
#> 2 BaseSet   TidySet-class.Rd
#> 3 BaseSet        activate.Rd
#> 4 BaseSet      add_column.Rd
#> 5 BaseSet    add_elements.Rd
#> 6 BaseSet    add_relation.Rd

In addition to those help pages that are not well connected it could be that some pages are linked but link to each other without connecting with other help pages of the package or other packages.

To retrieve these help pages forming a clique it requires the suggested package igraph.

cliques <- cran_help_cliques(pkg)
#> Warning in cran_targets_links(pkges): Omitting packages BaseSet, dplyr, methods, rlang, utils, cli, generics, glue, lifecycle, magrittr, pillar, R6, tibble, tidyselect, vctrs, utf8, pkgconfig, withr, graphics, grDevices.
#> Maybe they are currently not on CRAN?
# Number of help pages connected
table(cliques$n) 
#> < table of extent 0 >

If there is more than one length this would mean some pages not linked to the rest of the package.

Sometimes even if links exists they might not resolve correctly on the html version. For example if they link to a help page of a package that is not on the strong dependency list.

cran_help_pages_links_wo_deps(pkg)
#> [1] Package Source  Anchor  Target 
#> <0 rows> (or 0-length row.names)

If there is some output then the link cannot be resolved correctly if the other package is not independently installed on the same machine.

Reproducibility

If you wish to know what packages were available on CRAN on any given date you can use:

cs <- cran_snapshot(as.Date("2020-01-31"))
#> Warning: There are 4 packages both archived and published
#> This indicate manual CRAN intervention.
#> Retrieving comments, this might take a bit.
#> Caching results to be faster next call in this session.
nrow(cs)
#> [1] 102015

This might be helpful to know what was available on old project and why some feature of a given package wasn’t used. Maybe it wasn’t available on a given date!

Local versions

While working it might be good to update packages. To decide if it is needed maybe you’d like to know when were packages last updated on the system?

cran_session()
#> Warning in cran_archive(versions[, "Package"]): Omitting packages repo.data.
#> Maybe they were not on CRAN?
#> [1] "2025-08-27 18:40:06 CEST"

This uses the sessionInfo() output to find the date of last installation. Under the hood it uses a function for an arbitrary packages and their versions:

versions <- data.frame(Package = c("dplyr", "Rcpp", "rlang"),
                       Version = c("1.1.4", "0.8.9", NA))
cran_date(versions)
#> [1] "2023-11-17 17:50:03 CET"

This is the first date were these packages were at the requested version number (or available). Currently these packages can have a release with higher version numbers (this can be easily checked with old.packages()).

To answer the original question of this section we can use:

ip <- cran_date(installed.packages())
#> Warning in cran_archive(versions[, "Package"]): Omitting packages repo.data, airway, alabaster.base, alabaster.matrix, alabaster.ranges, alabaster.sce, alabaster.schemas, alabaster.se, annotate, AnnotationDbi, AnnotationFilter, AnnotationHub, assorthead, Biobase, BiocFileCache, BiocGenerics, BiocIO, BioCor, BiocParallel, BiocPkgTools, BiocStyle, BiocVersion, biocViews, biomformat, Biostrings, cransays, DelayedArray, DESeq2, ensembldb, ExperimentHub, fgsea, GenomeInfoDb, GenomeInfoDbData, GenomicAlignments, GenomicFeatures, GenomicRanges, GO.db, GOSemSim, GSEABase, gypsum, h5mread, HDF5Array, IRanges, KEGGREST, MatrixGenerics, microshades, org.Hs.eg.db, phyloseq, preprocessCore, ProtGenerics, reactome.db, resios, rhdf5, rhdf5filters, Rhdf5lib, Rhtslib, rostemplate, rotemplate, Rsamtools, rtracklayer, rutils, S4Arrays, S4Vectors, scRNAseq, SingleCellExperiment, SparseArray, SummarizedExperiment, UCSC.utils, XVector.
#> Maybe they were not on CRAN?
ip
#> [1] "2025-08-29 16:00:06 CEST"

Risk of being archived

If you ever wonder which packages are at risk of being archived you can use cran_doom():

cd <- cran_doom(bioc = TRUE)
#> Retrieving CRAN_db, this might take a bit.
#> Caching results to be faster next call in this session.
#> Retrieving bioc_available_release, this might take a bit.
#> Caching results to be faster next call in this session.
cd[c("time_till_last", "last_archived", "npackages")]
#> $time_till_last
#> Time difference of 71 days
#> 
#> $last_archived
#> [1] "2025-11-20"
#> 
#> $npackages
#>  CRAN   all 
#> 22691 26397
knitr::kable(head(cd$details))
Package Deadline type repo n_affected
geeasy 2025-09-03 direct CRAN 6
spsComps 2025-09-03 direct CRAN 2
GLDreg 2025-09-03 direct CRAN 1
maxLik 2025-09-03 direct CRAN 1
geotargets 2025-09-06 direct CRAN 2
fgdr 2025-09-11 direct CRAN 5

There are website dedicated to track those and provide information about new version submissions to CRAN to fix those. I participate on the cranhaven.org dashboard (and project).

Note that if a package is archived it can be brought back to the repository.

Reproducibility

For reproducibility here is the session info:

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Madrid
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] repo.data_0.1.4
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.5.1    magrittr_2.0.3    cli_3.6.5         rversions_2.1.2  
#>  [5] tools_4.5.1       igraph_2.1.4      rstudioapi_0.17.1 curl_7.0.0       
#>  [9] xml2_1.4.0        knitr_1.50        xfun_0.53         lifecycle_1.0.4  
#> [13] pkgconfig_2.0.3   rlang_1.1.6       evaluate_1.0.5