The SomaDataIO
R package loads and exports ‘SomaScan’
data via the SomaLogic Operating Co., Inc. structured text file called
an ADAT (*.adat
). The package also exports auxiliary
functions for manipulating, wrangling, and extracting relevant
information from an ADAT object once in memory. Basic familiarity with
the R environment is assumed, as is the ability to install contributed
packages from the Comprehensive R Archive Network (CRAN).
If you run into any issues/problems with SomaDataIO
full
documentation of the most recent release can
be found at our website of articles and
workflows. If the issue persists we encourage you to consult the issues page
and, if appropriate, submit an issue and/or feature request.
The SomaDataIO
package is licensed under the MIT
license and is intended solely for research use only (“RUO”) purposes.
The code contained herein may not be used for diagnostic,
clinical, therapeutic, or other commercial purposes.
The easiest way to install SomaDataIO
is to install
directly from CRAN:
install.packages("SomaDataIO")
Alternatively from GitHub:
::install_github("SomaLogic/SomaDataIO") remotes
which installs the most current “development” version from the
repository HEAD
. To install the most recent
release, use:
::install_github("SomaLogic/SomaDataIO@*release") remotes
To install a specific tagged release, use:
::install_github("SomaLogic/SomaDataIO@v5.3.0") remotes
The SomaDataIO
package was intentionally developed to
contain a limited number of dependencies from CRAN. This makes the
package more stable to external software design changes but also limits
its contained feature set. With this in mind, SomaDataIO
aims to strike a balance providing long(er)-term stability and a limited
set of features. Below are the package dependencies (see also the DESCRIPTION
file):
The Biobase
package is suggested, being
required by only two functions, pivotExpressionSet()
and
adat2eSet()
. Biobase
must be installed separately from Bioconductor by entering the
following from the R
Console:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}::install("Biobase", version = remotes::bioc_version()) BiocManager
Information about Bioconductor can be found here: https://bioconductor.org/install/
Upon successful installation, load the
SomaDataIO
as normal:
library(SomaDataIO)
For an index of available commands:
library(help = SomaDataIO)
The SomaDataIO
package comes with four (4) objects
available to users to run canned examples (or analyses). They can be
accessed once SomaDataIO
has been attached via
library()
. They are:
example_data
: the original ‘SomaScan’ file
(example_data.adat
) can be found here or
downloaded directly via:
wget https://raw.githubusercontent.com/SomaLogic/SomaLogic-Data/main/example_data.adat
within SomaDataIO
it has been replaced by an
abbreviated, light-weight version containing only the first 10
samples:
dir(system.file("extdata", package = "SomaDataIO"), full.names = TRUE)
ex_analytes
: the analyte (feature) variables in
example_data
ex_anno_tbl
: the annotations table associated with
example_data
ex_target_names
: a mapping object for analyte ->
target
See also ?SomaScanObjects
*.adat
text file into an
R
session as a soma_adat
object.soma_adat
object as a *.adat
text file.Loading an ADAT text file is simple using
read_adat()
:
# Sample file name
<- system.file("extdata", "example_data10.adat",
f package = "SomaDataIO", mustWork = TRUE)
<- read_adat(f)
my_adat
# test object class
is.soma_adat(my_adat)
#> [1] TRUE
# S3 print method (forwards -> tibble)
my_adat#> ══ SomaScan Data ═══════════════════════════════════════════════════════════════
#> SomaScan version V4 (5k)
#> Signal Space 5k
#> Attributes intact ✓
#> Rows 10
#> Columns 5318
#> Clinical Data 34
#> Features 5284
#> ── Column Meta ─────────────────────────────────────────────────────────────────
#> ℹ SeqId, SeqIdVersion, SomaId, TargetFullName, Target, UniProt, EntrezGeneID,
#> ℹ EntrezGeneSymbol, Organism, Units, Type, Dilution, PlateScale_Reference,
#> ℹ CalReference, Cal_Example_Adat_Set001, ColCheck,
#> ℹ CalQcRatio_Example_Adat_Set001_170255, QcReference_170255,
#> ℹ Cal_Example_Adat_Set002, CalQcRatio_Example_Adat_Set002_170255, Dilution2
#> ── Tibble ──────────────────────────────────────────────────────────────────────
#> # A tibble: 10 × 5,319
#> row_names PlateId PlateRunDate ScannerID PlatePosition SlideId Subarray
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 258495800012_3 Example… 2020-06-18 SG152144… H9 2.58e11 3
#> 2 258495800004_7 Example… 2020-06-18 SG152144… H8 2.58e11 7
#> 3 258495800010_8 Example… 2020-06-18 SG152144… H7 2.58e11 8
#> 4 258495800003_4 Example… 2020-06-18 SG152144… H6 2.58e11 4
#> 5 258495800009_4 Example… 2020-06-18 SG152144… H5 2.58e11 4
#> 6 258495800012_8 Example… 2020-06-18 SG152144… H4 2.58e11 8
#> 7 258495800001_3 Example… 2020-06-18 SG152144… H3 2.58e11 3
#> 8 258495800004_8 Example… 2020-06-18 SG152144… H2 2.58e11 8
#> 9 258495800001_8 Example… 2020-06-18 SG152144… H12 2.58e11 8
#> 10 258495800004_3 Example… 2020-06-18 SG152144… H11 2.58e11 3
#> # ℹ 5,312 more variables: SampleId <chr>, SampleType <chr>,
#> # PercentDilution <int>, SampleMatrix <chr>, Barcode <lgl>, Barcode2d <chr>,
#> # SampleName <lgl>, SampleNotes <lgl>, AliquotingNotes <lgl>,
#> # SampleDescription <chr>, …
#> ════════════════════════════════════════════════════════════════════════════════
Please see vignette
vignette("tips-loading-and-wrangling", package = "SomaDataIO")
for more details and options.
The soma_adat
class comes with numerous class-specific
S3 methods to the most popular dplyr and tidyr generics.
# see full complement of `soma_adat` methods
methods(class = "soma_adat")
#> [1] [ [[ [[<- [<- ==
#> [6] $ $<- anti_join arrange count
#> [11] filter full_join getAdatVersion getAnalytes getMeta
#> [16] group_by inner_join is_seqFormat left_join Math
#> [21] median merge mutate Ops print
#> [26] rename right_join row.names<- sample_frac sample_n
#> [31] semi_join separate slice_sample slice summary
#> [36] Summary transform ungroup unite
#> see '?methods' for accessing help and source code
Please see vignette
vignette("tips-loading-and-wrangling", package = "SomaDataIO")
for more details about available soma_adat
methods.
The soma_adat
object also contains specific structure
that are useful to users. Please also see ?colmeta
or
?annotations
for further details about these fields.
This section now lives in individual package articles. For further detail please see:
stats::t.test()
vignette("stat-two-group-comparison", package = "SomaDataIO")
stats::aov()
vignette("stat-three-group-analysis-anova", package = "SomaDataIO")
stats::glm()
vignette("stat-binary-classification", package = "SomaDataIO")
stats::lm()
vignette("stat-linear-regression", package = "SomaDataIO")