Using a custom controlled vocabulary for rMZQC

This vignette serves as a guide for R users to use a custom CV before creating an mzQC document.

Warning: you should settle on a CV before instantiating any mzQC objects, since this ensures that all CV terms are consistent (and checked for existance) and that the CV meta information within the mzQC document is accurate.

Target Audience: R users

Create a minial mzQC document with a custom CV

Let’s first consider what happens by default:

library(rmzqc)

print(getCVInfo())
## Reference class object of class "MzQCcontrolledVocabulary"
## Field "name":
## [1] "Proteomics Standards Initiative Mass Spectrometry Ontology"
## Field "uri":
## [1] "https://github.com/HUPO-PSI/psi-ms-CV/releases/download/v4.1.147/psi-ms.obo"
## Field "version":
## [1] "4.1.147"
## test if the default CV is usable
toQCMetric(id = "MS:4000059", value = 13405) ## number of MS1 scans
## Reference class object of class "MzQCqualityMetric"
## Field "accession":
## [1] "MS:4000059"
## Field "name":
## [1] "number of MS1 spectra"
## Field "description":
## [1] "\"The number of MS1 events in the run.\" [PSI:MS]"
## Field "value":
## [1] 13405
## Field "unit":
## list()

However, if you happen to run this code without an internet connection, it will fall back to the PSI-MS CV which is shipped with the rmzqc package (which may not contain the latest CV terms)

## With internet:
myCV = getCVSingleton()
myCV$setData(getCVDictionary("latest")) ## this is done internally by default when you load the package
## Downloading obo from 'https://github.com/HUPO-PSI/psi-ms-CV/releases/download/v4.1.147/psi-ms.obo' ...
cat("Number of entries in latest CV: ", nrow(getCVSingleton()$getCV()), "\n")
## Number of entries in latest CV:  6744
print(getCVInfo())
## Reference class object of class "MzQCcontrolledVocabulary"
## Field "name":
## [1] "Proteomics Standards Initiative Mass Spectrometry Ontology"
## Field "uri":
## [1] "https://github.com/HUPO-PSI/psi-ms-CV/releases/download/v4.1.147/psi-ms.obo"
## Field "version":
## [1] "4.1.147"
## simulate missing internet connection by invoking the function manually
myCV$setData(getCVDictionary("local"))
cat("Number of entries in local CV: ", nrow(getCVSingleton()$getCV()), "\n")
## Number of entries in local CV:  6700
print(getCVInfo())
## Reference class object of class "MzQCcontrolledVocabulary"
## Field "name":
## [1] "Proteomics Standards Initiative Mass Spectrometry Ontology"
## Field "uri":
## [1] "https://github.com/HUPO-PSI/psi-ms-CV/releases/download/v4.1.129/psi-ms.obo"
## Field "version":
## [1] "4.1.129"

Now, the package’s PSI-MS CV might still not suit you, and you want to use the latest unpublished CV, which you downloaded somewhere, or which you handcrafted for testing. Then simply use a custom .obo file:

myOBO = system.file("./cv/psi-ms.obo", package="rmzqc") ## we will use a local file, but you can point to anything you have (even URI's)
myCV$setData(getCVDictionary("custom", myOBO))
cat("Number of entries in custom CV: ", nrow(getCVSingleton()$getCV()), "\n")
## Number of entries in custom CV:  6700
print(getCVInfo())
## Reference class object of class "MzQCcontrolledVocabulary"
## Field "name":
## [1] "Proteomics Standards Initiative Mass Spectrometry Ontology"
## Field "uri":
## [1] "C:/Users/bielow/AppData/Local/Temp/Rtmp2fWCUu/Rinst352c48655dee/rmzqc/./cv/psi-ms.obo"
## Field "version":
## [1] "4.1.129"
## you may want to change the CV-entries, or URI or version manually, before creating an mzQC file:
newCV = list(CV = myCV$getData()$CV, 
             URI = "https://myURI.com",
             version = "9.9.2")
myCV$setData(newCV)
print(getCVInfo())
## Reference class object of class "MzQCcontrolledVocabulary"
## Field "name":
## [1] "Proteomics Standards Initiative Mass Spectrometry Ontology"
## Field "uri":
## [1] "https://myURI.com"
## Field "version":
## [1] "9.9.2"