rsdmx quickstart guide

rsdmx quickstart guide

The goal of this document is to get you up and running with rsdmx as quickly as possible.

rsdmx provides a set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework.

SDMX - a short introduction

The SDMX framework provides two sets of standard specifications to facilitate the exchange of statistical data:

SDMX allows to disseminate both data (a dataset) and metadata (the description of the dataset).

For this, the SDMX standard provides various types of documents, also known as messages. Hence there will be:

For more information about the SDMX standards, you can visit the SDMX website, or this introduction by EUROSTAT.

How to deal with SDMX in R

rsdmx offers a low-level set of tools to read data and metadata in the SDMX-ML format. Its strategy is to make it very easy for the user. For this, a unique function named readSDMX has to be used, whatever it is a data or metadata document, or if it is local or remote datasource.

What rsdmx does support:

Let’s see then how to use rsdmx!

Install rsdmx

rsdmx can be installed from CRAN or from its development repository hosted in Github. For the latter, you will need the remotes package and run:

remotes::install_github("opensdmx/rsdmx")

Load rsdmx

To load rsdmx in R, do the following:

library(rsdmx)

Read dataset documents

This section will introduce you on how to read SDMX dataset documents, either from remote datasources, or from local SDMX files.

Read remote datasets

using the raw approach (specifying the complete request URL)

The following code snipet shows you how to read a dataset from a remote data source, taking as example the OECD StatExtracts portal: https://sdmx.oecd.org/public/rest/data/DSD_PRICES@DF_PRICES_N_CP01/GRC……./all/?startPeriod=2020&endPeriod=2020

You can try it out with other datasources, such as from the EUROSTAT portal: https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/NAMA_10_GDP/A.CP_MEUR.B1GQ.BE+LU

The online rsdmx documentation also provides a list of data providers, either from international or national institutions, and more request examples.

using the helper approach

Now, the service providers above mentioned are known by rsdmx which let users using readSDMX with the helper parameters. The list of service providers can be retrieved doing:

providers <- getSDMXServiceProviders();
as.data.frame(providers)

Note it is also possible to add an SDMX service provider at runtime. For registering a new SDMX service provider by default, please contact me!

Let’s see how it would look like for querying an OECD datasource:

sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "DSD_PRICES@DF_PRICES_N_CP01",
                 key = list("GRC", NULL, NULL, NULL, NULL, NULL, NULL, NULL), start = 2020, end = 2020)
## [rsdmx][INFO] Fetching 'https://sdmx.oecd.org/public/rest/data/DSD_PRICES@DF_PRICES_N_CP01/GRC......./all/?startPeriod=2020&endPeriod=2020'
df <- as.data.frame(sdmx)
head(df)
##   REF_AREA FREQ METHODOLOGY MEASURE UNIT_MEASURE EXPENDITURE ADJUSTMENT
## 1      GRC    M           N     CPI           PA        CP01          N
## 2      GRC    M           N     CPI           PA        CP01          N
## 3      GRC    M           N     CPI           PA        CP01          N
## 4      GRC    M           N     CPI           PA        CP01          N
## 5      GRC    M           N     CPI           PA        CP01          N
## 6      GRC    M           N     CPI           PA        CP01          N
##   TRANSFORMATION DECIMALS obsTime    obsValue OBS_STATUS
## 1             GY        2 2020-08  1.87933600          A
## 2             GY        2 2020-09  2.44928100          A
## 3             GY        2 2020-10  1.89048000          A
## 4             GY        2 2020-11  1.88282800          A
## 5             GY        2 2020-12  0.78369880          A
## 6             GY        2 2020-01 -0.06878891          A

It is also possible to query a dataset together with its “definition”, handled in a separate SDMX-ML document named DataStructureDefinition (DSD). It is particularly useful when you want to enrich your dataset with all labels. For this, you need the DSD which contains all reference data.

To do so, you only need to append dsd = TRUE (default value is FALSE), to the previous request, and specify labels = TRUE when calling as.data.frame, as follows:

sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "DSD_PRICES@DF_PRICES_N_CP01",
                 key = list("GRC", NULL, NULL, NULL, NULL, NULL, NULL, NULL), start = 2020, end = 2020,
                 dsd = TRUE)
## [rsdmx][INFO] Fetching 'https://sdmx.oecd.org/public/rest/data/DSD_PRICES@DF_PRICES_N_CP01/GRC......./all/?startPeriod=2020&endPeriod=2020' 
## [rsdmx][INFO] Attempt to fetch DSD ref from dataflow description 
## [rsdmx][INFO] Fetching 'https://sdmx.oecd.org/public/rest/dataflow/all/DSD_PRICES@DF_PRICES_N_CP01/latest/' 
## [rsdmx][INFO] Fetching 'https://sdmx.oecd.org/public/rest/datastructure/all/DSD_PRICES/latest/?references=children' 
## [rsdmx][INFO] DSD fetched and associated to dataset!
df <- as.data.frame(sdmx, labels = TRUE)
head(df)
##   REF_AREA REF_AREA_label.fr REF_AREA_label.ar REF_AREA_label.en
## 1      GRC             Grèce           اليونان            Greece
## 2      GRC             Grèce           اليونان            Greece
## 3      GRC             Grèce           اليونان            Greece
## 4      GRC             Grèce           اليونان            Greece
## 5      GRC             Grèce           اليونان            Greece
## 6      GRC             Grèce           اليونان            Greece
##   REF_AREA_label.he FREQ FREQ_label.fr FREQ_label.en METHODOLOGY
## 1              יוון    M     Mensuelle       Monthly           N
## 2              יוון    M     Mensuelle       Monthly           N
## 3              יוון    M     Mensuelle       Monthly           N
## 4              יוון    M     Mensuelle       Monthly           N
## 5              יוון    M     Mensuelle       Monthly           N
## 6              יוון    M     Mensuelle       Monthly           N
##   METHODOLOGY_label.en MEASURE     MEASURE_label.en UNIT_MEASURE
## 1             National     CPI Consumer price index           PA
## 2             National     CPI Consumer price index           PA
## 3             National     CPI Consumer price index           PA
## 4             National     CPI Consumer price index           PA
## 5             National     CPI Consumer price index           PA
## 6             National     CPI Consumer price index           PA
##   UNIT_MEASURE_label.fr UNIT_MEASURE_label.en EXPENDITURE
## 1    Pourcentage par an     Percent per annum        CP01
## 2    Pourcentage par an     Percent per annum        CP01
## 3    Pourcentage par an     Percent per annum        CP01
## 4    Pourcentage par an     Percent per annum        CP01
## 5    Pourcentage par an     Percent per annum        CP01
## 6    Pourcentage par an     Percent per annum        CP01
##                                EXPENDITURE_label.fr
## 1 Produits alimentaires et boissons non alcoolisées
## 2 Produits alimentaires et boissons non alcoolisées
## 3 Produits alimentaires et boissons non alcoolisées
## 4 Produits alimentaires et boissons non alcoolisées
## 5 Produits alimentaires et boissons non alcoolisées
## 6 Produits alimentaires et boissons non alcoolisées
##               EXPENDITURE_label.en ADJUSTMENT
## 1 Food and non-alcoholic beverages          N
## 2 Food and non-alcoholic beverages          N
## 3 Food and non-alcoholic beverages          N
## 4 Food and non-alcoholic beverages          N
## 5 Food and non-alcoholic beverages          N
## 6 Food and non-alcoholic beverages          N
##                                                          ADJUSTMENT_label.fr
## 1 Ni corrigé des variations saisonnières ni corrigé des effets de calendrier
## 2 Ni corrigé des variations saisonnières ni corrigé des effets de calendrier
## 3 Ni corrigé des variations saisonnières ni corrigé des effets de calendrier
## 4 Ni corrigé des variations saisonnières ni corrigé des effets de calendrier
## 5 Ni corrigé des variations saisonnières ni corrigé des effets de calendrier
## 6 Ni corrigé des variations saisonnières ni corrigé des effets de calendrier
##                                 ADJUSTMENT_label.en TRANSFORMATION
## 1 Neither seasonally adjusted nor calendar adjusted             GY
## 2 Neither seasonally adjusted nor calendar adjusted             GY
## 3 Neither seasonally adjusted nor calendar adjusted             GY
## 4 Neither seasonally adjusted nor calendar adjusted             GY
## 5 Neither seasonally adjusted nor calendar adjusted             GY
## 6 Neither seasonally adjusted nor calendar adjusted             GY
##        TRANSFORMATION_label.fr  TRANSFORMATION_label.en DECIMALS
## 1 Taux de croissance, sur 1 an Growth rate, over 1 year        2
## 2 Taux de croissance, sur 1 an Growth rate, over 1 year        2
## 3 Taux de croissance, sur 1 an Growth rate, over 1 year        2
## 4 Taux de croissance, sur 1 an Growth rate, over 1 year        2
## 5 Taux de croissance, sur 1 an Growth rate, over 1 year        2
## 6 Taux de croissance, sur 1 an Growth rate, over 1 year        2
##   DECIMALS_label.en obsTime    obsValue OBS_STATUS OBS_STATUS_label.en.label
## 1               Two 2020-08  1.87933600          A              Normal value
## 2               Two 2020-09  2.44928100          A              Normal value
## 3               Two 2020-10  1.89048000          A              Normal value
## 4               Two 2020-11  1.88282800          A              Normal value
## 5               Two 2020-12  0.78369880          A              Normal value
## 6               Two 2020-01 -0.06878891          A              Normal value

For embedded service providers that require a user authentication/subscription key or token, it is possible to specify it in readSDMX with the providerKey argument. If provided, and that the embedded provider requires a specific key parameter, the latter will be appended to the SDMX web-request.

Note that in case you are reading SDMX-ML documents with the native approach (with URLs), instead of the embedded providers, it is also possible to associate a DSD to a dataset by using the function setDSD. Let’s try how it works:

#data without DSD
sdmx.data <- readSDMX(providerId = "OECD", resource = "data", flowRef = "DSD_PRICES@DF_PRICES_N_CP01",
                 key = list("GRC", NULL, NULL, NULL, NULL, NULL, NULL, NULL), start = 2020, end = 2020)

#DSD
sdmx.dsd <- readSDMX(providerId = "OECD", resource = "datastructure", resourceId = "DSD_PRICES")

#associate data and dsd
sdmx.data <- setDSD(sdmx.data, sdmx.dsd)

Read local datasets

This example shows you how to use rsdmx with local SDMX files, previously downloaded from EUROSTAT.

#bulk download from Eurostat
tf <- tempfile(tmpdir = tdir <- tempdir()) #temp file and folder
download.file("https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Frd_e_gerdsc.sdmx.zip", tf)
sdmx_files <- unzip(tf, exdir = tdir)

#read local SDMX (set isURL = FALSE)
sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)
stats <- as.data.frame(sdmx)

By default, readSDMX considers the data source is remote. To read a local file, add isURL = FALSE.

Read metadata documents

This section will introduce you on how to read SDMX metadata complete data structure definition (DSD)

Data Structure Definition (DSD)

This example illustrates how to read a complete DSD using a OECD StatExtracts portal data source.

rsdmx is implemented in object-oriented way with S4 classes and methods. The properties of S4 objects are named slots and can be accessed with the slot method. The following code snippet allows to extract the list of codelists contained in the DSD document, and read one codelist as data.frame.

#get codelists from DSD
cls <- slot(dsd, "codelists")

#get list of codelists
codelists <- sapply(slot(cls, "codelists"), function(x) slot(x, "id"))

#get a codelist
codelist <- as.data.frame(slot(dsd, "codelists"), codelistId = "CL_TABLE1_FLOWS") 

In a similar way, the concepts of the dataset can be extracted from the DSD and read as data.frame.

#get concepts from DSD
concepts <- as.data.frame(slot(dsd, "concepts"))

Save & Reload SDMX R objects

It is possible to save SDMX R objects as RData file (.RData, .rda, .rds), to then be able to reload them into the R session. It could be of added value for users that want to keep their SDMX objects in R data files, but also for fast loading of large SDMX objects (e.g. DSD objects) for use in statistical analyses and R-based web-applications.

To save a SDMX R object to RData file:

saveSDMX(sdmx, "tmp.RData")

To reload a SDMX R object from RData file:

sdmx <- readSDMX("tmp.RData", isRData = TRUE)