volkeR-Package volkeR package logo

Lifecycle: experimental R-CMD-check codecov CRAN status

High-level functions for tabulating, charting and reporting survey data.

Getting started

# Install the package (see below), then load it
library(volker)

# Load example data from the package
data <- volker::chatgpt

# Create your first table and plot, counting answers to an item battery
report_counts(data, starts_with("cg_adoption_social"))

# Create your first table and plot, reporting mean values of the item battery
report_metrics(data, starts_with("cg_adoption_social"))

See further examples in vignette("introduction", package="volker").

Don’t miss the template feature: Within RStudio, create a new Markdown document, select From template, choose and finally knit the volkeR Report! It’s a blueprint for your own tidy reports.

Example report

Concept

The volkeR package is made for creating quick and easy overviews about datasets. It handles standard cases with a handful of functions. Basically you select one of the following functions and throw your data in:

The report functions combine tables, plots and, optionally, effect size calculations. To request only one of those outputs, directly use the respective function:

Which one is best? That depends on your objective:

Examples

Metric Categorical
One variable Density plot Bar chart
Group comparison Group comparison Stacked bar chart
Multiple items Item battery boxplots Item battery bar chart


All functions take a data frame as their first argument, followed by a column selection, and optionally a grouping column. Reproduce the examples above:

The column selections determine which type of output is generated. In the second parameter (after the dataset), you can either provide a single column or a selection of multiple items. To compare groups, provide an additional categorical column in the third parameter. To calculate correlations, provide a metric column in the third parameter and set the metric-parameter to TRUE.

Note: Some column combinations are not implemented yet.

Effect sizes and statistical tests

You can calculate effect sizes and conduct basic statistical tests using effect_counts() and effect_metrics(). Effect calculation is included in the reports if you request it by the effect-parameter, for example:

report_counts(data, adopter, sd_gender, prop="cols", effect=TRUE)

A word of warning: Statistics is the world of uncertainty. All procedures require mindful interpretation. Counting stars might evoke illusions.

Factors and Clusters

You can generate tables and plots for clustering and factor analysis of metric variables. Both clustering and factor analysis are included in the reports when requested using the factors or clusters parameters.

Set the respective parameters to TRUE to generate a scree plot and let the diagnostics choose the optimal number:


report_metrics(data, starts_with("cg_adoption"), factors = TRUE, clusters = TRUE)

Set the desired number directly:


report_metrics(data, starts_with("cg_adoption"), factors = 3, clusters = 4)

You don’t need to add both parameters at the same time if you are only interested in factors or clusters.

Where do all the labels go?

One of the strongest package features is labeling. You know the pain. Labels are stored in the column attributes. Inspect current labels of columns and values by the codebook()-function:

codebook(data)

This results in a table with item names, item values, value names and value labels.

You can set specific column labels by providing a named list to the items-parameter of labs_apply():

data %>%
  labs_apply(
    items = list(
      "cg_adoption_advantage_01" = "Allgemeine Vorteile",
      "cg_adoption_advantage_02" = "Finanzielle Vorteile",
      "cg_adoption_advantage_03" = "Vorteile bei der Arbeit",
      "cg_adoption_advantage_04" = "Macht mehr Spaß"
    )
  ) %>% 
  tab_metrics(starts_with("cg_adoption_advantage_"))

Labels for values inside a column can be adjusted by providing a named list to the values-parameter of labs_apply(). In addition, select the columns where value labels should be changed:


data %>%
  labs_apply(
    cols=starts_with("cg_adoption"),  
    values = list(
      "1" = "Stimme überhaupt nicht zu",
      "2" = "Stimme nicht zu",
      "3" = "Unentschieden",
      "4" = "Stimme zu",
      "5" =  "Stimme voll und ganz zu"
    ) 
  ) %>% 
  plot_metrics(starts_with("cg_adoption"))

To conveniently manage all labels of a dataset, save the result of codebook() to an Excel file, change the labels manually in a copy of the Excel file, and finally call labs_apply() with your revised codebook.


library(readxl)
library(writexl)

# Save codebook to a file
codes <- codebook(data)
write_xlsx(codes,"codebook.xlsx")

# Load and apply a codebook from a file
codes <- read_xlsx("codebook_revised.xlsx")
data <- labs_apply(data, codebook)

Be aware that some data operations such as mutate() from the tidyverse loose labels on their way. In this case, store the labels (in the codebook attribute of the data frame) before the operation and restore them afterwards:

data %>%
  labs_store() %>%
  mutate(sd_age = 2024 - sd_age) %>% 
  labs_restore() %>% 
  
  tab_metrics(sd_age)

SoSci Survey integration

The labeling mechanisms follow a technique used, for example, on SoSci Survey. Sidenote for techies: Labels are stored in the column attributes. That’s why you can directly throw in labeled data from the SoSci Survey API:

library(volker)

# Get your API link from SoSci Survey with settings "Daten als CSV für R abrufen"
eval(parse("https://www.soscisurvey.de/YOURPROJECT/?act=YOURKEY&rScript", encoding="UTF-8"))

# Generate reports
report_counts(ds, A002)

For best results, use sensible prefixes and captions for your SoSci questions. The labels come directly from your questionnaire.

Please note: The values -9, -2, -1 and [NA] nicht beantwortet, [NA] keine Angabe, [no answer] are automatically recoded to missing values within all plot, tab, effect, and report functions. See the clean-parameter help how to disable automatic residual removal.

Customization

You can change plot colors using the theme_vlkr()-function:

theme_set(
  theme_vlkr(
    base_fill = c("#F0983A","#3ABEF0","#95EF39","#E35FF5","#7A9B59"),
    base_gradient = c("#FAE2C4","#F0983A")
  )
)

Plot and table functions share a number of parameters that can be used to customize the outputs. Lookup the available parameters in the help of the specific function.

Data preparation

Calculations

Labeling

Tables

Plots

Installation

As with all other packages you’ll have to install the package first.

install.packages("strohne/volker")

You can try alternative versions:

Special features

Troubleshooting

The kableExtra package produces an error in R 4.3 when knitting documents: .onLoad in loadNamespace() für 'kableExtra' fehlgeschlagen. As a work around, remove PDF and Word settings from the output options in you markdown document (the yml section at the top). Alternatively, install the latest development version:

remotes::install_github("kupietz/kableExtra")

Roadmap

Version Features Status
1.0 Descriptives 80% done
2.0 Effects 60% done
3.0 Factors & clusters 80% done
4.0 Text analysis work in progress

Similar packages

The volker package is inspired by outputs used in the the textbook Einfache Datenauswertung mit R (Gehrau & Maubach et al., 2022), which provides an introduction to univariate and bivariate statistics and data representation using RStudio and R Markdown.

Other packages with high-level reporting functions:
- https://github.com/joon-e/tidycomm
- https://github.com/kassambara/rstatix
- https://github.com/easystats/easystats

Authors and citation

Authors
Jakob Jünger (University of Münster)
Henrieke Kotthoff (University of Münster)

Contributers
Chantal Gärtner (University of Münster)

Citation
Jünger, J. & Kotthoff, H. (2024). volker: High-level functions for tabulating, charting and reporting survey data. R package version 3.0.