bmggum: Bayesian Estimation of Multidimensional Generalized Graded Unfolding Model

Naidan Tu, Bo Zhang

Overview

The bmggum package was developed to estimate Multidimensional Unfolding Models (MGGUM) using Bayesian method. Specifically,the R package rstan that utilizes the Hamiltonian Monte Carlo sampling algorithm was used for estimation. Below are some important features of the bmggum package:

  1. Allows users to incorporate person covariates (e.g., age, gender, education) into the estimation process to improve estimation accuracy.

  2. Automatically deals with missing data in a way similar to how full information maximum likelihood handles missing data.

  3. Allows users to estimate the multidimensional version of three unfolding models that are available in the software GGUM2004 (Roberts, Fang, Cui, & Wang, 2006).

    • UM8: The Generalized Graded Unfolding Model (GGUM).
    • UM4: The Partial Credit Unfolding Model, which is the GGUM with all alphas constrained to 1.
    • UM7: The Generalized Rating Scale Unfolding Model, which is the GGUM with equal taus across items.
  4. Five functions (i.e., bmggum( ), extract( ), modfit( ), bayesplot( ), and itemplot( )) are provided for model estimation, results extraction, model fit examination (e.g.,waic, loo, chisq/df), and plottings, respectively. See below for function details.

Tutorial

Step 1: Input data

A randomly generated dataset is used as an example in this tutorial. The first input (GGUM.Data) is a dataset including responses from 10 respondents answering a 4-item measure on a 4-point Likert scale measuring 2 traits. Missingness was also simulated for demonstration. Note that data is stored in a wide format.

ID item 1 item 2 item 3 item 4
1 1 4 4 2
2 4 1 2 1
3 4 1 2 NA
4 1 3 1 NA
5 2 1 3 NA
6 1 1 2 1
7 1 NA NA 3
8 3 2 2 NA
9 1 NA 1 1
10 1 3 1 4

The second input (delindex) is a two-row matrix specifying item numbers and the positivity/negativity of items. Users need to specify the delindex by themselves. The first row is item number (i.e., 1, 2, 3, 4…), and the second row indicates signs of delta of each item (-1, 0, 1). For items with negative deltas, “-1” should be assigned; for items with positive deltas, “1” should be assigned; for uncertain items whose deltas may be either positive or negative (e.g., intermediate items), “0” should assigned. We recommend at least two positive and two negative items per trait for better estimation. In this example, item 1 and 3 are negative, and item 2 and 4 are positive.

row item 1 item 2 item 3 item 4
1 1 2 3 4
2 -1 1 -1 1

The next part of the data is a row vector mapping items to traits. For example, c(1, 1, 1, 2, 2, 2) means that the first 3 items measure trait 1 and the last 3 items measure trait 2. In the current example, item 1 and 2 measure trait 1, and item 3 and 4 measure trait 2. Note that the current implementation of bmggum cannot deal with within-item multidimensionality (e.g., an item loading on two or more factors).

row item 1 item 2 item 3 item 4
1 1 1 2 2

If person covariates are to be included, a p*c person covariate matrix where p equals sample size and c equals the number of covariates is also needed. In this example, 1 person covariate is included. However, the default is a pure measurement model with no person covariate.

ID covariate
1 0.70
2 -1.25
3 0.48
4 -0.47
5 0.86
6 1.25
7 1.17
8 -1.35
9 -0.84
10 -0.55

Step 2: Estimate using the function bmggum()

# Fit the MGGUM model
#>mod <- bmggum(GGUM.Data=GGUM.Data, delindex=delindex, trait=2, ind=ind, option=4, model="UM8", covariate=covariate)
#>mod

The function bmggum() implements full Bayesian estimation of MGGUM using rstan. The returned object stores information including the (1)stanfit object (item parameter estimates in this object are organized in delta-ascending order), (2)estimated item parameters, (3)estimated person parameters, (4)correlations among traits, (5)regression coefficients linking person covariates to each trait, (6)response data (excluding respondents who endorse a single option across all items), and (7)the input row vector mapping each item to each trait. Note that when covariates are included, output (4) represents residual correlations among the traits after controlling for the covariates. If standardized regression coefficients are expected, users can standardize covariates before inputting them. Below are a list of other arguments it contains, the default of which can be manually replaced:

Step 3: Extract the estimated results using the function extract()

# Extract theta estimates 
#>theta <- extract(x=mod, pars='theta')
#>theta
# Turn theta estimates into p*trait matrix where p equals sample size and trait equals the number of latent traits
#>theta <- theta[,1]
# nrow=trait
#>theta <- matrix(theta, nrow=2, byrow = TRUE)  
#>theta <- t(theta)
# theta estimates in p*trait matrix format
#>theta


# Extract tau estimates 
#>tau <- extract(x=mod, pars='tau')
#>tau
# Turn the tau estimates into I*(option-1) matrix where I equals the number of items and option equals the number of response options
#>tau <- tau[,1]
# nrow=option-1
#>tau <- matrix(tau, nrow=3, byrow = TRUE)  
#>tau <- t(tau)
# tau estimates in I*(option-1) matrix format
#>tau


# Extract lambda estimates 
#>lambda <- extract(x=mod, pars='lambda')
# lambda[1,1] is the coefficient linking person covariate 1 to latent trait 1
# lambda[1,2] is the coefficient linking person covariate 1 to latent trait 2
#>lambda

The function extract() extracts bmggum estimation results.

Step 4: Obtain model fit statistics using the function modfit()

# Obtain the model fit statistic loo
#>loo <- modfit(mod)
#>loo

# Obtain the model fit statistic waic 
#>waic <- modfit(x=mod, index='waic')
#>waic

Step 5: Plotting using the function bayesplot()

# Obtain density plots for all alphas. 
#>bayesplot(x=mod, pars='alpha', plot='density', inc_warmup=FALSE)
# Obtain trace plots for the discrimination parameter of the first two items (alpha[1] and alpha[2]).
#>bayesplot(x=mod, pars=paste0("alpha[",1:2,"]"), plot='trace', inc_warmup=FALSE)

The function bayesplot() provides plots including density plots, trace plots, and auto-correlation plots to aid model convergence diagnosis. The smoothness of density plots, the stationary status of trace plots, and low degree of auto-correlation in auto-correlation plots all indicate good convergence. In this example, the density plots for alpha look ok. More iterations are needed to achieve stationary status of trace plots for alpha[1] and alpha[2]. The auto-correlation for theta[1,2] is high, and increase thinning might help. Note that results presented above are just for demonstration and may not reflect typical GGUM results.

Step 6: Plotting observable response categories (ORCs) for items using the function itemplot()

# Obtain item plots with ORCs for all items. 
#>itemplot(x=mod)
# Obtain item plots with ORCs for item 2, 3, 4.
#>itemplot(x=mod, items = 2:4)