Many studies have two variates where each variate is a score on an ordinal scale (e.g., an integer on a 1,…,M scale). Such data are typically organized into a rank-ordered matrix of frequency values where the element in the [I,J] cell is the frequency of occasions where one variate has a rank value of I while the corresponding rank for the other variate is J. For such matrices, Goodman and Kruskal (1954) provided a frequentist distribution-free concordance correlation statistic that has come to be called the Goodman and Kruskal’s gamma or the G statistic (Siegel & Castellan, 1988). The dfba_gamma()
function provides a corresponding Bayesian distribution-free analysis given the input of a rank-ordered matrix.
Chechile (2020) showed that the Goodman-Kruskal gamma is equivalent to the more general Kendall τA nonparametric correlation coefficient. Historically, gamma was considered a different metric from τ because, typically, the version of τ in standard use was τB, which is a flawed metric because it does not properly correct for ties. It is important to point out that the commands cor(x, y, method = "kendall")
and cor.test(x, y, method = "kendall")
(from the stats
package) return the τB correlation, which is incorrect when there are ties.
The correct τA is computed by the dfba_bivariate_concordance()
function (see the vignette for the dfba_bivariate_concordance()
function for more details and examples about the difference between τA and τB). The dfba_gamma()
function is similar to the dfba_bivariate_concordance()
function; the main difference is that the dfba_gamma()
function deals with data that are organized in advance into a rank-ordered table or matrix, whereas the input for the dfba_bivariate_concordance()
function are two paired vectors x
and y
of continuous values.
The gamma statistic is equal to:
G=nc−ndnc+nd,
where nc is the number of occasions when the variates change in a concordant way, and nd is the number of occasions when the variates change in a discordant fashion. The value of nc for an order matrix is the sum of terms for each [I,J] that are equal to nijN+ij, where nij is the frequency for cell [I,J] and N+ij is the sum of the frequencies in the matrix where the row value is greater than I and where the column value is greater than J. The value nd is the sum of terms for each [I,J] that are nijN−ij, where N−ij is the sum of the frequencies in the matrix where row value is greater than I and the column value is less than J. The nc and nd values computed in this fashion are respectively equal to nc and nd values found when the bivariate measures are entered as paired vectors into the dfba_bivariate_concordance()
function.
As with the dfba_bivariate_concordance()
function, the Bayesian analysis focuses on the population concordance proportion parameter ϕ, which is linked to the G statistic because G=2ϕ−1. The likelihood function is proportional to ϕnc(1−ϕ)nd. Similar to the Bayesian analysis for the concordance parameter in the dfba_bivariate_concordance()
function, the prior distribution is a beta distribution with shape parameters a0 and b0, and the posterior distribution is the conjugate beta distribution where shape parameters are a=a0+nc and b=b0+nd.
dfba_gamma()
FunctionThe dfba_gamma()
function has one required argument x
that must be an object in the form of a matrix or a table.
The following example demonstrates how to create a matrix of data and to analyze it using the dfba_gamma()
function.
N <- matrix(c(38, 4, 5, 0, 6, 40, 1, 2, 4, 8, 20, 30),
ncol = 4,
byrow = TRUE)
colnames(N) <- c('C1', 'C2', 'C3', 'C4')
rownames(N) <- c('R1', 'R2', 'R3')
A <- dfba_gamma(N)
A
#> Descriptive Statistics
#> ========================
#> Concordant Pairs Discordant Pairs
#> 6588 566
#> Proportion of Concordant Pairs
#> 0.9208834
#> Goodman-Kruskal Gamma
#> 0.8417668
#>
#> Bayesian Analyses
#> ========================
#> Posterior Beta Shape Parameters for the Concordance Phi
#> a b
#> 6589 567
#> Posterior Median
#> 0.920805
#> 95% Equal-tail interval limits:
#> Lower Limit Upper Limit
#> 0.914398 0.9269112
The dfba_gamma()
function also has three optional arguments; listed with their respective default arguments, they are: a0 = 1
, b0 = 1
, and prob_interval = .95
The a0
and b0
arguments are the shape parameters for the prior beta distribution; the default value of 1 for each corresponds to a uniform prior. The prob_interval
argument specifies the probability value for the interval estimate of the ϕ concordance parameter.
Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.
Goodman, L. A., and Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732-764.
Siegel, S., and Castellan, N. J. (1988). Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill.