This documents gives some instructions on how to create graphical
representations of correlation matrices (within sets and between sets of
variables) in the statistical environment R with package
Correlplot, using a variety of different statistical
methods (Graffelman and De Leeuw (2023)).
For the within-set correlation matrix, we use principal component
analysis (PCA), multidimensional scaling (MDS), principal factor
analysis (PFA), weighted alternating least squares (WALS), correlograms
(CRG) and corrgrams to produce displays of correlation structure. The
next section shows how to use the functions of the package in order to
create the different graphical representations, using both R base
graphics and ggplot2 graphics (Wickham (2016)). The computation of
goodness-of-fit statistics is also addressed. All these methods are
illustrated on a single data set, the wheat kernel data introduced
below. A separate section is dedicated to the representation of the
between-set correlation matrix by means of canonical correlation
analysis (CCA) and various adjustments of this matrix by means of an
alternating generalised least squares (AGLS) algorithm (Graffelman (2026)). The canonical methods are
illustrated with the psychology-academic achievement data.
We first load some packages we will use:
library(calibrate)
#> Loading required package: MASS
library(ggplot2)
library(corrplot)
#> corrplot 0.95 loaded
library(Correlplot)Throughout this section, we will use the wheat kernel data set taken from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/seeds) in order to illustrate the different plots. The wheat kernel data (Charytanowicz et al. (2010)) consists of 210 wheat kernels, for which the variables area (\(A\)), perimeter (\(P\)), compactness (\(C = 4*\pi*A/P^2\)), length, width, asymmetry coefficient and groove (length of the kernel groove) were registered. There are 70 kernels of each of three varieties Kama, Rosa and Canadian; here we will only use the kernels of variety Kama. The data is made available with:
data("Kernels")
X <- Kernels[Kernels$variety==1,]
X <- X[,-8]
head(X)
#> area perimeter compactness length width asymmetry groove
#> 1 15.26 14.84 0.8710 5.763 3.312 2.221 5.220
#> 2 14.88 14.57 0.8811 5.554 3.333 1.018 4.956
#> 3 14.29 14.09 0.9050 5.291 3.337 2.699 4.825
#> 4 13.84 13.94 0.8955 5.324 3.379 2.259 4.805
#> 5 16.14 14.99 0.9034 5.658 3.562 1.355 5.175
#> 6 14.38 14.21 0.8951 5.386 3.312 2.462 4.956The correlation matrix of the variables is given by:
p <- ncol(X)
R <- cor(X)
round(R,digits=3)
#> area perimeter compactness length width asymmetry groove
#> area 1.000 0.976 0.371 0.835 0.900 -0.050 0.721
#> perimeter 0.976 1.000 0.165 0.921 0.802 -0.054 0.794
#> compactness 0.371 0.165 1.000 -0.146 0.667 0.037 -0.131
#> length 0.835 0.921 -0.146 1.000 0.551 -0.037 0.866
#> width 0.900 0.802 0.667 0.551 1.000 -0.027 0.447
#> asymmetry -0.050 -0.054 0.037 -0.037 -0.027 1.000 -0.011
#> groove 0.721 0.794 -0.131 0.866 0.447 -0.011 1.000The corrgram (Friendly (2002)) is a
tabular display of the entries of a correlation matrix that uses colour
and shading to represent correlations. Corrgrams can be made with the
fuction corrplot
A corrgram of the wheat kernel data.
This shows most correlations are positive, and correlations with asymmetry are weak.
The correlogram (Trosset (2005)) represents correlations by the cosines of angles between vectors.
The correlogram of the wheat kernel data.
The vector theta.cos contains the angles (in radians) of
each variable with respect to the positive \(x\)-axis. The approximation provided by
these angles to the correlation matrix is calculated by
The correlogram always perfectly represents the correlations of the
variables with themselves, and these have a structural contribution of
zero to the loss function. We calculate the root mean squared error
(RMSE) of the approximation by using function rmse. We
include the diagonal of the correlation matrix in the RMSE calculation
by supplying a weight matrix of only ones.
This gives and RMSE of 0.2438, which shows this representation has a large amount of error. The correlogram can be modified by using a linear interpretation rule, rendering correlations linear in the angle (Graffelman (2013)). This representation is obtained by:
theta.lin <- correlogram(R,ifun="lincos",labs=colnames(R),xlim=c(-1.1,1.1),
ylim=c(-1.1,1.1),main="Linear Correlogram")Linear correlogram of the wheat kernel data.
The approximation to the correlation matrix by using this linear interpretation function is calculated by
Rhat.corlin <- angleToR(theta.lin,ifun="lincos")
rmse.lin <- rmse(R,Rhat.corlin,W=W1)
rmse.lin
#> [1] 0.1667556and this gives a RMSE of 0.1668. In this case, the linear representation is seen to improve the approximation.
Function ggcorrelogram can be used to create
correlograms on a ggplot2 canvas (Wickham (2016)). The output object contains the
field theta containing the vector of angles.
Correlogram of the wheat kernel data on a ggplot2 canvas.
We create a PCA biplot of the correlation matrix, doing the
calculations for a PCA by hand, using the singular value decomposition
of the (scaled) standardized data. Alternatively, standard R function
princomp may be used to obtain the coordinates needed for
the correlation biplot. We use function bplot from package
calibrate to make the biplot:
n <- nrow(X)
Xt <- scale(X)/sqrt(n-1)
res.svd <- svd(Xt)
Fs <- sqrt(n)*res.svd$u # standardized principal components
Gp <- res.svd$v%*%diag(res.svd$d) # biplot coordinates for variables
bplot(Fs,Gp,colch=NA,collab=colnames(X),xlab="PC1",ylab="PC2",main="PCA")PCA biplot of the (standardized) wheat kernel data.
The joint representation of kernels and variables emphasizes this is
a biplot of the (standardized) data matrix. However, this plot is a
double biplot because scalar products between variable vectors
approximate the correlation matrix. We stress this by plotting the
variable vectors only, and adding a unit circle. In order to facilitate
recovery of the approximations to the correlations between the
variables, correlation tally sticks can be added as with the
tally function, as is shown in the figure below:
bplot(Gp,Gp,colch=NA,rowch=NA,collab=colnames(X),xl=c(-1,1),yl=c(-1,1),main="PCA")
circle()
tally(Gp[-6,1:2],values=seq(-0.2,0.8,by=0.2))PCA biplot of the correlation matrix with correlation tally sticks.
The PCA biplot of the correlation matrix can be obtained from a correlation-based PCA or also directly from the spectral decomposition of the correlation matrix. The rank two approximation, obtained by means of scalar products between vectors, is calculated by:
In principle, PCA also tries to approximate the ones on the diagonal of the correlation matrix. These are included in the RMSE calculation by supplying a unit weight matrix.
This gives a RMSE of 0.1460, which is an improvement over the
previous correlograms. Function rmse can also be used to
calculate the RMSE of each variable separately:
rmse(R,Rhat.pca,W=W1,per.variable=TRUE)
#> area peri comp leng widt asym groo
#> 0.01429494 0.02168169 0.03158330 0.02386245 0.02047550 0.27686959 0.06000407This shows that asymmetry, the shortest biplot vector, has the worst fit.
PCA biplots can also be made on a ggplot2 canvas using
the function ggbplot. We redo the biplots of the data
matrix and the correlation matrix. It is convenient first to establish
the range of variation along both axes considering both row and column
markers. We find these ranges using jointlim:
limits <- jointlim(Fs,Gp)
limits
#> $xlim
#> [1] -2.250838 2.529940
#>
#> $ylim
#> [1] -2.650718 1.960994Next, we prepare two dataframes, one with the coordinates and names for the rows, and one for the columns.
df.rows <- data.frame(Fs[,1:2])
colnames(df.rows) <- c("PA1","PA2")
df.rows$strings <- 1:n
df.cols <- data.frame(Gp[,1:2])
colnames(df.cols) <- c("PA1","PA2")
df.cols$strings <- colnames(R)We compute the variance decomposition table:
lambda <- res.svd$d^2
lambda.frac <- lambda/sum(lambda)
lambda.cum <- cumsum(lambda.frac)
decom <- round(rbind(lambda,lambda.frac,lambda.cum),digits=3)
colnames(decom) <- paste("PC",1:p,sep="")
decom
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
#> lambda 4.205 1.516 0.999 0.208 0.051 0.020 0.001
#> lambda.frac 0.601 0.217 0.143 0.030 0.007 0.003 0.000
#> lambda.cum 0.601 0.817 0.960 0.990 0.997 1.000 1.000And use the table to construct axis labels that inform about the percentage of variance explained by each PC.
xlab <- paste("PC1 (",round(100*lambda.frac[1],digits=1),"%)",sep="")
ylab <- paste("PC2 (",round(100*lambda.frac[2],digits=1),"%)",sep="")Finally, we construct the PCA biplot of the data matrix with
ggbplot.
biplotX <- ggbplot(df.rows,df.cols,main="PCA biplot",xlab=xlab,
ylab=ylab,xlim=limits$xlim,ylim=limits$ylim,
colch="",size=2)
biplotXPCA biplot on a ggplot canvas
The biplot of the correlation matrix is obtained, because of its symmetry, by supplying the same biplot markers twice, once for the rows and once for the columns. Because the goodness-of-fit of the correlation matrix requires the squared eigenvalues (Gabriel (1971); Graffelman and De Leeuw (2023)), we first establish new labels for the axes:
lambdasq <- lambda^2
lambdasq.frac <- lambdasq/sum(lambdasq)
lambdasq.cum <- cumsum(lambdasq.frac)
decomsq <- round(rbind(lambdasq,lambdasq.frac,lambdasq.cum),
digits=3)
colnames(decomsq) <- paste("PC",1:p,sep="")
decomsq
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
#> lambdasq 17.682 2.299 0.997 0.043 0.003 0 0
#> lambdasq.frac 0.841 0.109 0.047 0.002 0.000 0 0
#> lambdasq.cum 0.841 0.950 0.998 1.000 1.000 1 1
xlab <- paste("PC1 (",round(100*lambdasq.frac[1],digits=1),"%)",sep="")
ylab <- paste("PC2 (",round(100*lambdasq.frac[2],digits=1),"%)",sep="")biplotR <- ggbplot(df.cols,df.cols,
main="PCA Correlation biplot",
xlab=xlab,ylab=ylab,
xlim=c(-1,1),ylim=c(-1,1),
rowarrow=TRUE,rowcolor="blue",
colch="",rowch="",size=3)
biplotRPCA Correlation biplot on a ggplot canvas.
We now add correlation tally sticks to the biplot, in order to favour
“reading off” the approximations of the correlations. Increments of 0.2
in the correlation scale are marked off along the biplot vectors. For
the shortest biplot vector, asym, we use a modified scale
with 0.01 increments. Note that the goodness-of-fit of the correlation
matrix is (always) better or as good as the goodness-of-fit of the
standardized data matrix (Graffelman and De Leeuw
(2023)).
biplotR <- ggtally(biplotR,Gp,Gp,R,ind=c(1:5,7),
values=seq(-0.2,0.8,by=0.2),dotsize=1.0,
verbose=FALSE)
biplotR <- ggtally(biplotR,Gp,Gp,R,ind=6,
values=seq(-0.01,0.01,by=0.01),
dotsize=0.2)
biplotRPCA Correlation biplot with correlation tally sticks.
We transform correlations to distances with the \(\sqrt{2(1-r)}\) transformation (Hills (1969)), and use the cmdscale
function from the stats package to perform metric
multidimensional scaling. We mark negative correlations with a dashed
red line.
Di <- sqrt(2*(1-R))
out.mds <- cmdscale(Di,eig = TRUE)
Fp <- out.mds$points[,1:2]
opar <- par(bty = "l")
plot(Fp[,1],Fp[,2],asp=1,xlab="First principal axis",
ylab="Second principal axis",main="MDS")
textxy(Fp[,1],Fp[,2],colnames(R),cex=0.75)
par(opar)
ii <- which(R < 0,arr.ind = TRUE)
for(i in 1:nrow(ii)) {
segments(Fp[ii[i,1],1],Fp[ii[i,1],2],
Fp[ii[i,2],1],Fp[ii[i,2],2],col="red",lty="dashed")
}MDS map of the correlation matrix of the wheat kernel data.
We calculate distances in the map, and convert these back to correlations:
Again, correlations of the variables with themselves are perfectly approximated (zero distance), and we include these in the RMSE calculations:
The fit by distances offered by MDS is better than the approximation
by scalar products obtained in PCA. The same MDS map can be made on a
ggplot2 canvas with
colnames(Fp) <- paste("PA",1:2,sep="")
rownames(Fp) <- as.character(1:nrow(Fp))
Fp <- data.frame(Fp)
Fp$strings <- colnames(R)
MDSmap <- ggplot(Fp, aes(x = PA1, y = PA2)) +
coord_equal(xlim = c(-1,1), ylim = c(-1,1)) +
ggtitle("MDS") +
xlab("First principal axis") + ylab("Second principal axis") +
theme(plot.title = element_text(hjust = 0.5,
size = 8),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank())
MDSmap <- MDSmap + geom_point(data = Fp, aes(x = PA1, y = PA2),
colour = "black", shape = 1)
MDSmap <- MDSmap + geom_text(data = Fp, aes(label = strings),
size = 3, alpha = 1,
vjust = 2)
Z <- matrix(NA,nrow=nrow(ii),ncol=4)
for(i in 1:nrow(ii)) {
Z[i,] <- c(Fp[ii[i,1],1],Fp[ii[i,1],2],Fp[ii[i,2],1],Fp[ii[i,2],2])
}
colnames(Z) <- c("X1","Y1","X2","Y2")
Z <- data.frame(Z)
MDSmap <- MDSmap + geom_segment(data=Z,aes(x=X1,y=Y1,
xend=X2,yend=Y2),
alpha=0.75,color="red",linetype=2)
MDSmapMDS map on ggplot canvas.
Principal factor analysis can be performed by the function
pfa of package Correlplot. We extract the
factor loadings.
The biplot of the correlation matrix obtained by PFA is in fact the same as what is known as a factor loading plot in factor analysis, to which a unit circle can be added. The approximation to the correlation matrix and its RMSE are calculated as:
In this case, the diagonal is excluded, for PFA explicitly avoids fitting the diagonal. To make the factor loading plot, aka PFA biplot of the correlation matrix:
bplot(L,L,pch=NA,xl=c(-1,1),yl=c(-1,1),xlab="Factor 1",ylab="Factor 2",main="PFA",rowch=NA,
colch=NA)
circle()PFA biplot of the correlation matrix of the wheat kernel data.
The RMSE of the plot obtained by PFA is 0.0112, lower than the RMSE obtained by PCA. Note that variable area reaches the unit circle for having a communality of 1, or, equivalently, specificity 0, i.e. PFA produces what is known as a Heywood case in factor analysis. The specificities are given by:
diag(out.pfa$Psi)
#> area peri comp leng widt asym
#> 0.000000000 0.011494664 0.158097108 0.007885055 0.022509545 0.997799829
#> groo
#> 0.258768379We also construct the PFA biplot on a ggplot2 canvas
with
L.df <- data.frame(L,rownames(L))
colnames(L.df) <- c("PA1","PA2","strings")
ggbplot(L.df,L.df,xlab="Factor 1",ylab="Factor 2",main="PFA biplot",
rowarrow=TRUE,rowcolor="blue",colch="",rowch="",size=3)PFA biplot on a ggplot canvas.
The correlation matrix can also be factored using weighted
alternating least squares, avoiding the fit of the ones on the diagonal
of the correlation matrix by assigning them weight 0, using function
ipSymLS (De Leeuw
(2006)).
bplot(Fp.als,Fp.als,rowch=NA,colch=NA,collab=colnames(R),
xl=c(-1.1,1.1),yl=c(-1.1,1.1),main="WALS")
circle()WALS biplot of the correlation matrix of the wheat kernel data.
Weighted alternating least squares has, in contrast to PFA, no restriction on the vector length. If the vector lengths in the biplot are calculated, then variable area is seen to just move out of the unit circle.
Rhat.wals <- Fp.als%*%t(Fp.als)
sqrt(diag(Rhat.wals))
#> [1] 1.00124368 0.99394213 0.91345321 0.99646265 0.99026217 0.04686397 0.86124152
rmse.als <- rmse(R,Rhat.wals)
rmse.als
#> [1] 0.01118619The RMSE of the approximation obtained by WALS is 0.011186, slightly
below the RMSE of PFA. The WALS low rank approximation to the
correlation matrix can also be obtained by the more generic function
wAddPCA, which allows for non-symmetric matrices and
adjustments (see the next section). In order to get uniquely defined
biplot vectors for each variable, corresponding to symmetric input, an
eigendecomposition is applied to the fitted correlation matrix.
out.wals <- wAddPCA(R, Wdiag0, add = "nul", verboseout = FALSE, epsout = 1e-10)
Rhat.wals <- out.wals$a%*%t(out.wals$b)
out.eig <- eigen(Rhat.wals)
Fp.adj <- out.eig$vectors[,1:2]%*%diag(sqrt(out.eig$values[1:2]))
rmse.als <- rmse(R,Rhat.wals)
rmse.als
#> [1] 0.01118619We also make the WALS biplot on a ggplot2 canvas
with
Fp.als.df <- data.frame(Fp.als,colnames(R))
colnames(Fp.als.df) <- c("PA1","PA2","strings")
ggbplot(Fp.als.df,Fp.als.df,xlab="Dimension 1",
ylab="Dimension 2",main="WALS biplot",
rowarrow=TRUE,rowcolor="blue",colch="",rowch="",
size=3)WALS biplot of the correlation matrix of the wheat kernel data.
A scalar adjustment can be employed to improve the approximation of the correlation matrix, and the adjusted correlation matrix is factored for a biplot representation. That means we seek the factorization
\[ {\mathbf R} - \delta {\mathbf J} = {\mathbf R}_a = {\mathbf G} {\mathbf G}', \]
where both \(\delta\) and \(\mathbf G\) are chosen such that the
(weighted) residual sum of squares is minimal. This problem is solved by
using function wAddPCA.
out.wals <- wAddPCA(R, Wdiag0, add = "one", verboseout = FALSE, epsout = 1e-10)
delta <- out.wals$delta[1,1]
Rhat <- out.wals$a%*%t(out.wals$b)
out.eig <- eigen(Rhat)
Fp.adj <- out.eig$vectors[,1:2]%*%diag(sqrt(out.eig$values[1:2]))The optimal adjustment \(\delta\) is 0.071. The corresponding biplot is shown below.
bplot(Fp.adj,Fp.adj,rowch=NA,colch=NA,collab=colnames(R),
xl=c(-1.2,1.2),yl=c(-1.2,1.2),main="WALS adjusted")
circle()WALS biplot of the correlation matrix of the wheat kernel data, with the use of an additive adjustment.
Note that, when calculating the fitted correlation matrix, adjustment \(\delta\) is added back. The fitted correlation matrix and its RMSE are now calculated as:
This gives RMSE 0.0056. This is smaller than the RMSE obtained by WALS without adjustment. We summarize the values of the RMSE of all methods in a table below:
rmsevector <- c(rmse.crg,rmse.lin,rmse.pca,rmse.mds,rmse.pfa,rmse.als,rmse.adj)
methods <- c("CRG (cos)","CRG (lin)","PCA","MDS",
"PFA","WALS R","WALS Radj")
results <- data.frame(methods,rmsevector)
results <- results[order(rmsevector),]
results
#> methods rmsevector
#> 7 WALS Radj 0.005560242
#> 6 WALS R 0.011186190
#> 5 PFA 0.011196885
#> 4 MDS 0.068374694
#> 3 PCA 0.145959040
#> 2 CRG (lin) 0.166755585
#> 1 CRG (cos) 0.243753461A summary of RMSE calculations for four methods (PCA and WALS, both
with and without \(\delta\) adjustment)
can be obtained by the functions FitRwithPCAandWALS and
rmsePCAandWALS; the first applies the four methods to the
correlation matrix, and the latter calculates the RMSE statistics for
the four approximations. The bottom line of the table produced by
rmsePCAandWALS gives the overall RMSE for each methods.
output <- FitRwithPCAandWALS(R,eps=1e-15)
rmsePCAandWALS(R,output)
#> PCA PCA-A WALS WALS-A
#> area 0.0143 0.0542 0.0081 0.0043
#> peri 0.0217 0.0405 0.0088 0.0064
#> comp 0.0316 0.0590 0.0110 0.0054
#> leng 0.0239 0.0485 0.0067 0.0045
#> widt 0.0205 0.0626 0.0076 0.0068
#> asym 0.2769 0.1182 0.0181 0.0049
#> groo 0.0600 0.0675 0.0135 0.0061
#> All 0.1460 0.0706 0.0112 0.0056The scalar adjustment changes the interpretation of the origin, and
it is convenient to show the zero correlation point on each biplot
vector. We do so by creating tally sticks, redoing the biplot on a
ggplot2 canvas. The origin now represents correlation 0.07
for all variables.
Fp.adj.df <- data.frame(Fp.adj,colnames(R))
colnames(Fp.adj) <- c("PA1","PA2")
colnames(Fp.adj.df) <- c("PA1","PA2","strings")
WALSbiplot.adj <- ggbplot(Fp.adj.df,Fp.adj.df,
xlab="Dimension 1",ylab="Dimension 2",
main="WALS adjusted",rowarrow=TRUE,
rowcolor = "blue",rowch="",colch="",
size=3)
WALSbiplot.adj <- ggtally(WALSbiplot.adj,Fp.adj[,1:2],
Fp.adj[,1:2],R,ind=c(1:5,7),
adj=-out.wals$delta[1,1],
values=seq(-0.2,0.8,by=0.2),
dotsize=1.0)
WALSbiplot.adjWALS biplot of the correlation matrix of the wheat kernel data, with additive adjustment and tally sticks.
We generate an asymmetric approximation to the correlation matrix using a column adjustment, based on the decomposition
\[ {\mathbf R} = \delta {\mathbf 1} {\mathbf 1}' + {\mathbf 1} {\mathbf q}' + {\mathbf G} {\mathbf G}' + {\mathbf E}. \]
This decomposition can be obtained with function
FitDeltaQSym.
out.walsq.sym <- FitRDeltaQSym(R,Wdiag0,eps=1e-8)
Gq.sym <- out.walsq.sym$G
rownames(Gq.sym) <- colnames(R)
Rhat.wsym <- out.walsq.sym$Rhat
rmse.wsym <- rmse(R,Rhat.wsym)
rmse.wsym
#> [1] 0.00540002This approximation has an RMSE of 0.0054, a very minor improvement in comparison with using a single scalar adjustment. The corresponding biplot can be obtained with
Gq.sym.df <- data.frame(Gq.sym)
Gq.sym.df$strings <- colnames(R)
colnames(Gq.sym.df) <- c("PA1","PA2","strings")
ll <- 1.5
WALSbiplotq.sym <- ggbplot(Gq.sym.df,Gq.sym.df,
main="WALS-q-sym biplot",
xlab="Dimension 1",
ylab="Dimension 2",
ylim=c(-ll,ll),xlim=c(-ll,ll),circle=TRUE,
rowarrow=TRUE,rowcolor="blue",rowch="",
colch="",size=3)
for(i in c(1:5,7)) {
WALSbiplotq.sym <- ggtally(WALSbiplotq.sym,Gq.sym[,1:2],
Gq.sym[,1:2],R,ind=i,
adj=-out.walsq.sym$delta-out.walsq.sym$q[i],
values=seq(-0.2,1,by=0.2),dotsize=1.0)
}
WALSbiplotq.symWALS biplot of the correlation matrix with column adjustment and tally sticks.
This biplot is similar in appearance to the previous biplot that uses a single scalar adjustment only. However, in principle it no longer has a unique origin, as the origin represents a (slightly) different correlation for each variable:
out.walsq.sym$delta + out.walsq.sym$q
#> area peri comp leng widt asym groo
#> 0.06640039 0.06234622 0.06582835 0.06548268 0.06719080 0.06612880 0.06899342For this data, a single scalar adjustment is preferable for simplicity.
Canonical correlation analysis (CCA) is a classic tool to analyse the correlations between two sets of variables, the \(X\) set and the \(Y\) set. CCA allows the construction of a biplot of the between-set correlation matrix \(R_{xy}\). We again use adjustments by scalar, row and/or column effects to improve the approximation obtained by CCA. We use a data set consisting of three psychological variables (locus, self and motivation), four acadamic achievement variables (read, write, math, science) and gender (female) to illustrate the use of CCA and the various adjustments fitted with an alternating generalised least squares algorithm. We first calculate the between-set and within-set correlation matrices.
data(achievement)
X <- achievement[,1:3]
Y <- achievement[,4:ncol(achievement)]
Rxy <- cor(X,Y)
Rxx <- cor(X)
Ryy <- cor(Y)
round(Rxy,3)
#> read write math science female
#> locus 0.374 0.359 0.337 0.325 0.113
#> self 0.061 0.019 0.054 0.070 -0.126
#> motivation 0.211 0.254 0.195 0.116 0.098
Rxx.inv <- solve(Rxx)
Ryy.inv <- solve(Ryy)Next, we fit all models (CCA plus all adjusted version of CCA with scalar, row and/or column adjustment) and compare their loss and RMSE.
Results <- FitAllModelsRxy(Rxy,Rxx,Ryy,verbose=FALSE,
eps=1e-08,ndim=2)
#> Algorithm converged in 449 iterations.
#> Algorithm converged in 22 iterations.
#> Algorithm converged in 15 iterations.
#> Algorithm converged in 18 iterations.
print(round(Results,6))
#> LOSS GLSRMSE OLSRMSE
#> cca 0.010814 0.026850 0.018816
#> delta 0.000409 0.005220 0.003104
#> row 0.000408 0.005216 0.003038
#> col 0.000000 0.000015 0.000017
#> r&c 0.000000 0.000028 0.000023We first run CCA using the canocor function from the
calibrate package, obtain the canonical correlations and
the biplot coordinates.
out.cca <- canocor(X,Y)
Fp <- out.cca$Fp
Gs <- out.cca$Gs
Gp <- out.cca$Gp
Fs <- out.cca$Fs
diag(out.cca$ccor)
#> [1] 0.4640861 0.1675091 0.1039911We calculate the root-mean-squared error of a two-dimensional
approximation to the between-set correlation matrix with function
rmse.rxy.
Rhat.cca <- tcrossprod(Fp[,1:2],Gs[,1:2])
rmse.cca <- round(rmse.rxy(Rxy,Rhat.cca,R=Rxx.inv,C=Ryy.inv),4)
rmse.cca
#> [1] 0.0269We set up title, axis labels, and dataframes with biplot markers for the rows and the columns, and make a biplot of the between-set correlation matrix.
header <- paste("CCA (",toString(rmse.cca),")",sep="")
fra1 <- 100*out.cca$fitRxy[2,1]
fra2 <- 100*out.cca$fitRxy[2,2]
xlab <- paste("First canonical axis (",
round(fra1,digits=1),"%)",sep="")
ylab <- paste("Second canonical axis (",
round(fra2,digits=1),"%)",sep="")
df.row <- data.frame(Fp[,1:2])
colnames(df.row) <- c("PA1","PA2")
df.row$strings <- rownames(Rxy)
df.row$ve <- c( 2, 1, -0.5)
df.row$ho <- c( 1, 0, 1)
df.col <- data.frame(Gs[,1:2])
colnames(df.col) <- c("PA1","PA2")
df.col$strings <- colnames(Rxy)
df.col$ve <- c( 1.15, -1, -1, 1, 0)
df.col$ho <- c( 0.5, 0.75, 0.5, -0.5, -0.5)
biplotRxy <- ggbplot(df.row,df.col,main=header,xlab=xlab,ylab=ylab,
rowch="",colch="",rowarrow = TRUE, size = 3,
linewidth = 0.1)
biplotRxyCCA biplot of the between-set correlation matrix.
We illustrate the construction of a calibrated correlation scale for
variable science, (ind=4) marking off 0.01, 0.05
and 0.10 increments, and showing the approximations of its correlations
with perpendiculars.
biplotRxy <- ggtally(biplotRxy,Fp,Gs,Rxy,ind=4,
values=round(seq(-0.2,0.8,by=0.01),2),dotsize=0.1,
W=Rxx.inv,dotcolour = "grey",linewidth = 0.1,
dp=TRUE)
biplotRxy <- ggtally(biplotRxy,Fp,Gs,Rxy,ind=4,
values=round(seq(-0.2,0.8,by=0.05),2),dotsize=0.2,
W=Rxx.inv,dotcolour = "dimgrey",linewidth = 0.1,
dp=TRUE)
biplotRxy <- ggtally(biplotRxy,Fp,Gs,Rxy,ind=4,
values=round(seq(-0.2,0.8,by=0.1),2),dotsize=1.0,
W=Rxx.inv,dotcolour = "black",linewidth = 0.1,
dp=TRUE)
biplotRxyCCA biplot of the between-set correlation matrix with calibration of variable science.
The output of FitAllModelsRxy above shows adjustment
with a single scalar brings the RMSE below 0.01. We fit the model with a
scalar adjustment (adjust="delta"), and obtain the
approximation and its RMSE.
out.delta <- FitRxy(Rxy,Rxx.inv,Ryy.inv,
adjust="delta",eps=1e-08,
verbose=FALSE)
#> Algorithm converged in 449 iterations.
out.delta$delta
#> [1] -0.268789
round(out.delta$y,3)
#> read write math science female
#> locus 0.373 0.358 0.343 0.324 0.112
#> self 0.062 0.021 0.045 0.071 -0.123
#> motivation 0.211 0.255 0.190 0.117 0.100
rmse.delta <- rmse.rxy(Rxy,out.delta$y,R=Rxx.inv,C=Ryy.inv)
rmse.delta
#> [1] 0.005220467We construct the biplot according to the adjusted analysis.
header <- paste("CCA-delta (",round(rmse.delta,4),")",sep="")
xlab <- "First adjusted canonical axis"
ylab <- "Second adjusted canonical axis"
Gc <- out.delta$Gc
Fc <- out.delta$Fc
delta <- out.delta$delta
df.col <- data.frame(out.delta$Gc[,1:2])
df.row <- data.frame(out.delta$Fc[,1:2])
colnames(df.col) <- c("PA1","PA2")
colnames(df.row) <- c("PA1","PA2")
df.col$strings <- colnames(Rxy)
df.row$strings <- rownames(Rxy)
df.col$ve <- c( 1.15, -1, -1, 1, 0)
df.col$ho <- c( 0.5, 0.75, 0.5, -0.5, -0.5)
df.row$ve <- c( 2, 1, 1)
df.row$ho <- c( 1, 0, 1)
biplotRxy.delta <- ggbplot(df.row,df.col,main=header,
xlab=xlab,ylab=ylab,
rowarrow = TRUE, rowch="", colch="",
size=3,linewidth = 0.1)
biplotRxy.deltaCCA biplot of the between-set correlation matrix with delta adjustment.
To emphasise the change in interpretation of the origin, we, calibrate all biplot vectors marking the positive part of the correlation scale in blue and the negative part in red.
biplotRxy.delta <- ggtally(biplotRxy.delta,Fc,Gc,Rxy,
linewidth=0.1,ind=1,
values=round(seq(-0.4,0.4,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=delta,
dotcolour = "black",dp=FALSE)
biplotRxy.delta <- ggtally(biplotRxy.delta,Fc,Gc,Rxy,
linewidth=0.1,ind=2,
values=round(seq(-0.4,0.3,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=delta,
dotcolour = "black",dp=FALSE)
biplotRxy.delta <- ggtally(biplotRxy.delta,Fc,Gc,Rxy,
linewidth=0.1,ind=3,
values=round(seq(-0.4,0.4,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=delta,
dotcolour = "black",dp=FALSE)
biplotRxy.delta <- ggtally(biplotRxy.delta,Fc,Gc,Rxy,
linewidth=0.1,ind=4,
values=round(seq(-0.4,0.6,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=delta,
dotcolour = "black",dp=TRUE)
biplotRxy.delta <- ggtally(biplotRxy.delta,Fc,Gc,Rxy,
linewidth=0.1,ind=5,
values=round(seq(-0.4,0.2,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=delta,
dotcolour = "black",dp=FALSE)As an alternative to the RMSE, the goodness-of-fit of the two-dimensional approximation can be calculated as
TSS <- tr(Rxx.inv%*%(Rxy)%*%Ryy.inv%*%t(Rxy))
ESS <- tr(Rxx.inv%*%out.delta$y%*%Ryy.inv%*%t(out.delta$y))
gof2D <- ESS/TSS
gof2D
#> [1] 0.9980534
fitstring <- paste("(",toString(round(100*gof2D,1)),"%)",sep="")biplotRxy.delta <- biplotRxy.delta + annotate("text", x = 0.9, y = 0.9,
label = fitstring, size = 2)
biplotRxy.deltaAdjustment of the columns reduces the RMSE to almost zero (0.000015).
The iteration history (verbose=TRUE), the approximation of
\(R_{xy}\), the RMSE and the
estimations of the column adjustments can be obtained as follows.
out.col <- FitRxy(Rxy,Rxx.inv,Ryy.inv,
adjust="col",eps=1e-08,verbose=TRUE)
#> 3 X-variables and 5 Y-variables.
#> itel 1 loss.old 0.0108141590 loss.new 0.0046605043
#> itel 2 loss.old 0.0046605043 loss.new 0.0018029505
#> itel 3 loss.old 0.0018029505 loss.new 0.0006702195
#> itel 4 loss.old 0.0006702195 loss.new 0.0002455383
#> itel 5 loss.old 0.0002455383 loss.new 0.0000894779
#> itel 6 loss.old 0.0000894779 loss.new 0.0000325442
#> itel 7 loss.old 0.0000325442 loss.new 0.0000118284
#> itel 8 loss.old 0.0000118284 loss.new 0.0000042980
#> itel 9 loss.old 0.0000042980 loss.new 0.0000015616
#> itel 10 loss.old 0.0000015616 loss.new 0.0000005674
#> itel 11 loss.old 0.0000005674 loss.new 0.0000002061
#> itel 12 loss.old 0.0000002061 loss.new 0.0000000749
#> itel 13 loss.old 0.0000000749 loss.new 0.0000000272
#> itel 14 loss.old 0.0000000272 loss.new 0.0000000099
#> itel 15 loss.old 0.0000000099 loss.new 0.0000000036
#> Algorithm converged in 15 iterations.
out.col$y
#> read write math science female
#> locus 0.37356822 0.35887972 0.33727196 0.32462839 0.11340611
#> self 0.06063429 0.01942905 0.05357759 0.06981648 -0.12591984
#> motivation 0.21058993 0.25423008 0.19499482 0.11566035 0.09813196
rmse.col <- out.col$rmse.approximation
rmse.col
#> [1] 1.547339e-05
ce <- out.col$ce
ce
#> read write math science female
#> 0.03217693 0.03821020 0.03154600 -0.00584184 -0.07241002The column adjustments are relatively small, and similar for
read, write and math. For simplicity, the
previous adjustment by a single scalar may be preferred, which already
has its RMSE close to zero (0.0052). We nevertheless show how a biplot
for the column-adjusted analysis can be obtained. For calibration, the
adjustment argument adj of function ggtally
should now be set to the estimated column adjustment for each
corresponding variable.
header <- paste("CCA-col (",round(rmse.col,6),")",sep="")
xlab <- "First adjusted canonical axis"
ylab <- "Second adjusted canonical axis"
Gc <- out.col$Gc
Fc <- out.col$Fc
df.col <- data.frame(out.col$Gc[,1:2])
df.row <- data.frame(out.col$Fc[,1:2])
colnames(df.col) <- c("PA1","PA2")
colnames(df.row) <- c("PA1","PA2")
df.col$strings <- colnames(Rxy)
df.row$strings <- rownames(Rxy)
df.col$ve <- c( 1.15, -1, -1, 1, 0)
df.col$ho <- c( 0.5, 0.75, 0.5, -0.5, -0.5)
df.row$ve <- c( 2, 1, 1)
df.row$ho <- c( 1, 0, 1)
biplotRxy.col <- ggbplot(df.row,df.col,main=header,
xlab=xlab,ylab=ylab,
rowarrow = TRUE, rowch="", colch="",
size=3,linewidth = 0.1)
biplotRxy.col <- ggtally(biplotRxy.col,Fc,Gc,Rxy,
linewidth=0.1,ind=1,
values=round(seq(-0.2,0.6,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=ce[1],
dotcolour = "black",dp=FALSE)
biplotRxy.col <- ggtally(biplotRxy.col,Fc,Gc,Rxy,
linewidth=0.1,ind=2,
values=round(seq(-0.2,0.6,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=ce[2],
dotcolour = "black",dp=FALSE)
biplotRxy.col <- ggtally(biplotRxy.col,Fc,Gc,Rxy,
linewidth=0.1,ind=3,
values=round(seq(-0.2,0.4,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=ce[3],
dotcolour = "black",dp=FALSE)
biplotRxy.col <- ggtally(biplotRxy.col,Fc,Gc,Rxy,
linewidth=0.1,ind=4,
values=round(seq(-0.2,0.6,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=ce[4],
dotcolour = "black",dp=TRUE)
biplotRxy.col <- ggtally(biplotRxy.col,Fc,Gc,Rxy,
linewidth=0.1,ind=5,
values=round(seq(-0.2,0.4,by=0.10),2),
dotsize=1.0,W=Rxx.inv,adj=ce[5],
dotcolour = "black",dp=FALSE)
ESS <- tr(Rxx.inv%*%out.col$y%*%Ryy.inv%*%t(out.col$y))
gof2D <- ESS/TSS
gof2D
#> [1] 0.9999522
fitstring <- paste("(",toString(round(100*gof2D,3)),"%)",sep="")
biplotRxy.col <- biplotRxy.col + annotate("text", x = 0.9, y = 0.9,
label = fitstring, size = 2)
biplotRxy.colCCA biplot of the between-set correlation matrix with column adjustment.