_{1}

Multilevel CFA models (MLV CFA) modeling permits more sophisticated construct validity research by examining relationships among factor structures, factor loadings, and errors at different hierarchical levels. In the MLV CFA models, the latent variable or variables have two kinds of elements: 1) the between-group elements (Level 2 or higher level) and 2) the within-group elements (Level 1 of lower level). The between-group elements represent the general part of the model and the within-group element the individual part. The within-level variation includes an individual-level measurement error variance, which generally expands the impact of the within-level variation to the intraclass correlations. Multilevel CFA therefore generates results corresponding to those generated by perfectly reliable measures. If the same measurement model is specified across levels, by defining each item loading to be invariant with its across-level counterpart, the researcher can equate the factor scales across levels. Thus, the factor variances at different levels are directly comparable. The fit of this constrained MLV CFA model can be evaluated by comparing it with an unconstrained model specified with freely estimated factor loadings at each level. In the present work the steps of the above procedure are fully described and additional issues relevant to the use of MLV CFA are discussed in detail.

In all the analyses carried out in the social and behavioral research field, data are organized at a single level. Nevertheless, real world data are frequently structured in multiple levels. These data structures are called hierarchical^{1}. Such hierarchical structures are also termed nested data or clustered data ( Byrne, 2012). This means that some variables are clustered or nested within other variables ( Field, 2013; Geiser, 2013; Nezlek, 2011). For example, to study the attachment type of a child to its mother ( Bowlby, 1969, 1973; Ainsworth, 1978), a researcher studies the mother-infant relationship of 300 infants in 100 families. The infants are nested in families. Infants are the first level of analysis and families are the second level. Lower-level units are also called micro level and higher-level macro level units ( Heck & Thomas, 2015; Geiser, 2013). Macro-level variables are alternatively called groups or contexts ( Kreft & de Leeuw, 1998; Heck & Thomas, 2015). Infants grew up in different family environments. Therefore, the researcher expects that they will have different attachment types. Generally, research in psychology deals with designs about individuals acting within a context, like families in the previous example, or schools (see Byrne, 2012; Geiser, 2013), organizations (see Brown, 2015; Darlington & Hayes, 2017) or neighborhoods (see Tabachnick & Fidell, 2013). The family in the above example is a contextual variable ( Field, 2013) that multilevel modeling analysis allows to be taken into consideration ( Hox, 2013; Loehlin & Beaujean, 2017). Models used to analyze clustered data are called Multilevel Models, Hierarchical Linear Models, Random Coefficient Models, or Mixed Models ( Geiser, 2013; Field, 2013 among many others). Multilevel models are not a new conceptualization (cf. hierarchical linear models; Raudenbush & Bryk, 2002; Bickel, 2007; as quoted by Brown, 2015). However, only in recent decades, they were efficiently incorporated in CFA (c.f. Muthén, 1994, 1997, 2004; Brown, 2015).

The purpose of this study is to describe the procedure of Multilevel Confirmatory Factor Analysis Modeling (MLV CFA, Byrne, 2012), i.e., how to incorporate the multilevel approach into a CFA model.

Multi-level models are a category of statistical techniques for studying hierarchically structured data-sets where the scores 1) are nested into larger units (clusters) and 2) each cluster may be dependent from the other. For example, repeated measurements generate inherently hierarchical datasets with multiple scores clustered within each respondent ( Kline, 2016: p. 444). Similarly, Selig, Card, and Little (2008) commented―as reproduced by Byrne (2012)―that any model representable as a multigroup SEM, can also be specified as a multilevel SEM/CFA, if the data are hierarchically clustered.

This relative delay of MLV CFA modeling could be attributed to the powerlessness of the older CFA software packages to deal with the inherent complexities of MLV CFA effectively, e.g. with the computation of separate covariance matrices for sampling units and the use of robust estimators ( Heck & Thomas, 2009; Hox, 2002; McArdle & Hamagami, 1996 as quoted by Byrne, 2012). In MLV EFA and MLV CFA, both direct and indirect effects are considered simultaneously before the assessment of the overall model fit, thus they are very flexible ( Hox, 2013). For a comparison of the multilevel design to the cross-sectional design see

A two-level structure (like in

Multilevel Design | Crossed Design | ||||||
---|---|---|---|---|---|---|---|

Parents (Level 2) | A | B | Parents (Level 2) | A | B | ||

Children (Level 1) | 1 | 2 | 3 | 4 | Children (Level 1) | 1 2 3 4 | 1 2 3 4 |

Source. Adapted by Schumacker & Lomax, 2016, page 195-196.

variables, possibly reflecting differences in social and economic status or/and culture (see also Field, 2013 and Byrne, 2012 for analogous examples).

To use another common example (e.g., Field, 2013; Kline, 2016; Hox, 2013; Geiser, 2013), let us assume that a sample includes 3000 students who attend 20 different schools. Scores from students (1^{st} level) attending the same classroom (2^{nd} level) may also not be independent and scores from students enrolled in the same school (3^{rd} level) may not be independent as well. This is likely to happen because students of the same classroom are affected by similar influences like the teacher’s character and peer’s behavior. According to Kline (2016) students of the same school could equally be influenced by school staff, school discipline frameworks, curriculums established, the number and nature of midterm exams and the like. Depending on the sampling design, there could be additional higher levels, e.g., schools, districts, cities, states, countries ( Geiser, 2013).

This case similarity because of common contextual influences in clustered sampling is problematic because it violates two core assumptions of quantitative measurement: 1) that all cases are independent, and 2) that all random errors of cases are also independent, normally distributed, and homoscedastic ( Byrne, 2012). By definition these assumptions are made by traditional statistical approaches like Ordinary Least Squares (OLS) regression analysis and Analysis of Variance ( Cohen, Cohen, West, & Aikem, 2003; Geiser, 2013; Field, 2013). Therefore, using conventional statistical approaches to analyze clustered data may lead to biased results ( Geiser, 2013). Specifically, a vital reason for MLV use is the correct estimation of standard errors or the assignment of probability weights in complex sampling designs ( Kline, 2016; Kelloway, 2015; Brown, 2015; Field, 2013). The bias, Kline (2016) continues, arises because standard errors are denominators of significance tests, and when underestimated the null hypothesis is often rejected, as p values of the statistical significance tests will often be too small ( Geiser, 2013). Moreover, clustering can lead to overestimation of the effective sample size. This would introduce biased statistical inference by an increase in the alpha error rate ( Cohen et al., 2003; Snijders & Bosker, 1999; Geiser, 2013). Muthén and Satorra (1995) argue that the more similar the individuals within groups are, the more biased the parameter estimates, standard errors, and related tests for significance will emerge (as reproduced by Byrne, 2012).

An additional reason, multilevel structure should not be overlooked is that interactions of variables at different levels are often of central research interest ( Geiser, 2013), see

Although multilevel modeling was introduced to study individuals within groups, the method was extended to repeated measures data (like in longitudinal designs). Thus, measurement occasions (termed also time points) are nested within individuals ( Bryk & Raudenbush, I987; Goldstein 1987; Singer & Willett, 2003; Geiser, 2013). Multilevel modeling of longitudinal data is a powerful approach, because it offers many possibilities for the metric treatment of time points, dealing effectively with missing data from dropouts and panel attrition ( Hox, 2013). Crucially, Structural Equation Models is more flexible approach than the traditional multilevel regression models additionally because regression models are based on unrealistic assumptions, e.g. that predictor variables are perfectly reliable. Structural equation models do not assume perfect reliability of variables, because they can specify a measurement model for the predictor or

Between groups | Model for the group-level structure (second level). This term gets progressively rather vague as higher levels are added, it is suggested to be combined with an explicit indication of the grouping element of each level (i.e., class or school level). |
---|---|

Cross-level interaction | Higher level variables can directly influence lower level variables. This is usually displayed by an interaction between higher-level and lower-level variables. |

Fixed effect, fixed Coefficient | Factor loadings and path coefficients that do not change across the units of the higher level. |

Intraclass Correlation | Used to examine population similarity of the individuals of the same group. It is also a measure of the amount of population variance at the group level. |

Multilevel model | A model comprising variables at multiple levels of a hierarchically structured population. Also called hierarchical model. |

Random effect, random coefficient | Factor loadings and path coefficients that do not change across the units of the higher level. |

Variance Component | Variances and covariances of the changeable coefficients. |

Within groups | Model representing the structure at the lowest level, i.e. the individual (first) level. |

Source. Hox (2013: p. 292).

・ Multilevel models properly account for the hierarchical data structure causing data dependencies. ・ Multilevel modeling methodology overcomes standard error bias due to clustering that generating inflated Type-I error rates and inaccurate confidence intervals. ・ Multilevel models permit analyzing variables at different levels taking into account cross-level interactions. ・ Multilevel analysis is a more flexible method requiring fewer assumptions than other statistical methods such as, repeated measures of A NOVA. |
---|

Source. Geiser, 2013, page 197.

outcome variables. Additionally, they can model more complicated interactions, like indirect effects of mediation analysis ( Hox, 2013).

Conventional SEM/CFA software can estimate two-level models by treating the two levels as two groups ( Muthén, 1994). Mehta and Neale (2005) described in detail how multilevel models can be incorporated in SEM/CFA. However, because using conventional SEM/CFA software requires complicated model specifications, recent versions of most SEM software packages (EQS, Bentler, 2005; LISREL, Joreskog & Sorbom, 1989, 1993; Mplus, Muthén & Muthén, 1998-2012; and Stata, StataCorp, 2015). Some extensions of this approach permit the use of categorical and ordinal data, incomplete data, and >2 levels ( Hox, 2013; Kline, 2016). These new capabilities are summarized next. For more detailed applications of the MLV approach in the literature please refer to Dedrick and Greenbaum (2010, 2011); Dyer, Hanges, and Hall (2005); Kaplan and Kreisman (2000); and J. Little (2013), Byrne (2012), Heck & Tomas (2015) and Brown (2015). See a summary of the main advantages of MLV in

One important feature of multilevel modeling is the flexibility to decide whether the effects of micro-level variables are fixed to be the same across macro-level research units (called a fixed effect), or are permitted to vary―called a random effect ( Darlington & Hayes, 2017). Thus, random coefficients are parameters in a model that vary across clusters. Covariates could be included in a multilevel model to represent variability within and between clusters. To elaborate the example of classrooms nested within schools further, (see Brown, 2015 for a similar example), a multilevel regression model could examine, e.g., if a student’s gender is a significant predictor of achievement in verbal ability. Gender would be a within-level effect (Level 1 or Micro level) because gender is a characteristic of individuals and the gender covariate illustrates variation in verbal achievement among individuals. An example of a between-level effect (Level 2), the age of the teacher (a classroom variable) may illustrate variability in oral achievement across classrooms. Thus, the effect of gender in oral achievement is a random slope (the slope varies across clusters) and the level 2 covariate of teacher age explained the variability of this coefficient across clusters/classrooms ( Hox, 2010, 2013; Brown, 2015).

To return to the previous example of students nested within schools, this multilevel structure suggests that the total covariance matrix, Σ, would be divided into a within-covariance matrix Σ_{W} and a between-covariance matrix Σ_{B}. The Σ_{W} matrix contains covariances at the individual level (i.e., individual score differences in oral achievement) and their correlates accounting for variation across schools. In contrast, the Σ_{B} matrix represents covariation at the school level (i.e. differences across schools in the teaching experience and age of the teaching stuff). The Σ_{W} and Σ_{B} covariance matrices can either have similar or totally different factor structures ( Byrne, 2012). For each student in Level 1, the total score comprises a Level 1 component accounting for the individual deviation from the group mean and a Level 2 component accounting for the disaggregated school group mean. This individual composition allows separate calculation of within- and between-group covariance matrices ( Heck, 2001; Hox, 2002 as quoted by Byrne, 2012). The related effects are defined within-cluster effects and between-cluster effects ( Bentler, 2005). If a mean structure is necessary, it is used to illustrate the between-group means ( Byrne, 2012).

In two-level structures, the observed individual-level variables are calculated by the following within and between equations:

y W = Λ W η W + ε W (1) (within level)

μ B = μ + Λ B η B + ε B (2) (between level)

μ = vector of between-level means

Λ_{W} = within-level factor loading matrix

Λ_{B} = between-level factor loading matrix

η_{W} = within-level factor

η_{Β} = between-level factor

ε_{W} = within-level indicator residual variance

ε_{Β} = between-level indicator residual variance

( Hox, 2013: p. 287; Brown, 2015: p. 421)

In the first equation the within-groups variation is represented. The second equation denotes the between-groups variation and the group level means while the factor loading matrices (Λ_{W}, Λ_{B}) and cluster-level means μ are considered fixed effects ( Brown, 2015). Importantly, μ_{B} represents the random intercepts of the X variables that are the focus of the between-level means. By their combination Equation (3) is obtained:

X i j = μ + Λ W η W + Λ B η B + ε B + ε W (3)

μ = vector of between-level means

Λ_{W} = within-level factor loading matrix

Λ_{B} = between-level factor loading matrix

η_{W} = within-level factor

η_{Β} = between-level factor

ε_{W} = within-level indicator residual variance

ε_{Β} = between-level indicator residual variance

( Hox, 2013: p. 288; Brown, 2015: p. 422)

Equation (2) is similar to equations used by random intercept regression models (except for symbols), with the loadings in the place of fixed regression coefficients and the factor matrices and a level-one and level-two error term. By allowing variation at the group-level factor loadings, this model is a generalized random coefficient model. The model in Equation (3) is a two-level factor model. If we add structural relationships between the latent factors at both levels, a multilevel SEM/CFA with two levels derives ( Hox, 2013; Brown, 2015).

Multilevel models can be employed to analyze both EFA and CFA models. Actually, the within and between levels might have different number of latent variables, because, applied research suggests that typically fewer factors emerge at the between levels than at within levels because the variability across groups is lower than among individuals. Any CFA parameter (like factor loadings, or indicator intercepts) might be handled like a random coefficient, if justifiable by substantive theory and are based on empirical basis ( Brown, 2015). Additionally, more complex data structures like cross-classifications, multiple-memberships or covariates are only few of the possible extensions of the basic CFA models developed (see Goldstein & Browne, 2005; Byrne, 2012).

Another feature of the multilevel CFA modeling is the disintegration of the total variance (Ψ) of the latent variables into the part attributed to between-cluster variation (Ψ_{B}) and the part attributed to within-cluster variation (Ψ_{W}). Based on these variances, the intraclass correlation (ICC) for the indicators can be estimated as:

ICC = Ψ B Ψ B + Ψ W (4)

( Finch & Bolin, 2017: p. 237)

ICC values can range from 0.0 to 1.0 ( Byrne, 2012). Generally, if the ICCs are all small, e.g., <0.05, the between-group variance is low and possibly there is no need to specify an MLV CFA model ( Hox, 2013; Brown, 2015). Muthén (1997) noted―as reproduced by Byrne (2012)―that while ICC values usually range from 0.00 to 0.50 ICC values of 0.10 or larger, for a group size of 15 or larger suggest that MLV data should definitely be modeled. However, Julian (2001) and Selig et al. (2008) cautioned that even with ICC < 0.10, the hierarchical structure of the data should be taken into account ( Byrne, 2012). Mehta and Neale (2005) proposed a method to compare the factor variances at levels 1 and 2. Specifically, (as reproduced by Finch & Bolin, 2017 and Heck & Thomas, 2015) the factor loadings across levels must be invariant. Thus, the loadings for each indicator at level 1 are constrained to be equivalent to the corresponding loading at level 2.

Multilevel CFA models are evaluated in multiple steps ( Hox, 2013). Byrne (2012) states that three different methods emerged over the years. The first was a method proposed by Muthén (1994) initially containing four phases with the MUML as an estimator. Muthén (1989, 1990, 1991, 1994) simplified the multilevel data analysis by using conventional SEM software by computing separate within and between-groups covariance matrices, which are orthogonal (uncorrelated) and additive ( Heck & Thomas, 2015). However, as Byrne (2012) comments, elaboration of the MLV modeling estimation―moving from MUML to FIML―plus the evolution of statistical software used ( Kaplan et al., 2009) and Bayesian methods of estimation ( Heck & Thomas, 2015) inevitably altered the original methodology proposed by Muthén (1994). Specifically, Byrne (2012) explains that some phases (2 - 4) were unified (c.f. Mplus, 1998-2012). The second method was proposed by Hox (2002) and it tests the fundamental assumptions of MLV modeling by establishing benchmark models. Finally, the third method was developed by Mehta and Neale (2005) and is based on a process of 3 phases of fitting the univariate random intercepts to the data. The Hox (2002) method is described as the most uncomplicated to carry out ( Selig et al., 2008; Byrne, 2012), but the Muthén (1994) approach is still the most frequently used ( Cheung & Au, 2005; Byrne, 2012). See Byrne (2012) for details. A brief description of the steps of the most widely used method proposed by Muthén (1994) or the general-specific method ( Heck & Thomas, 2015) follows.

The Steps of the method

The following steps were described by Hox (2013) for regression and SEM models and they were further detailed for CFA models by Brown (2015) and by Heck and Thomas (2015), Byrne (2012) and Finch and Bolin (2017). The following three steps are suggested for the estimation of a two-level model with the within-structure fully specified ( Hox, 2013; Brown, 2015). This method was originally proposed by Muthén (1994) using MUML estimator. However, as Byrne (2012) comments, subsequent elaboration of MLV modeling―from MUML estimator to FIML estimator―plus evolution of statistical software simplified the method. A two-level model can be analyzed following three steps.

・ Step 1: The intraclass correlations of the indicators are first examined (ICCs) of the indicators to examine group-level properties, i.e., how much variance in the indicator is explained by group membership ( Shumacker & Lomax, 2016). In other words, to examine the extent of individual scores dependency within groups due to similarities of individuals ( Field, 2013; Brown, 2015, Byrne, 2012; Tabachnick & Fidell, 2013; Kalaian & Kasim, 2007). The higher the ICC, the more score variance is attributed to the stratification or cluster (grouping variable). Using a design effect to estimate the difference between a multi-level nested design is possible as compared to a simple random sample ( Shumacker & Lomax, 2016). As an alternative, using different non-hierarchical methods is possible, that do allow for a certain minor dependency in the data ( Brown, 2015; Muthén & Muthén, 1998-2012). If the between-group variances are substantial, then the between structure is necessary to be taken into account ( Hox, 2013).

・ Step 2: Then the data of the within structure is analyzed (Level 1). At this level (the individual level) a standard CFA is used ( Hox, 2013) to ensure a viable measurement model at the within level with the between level unstructured ( Brown, 2015), or (beyond CFA) more generally statistical techniques for clustered samples (cf. de Leeuw, Hox, & Dillman, 2008). First, we carry out a CFA to test the validity of the hypothesized structure based on the covariance matrix of the full sample, without taking into account the data hierarchy. If model modifications suggested by MIs are supported by substantive theory, the model can be re-specified accordingly to include additional parameters used for the individual level only ( Byrne, 2012: p. 355). The fit of the model is then examined with conventional fit criteria (e.g., Hu & Bentler, 1999) and if satisfactory the researcher proceeds to the next step. As a rule, fit indices used are ( Byrne, 2012): χ^{2}, Comparative Fit Index (CFI; Bentler, 1990), Root Mean Square Error of Approximation (RMSEA; Steiger & Lind, 1980), and Standardized Root Mean Square Residual (SRMR).

・ Step 3: if an acceptable measurement model emerges, the final step is to examine the between-level factor structure (Level 2) with the within-level factor structure (Level 1) completely modeled ( Hox, 2013; Brown, 2015). Many MLV models with latent variables found in literature, but few of them are psychometrically oriented ( Dedrick & Greenbaum, 2010; Byrne 2012). With an adequate fit for the single-level CFA model, then the factor structure of both individual and group level-data are tested simultaneously. Analyses can be based on robust Maximum likelihood (MLR; Muthén & Muthén, 1998-2012) estimator. However, in this step an error message may occur related to the higher level of the model. That is the higher level of the model must be overidentified for the model to be estimated properly. Specifically, error messages occur due to the usually small sample of the higher level. Unluckily, even when estimated parameters at the higher level are adequate (i.e., the model is over-identified), the same error message may again appear (c.f. Byrne, 2012). Note that by using the variance―covariance formula [p (p + 1)/2]) estimating the number of variance―covariance parameters when a group level is added is possible. However, in MLV CFA the number of variance―covariance parameters doubled and the k intercept parameters estimated at Level-2 are added ( Heck & Thomas, 2015). If presented with persistent error messages Byrne (2012) proposes to consider carrying out the MLV CFA analysis using the MUML estimator instead of the MLR. Note however, that MUML cannot handle deviations from multivariate normality. According to studies on the MUML ( Hox & Maas, 2001; Yuan & Hayashi, 2005) when using MUML the likelihood of inadmissible solutions is greater if the sample size at the higher level is less than 50 (quoted in Byrne, 2012). In an admissible solution, according to Hox and Maas (2001) as reproduced by Byrne (2012), as a rule the factor loadings are generally accurate, but the residual variances and the standard errors may be underestimated.

If the estimation of the model will produce no errors the initial information examined is the following: (a) model summary results like the number of clusters in the analysis and the average cluster size, and (b) the ICCs pertinent to each of the observed variables. if’s the ICCs of the observed variables calculated in this step based on the simultaneous analysis at both levels are >0.10 (see Muthén, 1997 and Byrne, 2012), then the continuation of MLV analysis is supported ( Julian, 2001; Byrne, 2012). Model Fit is evaluated by the following measures ( Byrne, 2012): Chi-Square Test of Model Fit, Comparative Fit Index (CFI; Bentler, 1990), Tucker Lewis Index (TLI; Tucker & Lewis, 1973), Root Mean Square Error of Approximation (RMSEA; Steiger & Lind, 1980), and Standardized Root Mean Square Residual (SRMR) for the within model, Standardized Root Mean Square Residual (SRMR) for the between model ( Byrne, 2012). Akaike’s Information Criterion (AIC; Akaike, 1987), and the Bayesian Information Index (BIC; Raftery, 1993; Schwartz, 1978) can also be used for MLV model fit comparison ( Heck & Thomas, 2015). Crucially, even if model fit is acceptable, the estimated parameters must be examined as well to decide if the model is acceptable (i.e. significant factor loadings and relatively low measurement errors). These goodness-of-fit indices apply to the entire model. Specifically, they show to what extend the model fits the within-group model data and of the between-group model. Moreover, the likelihood function can be used for the calculation of the deviance statistic by multiplying with −2 (−2LL log likelihood function^{2}), where the log is the natural logarithm and likelihood is the value of the likelihood function at convergence ( Heck & Thomas, 2015). Generally, models with lower deviance show better fit than models with higher deviance ( Hox, 2002; Heck & Thomas, 2015).

Consider an example (

The path diagram in

Heck & Thomas, 2015). On the between levels, the single factor (F_{B}) is specified to account for the variation and covariation among these random intercepts ( Brown, 2015). For a similar applied example, the readers can refer to Brown (2015). For instructions on how to extend the CFA model to three levels readers can refer to Heck & Thomas (2015).

Brown (2015) notes that the following parameters are freely assessed ( Muthén & Muthén, 1998-2012): factor variances at both levels, fixed intercepts at the between-level and indicator residual variances at both levels ( Brown, 2015). By default, the latent-variable means and covariances of the residuals are fixed to zero at both levels. Note that the magnitudes of the variances of the parental satisfaction factors at both levels are not directly comparable unless the factors have a common metric. If the within and between levels have the same measurement model, the equality of factor loadings across levels can be tested. The metrics of the within-and between-level factors will be equated if the factor loadings are equivalent. Therefore, factor variances will also be directly comparable ( Mehta & Neale, 2005; Brown, 2015). However, if there is no common scale of measurement across levels, the magnitude of the factor variances at each level is not directly comparable ( Mehta & Neale, 2005; Heck & Thomas, 2015). Consequently, establishing a common scale of measurement across levels is often useful ( Heck & Thomas, 2015). Alternatively, Byrne (2012) follows the same procedure described above by omitting the initial calculation of the ICC. A second differentiation of the applied example proposed by Byrne (2012) is the inclusion of a different measurement model across levels. Finally, Byrne comments that ideally, to get a more accurate result description the model fit must be evaluated separately for each of the two levels. This procedure is described next, along with other important issues in the MLV CFA.

Model Estimation

During early MLV SEM modeling―as Byrne (2012) describes―the parameter estimation was carried out mainly by full information maximum likelihood estimation (FIML) adjusted for multilevel data (MUML), and it was proposed by Muthén (1994). More recent advancements in SEM research brought about important refinements in ML estimation and MLV modeling ( Heck & Thomas, 2009; Kaplan et al., 2009). These newer estimation methods can be distinguished based on their approach to the computation of standard errors. The first of these methods is based on the MLF estimator; the second is based on the usual ML estimator on second-order derivatives and the third is based on the MLR estimator, which is robust to nonnormality but also permits MLV analyses based on unbalanced groups. Given these new possibilities it was suggested that MUML estimator may no longer be needed ( Yuan & Hayashi, 2005; Byrne, 2012). Obviously, these estimation options increased SEM MLV modeling flexibility adding computational power ( Heck & Thomas, 2009). However, Byrne (2012) showed that MUML could be useful in case of errors generated during model estimation, typically caused by small sample size at levels > 1.

Model fit evaluation

As Finch and Bolin (2017) argue, fit statistics―maybe except Standardized Root Mean Square Residual (e.g. in Mplus; Muthén & Muthén, 1998-2012)―typically present combined model fit information about both levels (also Byrne, 2012). Usually points at Level 1 are greater than those of Level 2, fit indices primarily measure the level 1 model fit ( Ryu & West, 2009; Byrne, 2012). Stapleton (2013) provides instructions on separate model fit evaluation at each level, reproduced here based on Finch and Bolin (2017).

First, the Chi-square model fit statistics are calculated for Level 1 baseline models. This process is repeated for Level 2 baseline models. To calculate the baseline value for the level 1 part of the model, the covariances of the observed indicators are constrained to 0 at Level 1 and they are freely estimated at Level 2. By this method, the baseline Chi-square fit statistic is obtained. Similarly, to estimate the baseline value for the level 1 part of the model, the covariances of the observed indicators are constrained to 0 at Level 2 and they are freely estimated at Level 1. See an example of the path diagram of an MLV CFA Model with two factors in

Following Stapleton’s (2013) steps for calculating fit statistics at each level separately, a saturated model specified at level 2 (i.e. with a perfect fit at that level), with the level 1 model fully specified. The resulting Chi-square fit statistic is then examined. Using an equation described by Ryu and West (2009), the comparative fit index (CFI) for the level 1 part of the model is then obtained. Likewise, we can obtain the level 2 CFI value in a comparable method, i.e. by obtaining

the Chi-square goodness of fit statistic for the level 1 saturated model. The fit at level 2 is then examined following typical CFA guidelines (e.g. Hu & Bentler, 1999; Brown, 2015; Kline, 2016). Based on previous analyses the CFI values at levels 1 and 2 are both examined. The SRMR is also examined providing model fit information at both levels separately. From these results, we can decide if model fit at each level is acceptable. If the model shows a good fit to the data, the evaluation of model parameters comes next.

Moreover, to estimate the amount of variance of the observed indicators, attributed to each data level, the same latent structure at each level must be specified and the factor loadings must be constrained to equality at both levels ( Mehta & Neale, 2005; Brown, 2015; Finch & Bolin, 2017; Heck & Thomas, 2015). In order to obtain the ICC for each factor, we would employ Equation 4 above, using the factor variances resulting from this constrained model ( Finch & Bolin, 2017).

Sample Size

ML estimation is notorious for requiring large sample sizes, and this is also true for MLV CFA ( Heck, 2001; Hox, 2002; Hox & Maas, 2001; Muthén, 1994; Yuan & Bentler, 2002, 2004; Byrne, 2012). As a rule, in multilevel modeling, the sample size of the highest level is generally of primary importance, because the higher level sample sizes are smaller than the lower level sample sizes ( Hox, 2013). A minimum sample size of 60 for the highest level was recommended by Eliason (1993) when using ML as an estimator ( Hox, 2013). However, Maas and Hox (2005) set this value to at least 100 groups, although for uncomplicated models, even 50 groups may also suffice. Although these recommendations were initially made for regression models, multilevel regression accuracy of higher level variances also applies to SEM and CFA models, because multilevel SEM is also based on the within-group and between-group covariance matrices ( Hox, 2013; Hox, Maas, & Brinkhuis, 2010).

Finally, an unequal sample size at each level is not a problem for the estimation of the model, because unequal sample sizes are assumed by FIML (MLR or MLM). However, the interpretation of model fit indicators must be made cautiously. Also in longitudinal models, missing values commonly attributed to missing occasions or panel dropout can be easily handled. Van Buuren (2011) has elaborated on incomplete multilevel data ( Hox, 2013). For in depth analysis of the theoretical background and applied examples of MLV modeling the following resources are referred: Bovaird (2007), Heck (2001), Hox (2002, 2010), Kaplan et al. (2009), Little et al. (2000), Reise and Duan (2003), Selig et al. (2008), Hoffman (2007), Byrne (2012), Heck & Thomas (2015) and Finch & Bolin (2017). Typically, when MLV CFA is carried out to establish construct validity additional analyses are required ( Byrne, 2012), that are beyond the scope of this work. However, a detailed description a multi-phased method of construct validity was provided by Kyriazos (2018) or applied examples by Kyriazos, Stalikas, Prassa, Yotsidi (2018a, 2018b), Kyriazos, Stalikas, Prassa, Yotsidi, Galanakis, Pezirkianidis (2018) and Kyriazos, Stalikas, Prassa, Galanakis, Yotsidi, Lakioti (2018). Some specialized applications of MLV modeling within the SEM framework are recommended by SEM experts ( Byrne, 2012). Specifically, for information on longitudinal analyses and/or latent growth curve modeling, Byrne refers readers to Chen, Kwok, Luo, and Willson (2010); Chou, Bentler, and Pentz (1998); Ecob and Der (2003); Hung (2010); Jo and Muthén (2003); Kwok, West, and Green (2007); MacCallum and Kim (2000); and Muthén, Khoo, Francis, and Boscardin (2003).

Multilevel modeling is an extremely complicated topic. We can only skim the surface of these intriguing sets of methods ( Tabachnick & Fidell, 2013; Field 2013). The need for multilevel modeling arose because data are sometimes collected from people or other units that are “nested” in some fashion under different higher-level research units ( Darlington & Hayes, 2017). These hierarchical models explicitly model lower and higher levels by taking into account the interdependence of individuals within each sample group. In MLV CFA analysis, biases in parameter estimates, standard errors, and tests of model fit emerge if the hierarchical structure is ignored, additionally, if the nonindependence of the observations and the standard errors of parameter estimates may be underestimated, resulting in positively biased statistical significance testing ( Brown, 2015).

The MLV CFA procedure assumed that latent factors contain between- and within-group elements. The between-group element is typically the general part and the within-group element is the specific part of the model. During this process, if factor loadings are constrained to be invariant with its counterpart, factor loadings on the different level provide a way to equate factor scales across levels thus, enabling the direct comparison of factor variances across levels ( Mehta & Neale, 2005; Heck & Thomas, 2015; Brown, 2015; Finch & Bolin, 2017).

Alternatively, a different method can be used for evaluating clustered data and this is the hierarchical factor model ( Bauer, 2003; Curran, 2003; Harnqvist, Gustafsson, Muthén, & Nelson, 1994; Mehta & Neale, 2005; cited in Heck & Thomas, 2015). During this procedure, the assumption of invariant factor loadings across levels is examined but additionally, the assumption of zero variability of the observed indicators at the cluster level is also evaluated ( Mehta & Neale, 2005; Heck & Thomas, 2015). If these two assumptions are true, a hierarchical factor model emerges, where latent variables at the individual level define the latent factor at the higher level ( Mehta & Neale, 2005; Heck & Thomas, 2015).

During the estimation of multilevel CFA models, problems frequently arise. Specifically, due to the smaller sample size, or the necessity of a more parsimonious structure, the between-group structure may be more prone to errors during model estimation. Researchers are also in debate whether missing data are a problem ( Heck & Thomas, 2015) or not ( Hox, 2013). The MLV CFA is carried out in several steps. If a problem occurs, defining model starting values to boost the iteration of the software to a solution may be necessary ( Heck & Thomas, 2015). Alternatively, it could help to specify the model progressively, e.g. by defining one factor at a time within the multilevel model. Sometimes this could help to learn exactly where the problem lies. Anyhow, patience is a valuable strength of character when carrying out a multilevel CFA ( Heck & Thomas, 2015)!

The author declares no conflicts of interest regarding the publication of this paper.

Kyriazos, T. A. (2019). Applied Psychometrics: The Modeling Possibilities of Multilevel Confirmatory Factor Analysis (MLV CFA). Psychology, 10, 777-798. https://doi.org/10.4236/psych.2019.106051