Is the AAQ-II that bad?

Background and objectives

The Acceptance and Action Questionnaire-II (AAQ-II; Bond et al., 2011) has been heavily criticized in recent literature based on factor-analytic studies that question its discriminant validity. Several studies have suggested that the AAQ-II may primarily measure rumination about negative emotions rather than genuine psychological flexibility. These criticisms have prompted substantial debates about whether the AAQ-II should be reinterpreted or discontinued in scientific research and clinical practice.

The primary objective of this study was to examine the discriminant validity of the Spanish-language version of the AAQ-II through novel analytical approaches, comparing its psychometric properties with measures of negative emotional symptoms. The researchers sought to clarify whether the AAQ-II measures a unique and independent construct or whether it overlaps substantially with measures of depression, anxiety, and stress.

Method

Participants

The study included three different samples:

Study 1:

Sample 1a (general): N = 2,398 participants (age M = 27.96, SD = 9.83; 70% identified as female). 42 missing values obtained (2.40% of dataset).
Sample 1b (treatment-seeking): N = 358 participants (age M = 27.54, SD = 9.47; 75.1% identified as female). 25 missing values obtained (2.51% of dataset).
Recruitment: Participants responded to questionnaires administered online via social media and institutional channels. All provided informed consent prior to participation. Participants received no compensation.

Study 2:

Sample 2: N = 444 participants (age M = 27.54, SD = 9.47; 66.7% identified as female). Recruited at a university clinical psychology center. Two participants did not complete the acceptance/rejection eligibility question.

Instrument Evaluated

Acceptance and Action Questionnaire-II (AAQ-II): 7-item instrument (Bond et al., 2011). Spanish version by Ruiz et al. (2013, 2016). Uses 7-point Likert scale (never = 1; always = 7). The Spanish version has shown excellent internal consistency and unidimensional factor structure in Colombian samples (Ruiz et al., 2013, 2016). The AAQ-II was originally designed to measure experiential avoidance, though its final version was presented as a measure of psychological flexibility. Items measure painful experiences, difficulty living a valued life, inability to control worries, and other aspects related to inflexibility and avoidance.

Other Measurement Instruments

Depression Anxiety Stress Scales-21 (DASS-21): 21-item instrument (Lovibond & Lovibond, 1995). Spanish version by Daza et al. (2002). 4-point Likert scale (0 = did not apply to me; 3 = applied to me most of the time). Measures three subscales: Depression (7 items), Anxiety (7 items), and Stress (7 items). In this study, the DASS-21 showed Cronbach's alphas of .92 for Depression, .90 for Anxiety, and .90 for Stress.

Big Five Inventory-2 Neuroticism Subscale (BFI-2): 12-item measure of the Big Five personality model (Soto & John, 2017). Spanish version by Gallardo-Pujol et al. (2022). 5-point response scale. Measures neuroticism traits including negative emotions.

Data Analysis

Exploratory Graph Analysis (EGA):

EGA was employed as the primary method to analyze the AAQ-II's discriminant validity. EGA is a network approach that does not rely on the latent variable model of traditional factor analytic techniques.
EGA identifies latent communities through a network of partial correlations using a clustering algorithm (GLASSO; Friedman et al., 2008).
EGAnet package version 1.2.3 was used (Golino & Christensen, 2023).
The number of dimensions was estimated using bootstrap analysis with 500 resampled samples.
Dimension stability was evaluated using the Louvain algorithm.
Fit indices were calculated: RMSA (root mean square approximation error), CFI (comparative fit index), TLI (Tucker-Lewis index), SRMR (standardized root mean square residual).
Values of RMSA ≤ .08, CFI and TLI > .90, and SRMR ≤ .08 were considered acceptable.

Exploratory Factor Analysis (EFA):

EFA was used as a comparative analysis to EGA. Principal axis factoring method was employed.
Different numbers of factors were extracted based on data visualization using researcher judgment.
Promax rotation was used.
The number of factors was evaluated in simulations with explicit EFA procedures.

Multiple Regression Models:

Multiple regression analyses were used to examine the incremental validity of the AAQ-II. In these analyses, DASS-Total was used as the criterion variable, and the AAQ-II was included as a predictor.
Analyses examined predictions of DASS-Depression, DASS-Anxiety, and DASS-Stress separately.

Results

Study 1

Exploratory Graph Analysis (EGA):

The bootEGA estimation using EBIC Glasso method identified a four-community/dimension solution in both samples.
Sample 1a (general, N = 2,398): EGA identified four dimensions. AAQ-II items consistently loaded on a single dimension. DASS-21 identified two theoretically interpretable dimensions:
- First dimension: Anxiety and Stress (DASS-21 items)
- Second dimension: Depression (DASS-21 items)
- Third dimension: AAQ-II (all items)
- The AAQ-II demonstrated network loadings of virtually zero (.00 to .02) with the Anxiety/Stress dimensions, though it showed stronger associations with the Depression dimension.
Sample 1b (treatment-seeking, N = 358): Similar results. EGA identified four dimensions. The AAQ-II loaded on its unique dimension with network loadings higher than in the general sample (.01 to .03 with DASS dimensions). The AAQ-II dimension showed weak but present relationships with Depression.
EGA fit indices Sample 1 combined: RMSA = .059, 90% CI [.057, .061], CFI = .97, TLI = .97, SRMR = .031.

Exploratory Factor Analysis (EFA):

Parallel analysis suggested extracting two factors in Sample 1a, which explained 82.2% of the variance. The two main factors were:
- Factor 1: Anxiety and Stress (loadings between .50 and .88; DASS-21 items)
- Factor 2: Depression (loadings between .72 and .89; included DASS-21 Depression items)
AAQ-II items showed high loadings on their unique Factor (between .72 and .89). Cross-loadings were negligible.
The two-factor solution showed adequate fit indices: RMSA = .070, 90% CI [.068, .072], CFI = .91, TLI = .90, SRMR = .033.
Sample 1b: Similar analysis was conducted. EFA suggested a three-factor model (Factor 1: Anxiety/Stress = 40.7% of variance; Factor 2: Depression = 19.9% of variance; Factor 3: AAQ-II = loadings between .38 and .74). The factors showed strong correlations (Factor 1 and 2: r = .74; Factor 1 and 3: r = .68; Factor 2 and 3: r = .68).

Incremental Validity Analysis:

Multiple regression analyses indicated that AAQ-II significantly predicted DASS-Total (β = -.43, p < .001) and its subscales:
- DASS-Depression: β = -.40, p < .001
- DASS-Anxiety: β = -.43, p < .001
- DASS-Stress: β = -.36, p < .001
Coefficients were consistently higher for the AAQ-II in all analyses except for DASS-Stress prediction.

Study 2

Exploratory Graph Analysis (EGA):

Sample 2 (N = 444): EGA using bootEGA with EBIC Glasso estimation identified a four-community solution.
- First dimension: AAQ-II + DASS-21 Anxiety (items loaded together)
- Second dimension: DASS-21 Depression
- Third dimension: Neuroticism (BFI-2)
- Fourth dimension: AAQ-II + DASS-21 Stress (in some cases)
BootEGA analyses showed that the number of dimensions assigned and spatial organization were identified in 52.4% of the 500 resampled samples. A four-dimension solution was obtained in 47% of samples, and a five-dimension solution was an alternative.
Dimension stability: DASS-21 Depression and Anxiety showed dimension stability of D2 = .99. The dimension stability of the AAQ-II was .99. The DASS items showed dimension stability values of .99, .78, .56, and .46 across different dimensions.
Fit indices: RMSA = .061, 90% CI [.056, .067], CFI = .96, TLI = .96, SRMR = .050.

Exploratory Factor Analysis (EFA):

Parallel analysis suggested extracting four factors in Sample 2. The first factor explained 37.8% of the variance (included DASS-21 with loadings between .47 and .89). However, item S7 showed a cross-loading of .33 with the fourth factor.
The second factor explained 5.21% of the variance and included all AAQ-II items with high loadings between .67 and .88 and negligible cross-loadings.
The third and fourth factors included BFI-2 Neuroticism items. The fourth factor explained an additional 3.9% of variance and included six BFI-2 items with loadings between .51 and .70.
The four-factor solution showed acceptable fit indices: RMSA = .063, CFI = .89, TLI = .87, SRMR = .037.
The factors showed strong correlations:
- Factor 1 and Factor 2: r = .69
- Factor 1 and Factor 3: r = .54
- Factor 1 and Factor 4: r = .52
- Factor 2 and Factor 3: r = .51
- Factor 2 and Factor 4: r = .53
- Factor 3 and Factor 4: r = .42

Item Stability Analysis:

Dimension stability of all instruments was examined. Despite the lack of clear hierarchical equivalence across dimensions, the dimension stability of the AAQ-II was perfect (i.e., all items assigned to AAQ-II across all resampled samples).
DASS-21 Depression items showed adequate stability except D2. Similar results were found for Anxiety items (only A3 showed stability below .80). Stress items showed greater variability in stability, with all items except S6 showing stability values of .80 or higher.
The dimension stability of the AAQ-II was perfect (i.e., all items assigned to AAQ-II across all 500 resampled samples). In contrast, the dimension stability values obtained in the EGA solution were .42, .56, .42, and .24 for dimensions 1, 2, and 3 identified in the EGA solution.

Incremental Validity Analysis of the AAQ-II

Multiple regression analyses demonstrated that the AAQ-II significantly predicted DASS-Total scores, as well as individual DASS subscales (Depression, Anxiety, and Stress):
- DASS-Total: AAQ-II: B = -.43, p < .001; BFI-NE: B = -.37, 95% CI [.47, .74], p < .001
- DASS-Depression: AAQ-II: B = -.40, p < .001; BFI-NE: B = -.26, p < .001
- DASS-Anxiety: AAQ-II: B = -.43, p < .001; BFI-NE: B = -.40, p < .001
- DASS-Stress: AAQ-II: B = -.36, p < .001; BFI-NE: B = -.40, p < .001
Standardized coefficients were higher for the AAQ-II in all analyses except for DASS-Stress prediction.

Discussion and Conclusions

The results of the EGA and EFA analyses supported the discriminant validity of the AAQ-II when compared to the DASS-21 and BFI-2 Neuroticism measures. Overall, the EGA assigned the AAQ-II to its own exclusive community/dimension without significant network loadings with other factors. BootEGA indicated that AAQ-II items demonstrated perfect dimension stability.

The researchers concluded that:

Discriminant validity demonstrated: EGA and EFA analyses showed that AAQ-II items clustered in a single community/factor that was discriminable from DASS-21 and Neuroticism measures. Although the AAQ-II showed moderate to strong associations with negative emotion measures, these findings are consistent with ACT theory. Previous studies that found problems with AAQ-II discriminant validity may have used larger samples or procedures not explicitly designed to evaluate it.
Reinterpretation of use: The study recommends reinterpreting the AAQ-II as a measure of genuine psychological flexibility rather than a simple measure of experiential avoidance. AAQ-II items reflect a broader construct than mere experiential avoidance and include aspects of inability to live a valued life and acceptance of emotions.
ACT context: The findings suggest that the AAQ-II functions in ways that make theoretical sense in the context of Acceptance and Commitment Therapy. Psychological flexibility, as measured by the AAQ-II, correlates more strongly with depression symptoms than with anxiety or stress, a finding consistent with previous research that found medium to strong associations between flexibility and depressive symptoms.
Limitations and recommendations: The study acknowledges several limitations: (a) studies were conducted with the Spanish version of the AAQ-II in Colombian samples; (b) the study did not analyze AAQ-II discriminant validity considering other participant characteristics such as age, gender, or socioeconomic status; (c) Study 1 included samples recruited for specific clinical purposes; however, the collected sample from Study 1 did not gather information on participant characteristics with psychological problems; (d) the study did not analyze whether the AAQ-II is discriminable from neuroticism in a treatment-seeking sample; and (e) given recent criticisms of AAQ-II discriminant validity, more systematic future analyses are suggested.
Theoretical implications: The current study's findings provide support for the idea that the AAQ-II demonstrates adequate discriminant validity in relation to negative emotions and neuroticism. The study suggests that the problem with the AAQ-II may not be how it has been used but rather the instrument per se.

Relevance for Measurement in ACT/CBS

This study has direct relevance for professionals and researchers using the AAQ-II as a measure of psychological flexibility in ACT/CBS contexts:

Construct clarification: The study clarifies that the AAQ-II measures a unique and independent construct that, while correlating with negative emotional symptoms, is not identical to them. This has important implications for interpreting AAQ-II scores in clinical and research contexts.
Continued justification for use: Despite recent criticisms, the findings support continued use of the AAQ-II as a measure of psychological flexibility. Researchers can have greater confidence that the AAQ-II provides valid information about changes in psychological flexibility independent of changes in emotional symptoms.
Latin American context: By demonstrating solid psychometric properties in Colombian samples (both general population and treatment-seeking), the study validates the use of the Spanish version of the AAQ-II in Latin American contexts specifically.
Process vs. outcome measurement: The study suggests that the AAQ-II may be a more informative measure of the therapeutic process in ACT than a simplistic measure of emotional symptoms. Therapists can use the AAQ-II to assess changes in psychological flexibility as a mechanism of change distinct from symptom reduction.
Interpretation recommendations: For clinicians and researchers using the AAQ-II, the study recommends a nuanced interpretation that considers the AAQ-II as a valid measure of psychological flexibility while recognizing its moderate correlation with negative emotional symptoms as theoretically consistent and not as evidence of invalidity.

Is the AAQ-II that bad?

Authors

Journal

Abstract

Detailed Summary

Background and objectives

Method

Participants

Instrument Evaluated

Other Measurement Instruments

Data Analysis

Results

Study 1

Study 2

Incremental Validity Analysis of the AAQ-II

Discussion and Conclusions

Relevance for Measurement in ACT/CBS