General Health Questionnaire-12 validity in Colombia and factorial equivalence between clinical and nonclinical participants

Full reference: Ruiz, F. J., García-Beltrán, D. M., & Suárez-Falcón, J. C. (2017). General Health Questionnaire-12 validity in Colombia and factorial equivalence between clinical and nonclinical participants. Psychiatry Research, 256, 53-58. https://doi.org/10.1016/j.psychres.2017.06.020

Study type: Cross-cultural validation and measurement invariance analysis

Background and objectives

The General Health Questionnaire-12 (GHQ-12) is one of the most widely used mental health screening instruments globally for detecting emotional disorders in adult populations.

The General Health Questionnaire-12 (GHQ-12) is one of the most widely used mental health screening instruments globally for detecting emotional disorders in adult populations. Originally developed by Goldberg and Williams (1988) from a 60-item version, the GHQ-12 has gained broad acceptance due to its brevity and favorable psychometric properties. Although multiple Spanish translations exist with varying psychometric properties, evidence regarding its validity in Colombian contexts remains limited. Campos-Arias' (2007) Colombian study found a Cronbach's alpha of .78, relatively low compared to other Spanish-speaking populations reporting alphas between .86 and .90. Additionally, no previous study had evaluated the factorial equivalence of the GHQ-12 between clinical and nonclinical samples, which is critical for justifying score comparisons across these groups.

Aims and Design

Method

Participants

The study comprised three distinct samples:

Sample 1 (n=925): Undergraduate students from seven universities in Bogotá, aged 18-63 years (M=21.37, SD=3.83). Fifty-six percent were Psychology majors, and 66% were women. Thirty percent had received psychological or psychiatric treatment at some point, but only 5.4% were in active treatment. Two point nine percent were taking psychotropic medication.

Sample 2 (n=372): General Colombian population participants recruited through an online survey distributed via internet and institutional social media. Ages ranged from 18 to 89 years (M=26.65, SD=9.81), with 62% women. Forty-nine point two percent had completed primary or secondary education, 33.4% were students or professionals, and 16.4% had or were pursuing postgraduate degrees. Forty percent reported having received psychological or psychiatric treatment, but only 7.5% were in active treatment. Four point three percent consumed psychotropic medication.

Sample 3 (n=344): Clinical patients evaluated at an institutional psychological consultation center (91%) or private offices (9%) in Bogotá, aged 18-67 years (M=28.41, SD=11.23) with 67.7% women. Seventy-nine point seven percent reported emotional symptoms as consultation reason, 9% presented sexual disorders, and 11.3% other problems. Only 7.1% consumed psychotropic medication.

Instrument(s) under study

The GHQ-12 is a 12-item scale with 4-point Likert response format containing 6 positively worded and 6 negatively worded items. Two scoring methods were used: the Likert method (0-1-2-3) and the GHQ method (0-0-1-1), with higher scores indicating greater psychological distress.

Other outcome measures

Convergent validation instruments included the DASS-21 (Depression, Anxiety and Stress Scale-21) to assess emotional symptoms, the AAQ-II (Acceptance and Action Questionnaire-II) to measure experiential avoidance and psychological inflexibility, and the SWLS (Satisfaction with Life Scale) in Sample 1 to measure self-perceived well-being.

Data analysis

Prior to main administration, two pilot studies were conducted. First, the GHQ-12 was administered to 64 clinical psychology trainees to evaluate item comprehensibility, with no problems reported. Second, 3 experts in emotional disorders rated items' representativeness, comprehensibility, interpretation, and clarity, obtaining Aiken's V above .50 threshold for all items.

Sample 1 responded to questionnaires in classrooms at the beginning of a regular class. Sample 2 completed an anonymous online survey (www.typeform.com) titled "Survey of Emotional Health in Colombia." Sample 3 responded during clinical assessment interviews at treatment initiation. All participants provided informed consent and received debriefing upon completion.

Data Analysis

LISREL 8.71 was used for confirmatory factor analysis (CFA) with Robust DWLS estimation and polychoric correlations. One- and two-factor models were evaluated using goodness-of-fit indices: RMSEA, CFI, NNFI, ECVI, and SRMR. For measurement invariance analysis (metric and scalar) nested models were compared per Cheung and Rensvold (2002) and Chen (2007) criteria. Cronbach's alpha coefficients with 95% confidence intervals, corrected item-total correlations, Pearson correlations, and receiver operating characteristic (ROC) curves were calculated. Remaining analyses used SPSS 20.

Results

Item Psychometric Quality

All items demonstrated good discrimination with corrected item-total correlations ranging from .43 to .68 in Sample 1, .44 to .78 in Sample 2, and .55 to .74 in Sample 3. GHQ-12 alpha coefficient ranged from .88 (Sample 1) to .91 (Samples 2 and 3), with overall alpha of .90 (95% CI: .89 to .91). Gender differences in GHQ-12 scores were statistically significant only in Sample 1 with higher female scores (t=-3.55, p<.001), but not in Samples 2 and 3.

Factor Structure

The one-factor model demonstrated acceptable fit: S-Bχ²(54)=603.98, p<.01; CFI=.98, NNFI=.98, SRMR=.05, RMSEA=.079 (90% CI: .073 to .085). The two-factor model (positive and negative items on separate factors) showed better fit (RMSEA=.066, CFI=.99, NNFI=.98), but the interfactor correlation was .90, suggesting the fit improvement represented primarily a method effect. The one-factor model was selected for theoretical parsimony.

Measurement Invariance

Both metric and scalar invariance was demonstrated across samples (clinical versus nonclinical) and genders. RMSEA changes were below .01, and CFI and NNFI differences exceeded -.01, meeting invariance criteria. This indicates the GHQ-12 measures the same construct across clinical and nonclinical participants and across men and women, validating score comparisons between groups.

Convergent Validity

The GHQ-12 demonstrated theoretically coherent correlations with other variables. Strong positive correlations emerged between GHQ-12 and emotional symptoms (DASS-21 Depression: r range .66 to .79; DASS-21 Anxiety: r range .47 to .62; DASS-21 Stress: r range .56 to .70) and experiential avoidance (AAQ-II: r range .57 to .75). Moderate to strong negative correlations were observed between GHQ-12 and life satisfaction (SWLS: r=-.44 in Sample 1).

Criterion Validity

Mean GHQ-12 scores significantly differentiated between samples: clinical participants (M=16.54, SD=7.86) obtained significantly higher scores than Sample 1 (M=11.08, SD=6.37; t=-11.54, p<.001) and Sample 2 (M=11.87, SD=7.47; t=-8.13, p<.001).

ROC Curves and Cutoff Scores

ROC curves indicated GHQ-12 performed better than chance identifying emotional disorders. Area under curve was .80 (95% CI: .77 to .83) for Likert method and .78 (95% CI: .75 to .81) for GHQ method. A cutoff of 11/12 on Likert method was optimal (sensitivity=.82, specificity=.63), while 2/3 on GHQ method was adequate (sensitivity=.80, specificity=.64). These cutoff scores matched those found by Goldberg et al. (1997) in most countries.

Discussion and conclusions

The GHQ-12 Spanish version by Rocha et al. (2011) constitutes a valid mental health screening instrument for Colombia. It demonstrates excellent internal consistency, coherent unidimensional factor structure, measurement invariance between clinical and nonclinical samples justifying score comparisons, convergent validity with emotional symptoms and experiential avoidance measures, and adequate criterion validity. Authors acknowledged several limitations: systematic diagnostic information was not collected for clinical participants; some validity aspects were not evaluated such as sensitivity to treatment change; the female-to-male ratio was significantly higher, though mitigated by measurement invariance across genders; finally, ROC curves used clinical versus nonclinical sample membership as criterion rather than a gold standard.

Significance and contribution

This study contributes to cross-cultural validation of the GHQ-12 by providing solid evidence of its validity as a mental health screening instrument in Colombian contexts. By demonstrating measurement invariance between clinical and nonclinical samples and providing empirical cutoff scores specific to the region (11/12 for Likert method, 2/3 for GHQ method), the work extends the applicability of the instrument for screening and clinical assessment in Spanish-speaking populations. Findings regarding unidimensional factor structure and convergent validity with emotional symptoms suggest the GHQ-12 is a transculturally relevant measure for clinical evaluation.