Choosing the Right Acoustic Measure: A Decision Guide for Voice Clinicians
🎯 Key Takeaways
- Start with the clinical question—the right measure depends on what you need to know, not on what your software can compute
- For overall dysphonia: CPP/CPPS and AVQI are the most robust single and multiparametric options
- For breathiness: ABI or HNR, depending on whether you want a validated composite or a single parameter
- For gender-affirming work: F0 plus formants—pitch alone explains less than half of perceived gender
- For telehealth: CPP over HNR/shimmer—cepstral measures survive mobile recording conditions; perturbation measures do not
Modern voice analysis software can compute dozens of acoustic parameters in seconds. PhonaLab reports more than twenty. But the fact that we can measure something does not mean every measurement answers a useful question for every patient. Running all parameters on every recording is not thoroughness—it is noise.
The better question is: what do I need to know about this voice, and which measure is designed to tell me? Every acoustic parameter asks an implicit question. Jitter asks how stable the pitch period is from cycle to cycle. CPP asks how well-defined the harmonic structure is. HNR asks how much of the signal is periodic versus noise. Formants ask where the vocal tract resonates. When the clinical question matches the parameter's question, the measure is informative. When it does not, the number is just a number.
This guide is a practical decision framework. It organizes acoustic measures by the clinical question they are designed to answer, shows which are currently best supported by evidence, and flags the contexts in which each one fails.
Start With the Clinical Question
Before running any analysis, it helps to decide which of a small number of clinical questions you are actually trying to answer. Most voice assessments reduce to one or more of the following:
Five Common Clinical Questions
- 1. Is this voice dysphonic overall, and how severely? — screening, triage, tracking global change.
- 2. Which perceptual dimension is driving the disorder? — breathiness, roughness, strain.
- 3. Is the voice gender-congruent for this speaker's goals? — gender-affirming therapy.
- 4. Are resonance or articulation affected? — vocal tract involvement, dysarthria, resonant voice therapy.
- 5. Can I trust a remote or smartphone recording? — telehealth, asynchronous monitoring.
Each of these questions has a short list of measures that work and a shorter list that should be preferred. The rest of this guide walks through them in turn.
1. Overall Dysphonia Severity
This is the most common clinical question: how disordered is this voice, independent of what specifically is wrong with it? You want a measure that correlates with the overall grade (G) in GRBAS, the overall severity rating in CAPE-V, or the global perceptual impression of dysphonia.
Recommended Measures
- CPP / CPPS — the single best-supported acoustic predictor of overall dysphonia. Works on both sustained vowels and connected speech. Robust across severity levels and recording conditions. ASHA's 2018 expert consensus recommends CPP as a primary acoustic measure.
- AVQI (Acoustic Voice Quality Index) — a validated multiparametric composite that combines CPPS, HNR, shimmer, slope, tilt, and others. Outperforms any single measure for severity prediction and has published cutoffs across several languages.
Avoid as a Primary Severity Measure
- Jitter and shimmer alone — historically central, now known to fail for severe dysphonia because they require reliable pitch period detection. Still useful as components of AVQI, but not as standalone severity indices.
- HNR alone — correlates more with breathiness than with overall severity, and breaks down on highly aperiodic voices.
Practical Default
For most adult voice patients, report AVQI as the composite severity score and CPPS as the single-parameter anchor. If they disagree—a common situation with very short samples or unusual voices—the discrepancy itself is clinically informative and worth inspecting visually on a spectrogram.
2. Which Perceptual Dimension Is Driving the Disorder?
Once you know a voice is dysphonic, the next question is usually what kind of dysphonia. The three main perceptual dimensions beyond overall severity are breathiness, roughness, and strain. These dimensions behave differently acoustically and are not equally well-served by the measures currently in clinical use.
| Dimension | Best Composite | Useful Single Parameters | Status |
|---|---|---|---|
| Breathiness | ABI | HNR, CPP | Well supported |
| Roughness | No validated composite in routine use | Shimmer, jitter, GNE | Active research |
| Strain | No validated composite in routine use | HNR, F0, spectral tilt | Active research |
Breathiness
Breathiness is the best-served of the three dimensions. The Acoustic Breathiness Index (ABI) combines multiple parameters into a validated composite that tracks perceptual breathiness ratings closely. When a composite is not available, HNR is the strongest single-parameter proxy because incomplete glottal closure physically adds aperiodic noise to the signal, which is exactly what HNR measures. CPP also correlates with breathiness but is less specific.
Roughness
Roughness is harder. There is no widely adopted validated composite for roughness analogous to ABI, and clinicians typically combine shimmer and jitter with their perceptual judgment. Glottal-to-Noise Excitation ratio (GNE) has shown promise, and emerging work on dimension-specific multiparametric indices is beginning to offer better options. For now, interpret roughness-related single parameters cautiously and always in combination with perceptual assessment.
Strain
Strain is the least well-served dimension acoustically. It reflects effortful phonation—hyperfunction, compression, pressed voice quality—and has no routinely-used validated composite. Elevated F0 with low HNR, rising spectral tilt, and changes in harmonic structure can all point toward strain, but the evidence base is thinner than for severity or breathiness. If strain is the primary clinical concern, perceptual assessment and laryngeal examination remain essential; acoustic measures are supportive at best.
Why Some Dimensions Are Harder to Measure
Breathiness has a clear physical mechanism—turbulent airflow through an incompletely closed glottis—that maps cleanly onto a noise-versus-harmonic distinction. Roughness and strain involve more complex combinations of irregular vibration, supraglottic constriction, and muscular tension, and no single acoustic feature captures them as directly. This is an active area of research, and tools like PhonaLab will continue to add dimension-specific indices as validated composites emerge.
3. Gender-Affirming Voice Work
Gender-affirming voice work asks a different question entirely: is this voice perceived as congruent with the speaker's gender goals? The measures that matter here overlap only partially with those used for dysphonia.
Recommended Measures
- F0 mean and range — the primary pitch target. A mean F0 around 180 Hz is a commonly cited threshold for feminine perception, but it is only part of the picture.
- Formant frequencies (F1, F2, F3) — resonance is what pitch alone cannot provide. Research consistently shows that F0 explains less than half of perceived gender; formants account for a substantial additional share.
- Intonation measures — F0 variability and contour shape matter for gender perception beyond the mean pitch value.
For a deeper dive into evidence-based targets, hormonal effects, and WPATH SOC-8 guidelines, see our dedicated guide on gender-affirming voice therapy and the companion piece on formant analysis.
4. Resonance and Articulation
When the clinical question involves the vocal tract rather than the source—resonant voice therapy, dysarthria, articulation changes after surgery or neurological insult—source-based measures like jitter and HNR are largely irrelevant. The relevant measures shift to formants and vowel space.
Recommended Measures
- Formants (F1–F3) — for resonance focus, tongue position, and vocal tract shape.
- Vowel Space Area (VSA) — a single composite that summarizes articulatory working space; reduced in many dysarthrias and a common outcome measure in rehabilitation.
- Spectral tilt and alpha ratio — useful adjuncts for detecting resonant voice quality changes, where energy redistributes across the spectrum.
5. Telehealth and Smartphone Recordings
Not every parameter survives a consumer-grade microphone and a less-than-ideal room. When you cannot control the recording environment, some measures remain clinically useful and others become meaningless. Choosing the right subset is as important for remote work as choosing the right measure for the clinical question.
| Measure | Telehealth Reliability | Notes |
|---|---|---|
| F0 | High | Robust across recording devices |
| CPP / CPPS | High | Cepstral domain is relatively insensitive to microphone variation |
| Jitter | Moderate | Usable with reasonable recording conditions |
| Shimmer | Low | Highly sensitive to automatic gain control and microphone response |
| HNR | Low | Degrades quickly with background noise and non-flat mic response |
| Formants | Moderate | Generally usable if recording conditions are controlled |
For more detail on recording protocols that make smartphone assessment clinically valid, see our guide on smartphone voice recording for clinical assessment.
Putting It Together: A Minimal Default Protocol
For a general adult voice patient, a compact and defensible acoustic protocol looks like this:
- 1AVQI as the primary severity composite.
- 2CPPS as the single-parameter anchor and a robust fallback when recording quality is imperfect.
- 3ABI if breathiness is part of the clinical picture.
- 4F0 mean and SD for pitch characterization, extended to formants when gender or resonance is a target.
- 5Spectrogram inspection alongside the numbers, especially when measures disagree.
Add more parameters only when a specific clinical question requires them. Reporting every available number rarely helps communication with patients or referring physicians, and in some cases it introduces spurious significance where none exists.
Frequently Asked Questions
Q: If AVQI already includes CPPS, why report both?
AVQI is a weighted combination; it can be pushed up or down by any one of its components. Reporting CPPS separately lets you see whether the composite is being driven primarily by cepstral structure or by one of the other parameters—useful when tracking change over time.
Q: Is it wrong to still use jitter and shimmer?
Not wrong—just not sufficient on their own. They remain valuable as components of AVQI and as descriptors for mild-to-moderate dysphonia. What has changed is that they should no longer be theprimary severity measures, because they fail precisely on the severe voices where objective measurement matters most.
Q: What if my patient only produces a short vowel sample?
CPPS is the most forgiving of sample length. AVQI and ABI require enough signal for all their components to be computed reliably. If the sample is very short, report CPPS and F0 and note the limitation rather than forcing a composite score.
Q: Should I interpret acoustic measures without perceptual assessment?
No. Acoustic measures are designed to complement perceptual ratings, not replace them. The strongest clinical interpretations come from agreement between acoustic, perceptual, and (where indicated) visual-laryngoscopic findings. Disagreement between sources is itself informative and worth investigating.
Bottom Line: A Measure Is a Question
- 1Decide the clinical question first—severity, dimension, gender, resonance, or remote monitoring
- 2Match the question to a measure whose design addresses it—not every parameter fits every question
- 3Prefer validated composites (AVQI, ABI) when available; fall back to robust single parameters (CPPS) otherwise
- 4Know which measures survive telehealth—F0 and CPP do; shimmer and HNR largely do not
- 5Report fewer measures, interpreted well—a short, defensible protocol beats an exhaustive dump of every available number
📊 Build Your Protocol with PhonaLab
PhonaLab computes AVQI, ABI, CPPS, HNR, jitter, shimmer, formants, and more in a single analysis—so you can focus on choosing which measures to report, not on computing them. All algorithms are implemented via Praat (Parselmouth) for consistency with published research.
Try Free Voice Analyzer →No account required for Quick Voice Check • Full analysis with free account
⚠️ Clinical Documentation Tool
The information in this article is provided for educational purposes and clinical documentation support. Acoustic measures are intended to supplement—not replace—comprehensive voice evaluation including perceptual assessment, patient history, and laryngoscopic examination when indicated. All clinical decisions should be made by qualified healthcare professionals. PhonaLab tools do not provide medical diagnoses.
References & Further Reading
- Patel RR, Awan SN, Barkmeier-Kraemer J, et al. (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. American Journal of Speech-Language Pathology, 27(3), 887–905.
- Maryn Y, Corthals P, Van Cauwenberge P, Roy N, De Bodt M. (2010). Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. Journal of Voice, 24(5), 540–555.
- Barsties, B., Maryn, Y. (2012). Der Acoustic Voice Quality Index in Deutsch. HNO, 60, 715–720.
- Barsties v. Latoszek B, Maryn Y, Gerrits E, De Bodt M. (2017). The Acoustic Breathiness Index (ABI): a multivariate acoustic model for breathiness. Journal of Voice, 31(4), 511.e11–511.e27.
- Awan SN, Roy N, Jetté ME, Meltzner GS, Hillman RE. (2010). Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: comparisons with auditory-perceptual judgements from the CAPE-V. Clinical Linguistics & Phonetics, 24(9), 742–758.
- Hillenbrand J, Houde RA. (1996). Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. Journal of Speech and Hearing Research, 39(2), 311–321.
- Leemann A, Jeszenszky P, Steiner C, Studerus M, Messerli J. (2020). Linguistic fieldwork in a pandemic: Supervised data collection combining smartphone recordings and videoconferencing. Linguistics Vanguard, 6 (s3), 20200061.