Back to Guides

MPT and DSI: A Clinical Implementation Guide

May 9, 202615 min readJorge C. Lucero

🎯 Key Takeaways

  • MPT measures glottal efficiency, not lung capacity—a short MPT can reflect poor glottal closure, weak respiratory support, or both
  • Take three trials and use the longest—MPT shows learning effects across trials; the best of three is the standard
  • DSI combines four parameters—MPT, F0-high, I-low, and jitter—into a single score that ranges from +5 (normal) to –5 (severe dysphonia)
  • DSI is sensitive to therapy change—designed specifically to track treatment progress, with strong correlation to perceptual ratings and Voice Handicap Index
  • Cross-system DSI values are not directly comparable—jitter algorithms and pitch-tracking methods differ across software; use one system consistently within a patient

Maximum Phonation Time (MPT) is one of the oldest and simplest voice measurements in clinical use. A patient takes a deep breath and sustains the vowel /a/ as long as possible, and the clinician times it with a stopwatch. No software, no microphone, no analysis pipeline—just a reproducible behavioral measure that takes thirty seconds to administer.

The Dysphonia Severity Index (DSI), introduced by Wuyts and colleagues in 2000, builds on this tradition by combining MPT with three acoustic and phonatory measurements into a single weighted score. The result is one of the most widely used objective indices in voice clinics worldwide, valued specifically because it is sensitive to therapy progress and correlates with both perceptual ratings and patient-reported outcomes.

This guide covers what MPT actually measures (and what it does not), how to administer it correctly, the DSI formula and its components, normative values, and the practical limitations clinicians need to be aware of when interpreting these measures.

What MPT Actually Measures

MPT is sometimes described as a measure of respiratory function, sometimes as a measure of vocal fold efficiency. Both descriptions are partly correct and partly misleading.

"MPT measures how efficiently a patient converts available air into sustained phonation."

A short MPT can reflect any of three underlying problems—or some combination of them:

  • Glottal incompetence. Air leaks through an incompletely closed glottis (vocal fold paresis, presbyphonia, sulcus vocalis), so the patient burns through their breath faster than the vocal folds convert it into sound.
  • Reduced respiratory support. The patient simply has less air available to begin with (pulmonary disease, deconditioning, neuromuscular weakness).
  • Hyperfunction. Pressed phonation with high subglottal pressure can also shorten MPT, even when glottal closure is otherwise efficient.

Because MPT cannot distinguish among these mechanisms on its own, it is best interpreted alongside other measures—the S/Z ratio, perceptual evaluation, vital capacity if available, and laryngoscopic findings. As an isolated number it is suggestive; in a profile it becomes diagnostic.

The S/Z Ratio Companion

Pairing MPT with the S/Z ratio (sustained voiceless /s/ versus voiced /z/, longest of three trials each) helps separate respiratory from glottal contributions. A short MPT with an S/Z ratio close to 1.0 suggests a respiratory issue affecting both productions equally; a short MPT with an S/Z ratio above 1.4 suggests glottal incompetence affecting only the voiced production.

Recording Protocol

MPT looks simple, but small protocol differences produce surprisingly large variations in measured values. Standardizing the procedure within and across sessions matters more than the exact protocol chosen.

1. Position and demonstrate

Patient seated, upright, feet on the floor. The clinician demonstrates first—a deep inhalation followed by a sustained, comfortable /a/ at modal pitch and conversational loudness. The demonstration matters: patients who are not shown a model often produce strained, falsetto, or unusually loud productions that do not represent their habitual phonation.

2. Prompt and time three trials

"Take a deep breath in, then say /a/ for as long as you can." Time from voice onset to voice offset. Repeat three times with brief rest between trials. The longest of the three is the reported MPT—this is the convention adopted in most published normative studies and in the original DSI protocol.

3. Record what affects interpretation

Document the patient's age, sex, smoking status, current respiratory complaints, and whether they can sustain modal pitch throughout. A trial that drifts into vocal fry or breaks at the end is informative—note it rather than discarding it.

A Common Source of Error

Patients often time the run-up to phonation rather than the phonation itself. Coach them to inhale deeply and then begin the /a/, not to begin the /a/ as the inhalation finishes. Voice onset is the clock start, not the breath onset.

Normative Values: What Counts as Normal

MPT varies substantially with sex, age, body size, and physical conditioning, so a single "normal value" is not very useful. The ranges below come from healthy adult reference samples in the published literature and are reasonable clinical anchors, not strict thresholds.

PopulationTypical MPT RangeSource
Adult males (18–45)25–35 sIowa protocols; multiple normative studies
Adult females (18–45)15–25 sIowa protocols; multiple normative studies
Healthy older males~20–25 sMaslan et al. (2011)
Healthy older females~18–22 sMaslan et al. (2011)
Children (school age)8–15 s (age-dependent)Pediatric normative studies

Two patterns from the literature deserve clinical attention:

  • MPT below ~10 seconds in an adult is clinically significant regardless of sex or age. Such values reliably indicate either glottal incompetence, severely reduced respiratory support, or both, and warrant fuller workup.
  • Healthy older adults preserve MPT better than expected. When normative samples exclude pulmonary disease, neurological conditions, and voice complaints, mean MPT in the seventh-to-ninth decade does not differ much from younger adults. A short MPT in an older patient is not "just aging"—it suggests a process worth investigating.

From MPT Alone to the Dysphonia Severity Index

MPT alone captures only one dimension of voice function. Wuyts and colleagues, working at the University of Antwerp, asked a more ambitious question: could a small set of objective measurements be combined into a single score that reliably predicts perceptual dysphonia severity?

Their 2000 study on 387 subjects identified four measurements—each capturing a different aspect of vocal function—that together explained perceptual severity better than any one measure alone. The result was the Dysphonia Severity Index, defined by the formula:

DSI = 0.13 × MPT + 0.0053 × F0-high − 0.26 × I-low − 1.18 × Jitter(%) + 12.4

MPT in seconds, F0-high in Hz, I-low in dB, Jitter in %

The DSI scale was designed so that +5 corresponds to a perceptually normal voice (GRBAS Grade 0) and –5 to severe dysphonia (Grade 3), with intermediate values reflecting graded severity. The more negative the DSI, the worse the perceived voice quality. However, the +5 anchor represents the theoretical ceiling, not the population average: a 2020 meta-analysis of 14 studies and 1,330 healthy subjects found a mean DSI of 3.05 for perceptually normal voices, with a 95% confidence interval of 2.13–3.98 (Sobol & Sielska-Badurek, 2022). This means many healthy voices produce DSI values well below +5, and clinical interpretation bands should be anchored to the empirical distribution rather than the theoretical maximum.

What Each DSI Component Captures

The DSI is not arbitrary—each of its four components was selected because it indexes a different physiological aspect of voice production. Understanding what each measures helps interpret why a patient's DSI is what it is, beyond just the global score.

MPT (positive coefficient: 0.13)

Reflects glottal efficiency and respiratory-phonatory coordination. Longer MPT pushes the DSI up (better voice).

F0-high (positive coefficient: 0.0053)

The highest fundamental frequency the patient can produce. Reflects vocal range, cricothyroid function, and overall laryngeal flexibility. Loss of upper range is one of the earliest signs of many voice disorders, including presbyphonia and unilateral paresis. Patients are typically allowed to use glides or scales to reach their highest frequency, with the best of three trials recorded.

I-low (negative coefficient: –0.26)

The lowest intensity (in dB SPL) at which the patient can sustain phonation. Reflects fine motor control of the larynx and the ability to phonate at low subglottal pressure. Patients with hyperfunction or glottal incompetence often cannot produce soft voice cleanly. Note the negative coefficient: a lower I-low (softer minimum) increases the DSI, because it indicates better laryngeal control.

Jitter % (negative coefficient: –1.18)

Cycle-to-cycle frequency perturbation in a sustained vowel. Captures vocal fold vibratory irregularity. The large negative coefficient (–1.18) means jitter has substantial weight in the DSI: even a one-percentage-point increase in jitter shifts the DSI down by about 1.2 units.

Why These Four, Specifically

Wuyts et al. started with thirteen candidate measures and used proportional odds logistic regression to identify the smallest set that retained predictive power. The four that survived capture distinct dimensions: MPT (efficiency), F0-high (range), I-low (low-pressure control), and jitter (cycle stability). The DSI's parsimony is part of its clinical appeal—four numbers, each independently obtainable, combine into one interpretable score.

Interpreting DSI Scores

The DSI is most useful in two clinical scenarios.

1. Establishing baseline severity. The DSI value at first evaluation, contextualized against the +5/–5 scale, gives a reproducible numerical anchor for the perceptual impression. A DSI of +3.5 quantifies "mildly impaired" in a way that two clinicians can compare across sessions and across cases.

2. Documenting therapy progress. This is what the DSI was specifically designed for. Because it integrates four parameters, it is more sensitive to incremental change than any single measure—improvement on one component (longer MPT) will show up in the DSI even if jitter and intensity remain stable. Wuyts and colleagues explicitly framed the DSI as a tool for tracking "therapeutic evolution," and subsequent studies have validated this use.

DSI Interpretation Bands

  • Within normal limits: DSI ≥ 2.0—consistent with the meta-analytic 95% CI lower bound for healthy voices (2.13; mean = 3.05)
  • Mild dysphonia: DSI 1.0 to 2.0—below the normal population range; approximately GRBAS Grade 1
  • Moderate dysphonia: DSI –1.0 to 1.0—approximately GRBAS Grade 2
  • Severe dysphonia: DSI < –1.0—approaching the theoretical floor of –5; approximately GRBAS Grade 3

Bands are based on Wuyts et al. (2000) and the meta-analytic normative mean of 3.05 for healthy voices (Sobol & Sielska-Badurek, 2022). These are approximate clinical guidance, not diagnostic cutoffs. Population-specific norms vary; institutional baselines are recommended.

Known Limitations

The DSI is widely used, but it has well-documented limitations that affect how its values should be interpreted.

Cross-system values are not directly comparable

Aichinger and colleagues compared DSI values from two commercial systems (lingWAVES and DiVAS) on the same patients and found significant differences in F0-high and MPT, though the umbrella DSI score itself was similar. The original DSI was developed using the Voice Range Profile module of one specific system and the MDVP jitter algorithm. Other software combinations produce systematically different component values, and the DSI inherits these differences. Use a single measurement system consistently across sessions for the same patient.

Jitter is the weakest link

The –1.18 coefficient on jitter gives this single parameter substantial influence over the DSI. But jitter requires reliable F0 tracking, which fails in severely aperiodic voices. In Type 3 or Type 4 signals (using Titze's voice signal typing), jitter values may be unreliable or mathematically undefined—and the DSI inherits that instability.

The +5/–5 anchors are population-relative

The DSI was developed on a Belgian sample, with specific demographic and clinical characteristics. Subsequent studies in other populations (Brazilian Portuguese, Korean, Turkish, others) have reported different cutoffs separating normal from dysphonic voices. Treat the +5/–5 boundaries as approximate, especially outside the original development population.

DSI tracks severity, not type

Two patients with the same DSI of +1.0 can have very different voice profiles—one with breathy glottal incompetence and short MPT, another with hyperfunction and elevated jitter. The single number compresses real clinical information. Use the DSI alongside the perceptual rating and the individual component values, not in place of them.

A Practical Workflow

For clinicians integrating MPT and DSI into routine voice assessment, a simple, reproducible workflow tends to work best:

  1. 1Demonstrate the task. Record three MPT trials and three S/Z trials. Use the longest of each.
  2. 2Record a sustained /a/ at habitual pitch and loudness for ~3 seconds for jitter analysis. Use the central 1 second of the cleanest of three trials.
  3. 3Capture F0-high using a glide or scale up. Capture I-low at habitual pitch with a sound level meter or calibrated software. Best of three for each.
  4. 4Compute the DSI and document the four component values—not just the score. Save to the patient record alongside the perceptual rating.
  5. 5At follow-up, repeat the same protocol with the same equipment. Compare the component values, not just the DSI score, to see which dimensions are responding to therapy.

Summary

  1. 1MPT is a quick, reproducible behavioral measure of glottal-respiratory efficiency. Take three trials, use the longest, and standardize the protocol.
  2. 2MPT alone is suggestive, not diagnostic. Pair it with the S/Z ratio to separate respiratory from glottal contributions.
  3. 3The DSI integrates MPT, F0-high, I-low, and jitter into a single score on a +5 (normal) to –5 (severe) scale, designed specifically to track therapy progress.
  4. 4DSI values are system-dependent. Use one measurement system consistently within a patient; do not compare DSI values across software platforms without caution.
  5. 5Document the component values, not just the score. The DSI compresses four dimensions into one number; the components themselves carry clinical information that the score alone obscures.

📊 Compute MPT and DSI in PhonaLab

The PhonaLab MPT & DSI Calculator measures maximum phonation time, F0-high, I-low, and jitter from your recordings, then computes the DSI using the Wuyts formula. No installation required, and audio is processed in memory and never stored.

Open MPT & DSI Calculator →

⚠️ Educational Information

This article presents voice assessment concepts and summarizes published research findings for educational purposes. It does not constitute clinical advice, diagnostic guidance, or treatment recommendations. Clinical decisions regarding voice assessment and intervention should be made by qualified, licensed healthcare professionals based on comprehensive evaluation of individual patients. PhonaLab provides acoustic measurement tools; it does not provide clinical interpretations or medical diagnoses.

References & Further Reading

  • Wuyts FL, De Bodt MS, Molenberghs G, Remacle M, Heylen L, Millet B, Van Lierde K, Raes J, Van de Heyning PH. (2000). The Dysphonia Severity Index: An objective measure of vocal quality based on a multiparameter approach. Journal of Speech, Language, and Hearing Research, 43(3), 796–809.
  • Maslan J, Leng X, Rees C, Blalock D, Butler SG. (2011). Maximum phonation time in healthy older adults. Journal of Voice, 25(6), 709–713.
  • Speyer R, Bogaardt HC, Passos VL, Roodenburg NP, Zumach A, Heijnen MA, Baijens LW, Fleskens SJ, Brunings JW. (2010). Maximum phonation time: Variability and reliability. Journal of Voice, 24(3), 281–284.
  • Aichinger P, Feichter F, Aichstill B, Bigenzahn W, Schneider-Stickler B. (2012). Inter-device reliability of DSI measurement. Logopedics Phoniatrics Vocology, 37(4), 167–173.
  • Hakkesteegt MM, Brocaar MP, Wieringa MH, Feenstra L. (2008). The relationship between perceptual evaluation and objective multiparametric evaluation of dysphonia severity. Journal of Voice, 22(2), 138–145.
  • Sobol M, Sielska-Badurek EM. (2022). The Dysphonia Severity Index (DSI)—Normative values. Systematic review and meta-analysis. Journal of Voice, 36(1), 143.e9–143.e13. doi:10.1016/j.jvoice.2020.04.010
  • Goy H, Fernandes DN, Pichora-Fuller MK, van Lieshout P. (2013). Normative voice data for younger and older adults. Journal of Voice, 27(5), 545–555.
  • Barsties B, De Bodt M. (2015). Assessment of voice quality: Current state-of-the-art. Auris Nasus Larynx, 42(3), 183–188.
  • Maryn Y, De Bodt M, Roy N. (2010). The Acoustic Voice Quality Index: Toward improved treatment outcomes assessment in voice disorders. Journal of Communication Disorders, 43(3), 161–174.