Spectrogram Reading for SLPs: A Visual Guide to Voice Quality
Key Takeaways
- 1Narrowband spectrograms are essential for clinical voice assessment—they reveal harmonic structure, F0, subharmonics, and periodicity
- 2Always classify voice signal type (1-4) before calculating acoustic measures—this determines which measures produce valid results
- 3Type 1 signals allow all perturbation measures; Types 2-4 require CPP as the primary measure with perturbation data used cautiously or not at all
- 4Interharmonic noise indicates breathiness; subharmonics indicate asymmetric vibration; wavy harmonics indicate instability
- 5Spectrograms inform clinical hypotheses but don't diagnose—laryngeal visualization is always required for definitive diagnosis
- 6Count harmonics, assess clarity, check for subharmonics—this systematic approach improves reliability of visual analysis
- 7Adjust window length for high-pitched voices—standard settings may produce hybrid displays for women and children
The spectrogram is arguably the most powerful visual tool in voice assessment. Unlike single-number measures like jitter or CPP, a spectrogram displays all facets of vocal sound in a single picture—fundamental frequency, harmonics, noise, stability, voice breaks, and more. Yet many clinicians find spectrograms intimidating, uncertain how to translate those colorful bands into clinical insight.
This guide will demystify spectrogram interpretation for clinical voice assessment. You'll learn the critical difference between wideband and narrowband displays, how to classify voice signals using Titze's typing system (essential for knowing which acoustic measures are valid), and what specific visual patterns tell you about voice quality and pathology.
What a Spectrogram Actually Shows
A spectrogram is a three-dimensional representation of sound compressed into two dimensions. The horizontal axis represents time, the vertical axis represents frequency, and the darkness or color intensity represents amplitude (energy) at each time-frequency point.
Think of it as taking thousands of frequency snapshots of your audio and stacking them side by side. Dark regions indicate strong energy at that frequency and time; light or white regions indicate little energy.
The Fundamental Trade-Off
Spectrograms face an inherent physics constraint: you can have excellenttime resolution or excellent frequency resolution, but not both simultaneously. This is why we have two types of spectrograms—wideband and narrowband—each optimized for different clinical questions.
Wideband vs. Narrowband: Which to Use When
The distinction between wideband and narrowband spectrograms is the most important concept for clinical spectrogram reading. Each reveals different aspects of voice production.
| Feature | Wideband | Narrowband |
|---|---|---|
| Window length | Short (3-5 ms) | Long (30-50 ms) |
| Frequency bandwidth | Wide (~300 Hz) | Narrow (~20-45 Hz) |
| Time resolution | Excellent | Poor |
| Frequency resolution | Poor | Excellent |
| Shows clearly | Formants, vertical striations (glottal pulses) | Harmonics, F0, subharmonics |
| Primary clinical use | Articulation, formant transitions | Voice quality, signal typing |
Wideband Spectrograms: See the Vocal Tract
Wideband spectrograms use short analysis windows (typically 4-5 ms in Praat). This provides excellent temporal resolution—you can see individual glottal pulses as vertical striations. The dark horizontal bands represent formants, the resonant frequencies of the vocal tract.
Use wideband for: Analyzing articulation, tracking formant transitions during connected speech, visualizing voice onset time, and seeing rapid temporal events.
Narrowband Spectrograms: See the Voice Source
Narrowband spectrograms use longer analysis windows (30-50 ms). This provides excellent frequency resolution—you can see individual harmonics as distinct horizontal lines. The spacing between harmonics equals F0 (fundamental frequency).
Use narrowband for: Voice quality assessment, signal typing, identifying subharmonics, detecting F0 instability, and evaluating periodicity.
Clinical Bottom Line
For clinical voice assessment, the narrowband spectrogram is preferred because it displays the characteristics of vocal fold vibration: F0, harmonics, noise, subharmonics, and periodicity. This is what you need for voice signal typing and determining which acoustic measures are valid for your patient.
Voice Signal Typing: The Critical First Step
Before you calculate jitter, shimmer, or any perturbation measure, you must determine your patient's voice signal type. This classification, originally proposed by Ingo Titze in the 1995 NCVS Workshop on Acoustic Voice Analysis and later expanded by Sprecher et al. in 2010, determines which acoustic measures will produce valid results.
This step is often missed in clinical practice—yet it's essential. Calculating jitter on a Type 3 voice is meaningless; the algorithm cannot reliably track pitch periods in an aperiodic signal.
Type 1: Nearly Periodic
Spectrogram appearance: Clear, well-defined horizontal lines (harmonics) that appear nearly straight. Minimal noise between harmonics.
Clinical meaning: Healthy or mildly disordered voice with regular vocal fold vibration.
✓ Valid measures: Jitter, shimmer, HNR, CPP, all perturbation measures
Type 2: Subharmonics/Modulations
Spectrogram appearance: Alternating dark and light horizontal lines between harmonics. Harmonics may appear undulated rather than straight. Subharmonic frequencies visible.
Clinical meaning: Irregular vocal fold vibration with period doubling or amplitude modulation. Often seen in moderate dysphonia.
⚠ Valid measures: CPP, visual spectrogram analysis. Perturbation measures unreliable.
Type 3: Aperiodic/Chaotic
Spectrogram appearance: No clearly defined harmonic structure. Chaotic appearance with irregular energy distribution. Some pattern may still be discernible.
Clinical meaning: Severe dysphonia with chaotic vocal fold vibration. Low-dimensional chaos (finite correlation dimension).
⚠ Valid measures: CPP, perceptual rating, spectrogram analysis. Jitter/shimmer invalid.
Type 4: Stochastic Noise
Spectrogram appearance: No discernible periodicity. Appears as random noise with no harmonic structure—similar to white noise or severe breathiness.
Clinical meaning: Severely breathy voice, significant glottal incompetence, or aphonic segments. Infinite dimensionality.
✗ Valid measures: Perceptual rating only. No acoustic measures produce reliable results.
Why This Matters Clinically
If you report jitter = 2.3% for a Type 3 voice, that number is meaningless. The algorithm couldn't reliably identify pitch periods, so it's essentially measuring noise, not actual cycle-to-cycle variation. This is why ASHA's 2018 recommendations emphasize CPP as the primary measure—it works across all voice signal types.
Reading Harmonic Structure
In a narrowband spectrogram of a healthy voice, you'll see a series of horizontal lines stacked vertically. Each line is a harmonic—an integer multiple of the fundamental frequency (F0).
What Harmonics Tell You
- Harmonic spacing = F0: If the first harmonic is at 150 Hz and the second at 300 Hz, the fundamental frequency is 150 Hz.
- Number of visible harmonics: A healthy voice typically shows 8-15 clear harmonics in a narrowband spectrogram. Fewer visible harmonics suggests increased noise or reduced harmonic energy.
- Harmonic clarity: Sharp, well-defined harmonics indicate good periodicity. Fuzzy, smeared, or undulating harmonics suggest instability.
- Interharmonic noise: Energy between harmonic lines indicates turbulent airflow (breathiness) or aperiodic vibration (roughness).
Subharmonics: The Period-Doubling Pattern
Subharmonics appear as additional horizontal lines between the regular harmonics. If you see lines at F0, 1.5×F0, 2×F0, 2.5×F0, etc., you're observing subharmonics at half the fundamental frequency—indicating the vocal folds are completing two different vibratory cycles in alternation.
Subharmonics are common in:
- Vocal fold lesions (nodules, polyps) affecting vibratory symmetry
- Unilateral vocal fold paralysis
- Pubertal voice change
- Intentional vocal effects (vocal fry, growl)
Visual Patterns of Voice Quality
Different voice quality dimensions produce characteristic spectrographic appearances. Learning to recognize these patterns accelerates your clinical interpretation.
Breathiness
Spectrogram Features:
- • Increased energy between harmonics (interharmonic noise)
- • Fuzzy, less distinct harmonic boundaries
- • Reduced number of visible higher harmonics
- • High-frequency noise band (aspiration noise) often visible above 2000 Hz
Underlying Physiology:
Incomplete glottal closure allows turbulent airflow through the glottis, creating broadband noise that fills the spaces between harmonics. The voice source has both periodic and aperiodic components.
Roughness
Spectrogram Features:
- • Undulating or wavy harmonic contours
- • Presence of subharmonics (lines between harmonics)
- • Irregular harmonic spacing over time
- • Variable F0 visible as wobbling fundamental
Underlying Physiology:
Irregular vocal fold vibration from mass asymmetry, stiffness differences, or neuromuscular dysfunction. The periodicity is disrupted, creating cycle-to-cycle variation in both frequency and amplitude.
Strain/Pressed Voice
Spectrogram Features:
- • Strong, prominent harmonics extending to high frequencies
- • Enhanced energy in higher harmonics relative to F0
- • Sharp spectral "peaks" with steep roll-off
- • May show high-frequency noise from supraglottic constriction
Underlying Physiology:
Increased medial compression and longitudinal tension of the vocal folds creates a more abrupt glottal closure, which generates stronger high-frequency harmonics. Often accompanied by supraglottic hyperfunction.
Tremor
Spectrogram Features:
- • Regular, rhythmic undulation of harmonics
- • Cyclic frequency or amplitude modulation (typically 4-7 Hz)
- • Pattern repeats predictably throughout sustained phonation
- • May affect pitch, loudness, or both
Underlying Physiology:
Rhythmic oscillation of laryngeal or respiratory muscles, often related to neurological conditions (essential tremor, Parkinson's disease) or normal aging. The modulation frequency can help differentiate pathological from physiological tremor.
Spectrographic Patterns in Common Pathologies
While spectrograms alone cannot diagnose specific pathologies (that requires laryngeal examination), certain patterns are commonly associated with specific conditions. These associations can guide your clinical hypothesis and inform your referral decisions.
Vocal Fold Nodules
- •Mild to moderate interharmonic noise (breathiness)
- •Reduced higher harmonic energy
- •Usually Type 1 or borderline Type 2 signal
- •Voice breaks may appear as sudden discontinuities
Vocal Fold Polyp
- •More pronounced subharmonics (asymmetric vibration)
- •Often Type 2 signal with visible period doubling
- •Diplophonia may be visible as parallel harmonic tracks
- •Variable appearance depending on polyp size and location
Unilateral Vocal Fold Paralysis
- •Significant interharmonic noise (glottal incompetence)
- •Often prominent subharmonics from asymmetric vibration
- •May show diplophonia (two distinct F0 tracks)
- •Usually Type 2 or Type 3 signal
Reinke's Edema (Polypoid Corditis)
- •Abnormally low F0 (harmonic lines spaced closer together)
- •Increased harmonic instability (wavy contours)
- •Roughness pattern with aperiodicity
- •Bilateral involvement often creates irregular subharmonics
Remember: Spectrograms Inform, Don't Diagnose
These patterns are associated with specific pathologies, but similar patterns can result from different underlying conditions. A spectrogram showing subharmonics could indicate a polyp, paralysis, cyst, or functional disorder. Laryngeal visualization is always required for diagnosis.
Practical Protocol: Spectrogram Analysis in Clinical Workflow
Here's a systematic approach to incorporating spectrogram analysis into your voice assessment:
- 1
Generate a narrowband spectrogram first
In Praat, use View range 0-4000 Hz and Window length 0.03-0.05 seconds. For PhonaLab's Spectrogram Generator, select "Narrowband" mode.
- 2
Classify the voice signal type (1-4)
Are harmonics clearly defined? Are there subharmonics? Is the signal aperiodic or noise-dominated? Document this classification—it determines which acoustic measures are valid.
- 3
Assess harmonic clarity and number
Count visible harmonics. Note if they're sharp or fuzzy. Look for interharmonic noise indicating breathiness or aperiodicity.
- 4
Check for subharmonics and diplophonia
Look for additional horizontal lines between harmonics. Note if they're consistent (period doubling) or irregular (chaotic bifurcation).
- 5
Evaluate stability over time
Are harmonics straight (stable F0) or undulating (F0 instability, tremor)? Note any voice breaks, onset difficulties, or inconsistent segments.
- 6
Select appropriate acoustic measures
Based on signal type: Type 1 → all measures valid; Type 2-4 → use CPP as primary measure, report perturbation measures with caution or not at all.
Recommended Praat Settings
For those using Praat, here are optimized settings for clinical voice spectrogram analysis:
Narrowband (Voice Quality Assessment)
- View range: 0 – 4000 Hz
- Window length: 0.03 – 0.05 s
- Dynamic range: 50 – 70 dB
- Number of time steps: 1000
- Number of frequency steps: 250
Wideband (Formant/Articulation Analysis)
- View range: 0 – 5000 Hz
- Window length: 0.004 – 0.005 s
- Dynamic range: 50 – 70 dB
- Number of time steps: 1000
- Number of frequency steps: 250
Note for high-pitched voices: When analyzing women or children with F0 above 250 Hz, the default narrowband window may start showing individual harmonics even in "wideband" mode. If this happens, reduce the window length slightly (try 0.0035 s for wideband, 0.025 s for narrowband).
📊 Generate Clinical Spectrograms Instantly
PhonaLab's Spectrogram Generator creates publication-quality narrowband and wideband spectrograms with aligned waveforms. Upload any voice recording and visualize harmonic structure, noise, and voice quality patterns in seconds—no software installation required.
Try Free Spectrogram Generator →Wideband + narrowband options • Waveform overlay • Downloadable images
⚠️ Clinical Documentation Tool
The information in this article is provided for educational purposes and clinical workflow support. Spectrographic analysis should be interpreted within the context of comprehensive voice evaluation including perceptual assessment, patient history, and laryngeal visualization. All clinical decisions should be made by qualified healthcare professionals.
References & Further Reading
- Baken RJ, Orlikoff RF (1999). Clinical Measurement of Speech and Voice, 2nd ed. Cengage Learning.
- Titze IR (1995). Workshop on Acoustic Voice Analysis: Summary Statement. National Center for Voice and Speech.
- Sprecher A et al. (2010). Updating signal typing in voice: addition of type 4 signals. Journal of the Acoustical Society of America, 127(6), 3710-3716.
- Patel RR et al. (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. American journal of speech-language pathology, 27(3), 887-905.
- Maryn Y, Weenink D (2015). Objective dysphonia measures in the program Praat: smoothed cepstral peak prominence and acoustic voice quality index. Journal of Voice, 29(1), 35-43.
- Barsties B, Hoffmann U, Maryn Y (2015). The evaluation of voice quality via signal typing in voice using narrowband spectrograms. Laryngo-rhino-otologie, 95(2), 105-111.
Dr. Jorge C. Lucero
Professor of Computer Science, University of Brasília
Dr. Lucero has over 30 years of experience researching voice production, vocal fold dynamics, and acoustic analysis. He developed SimuVox, a physics-based voice disorder simulator, and created PhonaLab to make professional voice analysis accessible to clinicians worldwide.