Back to Guides

Acoustic Voice Assessment via Telehealth: What Survives the Compression

May 10, 202614 min readJorge C. Lucero

🎯 Key Takeaways

  • F0 measures survive videoconferencing transmission—mean fundamental frequency was the only acoustic measure unaffected by transmission across all six platforms tested in Weerathunge et al. (2021)
  • CPPS, HNR, and SPL measures degrade—platform audio enhancements compress dynamics, suppress noise, and distort spectral structure
  • Microsoft Teams was least disruptive; Zoom-with-enhancements most—but no platform preserved all measures with clinical accuracy
  • Asynchronous recording beats live capture—patient records locally on a phone or tablet with a headset microphone, then uploads the file; this preserves all measures
  • Ambient noise is the strongest predictor of distortion—a quiet room matters more than which platform you use

Voice therapy over telehealth is no longer experimental. Medicare telehealth coverage for SLP services has been extended through 2027, payer reimbursement is established for several CPT codes with telehealth modifiers, and outcome studies have demonstrated efficacy across vocal fold nodules, muscle tension dysphonia, and Parkinson's disease. The therapy works.

The harder question is the assessment. Voice therapy progress is documented quantitatively—pre and post acoustic measures, baseline-to-discharge comparisons, outcome scores. If those measurements are collected over Zoom or Microsoft Teams during a live session, can clinicians trust them? Or does the platform itself distort the signal in ways that invalidate the analysis?

This guide answers that question with the available evidence and outlines a practical protocol that works around the technical limitations of live videoconferencing. The short version: most acoustic measures other than F0 are meaningfully degraded by telepractice transmission, and the right solution is asynchronous recording rather than live capture.

Why Videoconferencing Distorts Voice Measurements

Videoconferencing platforms are optimized for intelligible conversation, not for high-fidelity acoustic capture. To deliver clear speech under variable bandwidth conditions, every major platform applies a chain of audio processing:

  • Lossy compression. Audio is encoded in formats designed to discard "perceptually irrelevant" information—exactly the kind of fine spectral detail that acoustic measures depend on.
  • Noise suppression. Algorithms identify and attenuate non-speech components, which can include the breathy or aperiodic energy that defines dysphonic voice.
  • Automatic gain control. Quiet passages are amplified and loud passages compressed, distorting the dynamic range that intensity measures depend on.
  • Echo cancellation and de-reverberation. These can introduce subtle nonlinear processing that changes spectral content.

Each of these processes is helpful for conversation and harmful for measurement. The result is a signal that sounds like the patient's voice but is acoustically a different object than what the microphone originally captured.

A Key Insight About Noise Suppression

Noise suppression algorithms are trained to remove background noise from the signal. They cannot reliably distinguish background noise (HVAC hum, traffic) from voice noise (turbulent airflow through an incompletely closed glottis). For a patient with breathiness, the platform may be actively suppressing the very acoustic signature the clinician is trying to measure.

What the Evidence Shows

The most systematic study of this question is Weerathunge and colleagues (2021), who recorded 29 voice samples from patients with dysphonia, transmitted each sample through six HIPAA-compliant videoconferencing platforms, and compared the resulting acoustic measures against the original recordings.

The platforms tested were Zoom (with and without audio enhancements), Cisco WebEx, Microsoft Teams, Doxy.me, and VSee Messenger. The acoustic measures included mean F0, F0 variability, F0 range, SPL range, HNR, L/H spectral ratio, and CPPS (sustained vowel and connected speech).

The summary of findings:

Acoustic MeasureTelepractice ValidityNotes
Mean F0PreservedOnly measure unaffected by any platform
F0 SD / rangeMostly preservedSome platform effects but smaller than other measures
CPPSDegradedSignificant decreases; large effect with WebEx
HNRDegradedAffected by ambient noise plus platform processing
SPL rangeDegradedAutomatic gain control compresses dynamics
L/H spectral ratioDegradedVariable across platforms; enhancement-driven
Jitter / shimmerNot validatedEarlier work shows perturbation measures fail below SNR 30 dB

Two findings from this work are particularly important for clinical practice. First, Microsoft Teams was the least disruptive platform, while Zoom with enhancements was the most disruptive—even though Zoom is the most commonly used platform for clinical telepractice. Second, ambient noise from the patient's environment was a significant predictor of measurement differences across nearly every metric. The patient's room mattered more than the patient's internet speed.

A Cautionary Note on Cross-Session Comparison

Even when a measure is "preserved" on average, platform-specific systematic biases mean that comparing a baseline measurement collected in person with a follow-up measurement collected over telepractice can produce spurious "improvements" or "deteriorations." Use one collection method consistently across sessions for the same patient, or use the asynchronous workflow described below.

Why Asynchronous Recording Solves the Problem

The fundamental issue with live videoconferencing is that the audio is processed in real time by the platform before reaching the clinician. The signal that arrives is not the signal the patient produced.

The workaround is to bypass the platform entirely for the recording itself. The patient records the voice sample locally on their own device, using a recording app that produces an uncompressed or losslessly compressed file (WAV, FLAC, or Apple Lossless). They then upload the file separately—via secure portal, secure email, or HIPAA-compliant file transfer. The clinician analyzes the uploaded file rather than the live transmission.

This approach has several advantages:

  • No platform compression. The signal reaching the analysis software is the signal the microphone captured.
  • Headset microphone is feasible. Patients can use a basic headset placed at the validated 2.5–5 cm distance, which dramatically reduces the impact of ambient room noise.
  • Quality control before analysis. The clinician can listen to the file before computing metrics and request a re-recording if there is obvious clipping, environmental noise, or off-task production.
  • Reproducible across sessions. Each recording is collected with the same device and protocol, eliminating session-to-session platform variability.

Bridge2AI-Voice consortium validation work has reported that smartphone or tablet recordings made with a headset microphone at 2.5–5 cm distance achieve correlations greater than 0.90 with research-grade equipment for CPP and other key acoustic measures. This is the empirical anchor for the asynchronous workflow.

A Practical Telehealth Workflow

For a clinician integrating acoustic assessment into a telepractice voice caseload, the following workflow balances technical reliability with patient burden:

1. Provide a recording protocol document

A one-page handout with screenshots: which app to use (any voice memo app that saves uncompressed audio), microphone placement, room setup (quiet space, away from windows and HVAC), and the exact vowel and connected-speech prompts. Brevity matters—patients will not follow a four-page protocol.

2. Send the patient a basic headset

A wired USB or 3.5 mm headset with a boom microphone, mailed at intake or provided at the first in-person visit. A consumer headset under $30 supports the 2.5–5 cm fixed-distance recording that drives the validation evidence. This is the highest-leverage investment in measurement quality.

3. Have the patient record before the session

The patient records sustained /a/ (3–5 seconds, three trials), the All Pronounceable Sentences, Rainbow Passage excerpt, or institutional standard, and uploads the files via secure portal a few hours before the live session. The clinician runs the acoustic analysis offline and discusses results during the session.

4. Use the live videoconference for everything else

Perceptual evaluation, case history, voice therapy practice, patient education, and outcome discussion all work well over Zoom or Teams. The video session captures the clinical interaction; the asynchronous recording captures the measurement.

5. Document the recording method in the chart

Note the device, microphone, and room conditions for each recording. This makes session-to-session comparison defensible and supports objective documentation of progress.

When Live Telepractice Capture Is Acceptable

Asynchronous recording is the right default, but live capture has legitimate clinical use cases when the limitations are understood:

  • F0 monitoring during gender-affirming voice therapy. Mean F0 is preserved across platforms, and within-session pitch tracking during practice is valuable for biofeedback.
  • Subjective screening when no objective measure is required. If the goal is to hear the voice and rate perceptually, the platform's audio fidelity is sufficient even when its acoustic fidelity is not.
  • Initial triage when no asynchronous option exists. A live impression is better than no impression, provided the clinician documents that the assessment was platform-mediated and acoustic measures were not collected.
  • Patient education and feedback. Showing a patient their pitch or loudness in real time during a session does not require absolute accuracy; relative change within a session is informative.

The line to draw is between using the platform's audio for clinical interaction and quantifying the platform's audio as if it were a clean recording. The first is fine; the second is not.

Documentation and Defensibility

Telepractice acoustic data needs to be defensible if it ends up in an outcome report, a peer-reviewed publication, or a third-party payer audit. Three documentation practices help:

  • Describe the recording chain explicitly. "Patient recorded sustained /a/ on iPhone 14 with Apple EarPods at approximately 3 cm from the mouth, in their home office, using Voice Memos app at lossless quality, uploaded via [secure portal name]."
  • Use the same chain for baseline and follow-up. Session-to-session comparisons require equipment continuity. If the device changes, document it and treat the new baseline as a new reference point.
  • Note any compromises. If a measurement was collected over live videoconferencing despite the limitations, document it explicitly: "F0 collected via Zoom; CPPS not collected this session due to platform limitations."

These practices make the methodology auditable and protect the clinician from later questions about whether observed changes reflect real voice change or measurement artifact.

Summary

  1. 1Videoconferencing platforms distort acoustic measurements. Compression, noise suppression, and gain control all alter the signal in ways that affect measurement.
  2. 2F0 survives transmission; most other measures do not. CPPS, HNR, SPL range, and L/H ratio are all platform-affected in published evidence.
  3. 3Asynchronous recording is the practical solution. The patient records locally with a headset and uploads the file; the clinician analyzes a clean signal.
  4. 4Ambient noise matters more than internet speed. A quiet recording environment is the highest-leverage protocol element.
  5. 5Live videoconferencing is fine for clinical interaction, not for quantification. Use the platform for the session; use uploaded files for the measurements.

📊 Analyze Uploaded Recordings in PhonaLab

PhonaLab accepts WAV, MP3, M4A, and other common audio formats directly in the browser. Patients can upload recordings from any device, and clinicians get the full PhonaLab measurement suite—F0, CPPS, AVQI, ABI, jitter, shimmer, HNR—without installing anything. Audio is processed in memory and never stored on PhonaLab's servers.

Open Voice Analyzer →

⚠️ Educational Information

This article presents telepractice acoustic assessment concepts and summarizes published research findings for educational purposes. It does not constitute clinical advice, regulatory guidance, or billing recommendations. Clinical decisions regarding voice assessment, telepractice protocols, and reimbursement should be made by qualified, licensed healthcare professionals based on jurisdiction- specific regulations and individual patient circumstances. PhonaLab provides acoustic measurement tools; it does not provide clinical interpretations or medical diagnoses.

References & Further Reading

  • Weerathunge HR, Segina RK, Tracy L, Stepp CE. (2021). Accuracy of acoustic measures of voice via telepractice videoconferencing platforms. Journal of Speech, Language, and Hearing Research, 64(7), 2586–2599. doi:10.1044/2021_JSLHR-20-00625
  • Awan SN, Shaikh MA, Awan JA, Abdalla I, Lim KO, Misono S. (2024). Validity of acoustic measures obtained using various recording methods including smartphones with and without headset microphones. Journal of Speech, Language, and Hearing Research.
  • Deliyski DD, Shaw HS, Evans MK. (2005). Adverse effects of environmental noise on acoustic voice quality measurements. Journal of Voice, 19(1), 15–28.
  • Lebacq J, Schoentgen J, Cantarella G, Bernardoni NH, Behlau M, Pepiot E. (2017). Maximal ambient noise level and type of voice material required for valid use of smartphones in clinical voice research. Journal of Voice, 31(5), 550–556.
  • Maryn Y, Ysenbaert F, Zarowski A, Vanspauwen R. (2017). Mobile communication devices, ambient noise, and acoustic voice measures. Journal of Voice, 31(2), 248.e11–248.e23.
  • Marsano-Cornejo MJ, Roco-Videla A. (2021). Variation of the acoustic parameters in different recording conditions: Their use in clinical settings. Acta Otorrinolaringológica Española, 72(2), 103–108.
  • Grillo EU. (2019). Building a successful voice telepractice program. Perspectives of the ASHA Special Interest Groups, 4(1), 100–110.
  • Schneider SL, Weston ZM, Rosen CA. (2021). Observations and considerations for implementing remote acoustic voice recording and analysis in clinical practice. Journal of Voice.
  • American Speech-Language-Hearing Association (n.d.). Telepractice. (Practice Portal). Retrieved May 10, 2026, from www.asha.org/Practice-Portal/Professional-Issues/Telepractice/.