Back to Guides

The Implicit Question: What AI Voice Analysis Actually Measures

January 20, 20269 min readDr. Jorge C. Lucero

"An AI model is a measure whose question is implicit rather than explicit."

I've been thinking about this idea a lot recently, in the context of voice analysis and the tools we build for clinicians.

Every measurement is, at its core, a question posed to reality. When we measure something, we're asking: "How much of this particular thing is present?" The value of any measurement depends on whether we're asking the right question—and whether we understand what question we're actually asking.

Traditional acoustic measures are transparent about the questions they ask. AI-based voice analysis changes the situation in ways that deserve careful attention—not because AI is inferior, but because it shifts what clinicians need to understand.

Traditional Measures Ask Explicit Questions

Consider the acoustic measures that speech-language pathologists have used for decades. Each one embodies a specific, articulable question about the voice signal:

Jitter

Question: "How stable are the durations of consecutive vocal fold cycles?"

The measure assumes you can identify individual cycles—and when you can't (severe dysphonia), the question stops making sense. This limitation is visible.

CPP (Cepstral Peak Prominence)

Question: "How prominent is the harmonic structure relative to noise?"

The cepstral transform and regression line are mathematically defined and inspectable. You know exactly what's being calculated.

AVQI (Acoustic Voice Quality Index)

Question: "How do these six acoustic features combine to predict perceived dysphonia severity?"

The regression weights are published. You can see exactly how CPPS, shimmer, and the other components contribute.

Each of these measures makes its assumptions visible. You may disagree with them. You may decide they don't apply in a given case. But you know what question is being asked—and, critically, you can recognize when that question stops making sense.

AI Models Ask Implicit Questions

Now consider an AI model trained to detect dysphonia from voice recordings.

The model still answers a question. But the question is no longer explicitly stated. Instead, it is learned—encoded in the training data, the labeling scheme, the recording conditions, the population demographics, and the optimization process.

The Implicit Question

When an AI model outputs "dysphonia probability: 0.87," the implicit question might be something like: "Based on patterns in 10,000 recordings labeled by three expert raters using GRBAS, recorded in sound-treated booths with head-mounted microphones, from patients aged 20-65 at a tertiary voice clinic, how similar is this recording to those labeled as disordered?"

But that question is rarely stated. Often, it's not even fully known.

The output may be accurate—sometimes remarkably so. AI models can detect patterns that no hand-crafted measure would capture. But why the model is accurate, and whenit will fail, are often much harder to see.

What Gets Encoded in an AI Model?

Training labels

Who labeled the data? What criteria did they use? How reliable were they?

Recording conditions

What microphones? What environments? What sample rates?

Population characteristics

Age, gender, language, pathology distribution, severity range?

Optimization targets

What was the model rewarded for? Accuracy? Sensitivity? AUC?

The Clinician's Task Has Shifted

This doesn't make AI inferior to traditional measures. In many contexts, it makes AI more powerful. But it does shift what the clinician needs to understand.

Traditional Measures

The clinician's task is to interpret a clearly defined measure:

  • • Understand what the measure calculates
  • • Know its assumptions and limitations
  • • Recognize when conditions invalidate it
  • • Apply clinical judgment to the value

AI Models

The clinician's task is to interpret the behavior of a learned model:

  • • Understand what data trained the model
  • • Recognize distribution shifts from training
  • • Identify when the model's "question" doesn't fit
  • • Apply clinical judgment to the prediction

The second task is harder. It requires a different kind of literacy—not about signal processing algorithms, but about the nature of learned representations and the ways they can silently fail.

When the Implicit Question Doesn't Fit

Consider some scenarios where an AI model's implicit question might not match the clinical situation:

Recording condition mismatch

A model trained on studio-quality recordings may behave unpredictably on smartphone audio—not because it's "wrong," but because it's answering a question about a different acoustic reality.

Population shift

A model trained predominantly on nodules and paralysis may not generalize to spasmodic dysphonia or muscle tension dysphonia. Its implicit question was shaped by pathologies it saw during training.

Spurious correlations

Perhaps all the "disordered" recordings happened to have a particular background noise pattern. The model may have learned to detect that—not dysphonia itself.

Traditional measures can fail too. But their failure modes are generally predictablefrom the algorithm definition. AI failure modes are often discovered empirically, after the fact, when something unexpected happens.

Transparency Becomes More Important, Not Less

As AI enters clinical voice assessment, transparency hasn't become less important. It has become harder—and more necessary.

When the question a measure asks is explicit, transparency means explaining the algorithm. When the question is implicit, transparency requires characterizing what the model learned, documenting its training conditions, validating across populations, and honestly acknowledging where uncertainty remains.

AI and Traditional Measures Work Together

The ideal may not be "AI versus traditional measures" but AI alongside traditional measures. AI provides powerful pattern recognition; traditional measures provide interpretable checkpoints. When they agree, confidence increases. When they disagree, the disagreement itself is informative—and the explicit measures give you somewhere to start investigating.

This is the approach we take at PhonaLab. We use AI to generate clinical interpretations, but we always show the underlying acoustic parameters—F0, jitter, shimmer, HNR, CPP, AVQI. The AI assists clinical reasoning; it doesn't replace the transparent measures that make reasoning possible.

Implications for Clinical Practice

What does this mean practically?

  1. 1

    Ask about training data

    When evaluating an AI tool, ask: what data trained it? What populations? What recording conditions? If this information isn't available, that's a red flag.

  2. 2

    Don't abandon interpretable measures

    Traditional acoustic measures provide a window into why a voice sounds the way it does. They support clinical reasoning, not just classification.

  3. 3

    Demand transparency from tool developers

    As a field, we should expect AI voice tools to document their training, validation, and limitations. "It works" is not sufficient. How it works, and when it doesn't, matter just as much.

  4. 4

    Remember that you are the clinician

    AI outputs are inputs to clinical decision-making. Your role is to integrate them with patient history, perceptual assessment, and clinical context—recognizing that no source, including AI, is infallible.

Why This Matters Now

I've been thinking through these issues while working on a longer-form project about voice acoustics and measurement. One thing has become very clear: as our tools become more sophisticated, the need to understand what our measurements actually mean doesn't go away. It quietly becomes the central issue.

The promise of AI in clinical voice assessment is real. Models can detect patterns humans miss, process data at scale, and potentially democratize access to expertise. But realizing that promise requires us to be thoughtful about what we're gaining and what we might be losing.

The goal isn't to reject AI. The goal is to use it wisely—which means understanding what question it's implicitly asking, and whether that question applies to the patient in front of us.

🔬 Transparent Voice Analysis

PhonaLab shows you the full picture: explicit acoustic measures (F0, jitter, shimmer, HNR, CPP, AVQI) plus AI-assisted interpretation. See what the measures calculate. Understand what they mean. Make informed clinical decisions.

Try Free Voice Analyzer →

All acoustic parameters displayed • AI interpretation as supplement • Your clinical judgment central

💭 A Note on This Essay

This piece reflects my thinking as a voice researcher and tool developer. It's not a systematic review or a clinical guideline—it's an attempt to articulate something I think is important as AI becomes more prevalent in our field. I welcome disagreement and discussion. The goal is to think carefully together, not to have the last word.

Further Reading

  • Patel RR, Awan SN, Barkmeier-Kraemer J, et al. (2018). Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel. American Journal of Speech-Language Pathology, 27(3), 887-905.
  • Rudin C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215.
  • Ghassemi M, Oakden-Rayner L, Beam AL. (2021). The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health, 3(11), e745-e750.
  • Ribeiro MT, Singh S, Guestrin C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144.

Dr. Jorge C. Lucero

Professor of Computer Science, University of Brasília

Dr. Lucero has spent 30+ years researching voice production, from the physics of vocal fold vibration to the mathematical models underlying acoustic analysis. He created PhonaLab to make professional voice analysis accessible while maintaining the transparency he believes clinical tools require. He is currently working on a book about voice acoustics and measurement for clinicians.