UltraSpeech
Real-time Ultrasound Tongue Imaging for Speech Therapy
Zero-to-one clinical biofeedback system built on a custom reactive DAG framework (sigflow-rt) that abstracts modalities by tensor shape and supports real-time inference inside the acquisition loop. The end-to-end pipeline captures B-mode ultrasound, audio, and video, runs ONNX inference for DeepLabCut tongue/lip pose, MediaPipe face mesh, and wav2vec2 phoneme transcription, renders live landmarks and a 3D tongue model, and scores tongue pose against UltraSuite TD references via Procrustes alignment. Zero-copy architecture (SIMD, shared memory, triple buffering) with async inference achieves real-time performance without a GPU.
- ›12–13 ms median latency on CPU only
- ›19,716 clinical inference operations validated
- ›14 tongue landmarks at 0.90 confidence
- ›All streams synchronized via LSL for recording and playback
- Demoed at ASHA Convention 2025 (Auspex Medix exhibitor)
sigflow-rt · ONNX · DeepLabCut · MediaPipe · wav2vec2 · LSL