March 7, 2025
The top feature request from developers building with WhisperKit: On-device speaker diarization, the task of identifying "who speak when". Responding to popular demand, we built SpeakerKit, the newest addition to the Argmax SDK family of on-device inference frameworks.
Highlights
Benchmarks
We have also built SDBench, a Python toolkit for reproducibly benchmarking speaker diarization systems across 13+ widely used datasets following standardized procedures to enable apples-to-apples comparison and fine-grained understanding of tradeoffs. Code will be open-sourced and the accompanying paper will be published in April due to conference submission restrictions.
Roadmap
Commercial use cases for diarization generally involve diarizing transcripts, i.e. "who spoke what and when". After attaining state-of-the-art standalone diarization quality for SpeakerKit (as measured by DER), our next focus is to attain the same level of quality for diarized transcripts (measured by WDER) by optimizing the joint usage of WhisperKit and SpeakerKit.
SpeakerKit is more than just diarization. A major upcoming feature is speaker identification: Extracting voiceprints for a given speaker and identifying them in novel contexts.
Availability
We appreciate the 100+ applications for our Early Access Program (EAP). Due to engineering resource constraints, we were only able to grant access to a fraction of the applicants such as Macwhisper and Detail. The EAP program ends today and SpeakerKit Pro joins the Argmax SDK.
Argmax SDK is available with a license subscription for your application starting today!