Quantle: fair and honest presentation coach in your pocket
Great public speakers are made, not born. Practicing a presentation in front of colleagues is common practice and results in a set of subjective judgements what could be improved. In this paper we describe the design and implementation of a mobile app which estimates the quality of speaker’s delivery in real time in a fair, repeatable and privacy-preserving way. Quantle estimates the speaker’s pace in terms of the number of syllables, words and clauses, computes pitch and duration of pauses. The basic parameters are then used to estimate the talk complexity based on readability scores from the literature to help the speaker adjust his delivery to the target audience. In contrast to speech-to-text-based methods used to implement a digital presentation coach, Quantle does processing locally in real time and works in the flight mode. This design has three implications: (1) Quantle does not interfere with the surrounding hardware, (2) it is power-aware, since 95.2% of the energy used by the app on iPhone 6 is spent to operate the built-in microphone and the screen, and (3) audio data and processing results are not shared with a third party therewith preserving speaker’s privacy.
We evaluate Quantle on artificial, online and live data. We artificially modify an audio sample by changing the volume, speed, tempo, pitch and noise level to test robustness of Quantle and its performance limits. We then test Quantle on 1017 TED talks held in English and compare computed features to those extracted from the available transcript processed by online text evaluation services. Quantle estimates of syllable and word counts are 85.4% and 82.8% accurate, and pitch is over 90% accurate. We use the outcome of this study to extract typical ranges for each vocal characteristic. We then use Quantle on live data at a social event, and as a tool for speakers to track their delivery when rehearsing a talk. Our results confirm that Quantle is robust to different noise levels, varying distances from the sound source, phone orientation, and achieves comparable performance to speech-to-text methods.