Why it exists
PiVocal was born from a concrete annoyance: a Bluetooth remote microphone button that works as expected on Android but does nothing useful on a Linux desktop. The project gives that hardware button a practical behavior: press once to start listening, speak, press again, and the recognized text is typed into the focused application.
Architecture
- trigger.py runs as root because it needs access to Linux input events.
- frontend.py runs as the normal desktop user and owns the GUI, microphone, speech recognition and text insertion.
- The two processes communicate through a Unix socket at /tmp/stt_trigger.sock.
- Vosk performs speech recognition locally.
- wtype is used on Wayland and xdotool on X11 for text insertion.
Key features
- Offline speech recognition with Vosk.
- Split privilege model between hardware trigger and desktop frontend.
- No clipboard usage for text insertion.
- Support for Wayland and X11 insertion paths.
- Installer and uninstaller scripts using systemd/autostart integration.
Design decisions
The split between a privileged trigger process and an unprivileged frontend is the core engineering decision. It keeps low-level input event access separate from the user-facing desktop workflow.