PiVocal — Ciro Ciampaglia

Why it exists

PiVocal was born from a concrete annoyance: a Bluetooth remote microphone button that works as expected on Android but does nothing useful on a Linux desktop. The project gives that hardware button a practical behavior: press once to start listening, speak, press again, and the recognized text is typed into the focused application.

Architecture

trigger.py runs as root because it needs access to Linux input events.
frontend.py runs as the normal desktop user and owns the GUI, microphone, speech recognition and text insertion.
The two processes communicate through a Unix socket at /tmp/stt_trigger.sock.
Vosk performs speech recognition locally.
wtype is used on Wayland and xdotool on X11 for text insertion.

Key features

Offline speech recognition with Vosk.
Split privilege model between hardware trigger and desktop frontend.
No clipboard usage for text insertion.
Support for Wayland and X11 insertion paths.
Installer and uninstaller scripts using systemd/autostart integration.

Design decisions

The split between a privileged trigger process and an unprivileged frontend is the core engineering decision. It keeps low-level input event access separate from the user-facing desktop workflow.

Links

Source Code