Project · 2026

PiVocal

A local-first speech-to-text overlay for Linux that repurposes the microphone button on a Bluetooth remote into a desktop voice input workflow.

PythonLinuxVosk

Why it exists

PiVocal was born from a concrete annoyance: a Bluetooth remote microphone button that works as expected on Android but does nothing useful on a Linux desktop. The project gives that hardware button a practical behavior: press once to start listening, speak, press again, and the recognized text is typed into the focused application.

Architecture

  • trigger.py runs as root because it needs access to Linux input events.
  • frontend.py runs as the normal desktop user and owns the GUI, microphone, speech recognition and text insertion.
  • The two processes communicate through a Unix socket at /tmp/stt_trigger.sock.
  • Vosk performs speech recognition locally.
  • wtype is used on Wayland and xdotool on X11 for text insertion.

Key features

  • Offline speech recognition with Vosk.
  • Split privilege model between hardware trigger and desktop frontend.
  • No clipboard usage for text insertion.
  • Support for Wayland and X11 insertion paths.
  • Installer and uninstaller scripts using systemd/autostart integration.

Design decisions

The split between a privileged trigger process and an unprivileged frontend is the core engineering decision. It keeps low-level input event access separate from the user-facing desktop workflow.

Links