Who is Qalab Hassnain Agha?

Qalab Hassnain Agha (QHA) is a CTO and AI Systems Architect based in Islamabad, Pakistan. He leads Quickgen Technologies and QuickComm AE, with 6+ years building production AI systems including LLM pipelines, computer vision, IoT platforms, and cloud-native backends shipped to clients in Australia, UAE, the UK, and Pakistan.

What AI services does Qalab Hassnain Agha offer?

Qalab offers AI Systems Architecture & Consulting, LLM Pipeline and RAG development (GPT-4, Gemini, Claude, Whisper), Computer Vision systems (YOLOv8, OpenCV), Backend development (FastAPI, microservices, AWS/Azure), and IoT platform development (BLE 5.0, ESP32, MQTT).

What is Qalab Hassnain Agha's tech stack?

Primary stack: Python, FastAPI, TensorFlow, Keras, YOLOv8, OpenCV, LLMs (GPT-4, Gemini, Claude), AWS, Azure, Docker, PostgreSQL, Redis, WebSockets. Also works with Next.js, Flutter, .NET Core, and IoT (BLE 5.0, ESP32, MQTT).

Where is Qalab Hassnain Agha based and does he work remotely?

Qalab is based in Islamabad, Pakistan and works remotely with international clients. He has delivered projects for clients in Australia, UAE, the UK, and Pakistan, and is open to remote, hybrid, or relocation opportunities.

How can I hire Qalab Hassnain Agha for an AI project?

You can contact Qalab via email at support@qalabagha.com, book a 30-minute call on Calendly, or reach him on LinkedIn (linkedin.com/in/qalabhassnainagha) and Upwork. He is currently available for new projects and consultations.

Voice AIWhisperDeepgramGeminiReal-TimeLLMHospitality

How We Replaced Hotel Walkie-Talkies With Real-Time Voice AI

Qalab Hassnain Agha·July 5, 2026·11 min read

ShareLinkedIn X / Twitter WhatsApp

Walk any hotel back-of-house and you hear it: constant radio chatter. "Housekeeping, 412 needs towels." "Engineering, pool pump again." Every request exists for exactly as long as the audio hangs in the air — no record, no routing, no accountability, and the guest waits while the right person maybe hears it.

QuickComm replaces that with a pipeline that hears the radio, understands it, and routes it — while keeping the radios themselves, because retraining an entire hotel staff off radios is how projects die. The staff kept talking exactly as before; the system listens. Here is how it works and what it took.

The Pipeline: Audio → Text → Intent → Action

Ingest: PCM audio streams from the radio system into the cloud — continuous, real-time, per channel.
Transcribe: streaming speech-to-text via Whisper and Deepgram converts utterances to text within seconds of being spoken.
Understand: a Gemini LLM classifies each utterance — request type, department, room, urgency — at ~88% intent precision, emitting a strict JSON schema.
Route: the structured event fires to the right team instantly over WebSockets — dashboard, app notification, and an audit trail that finally exists.

Hard Lesson 1: Radio Audio Is Its Own Species

Every STT benchmark is recorded on good microphones by people speaking in sentences. Radio traffic is compressed, clipped at both ends by push-to-talk, spoken in fragments and hotel shorthand, over static. Our first pass accuracy was sobering. Getting to 94%+ took audio preprocessing (normalisation, band filtering), domain vocabulary hints (room-number patterns, department names, local terms), and running dual STT engines — Whisper and Deepgram disagree in usefully different ways, and confidence-weighted selection between them recovers a meaningful slice of errors.

Hard Lesson 2: Constrain the LLM or It Will Improvise

The classification prompt evolved into a contract: a fixed intent taxonomy, mandatory JSON output validated at the boundary, explicit entity slots, and a confidence field the model must populate. Below the confidence threshold, the event routes to a human dispatcher rather than guessing — an unglamorous fallback that is the difference between a tool staff trust and one they turn off. LLM-as-classifier works; LLM-as-freestyle-interpreter does not.

Hard Lesson 3: The Architecture Migration Paid for Itself

V1 was a monolith — correct choice for shipping fast, wrong choice for scaling to many properties. Audio ingestion, transcription, classification, and delivery have wildly different load profiles; scaling the monolith meant scaling all of them to the peak of the hungriest. Splitting into AWS microservices along those load boundaries delivered 3× throughput at near-zero deployment downtime — and dropped infrastructure cost to roughly $3/month per property. Automated anomaly detection on service health metrics then cut critical-incident response time by 70%: at multi-property scale, the system notices its own problems before staff do.

Results

94%+ transcription accuracy on live, noisy radio audio
~88% intent-classification precision via constrained Gemini prompts
~45% faster staff response times — requests reach the right team instantly, with accountability
3× throughput post-migration · ~$3/month per property · 70% faster incident response

Where This Pattern Applies

Hotels were the wedge, but the shape generalises: any operation coordinating over voice — warehouses, hospitals, security teams, restaurants, events — is running on unstructured audio that could be structured, routed, and measured. The full QuickComm case study lives on my AI consulting services page; if your operation runs on radio chatter and you wonder what it would take, that is a conversation I am always glad to have.

Frequently Asked Questions

How accurate is speech-to-text on walkie-talkie radio audio?

Raw radio audio is brutal: compressed, clipped, full of static, spoken in shorthand. Out of the box, general STT models degrade badly on it. With audio preprocessing, domain vocabulary hints, and a dual-engine strategy (Whisper and Deepgram), our production system sustains 94%+ transcription accuracy on live hotel radio traffic.

Why use an LLM for intent classification instead of rules or a classifier?

Staff phrase the same request a hundred ways, in multiple languages, with names and room numbers embedded. Rules engines rot immediately. A constrained LLM (Gemini) with a strict output schema classifies intent at ~88% precision, extracts the entities (room, urgency, department), and degrades gracefully — anything below the confidence bar routes to a human dispatcher.

What does a voice AI system like this cost to run?

Less than intuition suggests, if cost is designed in: streaming architectures avoid storing and reprocessing audio; per-second STT billing rewards short utterances (radio traffic is naturally brief); a small fast LLM handles classification. After migrating from monolith to microservices, infrastructure runs at roughly $3/month per property with 3× the original throughput.

Code, architecture patterns, and recommendations in this article come from real projects but are shared as-is, without warranty — validate them against your own requirements before production use. See the Terms of Use.

ShareLinkedIn X / Twitter WhatsApp

Available for Consulting

Let's build something
that matters.

I take on a select number of project-based consulting engagements per quarter — from architecture reviews and LLM pipeline audits to full production builds.

AI SystemsComputer VisionLLM PipelinesMLOpsIoT & BLE

Book a Call

80+ clients · 14+ production systems · Remote / Islamabad