Who is Qalab Hassnain Agha?

Qalab Hassnain Agha (QHA) is a CTO and AI Systems Architect based in Islamabad, Pakistan. He leads Quickgen Technologies and QuickComm AE, with 4+ years building production AI systems including LLM pipelines, computer vision, IoT platforms, and cloud-native backends shipped to clients in Australia, UAE, the UK, and Pakistan.

What AI services does Qalab Hassnain Agha offer?

Qalab offers AI Systems Architecture & Consulting, LLM Pipeline and RAG development (GPT-4, Gemini, Claude, Whisper), Computer Vision systems (YOLOv8, OpenCV), Backend development (FastAPI, microservices, AWS/Azure), and IoT platform development (BLE 5.0, ESP32, MQTT).

What is Qalab Hassnain Agha's tech stack?

Primary stack: Python, FastAPI, TensorFlow, Keras, YOLOv8, OpenCV, LLMs (GPT-4, Gemini, Claude), AWS, Azure, Docker, PostgreSQL, Redis, WebSockets. Also works with Next.js, Flutter, .NET Core, and IoT (BLE 5.0, ESP32, MQTT).

Where is Qalab Hassnain Agha based and does he work remotely?

Qalab is based in Islamabad, Pakistan and works remotely with international clients. He has delivered projects for clients in Australia, UAE, the UK, and Pakistan, and is open to remote, hybrid, or relocation opportunities.

How can I hire Qalab Hassnain Agha for an AI project?

You can contact Qalab via email at aghaqalabhassnain@gmail.com, book a 30-minute call on Calendly, or reach him on LinkedIn (linkedin.com/in/qalabhassnainagha) and Upwork. He is currently available for new projects and consultations.

System ArchitectureMicroservicesBackend EngineeringProduction Migration

Monolith to Microservices: How We Achieved 3x Throughput on a Live Production System

Qalab Hassnain Agha·July 8, 2025·13 min read

ShareLinkedIn X / Twitter WhatsApp

Microservices migrations have a reputation for being expensive, risky, and often unnecessary. That reputation is earned — most microservices migrations are driven by architectural fashion rather than specific engineering pain.

Ours was driven by a specific, measurable problem: our monolithic backend couldn't handle the throughput we needed without scaling the entire application, even when only one component was under load.

The Starting Point: What Was Wrong with the Monolith

QuickComm started as a monolith. That was the right decision at the time — we needed to move fast, validate the product, and avoid operational overhead before we understood the system's actual load characteristics.

By the time we decided to migrate, the monolith had three components with fundamentally different scaling requirements:

Real-time audio processing: CPU-intensive, latency-sensitive, scales with concurrent active conversations — saturated CPU at peak load while everything else was idle
LLM inference pipeline: GPU-bound, throughput-sensitive, can tolerate 1–2 second latency — completely different scaling requirements from audio processing
REST API and dashboard backend: standard web service load, minimal CPU requirements

In the monolith, scaling any one component meant scaling all of them. The wasted spend was significant. The second problem was deployment coupling — a change to the LLM pipeline required a full application deployment of everything.

The Migration Strategy: Strangler Fig

The Strangler Fig pattern is the safest approach to monolith decomposition: extract one service at a time, route traffic to the new service incrementally, keep the monolith running until each extracted component is stable.

The alternative — a big bang rewrite — is almost always the wrong choice for live production systems. The risk is too concentrated, the feedback loop is too slow, and the deployment is too large to debug when something goes wrong.

Migration order (determined by isolation and risk):

Phase 1 — REST API (lowest risk): cleanest boundaries, stateless request handling, established infrastructure patterns for later phases
Phase 2 — LLM inference pipeline (medium risk): clear input/output contracts, stateful caching migrated carefully with Redis Streams as the interface
Phase 3 — Audio processing (highest risk): most complex state — active WebSocket connections, in-flight audio buffers, reconnection state machines

What We Used for Inter-Service Communication

Synchronous (REST/HTTP): for request-response interactions where the caller needs an immediate result — dashboard API queries
Asynchronous (Redis Streams): for pipeline stages where decoupling matters more than latency — audio processing publishes transcripts, LLM service consumes them
WebSockets (direct): for real-time dashboard delivery where message queue indirection overhead is unacceptable

What Broke (and What We Learned)

Distributed tracing was not optional

In a monolith, a request failure is a single log line. In microservices, a single user request spans 4–5 services each with their own logs. Without distributed tracing, debugging becomes archaeological. The first production issue after extraction took 4 hours to debug. With Jaeger tracing implemented immediately after, the next issue took 20 minutes.

Network latency is a real cost

In a monolith, a function call between components takes microseconds. In microservices, the same interaction is an HTTP call taking milliseconds. We had to explicitly budget for network latency in service SLAs — two sub-millisecond function calls became 5–15ms HTTP calls each.

Data consistency is harder than you think

The monolith had a single database with transactions. Microservices have multiple databases with eventual consistency. Cases the monolith handled atomically required explicit saga patterns in the distributed system. We found two data consistency bugs in the first month — both in edge cases that only appeared under high load.

The Results

Throughput: 3.2x improvement in concurrent conversation capacity at the same hardware cost
Deployment frequency: from weekly full-application deployments to daily service-level deployments — LLM prompt changes deploy in 4 minutes without touching audio processing
Cost efficiency: independent scaling eliminated over-provisioning — total infrastructure cost reduced by 35% at current load
Reliability: individual service failures no longer take down the entire application — LLM service issues showed lower classification accuracy, not a blank screen

When Microservices Are Not the Answer

The components must have genuinely different scaling requirements — if all components scale together, independent scaling provides no benefit
The team needs operational maturity to manage multiple services: distributed tracing, service discovery, inter-service auth, and separate deployment pipelines
The system needs stable service boundaries — early in a product's life, those boundaries shift frequently and microservices amplify the cost of getting them wrong

Final Thoughts

Microservices migrations succeed when they solve specific, measurable engineering problems — and fail when they're pursued for architectural reasons. Know your scaling bottleneck before you decompose.

Use the Strangler Fig pattern. Implement distributed tracing before you need it. The migration is not the goal. Throughput, reliability, and deployment velocity are the goals.

Frequently Asked Questions

What is the Strangler Fig pattern for microservices migration?

The Strangler Fig pattern extracts one service at a time from a monolith, routes traffic to the new service incrementally, and keeps the monolith running until each extracted component is stable in production. Migration order is determined by isolation and risk — extract the lowest-risk component first to establish infrastructure patterns before touching higher-risk ones.

When should you migrate from a monolith to microservices?

Migrate when you have components with genuinely different scaling requirements (some CPU-bound, some GPU-bound, some I/O-bound) and when deployment coupling is limiting your release frequency. The team must also have operational maturity to manage distributed tracing, service discovery, and separate deployment pipelines. A well-structured monolith is almost always the right starting point for new products.

What are the most common hidden costs of a microservices migration?

Network latency (function calls become 5–15ms HTTP calls per hop), distributed data consistency (transactions become saga patterns), and debugging complexity (distributed tracing is mandatory). We also found two data consistency bugs in the first month after migrating our audio processing component — edge cases that the monolith had handled atomically.