Who is Qalab Hassnain Agha?

Qalab Hassnain Agha (QHA) is a CTO and AI Systems Architect based in Islamabad, Pakistan. He leads Quickgen Technologies and QuickComm AE, with 6+ years building production AI systems including LLM pipelines, computer vision, IoT platforms, and cloud-native backends shipped to clients in Australia, UAE, the UK, and Pakistan.

What AI services does Qalab Hassnain Agha offer?

Qalab offers AI Systems Architecture & Consulting, LLM Pipeline and RAG development (GPT-4, Gemini, Claude, Whisper), Computer Vision systems (YOLOv8, OpenCV), Backend development (FastAPI, microservices, AWS/Azure), and IoT platform development (BLE 5.0, ESP32, MQTT).

What is Qalab Hassnain Agha's tech stack?

Primary stack: Python, FastAPI, TensorFlow, Keras, YOLOv8, OpenCV, LLMs (GPT-4, Gemini, Claude), AWS, Azure, Docker, PostgreSQL, Redis, WebSockets. Also works with Next.js, Flutter, .NET Core, and IoT (BLE 5.0, ESP32, MQTT).

Where is Qalab Hassnain Agha based and does he work remotely?

Qalab is based in Islamabad, Pakistan and works remotely with international clients. He has delivered projects for clients in Australia, UAE, the UK, and Pakistan, and is open to remote, hybrid, or relocation opportunities.

How can I hire Qalab Hassnain Agha for an AI project?

You can contact Qalab via email at support@qalabagha.com, book a 30-minute call on Calendly, or reach him on LinkedIn (linkedin.com/in/qalabhassnainagha) and Upwork. He is currently available for new projects and consultations.

RAGFine-TuningLLMAI ConsultingVector Databases

RAG vs Fine-Tuning: What I Tell Clients Who Want "ChatGPT for Their Data"

Qalab Hassnain Agha·July 5, 2026·10 min read

ShareLinkedIn X / Twitter WhatsApp

Roughly half of the AI consulting inquiries I receive contain the same sentence: "we basically want ChatGPT, but trained on our data." The word trained is doing a lot of damage in that sentence — because what the client almost always needs is not training at all.

I have shipped both approaches in production: a hybrid-retrieval RAG system for research papers (PaperIntel) and fine-tuned models where behaviour mattered more than knowledge. Here is the framework I walk clients through, and the failure modes each path hides.

The Core Misunderstanding: Fine-Tuning Is Not Memory

Fine-tuning adjusts a model’s weights on your examples. It is excellent at teaching behaviour — tone, format, domain vocabulary, output structure. It is unreliable at teaching facts. A model fine-tuned on your product docs will confidently mix your pricing from 2024 with a hallucinated feature, and you will have no way to trace where either came from.

Retrieval-augmented generation flips this: the facts live outside the model in a search index, get fetched per question, and are pasted into the prompt as context. The model’s job shrinks from "know everything" to "read these passages and answer" — a job current LLMs are genuinely good at.

The Decision Framework I Use With Clients

Does the knowledge change weekly or faster? → RAG. Re-indexing a document takes seconds; re-training takes a pipeline.
Do answers need citations — legal, medical, research, support? → RAG. Fine-tuned weights cannot point to a source; retrieval can, passage by passage.
Is the problem tone, format, or style consistency? → fine-tuning (or first, honest prompt engineering — cheaper and often sufficient).
Chasing latency or per-token cost with a small model? → fine-tuning a small model on task-specific data is the legitimate win here.
Both problems at once? → RAG for facts, light fine-tune for behaviour — in that order.

Why RAG Demos Impress and RAG Systems Disappoint

The naive pipeline — chunk documents, embed, cosine-similarity search, stuff the prompt — demos beautifully and then fails on real questions. When I built PaperIntel, a research assistant answering questions over academic PDFs, the gap between demo and dependable came from four upgrades:

Hybrid retrieval: dense vectors miss exact terms (part numbers, method names, citations); BM25 keyword search catches them. Fusing both is the single biggest quality jump.
Reranking: retrieve 30 candidates cheaply, then let a cross-encoder pick the best 5. Precision in the prompt beats volume in the prompt.
Query decomposition: real users ask multi-hop questions ("how does X compare to Y on Z?"). Splitting them into sub-queries and retrieving per hop is what makes those answerable.
Citation-aware generation: forcing the model to attribute each claim to a retrieved passage turns "trust me" into "check source 3" — which is what makes users actually adopt the tool.

What Each Actually Costs

RAG’s costs are infrastructure: a vector store, an embedding pipeline, and retrieval logic. Fine-tuning’s costs are process: dataset curation (the part everyone underestimates), training runs, evaluation, and repeating all three every time the knowledge shifts. In my experience the RAG stack is boring, predictable spend; the fine-tuning loop is where timelines quietly die.

The Bottom Line

If your sentence contains "our documents," you want RAG. If it contains "our voice" or "our format," you want prompting first and fine-tuning second. If it contains both, build RAG, then tune. And whichever you pick, benchmark on your own questions — not the vendor’s demo set. This decision is exactly what the audit phase of my AI consulting engagements settles; the first call is free.

Frequently Asked Questions

Is RAG or fine-tuning better for answering questions over company documents?

RAG, almost always. Retrieval-augmented generation fetches the relevant passages at question time, so answers stay current as documents change and every claim can be cited back to its source. Fine-tuning bakes information into weights — it is slow to update, cannot cite sources, and does not reliably memorise facts anyway.

Is fine-tuning cheaper than RAG?

At query time it can be — a fine-tuned small model can undercut a large model plus retrieval. But fine-tuning has upfront costs RAG does not: dataset preparation, training runs, evaluation, and re-training every time knowledge changes. For most document Q&A workloads, RAG on a mid-tier model is the cheaper total system.

Can you combine RAG and fine-tuning?

Yes, and mature systems often do: RAG supplies the facts, while a light fine-tune (or good few-shot prompting) fixes tone, format, and domain vocabulary. Do RAG first — it solves the correctness problem, which is the one that kills projects.

Code, architecture patterns, and recommendations in this article come from real projects but are shared as-is, without warranty — validate them against your own requirements before production use. See the Terms of Use.

ShareLinkedIn X / Twitter WhatsApp

Available for Consulting

Let's build something
that matters.

I take on a select number of project-based consulting engagements per quarter — from architecture reviews and LLM pipeline audits to full production builds.

AI SystemsComputer VisionLLM PipelinesMLOpsIoT & BLE

Book a Call

80+ clients · 14+ production systems · Remote / Islamabad