Skip to main content

Command Palette

Search for a command to run...

From AI-Integrated Systems to AI Platform Architect

What It Actually Takes and What I Built to Find Out

Published
β€’7 min read
From AI-Integrated Systems to AI Platform Architect

After 20 years in tech, you'd think at some point you get to sit back and say, "I know this stuff."

And then the AI stack explodes overnight and makes half your architecture knowledge feel like ancient history. πŸ˜…

The thing about something new showing up is that "you have a choice". Ignore it, side-step it, wait for it to mature, or take a peek. Just a peek.

The problem with peeking is that it pulls you in. One question leads to another. One repo leads to six. Two weeks later, you have a production-grade agentic AI platform, a full evaluation suite, and a very strong opinion about chunking strategies.

Curiosity is dangerous like that. I highly recommend it.


First, Some Context

I'm not new to AI. Over the past decade, I've shipped production systems that use AI β€” facial recognition and RFID tracking for a student safety system deployed in 50+ schools, augmented reality with ML Kit that lets users try on jewellery through a mobile app, and predictive analytics pipelines for healthcare operations.

But there's a real difference between integrating AI as a component and architecting the AI layer itself.

  1. Using AI as a component β€” calling a vision API, embedding a model into a mobile app, wiring a prediction service into a pipeline. AI is one ingredient in a larger system. You're a chef who uses a powerful appliance.

  2. Architecting the AI layer itself β€” designing the orchestration runtime, the retrieval pipeline, the tool interfaces, the evaluation framework, the observability stack. You're the one who builds the appliance.

Most engineers have done #1. Production AI platforms need #2.

I had deep experience with the first. The modern LLM and agentic stack required the second β€” and I knew it. So I treated it the way I treat any new infrastructure layer I need to own: build from primitives first, understand the tradeoffs, then use the frameworks with intention.


The Rules I Set

Two weeks. One rule: no claiming, only demonstrating.

  • Build from primitives before using frameworks β€” understand what the abstractions hide

  • Every component production-grade β€” proper error handling, logging, CI/CD, Docker, deployment config

  • Evaluate with real metrics β€” not "it seemed to work in the demo."

  • Everything public on GitHub β€” no hiding behind "I can't share the code."

Two weeks. Six repos. One flagship platform.


What I Built and Why in That Order

I started at the foundation and worked up:

llm-chat-api β€” baseline chat service with multi-provider abstraction across OpenAI, Anthropic, and Gemini. Before building anything agentic, I needed to understand the differences among providers at the API level and design a clean abstraction layer. Boring? Yes. Essential? Also yes.

rag-api β€” a complete RAG pipeline built entirely from primitives. Document loader, chunker, embedder, retriever β€” all built manually before touching LlamaIndex. This was the most valuable exercise in the entire two weeks. Building from scratch forces you to understand why chunking strategy matters, what retrieval actually does, and exactly where things break. You don't get that from a framework tutorial.

semantic-search-api β€” embeddings, ChromaDB, hybrid search, health dashboard. A standalone search service you can drop into any system.

ai-service-kit β€” a shared library with 102 passing tests, 2-level provider fallback, deterministic mock providers (SHA-256 seeding), and cloud logging abstraction for AWS, Azure, GCP, and Datadog. Paired with ai-service-template so every new service starts production-ready on day one. Turns out LLMs don't change the rules of good software engineering. Factory patterns, provider abstraction, mock layers β€” still very much needed. More on this later.

agents-api β€” a custom "ReAct" multi-agent system built deliberately without LangGraph. Planner β†’ Worker β†’ Reviewer pattern, model routing, semantic caching, guardrails, PII masking, Prometheus metrics. I built this before the flagship to understand the agent loop mechanics before abstracting them.

agentic-ai-platform β€” the flagship. LangGraph for stateful orchestration, LlamaIndex for RAG, and Model Context Protocol (MCP) for tool standardization, LangSmith for observability, and RAGAS for evaluation. An IT Support AI Agent use case with a live Human-in-the-Loop approval demo.

All repos are at github.com/manisundaram. Full portfolio at hellomani.com/page/work.


Meet the Stack

Before I go deep on each component, here's the cast of characters. Each gets their own episode β€” but it helps to know who's who before the show starts.

If you come from the REST API and cloud architecture world, which is where I come from, these mappings will feel familiar:

Component What It Does Familiar Equivalent
LLM The knowledge engine that generates responses A very smart API endpoint
RAG Gives the LLM access to current, specific data The DB call your API makes before responding
Vector Database Stores and searches data by meaning, not value RDS β€” but you query by similarity
Prompt Engineering How you structure the question matters Your API request payload
AI Agent Orchestrates tools and decisions autonomously Backend service making multiple API calls
LangGraph Manages complex agent workflows and state AWS Step Functions for AI
MCP Standardized interface for agent tools REST API contracts β€” for AI tools
RAGAS Evaluates whether your AI actually works NUnit and Selenium β€” for AI
LangSmith Full observability into what your AI is doing CloudWatch for your AI layer
ai-service-kit Shared library, abstractions, factory patterns Your internal SDK

Each of these is a distinct discipline. Some will feel immediately familiar. Some will feel new. All of them matter if you're building AI systems that actually hold up in production.


The Numbers

I evaluated the flagship with RAGAS. Real numbers, not marketing:

Metric Score
Agent Task Completion 1.00
Tool Call Accuracy 1.00
RAG Faithfulness 1.00
Hallucination Rate 0.10
Overall RAGAS Score 0.613

The agent does what it's supposed to do, uses tools correctly, and doesn't hallucinate against retrieved context. Good.

The overall score of 0.613 is pulled down by context precision (0.067) and context recall (0.20). My retrieval strategy needs work, specifically the chunking approach and query rewriting. I know exactly where the system is weak. That's the point of measuring.

A system that "seems to work" tells you nothing. A score of 0.067 tells you exactly where to focus next.


Why This Matters Beyond the Repos

The honest answer is I built this to close a gap between what I could claim and what I could demonstrate. In this market, that gap matters.

But the process gave me something more valuable than repos. Real opinions about real tradeoffs. LangGraph vs custom loops. LlamaIndex vs primitives. When HITL adds value vs when it adds friction. What RAGAS measures and what it misses.

Those opinions only come from building.

I know which one keeps life interesting.


Next up: What is RAG β€” your LLM's first stop for current information?

S

This is the right distinction......Using AI inside a product is very different from owning the AI platform layer. Once you start dealing with RAG quality, provider fallback, evals, tool contracts, observability, HITL, and agent state, it stops being β€œjust add an LLM call.”....The RAGAS numbers are the most useful part here. A system that works in a demo can still have weak retrieval, bad context precision, or hidden failure modes. Measuring that gap is what makes the architecture real.

I also like the build-from-primitives-first approach. Frameworks are useful, but only after you understand what they are hiding.

S

The shift from integrated systems to platform thinking is real β€” it changes how you design, scale, and think about AI as a product, not just a feature.

I Built an Agentic AI Platform. Here's What I Learned.

Part 1 of 1

A 20-year platform architect breaks down the modern agentic AI stack β€” RAG, vector databases, agents, LangGraph, MCP, RAGAS, LangSmith, and more. Plain English, real analogies, real numbers. One component at a time.

More from this blog