service / 04 ai solutions

AI that earns
its keep.

AI for us means workflows that demonstrably save money, not demos that wow a board meeting. We build retrieval-grounded assistants on the Anthropic Claude API and OpenAI, classification pipelines for unstructured documents, and evaluation harnesses so you can tell whether a model upgrade actually broke production. We measure cost-per-task, latency, and accuracy against a held-out set before we recommend rolling anything out.

— ch. 01 / what you get

What you get

RAG pipelines that cite their sources

Document ingestion (PDFs, Word, scanned receipts via Tesseract), chunking, embeddings via OpenAI or local models, and a retriever that returns source paragraphs alongside every answer. No hallucinated quotes — every assertion links to a source.

Document classification + extraction

Claude or GPT-4o-mini for structured extraction from invoices, KYC documents, contracts. Output is typed JSON, validated against your schema, with a human-in-the-loop queue for low-confidence cases.

Evaluation harnesses

A held-out test set, golden outputs, and an automated harness that scores every prompt change before it ships. We don't push prompt changes to production without seeing the eval delta first.

Cost + latency budgets

Every workflow ships with a cost-per-task target and a p95 latency target. We instrument both in production and alert when either drifts. Most builds end up cheaper than the manual process by 40–80×.

Sensible model routing

Cheap models (Claude Haiku, GPT-4o-mini) for the bulk of traffic, escalation to bigger models only when the cheaper one returns low confidence. Saves 60–90% on inference cost without measurable quality loss.

— ch. 02 / our approach

Our approach

We treat AI as a software engineering problem first and a research problem second. The hard parts are usually data plumbing, evaluation, and integration with the existing workflow — not the model choice. We build the evaluation harness in week one so we can measure progress.

We are deliberately conservative about deploying autonomous agents. We've seen too many demos break down when the model has to chain four tool calls on real-world inputs. We prefer well-scoped assistants that do one thing reliably and hand off to a human when they hit a known edge.

Everything we build runs through the Anthropic SDK or OpenAI SDK directly — no Langchain layer cake. We've found that a thin wrapper around the official SDKs is easier to debug at 2am than a chain of abstractions.

— ch. 03 / pricing & timeline

Pricing & timeline

discovery + eval set · 2 weeks · KES 80,000 · golden test set + cost model
rag assistant v1 · 4–6 weeks · KES 250,000+ · ingestion + retrieval + UI
classification pipeline · 3–5 weeks · KES 200,000+ · per workflow
model ops retainer · ongoing · from KES 50,000/month · evals + prompt tuning

Model inference (Claude, OpenAI) billed at cost — we pass through the actual provider invoice. Typical RAG workflows run at $0.001–0.01 per query.

— ch. 04 / recent example

Recent example

Support triage routera hybrid LLM router that reads inbound customer-support emails, classifies intent, drafts a response, and routes 78% of the queue to auto-send. The remaining 22% goes to a human with a pre-drafted reply. Replaced a 3-person triage team.

$40/month inference cost replacing ~KES 180,000/month in labour
94% intent-classification accuracy on held-out test set
Median response time dropped from 6 hours to 4 minutes

Have a workflow that wants AI?

Send us a description of the manual task and we'll come back with an honest read on whether AI helps, and what the eval would look like.