Sintropia · Technical Reference
The full system design behind every Knowledge AI product we build — how the pieces fit together, why each decision was made, and what it costs to run in production.
— System diagram
This is the complete request flow — from the moment a user types a question to the moment a grounded answer arrives. Each layer has a specific job. Removing any one of them degrades the quality of the output in a measurable way.
— Design decisions
Every component was chosen deliberately — the result of optimizing for low infrastructure cost, high retrieval precision, and production reliability at the scale a small or medium business actually needs.
Pinecone, Weaviate, and Qdrant cost $70–$500/month and add an external dependency. SQLite with sqlite-vec runs in-process, fits on a $6 VPS, and starts in milliseconds. For knowledge bases under a few million chunks — which covers every small business use case — the performance is identical. Zero infra overhead.
Pure semantic search fails for exact queries — product codes, invoice numbers, names, dates. Pure keyword search fails for conceptual queries — "what's our refund policy", "how do we handle late deliveries." Hybrid search covers both. RRF fusion produces a ranked list more accurate than either strategy alone.
Structured data (spreadsheets, tables) should go through SQL for exact answers. Unstructured data (PDFs, text) should go through RAG. Ambiguous queries run both and let the LLM decide. Most RAG systems ignore structured data entirely — which means half the answers a business needs are wrong or missing.
The system prompt is not configuration — it is the product. It defines the agent's role, tone, citation behavior, what to say when it doesn't know, and when to escalate. A well-engineered system prompt running a small model outperforms a poorly-engineered one running a large model. It is the most critical decision in the stack.
— What it actually costs
State-of-the-art AI should not require enterprise infrastructure budgets. Adjust the slider to see the real cost of running this architecture for your usage level.
Build cost (one-time) — this is where Sintropia adds value. The infrastructure is cheap. The engineering — document audit, ingestion pipeline, retrieval calibration, system prompt design, deployment, interface — is a 4–8 week engagement priced in MXN, accessible to businesses of any size.
— Proof of concept
Arthur — the conversational agent running on this site — is a live deployment of this exact architecture. Same ingestion pipeline, same hybrid search, same three-branch router, same streaming SSE endpoint. The only thing that changed is the knowledge base (4 markdown files about Sintropia's services and pricing) and the system prompt (Arthur's role and personality). It was built in under two weeks.
This is what the template delivers. Not a demo — a production system answering real questions, qualifying real leads, and sending real emails. The same system, pointed at your documents, becomes your agent.
FastAPI · sqlite-vec + FTS5 · Gemini Flash · gemini-embedding-2 · BM25 + KNN hybrid search · RRF merge · SSE streaming · Railway volume for SQLite · Resend for emails
4 markdown files: services, pricing, methodology, FAQ — all in Spanish for the Mexican market. System prompt defines Arthur's warm tone, handoff signal, and the rule that HANDOFF:true requires real project details before firing.
From empty repository to live production agent with bilingual support, streaming responses, email capture, semantic cache, rate limiting, and security hardening.
Gemini Flash generation + embedding API + Railway VPS + Resend email. The infrastructure bill for a production AI agent serving a real business is less than a Netflix subscription.
Same template. Your knowledge base. Your agent. In weeks, not quarters — at a price that fits a real business budget.
hello@sintropia.io