What I have built
Open architecture. Take what helps.
I do not run one big agent. I run a sequence of AI nodes that pass context to each other, each tuned with its own prompt and guardrails, so the system reaches the best answer instead of the fastest one. The path every request takes is below. The pieces that power it are further down.
Prompt. A request comes in by dashboard, Telegram, or voice note.
The numbers
Components
Click any component to expand.
How it all connects
I send a request to the dashboard. The Master Agent loads the right business profile. It pulls related memory from Pinecone. It reads guardrails from past failures. It writes a plan. If the plan has gaps, the agent stops and asks. If it is clean, the Executor runs each step.
Each step calls tools. Each tool result feeds the next step. After the agent edits a file, it runs the type checker and tests on its own. If something breaks, it fixes it and tries again.
Long jobs go to the Railway worker. Short jobs run in the dashboard. Decisions and corrections get logged for the next run. Memory consolidates each night. The watcher sees all of it and fixes what it can.
How I verify it works
A working agent and a known-working agent are different things. I run an evaluation harness over the agent's output. Each run produces a score against a fixed rubric. Did it answer the question. Did it cite sources when needed. Did it ask for missing information instead of guessing. Did it stay within budget. The harness runs on every change. If the score drops, the change does not ship.
Accuracy
The agent's answers get checked against ground-truth examples. New examples get added each week from real failures.
Honesty
The honesty protocol asks the agent to flag what it does not know. The harness checks whether the flagged uncertainty matches actual uncertainty.
Cost
Every run logs token spend. A run that produces a good answer at 5x normal cost is a regression, not a success.
I do not run my agent against academic benchmarks. The benchmarks measure what benchmarks measure. My agent does work that earns its keep. Real failures are the better signal.
What is coming next
More builds will land here. The architecture above generalises to other domains. New page entries appear when they ship, not before. Stub pages and roadmaps are noise.
Use this page with your AI
Copy the prompt below. Paste it into Claude, Copilot, Gemini, or any AI you use. The AI will ask you simple questions, then teach the page back to you using your own work as the example.
You are an expert teacher and AI strategist. Read the page below as your reference material. PAGE: What I have built CONTENT: # What I have built Open architecture. Take what helps. I do not run one big agent. I run a sequence of AI nodes that pass context to each other, each tuned with its own prompt and guardrails, so the system reaches the best answer instead of the fastest one. The path every request takes is below. The pieces that power it are further down. ## The path every request takes 1. Prompt. A request comes in by dashboard, Telegram, or voice note. 2. Pull context. It pulls related past work from memory across every project namespace. 3. Filter to the prompt. It strips that context down to only what this prompt needs. Noise in is wrong answers out. 4. Planner. It writes a plan before it acts and flags gaps instead of guessing. 5. Tools. It executes with web AIs, APIs, or Claude Code and Codex for real file edits. 6. Update Lessons.md. It writes down what it learned so the same mistake does not happen twice. 7. Respond. It returns the answer with a note on what it was unsure about. ## The numbers ### 7 nodes every prompt runs the full pipeline before it answers A chat tool answers in one shot. My system runs every request through a fixed sequence, and each node hands clean context to the next. - Prompt. A request comes in by dashboard, Telegram, or voice note. - Pull context. It pulls related past work from memory across every project namespace. - Filter to the prompt. It strips that context down to only what this prompt needs. Noise in is wrong answers out. - Planner. It writes a plan before it acts and flags gaps instead of guessing. - Tools. It executes with web AIs, APIs, or Claude Code and Codex for real file edits. - Update Lessons.md. It writes down what it learned so the same mistake does not happen twice. - Respond. It returns the answer with a note on what it was unsure about. ### 7 memory namespaces pulled wide, then filtered to just this prompt Memory is split into namespaces, one per project plus conversation logs. A new prompt searches all of them at once. Then a filter step strips the result down to only what this prompt needs. Sending the model everything is how you get confident wrong answers. Sending it the right slice is how you get a useful one. ### Every run ends by writing what it learned to Lessons.md Most tools repeat the same error across weeks because nothing carries forward. After each run, a small model reads what went wrong and writes one short rule to Lessons.md. Those rules feed back in as guardrails on the next run. The system gets harder to fool over time, not easier. ## Components ### Planner The Planner reads my request and writes a plan before any work starts. The plan lists each step. It flags steps that could break things. It surfaces gaps in what I asked. If gaps exist, the agent stops and asks me to fill them. Most agent failures come from rushing into half-understood work. ### Executor The Executor runs the plan one step at a time. It calls tools to read files, edit code, run tests, send messages, or build documents. It reads each tool result before the next step starts. Acting blind on tool output is how agents loop on failure. ### Skills A skill is a short markdown file with focused instructions for one kind of work, like cold outreach or a design spec. The agent loads the right skill for the job. The main agent stays clean while specialists handle craft work. ### RAG memory layer Pinecone stores chunks of past work as math vectors. When I ask a new question, the agent finds the most related past chunks and feeds them in. The agent draws on months of past projects without holding all of them in mind at once. ### Process manager (PM2) PM2 keeps the agent running on my computer. It starts the dashboard when I boot up. It restarts services if they crash. It runs background jobs like nightly memory cleanup. An agent that needs me to babysit its uptime is not autonomous. ### Sub-agents per business Each business I run has its own sub-agent profile. The profile holds context about that business, its voice, its customers, and its rules. The Master Agent loads the right profile when the work calls for it. Mixing business context creates wrong answers and off-brand work. ### Memory layer Three layers store what the agent knows. Episodic memory holds each chat as it happens. Causal memory links chats that caused or led to each other. Consolidated memory takes a day of related chats and rolls them into one summary. Old memory does not get deleted, only flagged when newer memory replaces it. Flat chat history degrades the agent over weeks of use. ### Monitoring and autoheal A watcher process reads logs from my agent and from cloud services like Vercel and Railway. When it spots an error, it tries to fix it on its own. For high-risk fixes, it asks a stronger model and waits for me to sign off. My time goes to new work, not patching old bugs. ### Model routing Different jobs go to different models. Claude Sonnet plans and reasons. Claude Haiku runs cheap fast tasks like classification or one-line edits. Gemini Flash handles bulk research. Gemini Pro handles big synthesis steps. Claude Opus runs only on the riskiest fixes. Using the most expensive model for every job burns money and slows everything down. ### Synthesis daemon A daemon is a background program that runs all the time. The Synthesis Daemon takes scattered notes from many runs and turns them into one clean output. It uses Gemini Pro because that model handles long context well. Agents produce raw chunks that need stitching. ### Honesty protocol The agent ends plans, design choices, and claims about new tools with a short footer. The footer names the weakest part of the answer, what the agent does not know, and how confident it is. Confident wrong answers waste more time than honest gaps. ### Failure distillation Every night a job reads yesterday's failures from my logs. A small model writes one short rule per failure, like a note to self. The rules get fed back as guardrails. The agent stops repeating the same mistakes across weeks. ### Decisions and corrections log Every plan and every choice gets logged. Every time I edit a message or override the agent, that gets logged too. Both feed back into the agent's context as patterns of how I work. The agent learns my standards. ### CRM and pipeline layer Contacts, activities, deals, and tasks sit in one graph. The agent reads it. The agent writes to it. When I create a new contact, a research job runs in the background and enriches the record with company, role, and recent signals. Before each call, the agent prepares a brief from my own past notes and from skills like cold outreach. After each call, I record a short voice note. The agent transcribes it on my laptop, logs the activity, and sets the next action. The CRM compounds with use. ### Voice transcription Whisper.cpp runs on my laptop with a small model. Audio from headphones gets captured by the operating system and transcribed without leaving the device. Every call I make turns into a clean activity log and feeds the objection bank. ### Calendar and booking A booking widget on my homepage pulls live availability from Google Calendar. When a prospect picks a slot, the event lands on my calendar and a contact appears in the CRM. The receptionist agent on the homepage can book real slots during a demo call. ### Outreach engine Smartlead runs the multi-touch sequences from four warmed inboxes. Reply detection routes positive replies to me and updates the contact stage. Apify scrapes lead sources for the niches I work in. The agent generates the cold copy from my expert namespaces, including pclub material and curated YouTube research from operators with real results. ### Multi-tenant client portal Each client gets a locked page on my domain. They log in with credentials I create and change the password on first login. They see the work I run for them: pipeline status, deliverables, and progress. I see all clients from a side menu only I can access. ### Railway worker for long jobs Some jobs take hours. I send those to a worker on Railway, a cloud platform with no per-task time limit. The worker runs the same agent stack as the chat. It writes progress to my database and sends a Telegram message when done. True autonomy means I can walk away. ### Telegram control The agent listens on Telegram. I log calls from anywhere. I ask for my pipeline. I approve or reject autoheal fixes. The CRM and the agent both run from my pocket. ## How it all connects I send a request to the dashboard. The Master Agent loads the right business profile. It pulls related memory from Pinecone. It reads guardrails from past failures. It writes a plan. If the plan has gaps, the agent stops and asks. If it is clean, the Executor runs each step. Each step calls tools. Each tool result feeds the next step. After the agent edits a file, it runs the type checker and tests on its own. If something breaks, it fixes it and tries again. Long jobs go to the Railway worker. Short jobs run in the dashboard. Decisions and corrections get logged for the next run. Memory consolidates each night. The watcher sees all of it and fixes what it can. ## How I verify it works A working agent and a known-working agent are different things. I run an evaluation harness over the agent's output. Each run produces a score against a fixed rubric. Did it answer the question. Did it cite sources when needed. Did it ask for missing information instead of guessing. Did it stay within budget. The harness runs on every change. If the score drops, the change does not ship. ### Accuracy The agent's answers get checked against ground-truth examples. New examples get added each week from real failures. ### Honesty The honesty protocol asks the agent to flag what it does not know. The harness checks whether the flagged uncertainty matches actual uncertainty. ### Cost Every run logs token spend. A run that produces a good answer at 5x normal cost is a regression, not a success. I do not run my agent against academic benchmarks. The benchmarks measure what benchmarks measure. My agent does work that earns its keep. Real failures are the better signal. ## What is coming next More builds will land here. The architecture above generalises to other domains. New page entries appear when they ship, not before. Stub pages and roadmaps are noise. Your job: 1. Ask me 3 to 5 simple questions about my work, my situation, and what I would actually use AI for. One question at a time. Wait for my answer between each. 2. Once you have my answers, explain the key ideas from the page back to me, using my answers as the example. 3. Suggest one concrete next step I could take this week. Tie it back to the page. 4. Push back if my answer is vague. Ask me to be specific. Rules for you: - Do not flatter me. - Do not agree with me when I am wrong. - If you do not know something, say so. - Be brief. Two paragraphs at most per turn. - Ask one question at a time. Do not stack questions.