Reporting Agent V3 — Architecture Spec¶
One-line: Replace manual weekly traction reports with an autonomous agent that reads HubSpot + Cockpit + LinkedIn API + Google Ads, narrates the week, and posts to Notion + Slack every Friday 17:00 CET.
Why now: W1/W2/W3 reports were filed Apr 26 (backfilled). Real-time process miss = 0/3. Building V3 closes this for good and removes Cleiton from the bottleneck for every weekly review.
Decision context¶
This spec exists to convert Decision #8 in the Julien Re-onboarding Brief into concrete build state.
Apr 28 update — Decision #8a confirmed (Julien catchup): - Phase 1 build = Cleiton self-funded (personal Anthropic account). No Soilytix budget approval needed. - Phase 2 expansion (8-agent department) = post 4-week quality gate, decision real Jun. - Pending Julien Friday: confirm 17:00 CET cadence + Slack DM format.
| Question | Answer |
|---|---|
| Build time | ~5 working days (1 week) — Mon May 4 → Fri May 8 |
| Run cost | €2-5/mo (router-on, no separate VM, Langfuse free tier) |
| Maintenance | ~2h/week monitoring + tuning |
| Reversibility | High — kill switch is pause workflow in GHA |
| Replaces | Manual report writing (~2h/week Cleiton) + 0/3 real-time miss rate |
Scope (Phase 1 — Reporting Agent ONLY)¶
This is slice 1 of the broader Revenue AI Department V3 vision (8 agents). Scope here = single-agent end-to-end, not the full department. Rationale: validate stack with one agent, instrument cost/quality, then decide Phase 2 expansion.
In scope: - Weekly traction report (Mon-Sun, posts Fri 17:00 CET) - Daily cockpit health-check (anomaly detection, Slack DM if alert) - Monthly summary (auto-aggregates 4 weeklies + adds narrative)
Out of scope (Phase 2+):
- Reply triage agent
- BD outreach agent (already V0 = bd_pipeline.py)
- PR pipeline agent (already V0 = pr_pipeline.py)
- Content creator
- Reviewer
- Pipeline health (separate agent)
- Manager/orchestrator (only needed when 3+ agents)
Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ GitHub Actions cron (Fri 17:00 CET / Mon-Fri 08:00 CET) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Reporting Agent (Claude Agent SDK v0.2.111+) │
│ Orchestrator: Opus 4.7 (prompt caching ON) │
└────┬───────────┬───────────┬───────────┬───────────┬────────────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐
│HubSpot │ │ Cockpit │ │LinkedIn │ │ Google │ │ PostHog │
│ MCP │ │ Sheets │ │ Ads │ │ Ads │ │ events │
│ │ │ API │ │ API │ │ Reports │ │ │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └──────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Worker: Sonnet 4.6 — narrative generation (caches week schema) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Outputs │
│ • Notion page in Commercial Board (status Done auto) │
│ • Slack #soilytix-agent message (TL;DR + link) │
│ • soilytix/reports/ markdown file (git-versioned) │
│ • Cockpit Sheet auto-update (Weekly tab Pipeline + CPL) │
└─────────────────────────────────────────────────────────────────┘
Single agent, no manager — at this scope an orchestrator adds latency without value. Manager Agent only kicks in when 3+ workers run concurrently (Phase 2+).
Stack — multi-provider router from Day 1¶
| Component | Choice | Role | Why | Cost (caching ON) |
|---|---|---|---|---|
| Orchestrator | Claude Agent SDK v0.2.111+ | Subagent dispatch, MCP, hooks | Mandatory for Opus 4.7 + native Claude alias support | — |
| Reasoning + planning | Opus 4.7 (Anthropic EU) | Decisions in agent loop (only when needed) | High-stakes synthesis. Used sparingly — most weeks Sonnet alone is enough. | €5 / €25 per MTok |
| Narrative model | Sonnet 4.6 (Anthropic EU) | Final write-up of report | 80% cheaper than Opus, indistinguishable for narrative tasks | €3 / €15 per MTok |
| Long-context reads | Gemini 2.5 Flash (Google EU) — via LiteLLM | Pulling HubSpot deals + Cockpit Sheet + Ads APIs into context | 10× cheaper than Sonnet for read-heavy tasks. 1M ctx window absorbs all reads in one shot. Batch discount 50%. | €0.30 / €2.50 per MTok |
| Anomaly classifier | Haiku 4.5 (Anthropic EU) | Daily z-score check on rolling metrics | ~5k tokens/day. Native Claude. | €1 / €5 per MTok |
| Bulk parsing (Phase 2) | Llama 3.3 70B (Groq, US) — via LiteLLM | When parsing 100+ deals at once | Cheapest fast tokens. Activates only at scale. | €0.59 / €0.79 per MTok |
| Multi-provider proxy | LiteLLM v1.83.x (pinned, patched) | Routes Gemini/Llama/Codestral as Claude-aliases for SDK | Claude Agent SDK only speaks Claude aliases natively → LiteLLM is required for non-Claude workers. Pin exact patched version (NOT v1.82.7/8 — supply chain incident). | Python lib (no separate VM) — pip install litellm |
| Runtime | GitHub Actions cron | Trigger weekly/daily/monthly | Already used for BD/PR pipelines. Zero new infra. | Free tier sufficient |
| State | Cockpit Sheet + Notion DB | Persistence | No new DB needed. | Free |
| Observability | Langfuse Cloud EU (free tier) | LLM traces + cost per agent + last-run + kill-button | EU residency (GDPR). 50k observations/mo free — covers Phase 1 + 2 easily. Wire Day 1, not retrofit. | Free tier |
EU compliance: Anthropic EU residency (1.1x multiplier accepted), Gemini EU region, Mistral FR-domiciled, Langfuse Cloud EU. No DeepSeek hosted (CSRD/GDPR red flag — see
Why router from Day 1 (not deferred to Phase 2): the read step is by far the heaviest in tokens (~30k context per weekly run). Routing reads through Gemini Flash drops Phase 1 cost from €10-20/mo to €2-5/mo, and proves the multi-provider stack works on a low-stakes pipeline before scaling to Phase 2's 8-agent department. The router IS the Phase 1 architectural validation.
Why LiteLLM as Python lib (not VM): Phase 1 has 1 agent in 1 GHA workflow. A separate VM adds €4.5/mo + DevOps overhead with zero benefit at this scale. pip install litellm inside the GHA job is enough. Migrate to Hetzner VM in Phase 3+ when Cleiton evolves B-level DevOps comfort and 5+ agents share routes (then VM amortizes).
Data sources (read paths)¶
| Source | What it gives | Auth | Read frequency |
|---|---|---|---|
| HubSpot MCP | Deals (stage, amount, probability), Contacts (last touch), Meetings (booked/held) | OAuth (existing) | Mon morning + Fri pre-report |
| Cockpit Sheet API | Daily ad spend + leads + CPL per channel | Service account (sheets-write-v2) | Daily 08:00 + Fri pre-report |
| LinkedIn Ads API | Campaign-level perf (CPC, CPL, leads) for nuance beyond Cockpit | OAuth (LinkedIn Ads MCP token) | Fri pre-report |
| Google Ads API | Campaign perf + search terms | OAuth (existing reauth flow) | Fri pre-report |
| PostHog API | Funnel events from soilytix.com (post-Klaro consent) | Project API key | Fri pre-report |
| GA4 API | Organic + referral traffic | OAuth (existing) | Fri pre-report |
Read-only. No writes to any source except own Notion page + Slack + git repo.
Outputs¶
Friday weekly report¶
- Notion page in Commercial Board → Reports section, naming
[Weekly Traction] WX YYYY-MM-DD(auto-published) - Slack DM to Cleiton + #soilytix-agent channel:
- Markdown file in
soilytix/reports/2026-W17-traction-report.md(git committed by GHA) - Cockpit Sheet update — Weekly tab autopopulated with new row
Daily cockpit health-check¶
- Mon-Fri 08:00 CET
- Compares yesterday's metrics vs 7-day rolling avg + benchmarks
- Silent unless anomaly detected (CPL >2σ from mean, leads = 0 for 2+ days, spend overrun)
- On anomaly: Slack DM to Cleiton with diagnosis hypothesis (audience fatigue / tracking break / weekend effect)
Monthly summary¶
- 1st of month, runs at 09:00
- Aggregates 4 weeklies + adds month-over-month narrative
- Updates Cockpit Monthly tab
- Posts to Notion → Reports
Cost model (realistic, router-on)¶
Per weekly run, broken down by step¶
| Step | Tokens | Model | Provider | Cost |
|---|---|---|---|---|
| Read context (HubSpot + Sheets + Ads + PostHog + GA4) | 30k in | Gemini 2.5 Flash (via LiteLLM) | Google EU | €0.01 |
| Number-tracing validator (every claim → source field) | 2k in, 0.5k out | Haiku 4.5 | Anthropic EU | €0.005 |
| Narrative generation (final report write-up) | 5k cached + 3k out | Sonnet 4.6 (caching ON) | Anthropic EU | €0.05 |
| Orchestration loop (only when complex decisions) | ~1k | Opus 4.7 (rare) | Anthropic EU | €0.005 |
| Per weekly run total | ~€0.07 |
Per daily health-check¶
| Step | Tokens | Model | Cost |
|---|---|---|---|
| Read yesterday's metrics (Cockpit Daily tab) | 3k in | Gemini 2.5 Flash | €0.001 |
| Anomaly z-score classifier | 5k in, 0.5k out | Haiku 4.5 | €0.007 |
| Per daily total | ~€0.008 |
Monthly total (router-on, no VM)¶
| Cadence | Runs/month | Cost/run | Subtotal |
|---|---|---|---|
| Weekly report | 4 | €0.07 | €0.28 |
| Daily anomaly check (Mon-Fri) | 22 | €0.008 | €0.18 |
| Monthly summary | 1 | €0.10 | €0.10 |
| Anomaly investigations triggered | ~5 | €0.05 | €0.25 |
| Subtotal LLM API spend | €0.81 | ||
| LiteLLM (Python lib in GHA job) | — | — | €0 |
| GHA compute | — | — | €0 (free tier) |
| Langfuse Cloud EU (free tier 50k obs) | — | — | €0 |
| Total Phase 1 (router-on, no VM) | €2–5/mo |
Buffer covers Anthropic API cost spikes + occasional ad-hoc analyses + free tier headroom margin.
Spec history: - v1 Apr 26: €10-20/mo Anthropic-only assumption - v1.1 Apr 26 evening: €5-7/mo router-on with Hetzner VM - v2 Apr 28: €2-5/mo router-on, LiteLLM as Python lib, Langfuse free tier (current)
Without caching: ~€10/mo (still trivial). With caching off + Anthropic-only fallback (worst case): €25-40/mo.
Phase 2 implications (8-agent department, post 4-week gate)¶
Earlier brief estimate: €250-600/mo (Anthropic-only assumption). With router (60-70% of read+classify load on Gemini/Groq/Codestral, narrative on Anthropic): €80-200/mo for the full 8-agent dept fully online.
The router IS the cost story. Without it, Phase 2 is hard to greenlight at €600/mo. With it, the same capability is European-residency, multi-provider, and ~3× cheaper.
First 3 actions — Mon May 4 morning (35 min total)¶
Pre-build setup. Do these BEFORE Day 1 begins. Sequenced for minimum context-switching.
1. Setup Langfuse Cloud account EU region (15 min)¶
- Go to https://cloud.langfuse.com
- Choose EU region during signup (GDPR — non-negotiable)
- Create project "soilytix-agents"
- Get API keys:
LANGFUSE_PUBLIC_KEY+LANGFUSE_SECRET_KEY+LANGFUSE_HOST(https://cloud.langfuse.com) - Add as GHA secrets in
Soilytix/soilytix-revenue-automation: - Verify: dashboard
https://cloud.langfuse.comopens with empty project — ready to receive traces
2. Setup Google AI Studio Gemini API key (10 min)¶
- Go to https://aistudio.google.com
- Sign in
cleitonsenaa@gmail.com(existing) - Create API key (free tier — 60 req/min, plenty for Phase 1)
- Test:
- Add as GHA secret:
3. Narrative cadence — committed Apr 29 (10 min mental commitment ✓)¶
Personal-only channels (NOT Soilytix corporate accounts). Build-in-public Karpathy/Tan pattern. First Friday post = Fri May 8 (after first auto-report ships).
Stack pessoal (5 primary + 3 bonus)¶
| # | Channel | Cadence | Role |
|---|---|---|---|
| 1 | LinkedIn pessoal (post) | Fri 18:00 CET weekly | Hub principal (Soilytix buyers + AI builders EU) |
| 2 | LinkedIn Articles (long-form) | Monthly (1st of month) | Authority + search-indexed |
| 3 | Substack newsletter | Bi-weekly (Sun) | Email list ownership + SEO durable |
| 4 | X.com (@cleitonsena) | Daily build-log threads + Fri cross | AI builders global community |
| 5 | Wiki público GitHub Pages | Continuous push | Karpathy 2nd brain — long-tail SEO |
| 6 (bonus) | Hacker News (Show HN) | One-shot per milestone | Phase 1 done (May 30) + Phase 3 framework OSS |
| 7 (bonus) | Bluesky | Cross-post X automated | Zero extra effort, tech audience migrating |
| 8 (bonus) | Dev.to | Cross-post Substack (canonical → Substack) | EU dev audience |
Wk 1-4 May topic backlog¶
| Wk | Date | LinkedIn topic |
|---|---|---|
| 1 | Fri May 8 | "Built a 5-line agent that replaces my 2h/week manual report. Stack: GHA + Claude SDK + LiteLLM + Gemini Flash + Langfuse. Cost: €X." |
| 2 | Fri May 15 | "First production alert — what the anomaly detector caught that I would have missed." |
| 3 | Fri May 22 | "Cost month-1: €X. Where every euro went (router-on Gemini reads = 80% saving)." |
| 4 | Fri May 29 | "Deciding Phase 2 — which agent next (Reply Triage vs Lead Enrichment) and why." |
Daily/weekly choreography¶
| Day | Channel | Content |
|---|---|---|
| Mon-Thu | X.com | 1-2 build-log tweets/day, learning-of-the-day |
| Fri 18:00 | LinkedIn post + X cross-post | Weekly build-in-public anchor |
| Sun | Substack (bi-weekly Wk 2 + Wk 4) | Long-form synthesis |
| 1st of month | LinkedIn Article + Wiki público update | Case study expansion |
Channels skipped + why¶
| Channel | Reason |
|---|---|
| TikTok | B2B agritech-AI = wrong audience |
| Threads (Meta) | Cross-post X engagement <5% |
| Mastodon | Tech audience migrated to Bluesky |
| YouTube | Time-intensive — defer Phase 3+ when framework OSS ready |
| Engagement-heavy + shadow-ban risk for self-promo | |
| Medium | Substack > Medium for ownership/SEO |
| Product Hunt | Defer Phase 3+ (framework OSS = launch real) |
Soilytix corporate (separate track, NOT this stack)¶
Reserved for the Soilytix company profile (Bruno/Julien own those channels): - DLG events / Agritech meetups (in-person) - Future Farming Magazine (EU trade) - AgFunder Network newsletter
Total pre-day setup time: 35 min real work. Account setup paralelo (Substack + X + Bluesky + Dev.to + Wiki público GitHub Pages) ~2h spread across May Wk 1, NÃO bloqueador do build.
Build plan (5 working days)¶
Day 1 — Foundation¶
- Pre-day setup (35 min Mon AM): Langfuse Cloud EU signup + Google AI Studio Gemini key + GHA secrets wiring. See "First 3 actions" section.
- Create repo dir
Soilytix/soilytix-revenue-automation/agents/reporting/ -
pip install litellm==1.83.x(pinned patched, post-supply-chain) inside the GHA job — no separate VM - Install Claude Agent SDK v0.2.111+ (
pip install claude-agent-sdk) - Wire all 6 MCPs (HubSpot, Sheets, LinkedIn Ads, Google Ads, PostHog, GA4) — already exist, just connect
- Langfuse traces wired Day 1, not retrofit — every LLM call instrumented from the first token
Day 2 — Read paths¶
- HubSpot deal-stage snapshot function (input: week range; output: structured dict)
- Cockpit Sheet read function (input: Weekly tab range; output: dict)
- LinkedIn Ads + Google Ads campaign perf functions
- PostHog + GA4 funnel snapshot
- Smoke test: dump all 6 sources to JSON, verify shapes
Day 3 — Narrative generation¶
- Prompt design: weekly report template (TL;DR / By the numbers / Wins / Risks / Next week focus)
- Sonnet 4.6 narrative call with cached system prompt + cached schema
- Quality gate: generate 3 reports against W14/W15/W16 data, compare vs human-filed Cleiton reports — should pass blind test
Day 4 — Outputs¶
- Notion page creation in Commercial Board (using existing MCP or notion-cli once MKT-OPS-03 lands)
- Slack message via webhook (reuse #soilytix-agent webhook from PR pipeline)
- Markdown file commit (auto-PR or direct push to main with skip-ci)
- Cockpit Sheet auto-write (sheets-write-v2 creds)
Day 5 — Cron + observability¶
- GHA workflow
.github/workflows/reporting-weekly.yml(cron0 16 * * 5) - GHA workflow
.github/workflows/reporting-daily.yml(cron0 7 * * 1-5) - Anomaly detection function (z-score on rolling 7-day window)
- Langfuse dashboard: cost per report, latency per LLM call, error rate, kill-button per agent
- Run end-to-end Fri May 8 17:00 CET — first auto-report ships
Buffer / Polish (week 2 if needed)¶
- Tune narrative prompts based on Julien feedback
- Add Linear-style "Next week focus" section auto-derived from open Notion tickets
- Add weekly emoji header (🚀 / 🟡 / 🔴) based on overall traction score
Quality gates (before Julien greenlight Phase 2)¶
Run for 4 consecutive weeks (May Wk 2 → Wk 5 = May 8 → May 31). Phase 2 only if: - [ ] Cost actual ≤ €10/mo (vs €2-5 estimate, with buffer) - [ ] Quality — Julien rates 4 of 4 reports ≥ 7/10 vs Cleiton baseline - [ ] Reliability — 0 missed weekly runs (auto-recovery on transient failures) - [ ] Time saved — Cleiton spent <30 min/week reviewing/editing (vs 2h writing) - [ ] No data leak — Langfuse traces show 0 unauthorized writes outside scoped sources
If 1+ gate fails: iterate Phase 1, do not expand to Phase 2.
Risks + mitigations¶
| Risk | Severity | Mitigation |
|---|---|---|
| Opus 4.7 new tokenizer surprise costs | Medium | Cap weekly spend at €5 via Anthropic API budget alert. Fall back to Sonnet 4.6 orchestrator if exceeded. |
| LiteLLM future supply chain incident | Medium | Pin exact version (v1.83.x specific patch). Renovate bot for security-only updates. |
| HubSpot/LinkedIn API rate limits | Low | Cache reads with 1h TTL; nightly batch instead of real-time. |
| Narrative drifts (Julien hates the voice) | Medium | Quality gate week 1 — if drift, re-prompt with Cleiton's W1-W3 reports as few-shot examples. |
| Hallucinated numbers | High | Hard gate: every number in narrative must trace to a source field. Validator function asserts before publish. Fail closed (skip publish, alert). |
| Anomaly false positives (Slack noise) | Low | Tune z-score threshold week 1-2. Allow user mute command in Slack. |
Phase 2 trigger criteria¶
Build Reply Triage Agent (next slice) only when: 1. Phase 1 ran 4 consecutive weeks without manual intervention 2. Real cost ≤ €10/mo confirmed 3. Julien explicitly greenlights with "yes, go to Phase 2" 4. Langfuse dashboard shows clean trace flow (no investigation backlog)
Phase 2 expected scope: BD reply triage (Haiku 4.5 native classify into 5 buckets) + Gemini 2.5 Flash for prospect research enrichment.
References¶
- Anthropic — Building Effective Agents PDF
- Claude Agent SDK overview
- LiteLLM providers
- [memory]
(V3 vision Apr 26) - [memory]
- [ticket] MKT-OPS-04 (build tracker — to be created)
- [brief] Julien Re-onboarding Brief Decision #8
Filed Apr 26 2026 by Cleiton Sena. Status: SPEC — awaiting Julien approval Tue 28 Apr in 1:1.