Skip to content

Reporting Agent V3 — Architecture Spec

One-line: Replace manual weekly traction reports with an autonomous agent that reads HubSpot + Cockpit + LinkedIn API + Google Ads, narrates the week, and posts to Notion + Slack every Friday 17:00 CET.

Why now: W1/W2/W3 reports were filed Apr 26 (backfilled). Real-time process miss = 0/3. Building V3 closes this for good and removes Cleiton from the bottleneck for every weekly review.


Decision context

This spec exists to convert Decision #8 in the Julien Re-onboarding Brief into concrete build state.

Apr 28 update — Decision #8a confirmed (Julien catchup): - Phase 1 build = Cleiton self-funded (personal Anthropic account). No Soilytix budget approval needed. - Phase 2 expansion (8-agent department) = post 4-week quality gate, decision real Jun. - Pending Julien Friday: confirm 17:00 CET cadence + Slack DM format.

Question Answer
Build time ~5 working days (1 week) — Mon May 4 → Fri May 8
Run cost €2-5/mo (router-on, no separate VM, Langfuse free tier)
Maintenance ~2h/week monitoring + tuning
Reversibility High — kill switch is pause workflow in GHA
Replaces Manual report writing (~2h/week Cleiton) + 0/3 real-time miss rate

Scope (Phase 1 — Reporting Agent ONLY)

This is slice 1 of the broader Revenue AI Department V3 vision (8 agents). Scope here = single-agent end-to-end, not the full department. Rationale: validate stack with one agent, instrument cost/quality, then decide Phase 2 expansion.

In scope: - Weekly traction report (Mon-Sun, posts Fri 17:00 CET) - Daily cockpit health-check (anomaly detection, Slack DM if alert) - Monthly summary (auto-aggregates 4 weeklies + adds narrative)

Out of scope (Phase 2+): - Reply triage agent - BD outreach agent (already V0 = bd_pipeline.py) - PR pipeline agent (already V0 = pr_pipeline.py) - Content creator - Reviewer - Pipeline health (separate agent) - Manager/orchestrator (only needed when 3+ agents)


Architecture

┌─────────────────────────────────────────────────────────────────┐
│  GitHub Actions cron (Fri 17:00 CET / Mon-Fri 08:00 CET)        │
└────────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│  Reporting Agent (Claude Agent SDK v0.2.111+)                   │
│  Orchestrator: Opus 4.7 (prompt caching ON)                     │
└────┬───────────┬───────────┬───────────┬───────────┬────────────┘
     │           │           │           │           │
     ▼           ▼           ▼           ▼           ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐
│HubSpot  │ │ Cockpit │ │LinkedIn │ │ Google  │ │ PostHog  │
│ MCP     │ │ Sheets  │ │  Ads    │ │  Ads    │ │  events  │
│         │ │  API    │ │  API    │ │ Reports │ │          │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └──────────┘
┌─────────────────────────────────────────────────────────────────┐
│  Worker: Sonnet 4.6 — narrative generation (caches week schema) │
└────────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│  Outputs                                                         │
│  • Notion page in Commercial Board (status Done auto)            │
│  • Slack #soilytix-agent message (TL;DR + link)                  │
│  • soilytix/reports/ markdown file (git-versioned)               │
│  • Cockpit Sheet auto-update (Weekly tab Pipeline + CPL)         │
└─────────────────────────────────────────────────────────────────┘

Single agent, no manager — at this scope an orchestrator adds latency without value. Manager Agent only kicks in when 3+ workers run concurrently (Phase 2+).


Stack — multi-provider router from Day 1

Component Choice Role Why Cost (caching ON)
Orchestrator Claude Agent SDK v0.2.111+ Subagent dispatch, MCP, hooks Mandatory for Opus 4.7 + native Claude alias support
Reasoning + planning Opus 4.7 (Anthropic EU) Decisions in agent loop (only when needed) High-stakes synthesis. Used sparingly — most weeks Sonnet alone is enough. €5 / €25 per MTok
Narrative model Sonnet 4.6 (Anthropic EU) Final write-up of report 80% cheaper than Opus, indistinguishable for narrative tasks €3 / €15 per MTok
Long-context reads Gemini 2.5 Flash (Google EU) — via LiteLLM Pulling HubSpot deals + Cockpit Sheet + Ads APIs into context 10× cheaper than Sonnet for read-heavy tasks. 1M ctx window absorbs all reads in one shot. Batch discount 50%. €0.30 / €2.50 per MTok
Anomaly classifier Haiku 4.5 (Anthropic EU) Daily z-score check on rolling metrics ~5k tokens/day. Native Claude. €1 / €5 per MTok
Bulk parsing (Phase 2) Llama 3.3 70B (Groq, US) — via LiteLLM When parsing 100+ deals at once Cheapest fast tokens. Activates only at scale. €0.59 / €0.79 per MTok
Multi-provider proxy LiteLLM v1.83.x (pinned, patched) Routes Gemini/Llama/Codestral as Claude-aliases for SDK Claude Agent SDK only speaks Claude aliases natively → LiteLLM is required for non-Claude workers. Pin exact patched version (NOT v1.82.7/8 — supply chain incident). Python lib (no separate VM)pip install litellm
Runtime GitHub Actions cron Trigger weekly/daily/monthly Already used for BD/PR pipelines. Zero new infra. Free tier sufficient
State Cockpit Sheet + Notion DB Persistence No new DB needed. Free
Observability Langfuse Cloud EU (free tier) LLM traces + cost per agent + last-run + kill-button EU residency (GDPR). 50k observations/mo free — covers Phase 1 + 2 easily. Wire Day 1, not retrofit. Free tier

EU compliance: Anthropic EU residency (1.1x multiplier accepted), Gemini EU region, Mistral FR-domiciled, Langfuse Cloud EU. No DeepSeek hosted (CSRD/GDPR red flag — see ).

Why router from Day 1 (not deferred to Phase 2): the read step is by far the heaviest in tokens (~30k context per weekly run). Routing reads through Gemini Flash drops Phase 1 cost from €10-20/mo to €2-5/mo, and proves the multi-provider stack works on a low-stakes pipeline before scaling to Phase 2's 8-agent department. The router IS the Phase 1 architectural validation.

Why LiteLLM as Python lib (not VM): Phase 1 has 1 agent in 1 GHA workflow. A separate VM adds €4.5/mo + DevOps overhead with zero benefit at this scale. pip install litellm inside the GHA job is enough. Migrate to Hetzner VM in Phase 3+ when Cleiton evolves B-level DevOps comfort and 5+ agents share routes (then VM amortizes).


Data sources (read paths)

Source What it gives Auth Read frequency
HubSpot MCP Deals (stage, amount, probability), Contacts (last touch), Meetings (booked/held) OAuth (existing) Mon morning + Fri pre-report
Cockpit Sheet API Daily ad spend + leads + CPL per channel Service account (sheets-write-v2) Daily 08:00 + Fri pre-report
LinkedIn Ads API Campaign-level perf (CPC, CPL, leads) for nuance beyond Cockpit OAuth (LinkedIn Ads MCP token) Fri pre-report
Google Ads API Campaign perf + search terms OAuth (existing reauth flow) Fri pre-report
PostHog API Funnel events from soilytix.com (post-Klaro consent) Project API key Fri pre-report
GA4 API Organic + referral traffic OAuth (existing) Fri pre-report

Read-only. No writes to any source except own Notion page + Slack + git repo.


Outputs

Friday weekly report

  • Notion page in Commercial Board → Reports section, naming [Weekly Traction] WX YYYY-MM-DD (auto-published)
  • Slack DM to Cleiton + #soilytix-agent channel:
    📊 W17 Traction Report ready
    TL;DR: LinkedIn CPL €165 (+106% bench, see action plan)
           Pipeline +€[REDACTED] weighted (active deals)
           3 demos booked (-2 vs W16)
    Full report: <notion-url>
    
  • Markdown file in soilytix/reports/2026-W17-traction-report.md (git committed by GHA)
  • Cockpit Sheet update — Weekly tab autopopulated with new row

Daily cockpit health-check

  • Mon-Fri 08:00 CET
  • Compares yesterday's metrics vs 7-day rolling avg + benchmarks
  • Silent unless anomaly detected (CPL >2σ from mean, leads = 0 for 2+ days, spend overrun)
  • On anomaly: Slack DM to Cleiton with diagnosis hypothesis (audience fatigue / tracking break / weekend effect)

Monthly summary

  • 1st of month, runs at 09:00
  • Aggregates 4 weeklies + adds month-over-month narrative
  • Updates Cockpit Monthly tab
  • Posts to Notion → Reports

Cost model (realistic, router-on)

Per weekly run, broken down by step

Step Tokens Model Provider Cost
Read context (HubSpot + Sheets + Ads + PostHog + GA4) 30k in Gemini 2.5 Flash (via LiteLLM) Google EU €0.01
Number-tracing validator (every claim → source field) 2k in, 0.5k out Haiku 4.5 Anthropic EU €0.005
Narrative generation (final report write-up) 5k cached + 3k out Sonnet 4.6 (caching ON) Anthropic EU €0.05
Orchestration loop (only when complex decisions) ~1k Opus 4.7 (rare) Anthropic EU €0.005
Per weekly run total ~€0.07

Per daily health-check

Step Tokens Model Cost
Read yesterday's metrics (Cockpit Daily tab) 3k in Gemini 2.5 Flash €0.001
Anomaly z-score classifier 5k in, 0.5k out Haiku 4.5 €0.007
Per daily total ~€0.008

Monthly total (router-on, no VM)

Cadence Runs/month Cost/run Subtotal
Weekly report 4 €0.07 €0.28
Daily anomaly check (Mon-Fri) 22 €0.008 €0.18
Monthly summary 1 €0.10 €0.10
Anomaly investigations triggered ~5 €0.05 €0.25
Subtotal LLM API spend €0.81
LiteLLM (Python lib in GHA job) €0
GHA compute €0 (free tier)
Langfuse Cloud EU (free tier 50k obs) €0
Total Phase 1 (router-on, no VM) €2–5/mo

Buffer covers Anthropic API cost spikes + occasional ad-hoc analyses + free tier headroom margin.

Spec history: - v1 Apr 26: €10-20/mo Anthropic-only assumption - v1.1 Apr 26 evening: €5-7/mo router-on with Hetzner VM - v2 Apr 28: €2-5/mo router-on, LiteLLM as Python lib, Langfuse free tier (current)

Without caching: ~€10/mo (still trivial). With caching off + Anthropic-only fallback (worst case): €25-40/mo.

Phase 2 implications (8-agent department, post 4-week gate)

Earlier brief estimate: €250-600/mo (Anthropic-only assumption). With router (60-70% of read+classify load on Gemini/Groq/Codestral, narrative on Anthropic): €80-200/mo for the full 8-agent dept fully online.

The router IS the cost story. Without it, Phase 2 is hard to greenlight at €600/mo. With it, the same capability is European-residency, multi-provider, and ~3× cheaper.


First 3 actions — Mon May 4 morning (35 min total)

Pre-build setup. Do these BEFORE Day 1 begins. Sequenced for minimum context-switching.

1. Setup Langfuse Cloud account EU region (15 min)

  • Go to https://cloud.langfuse.com
  • Choose EU region during signup (GDPR — non-negotiable)
  • Create project "soilytix-agents"
  • Get API keys: LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY + LANGFUSE_HOST (https://cloud.langfuse.com)
  • Add as GHA secrets in Soilytix/soilytix-revenue-automation:
    gh secret set LANGFUSE_PUBLIC_KEY --body "pk-lf-..."
    gh secret set LANGFUSE_SECRET_KEY --body "sk-lf-..."
    gh secret set LANGFUSE_HOST --body "https://cloud.langfuse.com"
    
  • Verify: dashboard https://cloud.langfuse.com opens with empty project — ready to receive traces

2. Setup Google AI Studio Gemini API key (10 min)

  • Go to https://aistudio.google.com
  • Sign in cleitonsenaa@gmail.com (existing)
  • Create API key (free tier — 60 req/min, plenty for Phase 1)
  • Test:
    curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=AIza..." \
      -H 'Content-Type: application/json' \
      -d '{"contents":[{"parts":[{"text":"hello"}]}]}'
    
  • Add as GHA secret:
    gh secret set GOOGLE_AI_API_KEY --body "AIza..."
    

3. Narrative cadence — committed Apr 29 (10 min mental commitment ✓)

Personal-only channels (NOT Soilytix corporate accounts). Build-in-public Karpathy/Tan pattern. First Friday post = Fri May 8 (after first auto-report ships).

Stack pessoal (5 primary + 3 bonus)

# Channel Cadence Role
1 LinkedIn pessoal (post) Fri 18:00 CET weekly Hub principal (Soilytix buyers + AI builders EU)
2 LinkedIn Articles (long-form) Monthly (1st of month) Authority + search-indexed
3 Substack newsletter Bi-weekly (Sun) Email list ownership + SEO durable
4 X.com (@cleitonsena) Daily build-log threads + Fri cross AI builders global community
5 Wiki público GitHub Pages Continuous push Karpathy 2nd brain — long-tail SEO
6 (bonus) Hacker News (Show HN) One-shot per milestone Phase 1 done (May 30) + Phase 3 framework OSS
7 (bonus) Bluesky Cross-post X automated Zero extra effort, tech audience migrating
8 (bonus) Dev.to Cross-post Substack (canonical → Substack) EU dev audience

Wk 1-4 May topic backlog

Wk Date LinkedIn topic
1 Fri May 8 "Built a 5-line agent that replaces my 2h/week manual report. Stack: GHA + Claude SDK + LiteLLM + Gemini Flash + Langfuse. Cost: €X."
2 Fri May 15 "First production alert — what the anomaly detector caught that I would have missed."
3 Fri May 22 "Cost month-1: €X. Where every euro went (router-on Gemini reads = 80% saving)."
4 Fri May 29 "Deciding Phase 2 — which agent next (Reply Triage vs Lead Enrichment) and why."

Daily/weekly choreography

Day Channel Content
Mon-Thu X.com 1-2 build-log tweets/day, learning-of-the-day
Fri 18:00 LinkedIn post + X cross-post Weekly build-in-public anchor
Sun Substack (bi-weekly Wk 2 + Wk 4) Long-form synthesis
1st of month LinkedIn Article + Wiki público update Case study expansion

Channels skipped + why

Channel Reason
TikTok B2B agritech-AI = wrong audience
Threads (Meta) Cross-post X engagement <5%
Mastodon Tech audience migrated to Bluesky
YouTube Time-intensive — defer Phase 3+ when framework OSS ready
Reddit Engagement-heavy + shadow-ban risk for self-promo
Medium Substack > Medium for ownership/SEO
Product Hunt Defer Phase 3+ (framework OSS = launch real)

Soilytix corporate (separate track, NOT this stack)

Reserved for the Soilytix company profile (Bruno/Julien own those channels): - DLG events / Agritech meetups (in-person) - Future Farming Magazine (EU trade) - AgFunder Network newsletter

Total pre-day setup time: 35 min real work. Account setup paralelo (Substack + X + Bluesky + Dev.to + Wiki público GitHub Pages) ~2h spread across May Wk 1, NÃO bloqueador do build.


Build plan (5 working days)

Day 1 — Foundation

  • Pre-day setup (35 min Mon AM): Langfuse Cloud EU signup + Google AI Studio Gemini key + GHA secrets wiring. See "First 3 actions" section.
  • Create repo dir Soilytix/soilytix-revenue-automation/agents/reporting/
  • pip install litellm==1.83.x (pinned patched, post-supply-chain) inside the GHA job — no separate VM
  • Install Claude Agent SDK v0.2.111+ (pip install claude-agent-sdk)
  • Wire all 6 MCPs (HubSpot, Sheets, LinkedIn Ads, Google Ads, PostHog, GA4) — already exist, just connect
  • Langfuse traces wired Day 1, not retrofit — every LLM call instrumented from the first token

Day 2 — Read paths

  • HubSpot deal-stage snapshot function (input: week range; output: structured dict)
  • Cockpit Sheet read function (input: Weekly tab range; output: dict)
  • LinkedIn Ads + Google Ads campaign perf functions
  • PostHog + GA4 funnel snapshot
  • Smoke test: dump all 6 sources to JSON, verify shapes

Day 3 — Narrative generation

  • Prompt design: weekly report template (TL;DR / By the numbers / Wins / Risks / Next week focus)
  • Sonnet 4.6 narrative call with cached system prompt + cached schema
  • Quality gate: generate 3 reports against W14/W15/W16 data, compare vs human-filed Cleiton reports — should pass blind test

Day 4 — Outputs

  • Notion page creation in Commercial Board (using existing MCP or notion-cli once MKT-OPS-03 lands)
  • Slack message via webhook (reuse #soilytix-agent webhook from PR pipeline)
  • Markdown file commit (auto-PR or direct push to main with skip-ci)
  • Cockpit Sheet auto-write (sheets-write-v2 creds)

Day 5 — Cron + observability

  • GHA workflow .github/workflows/reporting-weekly.yml (cron 0 16 * * 5)
  • GHA workflow .github/workflows/reporting-daily.yml (cron 0 7 * * 1-5)
  • Anomaly detection function (z-score on rolling 7-day window)
  • Langfuse dashboard: cost per report, latency per LLM call, error rate, kill-button per agent
  • Run end-to-end Fri May 8 17:00 CET — first auto-report ships

Buffer / Polish (week 2 if needed)

  • Tune narrative prompts based on Julien feedback
  • Add Linear-style "Next week focus" section auto-derived from open Notion tickets
  • Add weekly emoji header (🚀 / 🟡 / 🔴) based on overall traction score

Quality gates (before Julien greenlight Phase 2)

Run for 4 consecutive weeks (May Wk 2 → Wk 5 = May 8 → May 31). Phase 2 only if: - [ ] Cost actual ≤ €10/mo (vs €2-5 estimate, with buffer) - [ ] Quality — Julien rates 4 of 4 reports ≥ 7/10 vs Cleiton baseline - [ ] Reliability — 0 missed weekly runs (auto-recovery on transient failures) - [ ] Time saved — Cleiton spent <30 min/week reviewing/editing (vs 2h writing) - [ ] No data leak — Langfuse traces show 0 unauthorized writes outside scoped sources

If 1+ gate fails: iterate Phase 1, do not expand to Phase 2.


Risks + mitigations

Risk Severity Mitigation
Opus 4.7 new tokenizer surprise costs Medium Cap weekly spend at €5 via Anthropic API budget alert. Fall back to Sonnet 4.6 orchestrator if exceeded.
LiteLLM future supply chain incident Medium Pin exact version (v1.83.x specific patch). Renovate bot for security-only updates.
HubSpot/LinkedIn API rate limits Low Cache reads with 1h TTL; nightly batch instead of real-time.
Narrative drifts (Julien hates the voice) Medium Quality gate week 1 — if drift, re-prompt with Cleiton's W1-W3 reports as few-shot examples.
Hallucinated numbers High Hard gate: every number in narrative must trace to a source field. Validator function asserts before publish. Fail closed (skip publish, alert).
Anomaly false positives (Slack noise) Low Tune z-score threshold week 1-2. Allow user mute command in Slack.

Phase 2 trigger criteria

Build Reply Triage Agent (next slice) only when: 1. Phase 1 ran 4 consecutive weeks without manual intervention 2. Real cost ≤ €10/mo confirmed 3. Julien explicitly greenlights with "yes, go to Phase 2" 4. Langfuse dashboard shows clean trace flow (no investigation backlog)

Phase 2 expected scope: BD reply triage (Haiku 4.5 native classify into 5 buckets) + Gemini 2.5 Flash for prospect research enrichment.


References


Filed Apr 26 2026 by Cleiton Sena. Status: SPEC — awaiting Julien approval Tue 28 Apr in 1:1.