AI Operations SOP — Elena Revicheva

What a business owner should take away

Coverage: tutoring (WhatsApp + Telegram), growth & education social (X + Instagram), job hunter + shortlists, marketing co-founder (site + CRM pipeline), CTO-grade orchestration, creative co-founder persona, daily voice briefing—wired together intentionally.
Risk control: watchdog scripts, restart policies, and separation of secrets mean failures tend to be local (one bot) not existential (whole fleet).
GTM leverage: meaningful releases can propagate to LinkedIn, blogs, X, and Instagram without manual copy-paste—governed by commit rules so internal churn never spammed customers.

Tracked capabilities · 9 live · 1 roadmap (AILA)

Infrastructure layers · Oracle · AWS · static edge

Automated health cadence

Codebases on the primary compute host

Manual steps in milestone → social pipeline

Section 01

Products & agents

Each row is a shipping capability—what customers or partners touch—with how it runs underneath (Linux services, process supervisor, or serverless). Naming matches the internal resilience matrix without exposing infrastructure coordinates.

CRM pipeline (HubSpot) + prospecting from hiring boards & product launches X growth automation (stream listening + engagement + alerts) Morning briefing audio via AWS Lambda + secure CTO data bridge Executive rhythm: Trello digests to Telegram

📌

Operational invariant: exactly one deployed checkout per GitHub repository—prevents version drift, duplicate secrets, and “which folder is live?” incidents. Pairs like CTO + creative co-founder deliberately share a codebase but run as distinct personalities/interfaces.

#	Agent	Role (business + ops)	Runtime	Status
01	EspaLuz WhatsApp repo wa.me	Channel: Spanish tutoring on WhatsApp—conversation, drills, corrections. Runs as a managed Linux service (`espaluz-whatsapp`) with automated health checks.	systemd	● Live
02	EspaLuz Telegram repo Telegram	Channel: Same tutoring product on Telegram. Two-layer memory: retrieval + pgvector RAG (`espaluz_rag.py`). Service `espaluz-familybot`.	systemd	● Live
03	EspaLuz Influencer repo Telegram	Brand: Instagram publishing on a disciplined schedule; can spotlight real shipping milestones in consumer-friendly copy. Groq captions + Make.com media handoff. Unit `espaluz-influencer`.	systemd	● Live
04	Algom Alpha (@reviceva) repo X	Growth: Always-on X presence (education + narrative); folds major releases into the timeline without sounding like raw developer logs. Stream sampling, engagement runner, and account-activity hooks coordinated with the CTO bot for alerts / follow-back. PM2 workers include `dragontrade-main` and satellite processes.	PM2	● Live
05	VibeJob Hunter repo Telegram	Product: Autonomous job hunt pipeline—evaluation harness, routing, ATS integrations. Shares codebase with the marketing co-founder agent. Worker `vibejobhunter`.	systemd	● Live
06	AI Marketing Co-Founder (CMO AIPA) repo LinkedIn aideazz.xyz	Revenue narrative: LinkedIn cadence, long-form syndication, CRM hygiene—turns engineering momentum into market-facing proof. Claude + connectors for social; Hunter.io enrichment → HubSpot. Paired FastAPI bridge `vibejobhunter-web` exposes an internal health route.	systemd	● Live
07	OpenClaw Vibejob Shortlist repo Telegram	UX: Curated job shortlists delivered inside Telegram. Standalone gateway service `openclaw-gateway`; probed via private health URL on the app host.	systemd	● Live
08	Tech Co-Founder (CTO AIPA) repo Telegram	Control tower: Watches repositories, scores riskier changes, broadcasts milestones to marketing, runs outreach/board workflows. Express orchestrator under PM2 (`cto-aipa`), Oracle Autonomous DB via wallet-based TLS—credentials never live in this HTML.	PM2	● Live
08.1	Sprint Briefing (Sprinter) repo	Founder ritual: Daily audio briefing synthesized from tasks, notes, and captures. AWS Lambda on a schedule; pulls context through the CTO service over HTTPS with shared-secret auth—no database wallet inside Lambda.	Lambda	● Live
09	Creative Co-Founder (Atuona CCF) repo Telegram atuona.xyz	Creative partner: Separate bot persona + public studio site—same reliability envelope as the CTO stack. Single PM2 orchestrator binary; site ships via static edge hosting.	PM2	● Live
10	AILA repo	Roadmap: Long-horizon personal orchestration—documented architecture, not yet a standalone production process. Interim coordination fields live in Oracle until AILA ships.	—	In design

Section 02

Reliability & uptime practice

For founders: scheduled probes ask each product whether it still responds. For engineers: one bash driver on the primary Oracle VM runs roughly every five minutes; keep-alive traffic avoids idle reclamation; systemd caps restart storms.

⚡

No noisy herd restarts: failed probes recycle only the affected unit. Concrete URLs and scripts remain in the private appendix—not pasted here.

Agent

Health check method

Recovery action

CTO AIPA + Atuona

Orchestrator HTTP OK via localhost probe

pm2 restart cto-aipa

EspaLuz WhatsApp

Tutoring webhook answers OK from localhost

systemctl restart espaluz-whatsapp

VibeJob Hunter + CMO

Marketing bridge health endpoint OK internally

systemctl restart vibejobhunter-web vibejobhunter

OpenClaw Shortlist

HTTP GET gateway loopback → 200

systemctl restart openclaw-gateway

Sprint Briefing

CloudWatch + EventBridge schedule

Lambda retries / DLQ policy

PM2 stacks (e.g. Algom)

cron HTTP + pm2 jlist status online

pm2 restart <app>

All systemd agents

Process liveness via systemctl

systemd restart policy

Section 03

Go-to-market automation

When engineering ships something worth talking about, the stack fans it out across LinkedIn, blogs, X, and Instagram—without a human retyping the same story five times.

🔀

Quality gate: only commits tagged feat:, launch:, or release: notify the marketing agent. Housekeeping commits (fix:, docs:, chore:, …) stay invisible to customers.

GitHub Webhook

Commit detected → CTO AIPA

Push events hit the secured webhook. Groq/Claude review diffs, classify milestones, enqueue pending updates for downstream marketers.

LinkedIn · 20:00 Panama

CMO generates + posts

Claude Sonnet copy → Make.com delivery. Zero manual paste.

Hashnode + dev.to · Async

Blog crosspost

blog_publisher.py fires after LinkedIn: Hashnode essay + dev.to canonical backlink to aideazz.xyz.

X · Every 5th post slot

Algom Alpha tweet

x-tech-updater.js merges milestones in plain language (Haiku / Groq), guarded against duplicate queue states.

Instagram · Even days 18:00 Panama

EspaLuz Influencer

Milestone-aware caption + Make.com media pipeline; falls back to standard queue when nothing pending.

Section 04

Release discipline

Board-friendly translation: we ship like a product company—predictable processes, isolated secrets, verifiable rollouts—even though agents move faster than most teams.

Rule · 01

One live checkout per codebase

Eliminates “which folder is prod?” debates; paired bots share code intentionally but never duplicate repos.

Rule · 02

Green build, then swap

Pull latest → compile/tests succeed → only then restart supervised processes. Broken artifacts never replace what customers already rely on.

Rule · 03

Secrets isolation

Each bot owns its environment file; crypto wallets never touch GitHub; TypeScript strict mode catches sloppy typings before prod.

Rule · 04

PM2 persistence

pm2 startup + pm2 save on every new process; ecosystem files set max_restarts + autorestart.

Rule · 05

No silent failures

Crash handlers log before exit so supervisors show why something died; watchdog cadence targets ~5 minute detection.

Rule · 06

Verify after deploy

Health signal green, database connectivity logs clean, one real Telegram interaction—all pass before the incident is closed.

Section 05

Incident response template

For stakeholders: regressions are handled like financial reconciliations—symptoms, compounded causes, fix, proof—so the same automation trap rarely strikes twice.

🔁

HubSpot duplicate posting loop

May 10, 2026 — same milestone tweet emitted twice ~6 minutes apart

Symptom

Pending HubSpot milestones resurfaced every x-tech-updater.js cycle.

Root causes

Triple mismatch: legacy posted vs filter on posted_x; mark endpoint keyed on timestamp while older rows used received_at; backlog needed posted_x backfill.

Fix applied

GET excludes either flag; mark endpoint tries timestamp → received_at → title; JS client sends title for fallback matching.

Verified by

API snapshot {"ok": true, "pending": [], "total": 0, "held": true} + two full automation cycles without duplication.

Resolution

≈2 hours from detection → patched APIs → verified on live automation cycles. Full narrative retained in the engineering appendix linked below.

📝

SOP update rule: materially production-facing incidents earn the same structured write-up internally—so institutional memory compounds instead of resetting.

Section 06

Stack reference

Boring reliability primitives where uptime matters; sharp AI + CRM + social APIs where differentiation matters.

Process Mgmt

PM2 · systemd · AWS Lambda + EventBridge

Databases

Oracle Autonomous DB (walleted TLS, thick mode, multi-table estate) · PostgreSQL + pgvector (1536-d RAG)

CRM & Outreach

HubSpot CRM v3 + v4 associations · Hunter.io · Resend

Social

X API v2 (Account Activity, filtered stream, engagement worker) · Make.com · Telegram Bot API

AI / LLMs

Claude Sonnet / Haiku · Groq Llama 3.3 70B · OpenAI TTS / Whisper · LangChain · LangGraph

Lead Gen

HN Algolia · GitHub REST · Product Hunt GraphQL — ~150–250 net-new companies/month after filtering

Hosting / CDN

OCI Ubuntu VM (VM.Standard.E5.Flex, 12 GB) · AWS Lambda · 4everland IPFS frontends · Cloudflare DNS

Monitoring

Cron health driver · PM2 logs · CloudWatch · curl probes · OCI keep-alive

Content & SEO

Hashnode GraphQL · dev.to · GA4 · Google Search Console · GEO pack (llms.txt, crawler tokens)

Project Mgmt

Trello API (daily + weekly Telegram briefings) · GitHub webhooks across the fleet

AI Agent OperationsRunbook

Products & agents

Reliability & uptime practice

Go-to-market automation

Commit detected → CTO AIPA

CMO generates + posts

Blog crosspost

Algom Alpha tweet

EspaLuz Influencer

Release discipline

One live checkout per codebase

Green build, then swap

Secrets isolation

PM2 persistence

No silent failures

Verify after deploy

Incident response template

HubSpot duplicate posting loop

Stack reference

AI Agent Operations
Runbook