I killed my Spanish learning web app after 97 signups and zero revenue. Three weeks later, I had paying customers for the exact same functionality delivered through WhatsApp. The difference wasn't the AI — it was removing every friction point between a learner and their next conversation.
Here's what actually worked: a two-layer memory system that maintains conversation context without Pinecone or Weaviate, session continuity that survives days between messages, and why my highest-paying user sends 200+ messages daily at $0.0008 per exchange.
Memory Without Vector Stores
Most AI language learning apps treat memory like a technical checkbox: embed conversations, store in Pinecone, retrieve on similarity. I started there too. Monthly cost for 10 active users: $49 for Pinecone, plus embedding compute. Revenue from those users: $0.
EspaLuz runs on a simpler architecture. Layer one: the last 20 messages stored in PostgreSQL with user_id, timestamp, and role. Layer two: a daily summary generated at midnight UTC, compressed to 500 tokens. Total storage per user per month: 18KB. Cost on Oracle Cloud's always-free tier: $0.
The conversation memory works because language learners repeat patterns. They practice the same verb conjugations, revisit the same vocabulary, make the same mistakes. You don't need semantic search across 10,000 historical messages. You need yesterday's correction about "ser vs estar" and last week's restaurant vocabulary.
Here's the actual query that powers memory retrieval:
SELECT message_content, role, created_at
FROM conversations
WHERE user_id = $1
AND created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC
LIMIT 20;
For context beyond seven days, the system pulls the latest summary:
SELECT summary_content
FROM daily_summaries
WHERE user_id = $1
ORDER BY summary_date DESC
LIMIT 5;
Five summaries = 2,500 tokens of context. Enough to remember that this user struggles with subjunctive mood and learned colors last month. Not enough to blow through context windows or retrieval budgets.
WhatsApp's Natural Persistence
Web apps demand logins. They send reminder emails. They need users to remember URLs, passwords, which tab they left open. My analytics showed the brutal truth: average session length 3.2 minutes, return rate after 24 hours: 7%.
WhatsApp conversations never end. The chat stays there between their mom's messages and their work group. No login. No URL. Just type and continue.
The technical implementation uses Twilio's WhatsApp Business API. The tricky part isn't receiving messages — it's maintaining state across the 24-hour session window that Twilio enforces. Here's how EspaLuz handles it:
1. Each incoming message updates a last_active timestamp
2. If last_active > 24 hours ago, inject a context refresh into the prompt
3. The refresh pulls the last conversation summary plus any corrections marked as "recurring"
The context refresh adds 200 tokens to the first message after a break. Cost: $0.0001. Value: the user doesn't repeat "Hi, I want to practice Spanish" for the 50th time.
Routing Decisions That Cut Costs 70%
My first architecture sent everything to Claude 3.5 Sonnet. Clean, simple, expensive. At $3 per million input tokens and $15 per million output tokens, a chatty user burned through $2/day. Spanish lessons at $29/month don't math at those margins.
Current routing logic:
- Grammar explanations, cultural context → Claude 3.5 Sonnet
- Basic conversation, vocabulary practice → Groq Llama 3.1 70B
- Message classification, intent detection → Groq Llama 3.1 8B
The router itself runs on Llama 3.1 8B, making decisions in <50ms:
def route_message(message_content, conversation_history):
classifier_prompt = f"""
Message: {message_content}
Recent context: {conversation_history[-3:]}
Classify as:
- complex: subjunctive, cultural nuance, multi-sentence explanation needed
- simple: vocabulary, present tense, yes/no, greetings
- meta: questions about the app, billing, features
Output only the classification word.
"""
classification = groq_client.complete(
model="llama-3.1-8b",
prompt=classifier_prompt,
max_tokens=10
)
return {
'complex': 'claude-3.5-sonnet',
'simple': 'llama-3.1-70b',
'meta': 'llama-3.1-8b'
}.get(classification.strip(), 'llama-3.1-70b')
Real usage data from the highest-volume user (237 messages yesterday):
- 71% routed to Llama 70B: $0.08
- 24% routed to Llama 8B: $0.01
- 5% routed to Claude: $0.11
Total cost: $0.20 for a full day of conversation practice. They pay $49/month.
Why Three Customers Beat One Hundred Signups
The web app had 97 signups in two weeks. Conversion rate to paid: 0%. The WhatsApp bot has 11 users. Three pay $29-49/month. The difference comes down to usage patterns that only emerged with real payment incentive alignment.
Free web app users:
- Average 2.3 messages per session
- Tested features, didn't practice Spanish
- Churned the moment they hit any friction
Paying WhatsApp users:
- Average 47 messages per day
- Practice specific scenarios repeatedly
- Report bugs with exact reproduction steps
The highest value feedback came from a $49/month user who sends voice messages. She exposed three critical issues:
1. The bot was correcting accent marks in voice transcriptions, creating confusion about whether she pronounced something wrong or Whisper misheard
2. Response latency spikes during her morning practice (7 AM Panama time) when Oracle Cloud's free tier gets hammered
3. The grammar explanations used Spain Spanish examples while she needed Mexican variants
None of these issues surfaced with free users. They just stopped using the app.
Implementation Details That Matter
The full EspaLuz stack:
- Orchestration: Custom Python scheduler on Oracle Cloud (no Airflow/Dagster)
- Message queue: PostgreSQL LISTEN/NOTIFY (no Redis/RabbitMQ)
- State management: JSON blob in PostgreSQL with conversation state
- Memory: Two-layer system described above
- Models: Groq (Llama 3.1 8B/70B), Anthropic (Claude 3.5 Sonnet)
- Voice: Whisper API for transcription, no TTS yet
- Deployment: Single Oracle Cloud VM, 4 CPU, 24GB RAM (free tier)
The surprising constraint: Twilio rate limits hit before any model API limits. WhatsApp Business allows 1,000 customer-initiated conversations per day. I hit 890 yesterday across all users.
Cost breakdown for 1,000 daily messages:
- Twilio WhatsApp: $8.90
- Model APIs: ~$2.50 (with routing)
- Oracle Cloud: $0 (free tier)
- PostgreSQL storage: $0 (under 1GB)
Revenue from 1,000 daily messages (at current pricing): ~$150/month.
Lessons for Technical Founders
Building AI language learning on WhatsApp taught me three things that web-first builders miss:
1. Memory doesn't need to be perfect. Language learners want continuity, not omniscience. My two-layer system maintains enough context for natural conversation at 1/100th the cost of vector search.
2. Platform constraints create better products. WhatsApp's 24-hour session window forced me to build better context refresh. The 1,600 character limit made responses concise. Voice message support emerged because users already used it.
3. Charging money surfaces real requirements. Free users explore. Paying users practice. One pays your AWS bills.
I'm building the fourth iteration now: group conversations for Spanish practice between learners. Same WhatsApp infrastructure, same memory system, new challenge: maintaining conversation context across multiple speakers without bleeding their individual learning histories together.
The technical approach: partition memory by conversation_id, inject only group-relevant summaries, maintain individual progress tracking separately. Expected launch: when I have five users willing to pay $19/month for group practice.
Because that's another lesson: in the AI agent business, revenue commitment beats feature requests every time.
Frequently Asked Questions
Q: Why not use RAG with local vector stores like ChromaDB to eliminate monthly costs?
A: I tested ChromaDB locally. Memory usage spiked to 2.8GB for 50 users' conversation histories. On Oracle's free tier with 1GB RAM allocated to vectors, I'd cap at ~18 users. My PostgreSQL approach scales to 500+ users on the same infrastructure.
Q: How do you handle conversation context when Groq's Llama models have smaller context windows than Claude?
A: Llama 3.1 70B handles 128K tokens. I use maximum 3K: 20 recent messages (2K) + 5 summaries (1K). The constraint isn't context window — it's response relevance. More context doesn't improve "¿Cómo se dice 'apple'?" answers.
Q: What happens to conversation memory if a user stops paying then resubscribes months later?
A: PostgreSQL soft-deletes after 30 days of inactivity but keeps summaries for 180 days. Resubscribing users get a "welcome back" message with their last three summaries injected. Cost to maintain: $0.0001/user/month.
Q: Why route to multiple models instead of using GPT-4 Mini for everything at $0.15/million tokens?
A: I A/B tested GPT-4 Mini against my routing system. Users rated conversation quality identical (4.3/5 average). But Mini's Spanish explanations were wordier — average response 340 tokens vs 180 for Llama 70B. At scale, those extra tokens cost more than smart routing saves.
Q: How do you prevent memory poisoning if users deliberately inject false information about their learning history?
A: I don't. If someone pays $29/month to gaslight their Spanish tutor bot about their progress, that's their choice. The daily summaries use basic fact extraction, not truth validation. Real users want accurate progress tracking.