New paradigm · April 2026

Your app
should learn
while you work.

Most apps are built once and stay frozen. What if yours got better every time you used it — automatically, without code changes, without retraining a model?

The Limitation

Claude Code, Codex, and other AI harness systems are no longer useful only for developers. They are becoming important for any kind of work that depends on human judgment.

Research, document review, lead qualification, onboarding, reporting, compliance, and creative work. Any process where a person has to read something, make a decision, and take the next step is a good fit for agentic AI.

Skills make this even more powerful. Skills are sets of instructions that teach AI how to behave in a specific task or industry. They help turn a general AI model into something more practical, consistent, and reliable.

In simple terms, Skills are like procedures or playbooks for AI. They show it how to handle real work, in real companies, at real scale.

But there is still a big problem that none of these tools has fully solved yet. Not everything can be done in a chat window or a terminal. Most processes require a better interface and more structure. The AI is ready. The shell it lives in is holding it back.

What if...?
01

What if you could have Claude Code and a specialised interface at the same time?

Not one or the other. A proper native or web app — with your own UI, your own workflows, your own design — running Claude Code in the background as its brain. The intelligence of the harness, the experience of a real product.

02

What if your app learned from every session without any model changes?

Every time you correct an AI decision, that correction is saved. Next run, the AI reads its own history and uses it as examples. No retraining. No data science team. The app just gets better as you use it.

03

What if the intelligence layer was completely independent from the app?

Update the AI's instructions, rules, and reasoning without touching a line of app code. Ship improvements to how Claude thinks daily — while the app itself stays stable and unchanged.

Introducing

Self-Evolving Apps are web or native applications built on top of Claude Code or Codex — where the AI reasoning layer lives completely outside the app shell, and improves automatically over time.

The app handles the interface, the data, and the user experience. Claude handles the thinking, the matching, the decisions. They communicate through a shared folder — a structured contract that neither side breaks.

Every user correction feeds back as a future example. Every skill update takes effect immediately. The app you ship today is smarter than the one you shipped last week — without a new release.

Definition

Self-Evolving App /self-ih-volv-ing ap/

  • A web or native application whose intelligence layer runs as a Claude Code or Codex subprocess
  • Built on skills — domain-specific instruction sets that can be updated without changing the app
  • Accumulates user corrections as few-shot examples, fed into every subsequent AI run
  • Streams AI thinking and tool usage to the UI in real time
  • Deployable as a web app (Node.js) or native desktop app (Swift/macOS)
  • Passes the independence test: Claude can run the task from a terminal alone
Built for Every Industry

Any process that involves reading, deciding, and acting is a candidate. Here are examples across common business functions — but if your process requires judgment, it fits.

💼
Sales
  • Quotation Generator

    Reads deal context and generates tailored quotes with correct pricing, conditions, and terms.

  • SDR Follow-up Agent

    Reviews CRM activity and drafts personalised follow-up sequences based on deal stage and history.

  • Proposal Builder

    Assembles full sales proposals from a brief — scope, pricing, timeline, and differentiation.

  • Pipeline Health Report

    Analyses deal pipeline data and flags at-risk opportunities with recommended next actions.

⚖️
Legal
  • Contract Generator

    Creates first-draft contracts from a brief — NDA, MSA, SoW — using company-approved templates and language.

  • Contract Review

    Reads incoming contracts, highlights non-standard clauses, flags risk, and proposes redlines.

  • NDA Screening

    Checks NDAs against a policy checklist and returns a pass/flag/reject with reasoning.

  • Compliance Checker

    Reviews internal documents or processes against regulatory requirements and outputs a gap report.

⚙️
Operations
  • Project Review Assistant

    Reads project updates, status logs, and timelines to produce a concise health summary with risk flags.

  • Time Tracking Analyser

    Processes time logs and surfaces utilisation patterns, budget overruns, and team allocation issues.

  • Process Audit Tool

    Maps an existing workflow against a standard operating procedure and identifies gaps or inefficiencies.

  • Resource Allocation Report

    Analyses team capacity and workload data to recommend project staffing adjustments.

🧾
Admin & Finance
  • Invoice Generator

    Converts completed project data into formatted invoices with correct line items, taxes, and payment terms.

  • Reconciliation Agent

    Compares transaction records across systems, flags discrepancies, and produces a reconciliation report.

  • Expense Report Automation

    Reads receipts and categorises expenses against policy, flagging out-of-policy items before submission.

  • Budget vs Actual Analysis

    Pulls financial data and generates a commentary-style variance report ready for leadership review.

🧑‍💼
HR & People
  • CV Screening Assistant

    Reads applications against a job brief and scores each candidate with a structured shortlist rationale.

  • Onboarding Workflow

    Guides new hires through documentation, policy reading, and task completion with AI-assisted Q&A.

  • Performance Review Summariser

    Reads self-assessments, manager notes, and goal data to draft structured review summaries.

  • Job Description Generator

    Turns a role brief into a polished, inclusive job description aligned to company tone and level standards.

📣
Marketing & Support
  • Content Brief Generator

    Takes a topic and audience brief and produces a structured content brief with angle, outline, and key messages.

  • Campaign Performance Analyst

    Reads campaign data and generates a narrative performance report with insights and next-step recommendations.

  • Support Ticket Triage

    Classifies incoming support tickets by urgency, topic, and required skill, then routes or drafts a first response.

  • Knowledge Base Q&A

    Answers customer questions using company documentation, escalating automatically when confidence is low.

Don't see your use case? If your process involves reading information, applying judgment, and producing an output — it can be built as a self-evolving app.

Why Advanced
Harness Models?

Claude Code, Gemini CLI, Codex, and Copilot CLI are the most capable AI harness systems available today. They're not the same product — but they share the same fundamental shift: the AI doesn't just generate text, it acts.

What they all share
  • Advanced reasoning & extended thinking modes
  • MCP & tool-use — connect any API or data source
  • CLI-first design — scriptable, automatable, composable
  • Native code execution in sandboxed environments
  • Subagent orchestration — agents spawning agents
  • Continuous improvement — all developing at incredible speed
Which one should you use?

It really depends on your company policies and existing licenses. The good news: self-evolving apps work with any of them. The harness is swappable — your skills, memory, and architecture stay the same.

Feature Comparison
Feature Claude Code Codex CLI Gemini CLI Copilot CLI
Skills ✅ Markdown skill files ✅ AGENTS.md + custom commands ✅ Agent Skills (.md) ✅ Shared with cloud agent & VS Code
Subagents ✅ Isolated context, custom prompts & tools ✅ Roles via config.toml + git worktrees ✅ Custom agents in .gemini/agents/ ✅ Built-in + custom .agent.md files
Parallel Agents ✅ Agent Teams with direct messaging ✅ Parallel worktrees + Agents SDK ⚠️ Experimental ✅ /fleet + multiple sessions
MCP Support ✅ Native (stdio, SSE) ✅ stdio + streaming HTTP; can act as MCP server ✅ Native (stdio, http, sse) ✅ GitHub MCP built-in + custom
Headless Run ✅ -p flag ✅ codex exec (dedicated mode) ✅ gemini -p "prompt" ✅ -p / --prompt flag
Streaming / JSON ✅ JSON + stdout streaming ✅ JSONL stream + --output-schema ✅ --output-format stream-json ✅ --output-format=json JSONL
Open Source ❌ No ✅ Apache 2.0 (Rust) ✅ Apache 2.0 ❌ No
Multi-Model ❌ Anthropic only ⚠️ OpenAI only (+ local Ollama) ❌ Google only ✅ Anthropic + OpenAI + Google
See full comparison →
How It Works
User Layer

App Shell — Web or Native

The app manages the UI, prepares data files, stores corrections, and renders results. It has no embedded intelligence — it is deliberately dumb. Language: Node.js + Express (web) or Swift + SwiftUI (macOS).

Vanilla JS / SwiftUI File I/O Subprocess runner Corrections store Token tracking
reads & writes files ↕
Contract Layer

Working Directory — The Briefing Room

The only shared space between the app and Claude. The app prepares it before each run. Claude walks in, reads everything, and leaves a structured answer. Neither side knows about the other's implementation — they only share this folder.

input.json result.json CLAUDE.md references/.env corrections.json skill symlinks
cwd = working directory ↕
AI Layer

Claude Code / Codex Subprocess

Spawned by the app per session or kept alive between messages. Reads CLAUDE.md for instructions, credentials from .env, corrections.json for past examples. Streams thinking, tool calls, and text deltas back to the app. Writes structured output to result.json.

stream-json --verbose AsyncStream events Tool calls logged Token + cost tracking
skill is a symlink ↕
Intelligence Layer

Skill — The Updatable Brain

A folder containing SKILL.md (instructions), Python scripts (preprocessing), and reference documents (domain knowledge, matching rules). Symlinked into four locations so Claude finds it from any context. Update the skill — the next run is smarter. The app never changes.

SKILL.md scripts/*.py references/*.md 4× symlinked Zero-downtime updates
The Learning Loop
01

Claude processes your data

The app passes input files to Claude. Claude reads the skill instructions, loads domain knowledge, and produces a structured result — streamed live to the UI.

02

You review the results

The app parses result.json and shows Claude's decisions in a review interface. Most answers are correct. A few need correction.

03

You correct what's wrong

You change the wrong answer to the right one. The app saves the correction: which signals were present, what Claude thought, and what the correct answer was.

04

Corrections become examples

corrections.json grows. The skill instructs Claude: "Read this file before reasoning. If you see similar signals, use these past answers as authoritative examples."

05

Next run is more accurate

No model retraining. No data science. Just a casebook that grows with every session — and an AI that reads it before every decision.

0
ML infrastructure required
No fine-tuning, no vector databases, no training pipelines. The self-learning mechanism is a JSON file and a SKILL.md instruction to read it.
Skill updates, zero app releases
Improve the AI's reasoning, rules, and domain knowledge independently — while the app and its users see improvements on the next run.
Accuracy grows with usage
Every correction is an example the AI never forgets. The more the app is used, the less it needs to be corrected. The loop tightens automatically.
Technical Foundation

For builders who want to understand the implementation. Every pattern is production-tested in a real daily-use application.

Read the Docs →
📡

Streaming — NDJSON & AsyncStream

Claude streams events line by line. Web apps read via HTTP chunked fetch. Native apps use Swift AsyncStream<ClaudeEvent> — a typed enum covering thinking, toolUse, textDelta, done, and tokenUsage. Every event is rendered live.

🔐

Credentials — references/.env

All API keys live in references/.env inside the working directory. The app reads and writes this file. Claude reads it directly when calling external APIs. No hardcoded secrets. No app-specific credential stores.

🔗

Skill Symlinks — 4 Locations

Every skill is symlinked into ~/.claude/skills/, ~/.agents/skills/, ~/workdir/.claude/skills/, and ~/workdir/.agents/skills/. All four point to the same real directory. Update once, all contexts update instantly.

📊

Token & Cost Tracking

Every run captures input tokens, output tokens, cache read tokens, and total cost in USD from Claude's result event. Displayed after every session. Users always know what processing costs.

📄

Structured Output Contract

Claude never returns freeform text as primary output. Every run ends with a structured JSON envelope: {"message":"...","results":[...]}. The schema is the only hard coupling between app and skill. Change the skill freely — keep the schema stable.

Persistent Session (Native)

Native apps keep one Claude process alive per session via stdin/stdout — eliminating per-message startup delay. Web apps spawn per message. Both patterns supported. Both share the same working directory and skill architecture.

The File Contract
APP writes input files + references/.env before each run
APP writes CLAUDE.md on every launch — sources the latest skill content
APP appends corrections to corrections.json when user overrides AI
SKILL reads CLAUDE.md → credentials → corrections → input files
SKILL reasons, runs scripts, calls APIs → writes result.json
APP reads result.json → parses to domain model → renders UI
BOTH agree on result.json schema — the only hard coupling
Two Platforms. One Architecture.
🌐

Web App

Node.js + Express + Vanilla JS
  • Browser UI — HTML, CSS, no framework, no build step
  • Chat-first layout — controls left, AI chat right
  • NDJSON streaming via HTTP chunked fetch
  • New Claude process per message
  • Deployable to Railway, Docker, any server
  • Skills in skills/ dir, symlinked into workdir
  • Session continuity via --session-id / -r flags
  • Best for: team tools, dashboards, quick prototypes
🍎

Native macOS

Swift + SwiftUI + XcodeGen
  • Native SwiftUI — menus, notifications, system integration
  • Typed AsyncStream<ClaudeEvent> enum — no JSON in view layer
  • Persistent process — no startup delay between messages
  • Skills deployed from app bundle via SkillManager
  • CLAUDE.md + AGENTS.md rewritten on every launch
  • Credentials via EnvStore — reads references/.env
  • Screenpipe integration for passive activity capture
  • Best for: power users, daily workflows, offline-first tools
Licensing Models

Not all authentication methods are created equal. Whether you're building for yourself, your team, or external customers — the rules differ significantly across Claude Code, Codex CLI, Gemini CLI, and Copilot CLI. Understanding this upfront saves you from costly architecture decisions later.

Covers personal use, VPS deployment, CI/CD, centralized servers, and commercial product scenarios — with a full breakdown of what's allowed per auth method and vendor.

View Licensing Details →
Ready to build
a self-evolving app?

We're sharing the architecture, the patterns, and the lessons from building in production. If you're building AI tools that need to go beyond the chat window — let's talk.

Book a Call with Me
Let's talk
Tiberiu Socaci
Founder · Self-Evolving Apps
Architecture consulting for AI apps
Self-evolving app implementation
CLI harness selection & licensing review
Production deployment & skills design
Book a Free Call →
Open Source Skill
Want to build it yourself?

The full headless app creator skill — production-tested patterns, architecture decisions, and ready-to-use code — packaged and ready to drop into your own Claude Code setup.

Read the Docs →
📡
NDJSON Streaming Protocol
Complete backend + frontend streaming pattern. Event types, error recovery, active-run guard.
🧠
Skills & Subagent System
SKILL.md schema, working folder isolation, symlink setup, and agent.md patterns.
🚀
Docker & Railway Deployment
Production-ready Dockerfile, package.json, and environment setup. No build step.
💬
Chat-First UI Patterns
Activity pill, optimistic messages, multi-view SPA, session resume, drag & drop upload.
Frequently Asked Questions
What is a self-evolving app? +
A self-evolving app is a web or native application where the AI reasoning layer lives completely outside the app shell. Built on top of Claude Code, Codex CLI, Gemini CLI, or Copilot CLI, the AI improves its own skills and decision-making automatically over time — without any app releases or developer intervention.
Can I build an app on top of Claude Code? +
Yes. Claude Code supports headless mode via the -p flag, letting you spawn it as a subprocess from a Node.js server. This enables you to build full web applications, automations, and SaaS tools with Claude as the AI reasoning engine. The same pattern works with Codex CLI (codex exec) and Gemini CLI (gemini -p).
Read the full architecture guide →
Do self-learning apps require machine learning infrastructure? +
No. Zero ML infrastructure is required. No fine-tuning, no vector databases, no training pipelines. The self-learning mechanism is a JSON corrections file and a SKILL.md instruction that tells Claude to read it. Every correction becomes a permanent example the AI applies on every future run.
Which is better — Claude Code, Codex CLI, Gemini CLI, or Copilot CLI? +
It depends on your company policies and existing licenses. Claude Code has the most mature subagent system. Codex CLI is best for CI/CD and is open source. Gemini CLI has the largest context window (1M tokens) and the most generous free tier. Copilot CLI is the only multi-model tool (Anthropic + OpenAI + Google). Self-evolving apps work with any of them — the harness is swappable.
See the full feature comparison →
What is the difference between a self-evolving app and a traditional AI chatbot? +
A traditional AI chatbot sends user messages to an API and returns a response. A self-evolving app goes further: it maintains persistent skills (domain expertise encoded in SKILL.md files), accumulates corrections in a structured format, uses subagents to delegate tasks, executes real actions via MCP tools, and improves its own reasoning rules over time — all without any developer intervention after deployment.
Can I use a Claude subscription to build a product for external customers? +
No — for external products, you must use an API key. Anthropic's consumer subscription is restricted to "ordinary, individual usage." For a centralized server serving multiple users or any external commercial product, the Anthropic Commercial Terms require API key authentication. The same rule applies to Gemini CLI and GitHub Copilot CLI. Codex CLI is the only tool where OpenAI actively endorses individual users using their own subscription in third-party tools.
Read the full licensing guide →
Who built this? +
Self-Evolving Apps is an AI implementation project founded by Tiberiu Socaci, focused on helping businesses build production-ready AI-powered products. The team specialises in self-evolving app architecture, Claude Code and Codex CLI integrations, headless AI backends, and AI skills design for business processes. The self-evolving apps pattern was developed and battle-tested through real client projects before being documented and published here.
What is a CLI tool — and do my users ever see one? +
CLI stands for Command-Line Interface — a text-based way to run software by typing commands into a terminal. Claude Code, Codex CLI, Gemini CLI, and Copilot CLI are all AI tools that run in a terminal. In a self-evolving app, the CLI runs silently in the background, driven by your server. Your users never see a terminal — they use your app's normal interface (buttons, forms, chat). The CLI is the engine; your app is the dashboard.
What kind of apps can you build with this pattern? +
Any domain process that currently involves human judgment is a candidate. Real examples include: contract review and analysis tools, customer support agents with persistent domain memory, sales enablement apps that learn company pitch patterns, HR screening assistants, financial report generators, and custom CRM automations. The pattern works for both web apps (Node.js + Express) and native desktop apps (Swift/macOS). If it involves reading, deciding, and responding — it's a fit.
How is this different from calling the Claude API directly? +
Calling the Claude API gives you a stateless text response — no persistent memory, no file access, no tool use, no self-improvement. A self-evolving app built on Claude Code adds all of that: persistent skills that encode domain expertise, a corrections loop so the AI learns from mistakes, subagents for parallel complex tasks, MCP tool connections to any API or database, real-time streaming, and session isolation. The API is a raw capability. Claude Code is a full reasoning engine — and the self-evolving pattern is the harness that makes it production-worthy.
Do I need a developer to build a self-evolving app? +
Setting up the initial architecture — Node.js server, Express routes, Claude CLI subprocess — requires development capability. But once the shell is built, updating the AI's skills, domain knowledge, and reasoning rules requires no coding at all. We can build the full architecture as a consulting engagement, or provide the downloadable skill and documentation for your team to implement it themselves.
Talk to us about your project →