Recursive Creating Machines

Can we build a general thinking machine that really creates knowledge beyond imitation in the real world?¹

I first sketched primitive versions of the Civilization Tests below as a freshman, before I had even learned machine learning — when I was simply trying to figure out what should count as intelligence. A decade on, the questions have only sharpened, and the framework below is crystallized from my research journey in pursuing them.

Imagine an early human: lacking accumulated civilization-scale knowledge, yet possessing the raw capacity to learn, invent, and transmit. From that near-blank slate, humanity ultimately built everything we have today and laid the foundation for our future. The question is whether we can build a machine with that same property — not the encyclopedia, but the spark.

Three thought experiments make the question precise. I'll call them the Civilization Tests — north-star evaluations of general intelligence, distinct from today's increasingly saturated static benchmarks.

Test I

Civilization Origination

If sent back 1,000,000 years into the pre-human era, stripped of all prior human knowledge, could it invent and build a human-level civilization from scratch?

Test II

Civilization Expansion

If deployed alone to Mars or another barren world, could it construct a functioning civilization from raw materials, expand to neighboring planets, and repeat this process recursively?

Test III

Civilization Continuity

If humanity were to suddenly stop "working" but keep "playing," could it run our society and carry civilization forward?

These are not proposed as near-term benchmarks. They are limit cases — thought experiments that expose what a genuinely general creating system would have to be able to do. By "stripped of prior human knowledge" I mean explicit cultural and scientific content, not architectural priors, sensors, or learning machinery.

They are evaluations of what general intelligence — actual creation, not sophisticated imitation — would look like at civilization scale. Passing any one of them is civilizationally significant.

01Three Capacities

A machine capable of any of the three tasks above must possess three capacities, in combination.

Completeness — the basis. The machine must command a sufficiently broad and operationally usable representation of what humans already know: not just explicit theories and recorded data, but the tacit procedures, institutional routines, and context-sensitive judgments that constitute the working stock of human knowledge — and the ability to derive or approximate their relevant consequences under real constraints.

Creation — the engine. The machine must extend that boundary. Produce assertions that don't follow from anything humans have established — and that survive whatever criterion the domain requires for acceptance. New axioms, new physical laws, new architectures, new medicines, new designs that weren't derivable from prior knowledge.

Recursion — the structure. The machine must be capable of seeding the next stage: producing — perhaps itself, perhaps a successor — a machine whose basis is what it has created. Not the inevitability of compounding creation, but the capacity for it. The hierarchy, in principle, extends without bound.

Each Test stresses a different capacity, though all three are needed for all three:

Continuity stresses completeness-and-coordination — keeping civilization running requires covering the operational closure of human institutions, including explicit knowledge, tacit routines, and real-time adaptation. The requirement distributes across many machines but doesn't relax.
Origination stresses creation — bootstrapping from a near-zero $\partial K$ recapitulates millennia of layered creations, with no shortcut through inherited knowledge.
Expansion stresses recursion — carrying the creating structure itself to new worlds requires that each instance can seed the next, indefinitely.

All three Tests require all three capacities. Looking at current agentic systems, this same asymmetry tells us where they are and aren't.

02Where Current AI Sits

Two questions frame this landscape. Turing in 1950¹ asked Can machines think? — and famously declined to define thinking directly, replacing the question with the imitation game and letting indistinguishability stand in for an answer. A century earlier, Ada Lovelace's 1843 notes on Babbage's Analytical Engine² posed the harder question: Can machines create? She argued they could not — that "the Analytical Engine has no pretensions whatever to originate anything." Imitation and creation are not the same thing, and the gap between them maps directly onto the distinction between M₁ and M₂.

Most AI systems we have today are partial implementations of completeness. A trained language model is, roughly, a finite encoding that approximates consequences of human-recorded knowledge. Asked a known question, it returns a known answer. Asked a novel question whose answer follows from training data, it can sometimes derive it. This is boundary-complete reasoning. It is also, ultimately, imitation: the output, however impressive, was logically latent in what we already knew. (Stipulatively, I reserve creation for boundary extension; by that convention even a spectacular derivation from existing premises remains M₁ rather than M₂.)

This is enormously valuable. A truly boundary-complete machine would not merely retrieve human knowledge but search uniformly over its consequences — every theorem derivable from current axioms, every regularity implicit in existing data. It could imitate any expert at any task whose ground truth is derivable from known premises.

But it would not create.

The output, however impressive, was logically latent in what we already knew. The signature of imitation

Two non-examples sharpen the distinction.

AlphaGo⁴ is not M₂ relative to the formal rules of Go — Go is a finite game whose optimal play is in principle entailed by the rules. But it is M₂ relative to the human Go boundary: Move 37 in Game 2 against Lee Sedol sat outside the inherited professional repertoire, and its value was unrecognized by the human tradition until AlphaGo exposed it. This is relative M₂ — creation against the human knowledge boundary in a closed domain, narrow but real. The capacity does not generalize.

AI Scientist-style systems⁵ are broader in domain — they can in principle write papers in any field, and they automate parts of hypothesis generation, experimentation, and paper production. They are better viewed as candidate M₂ mechanisms than established general M₂ systems: whether their outputs constitute durable boundary extensions, rather than recombinations of known scientific moves, remains an open empirical question.

The central thesis: we need all three properties at once. Generality, creation, and real-world operation.

03The Formal Picture

The formalization below treats a knowledge boundary as a logical theory. This is exact for mathematics and only an idealization for empirical science, engineering, and economic activity, where acceptance is probabilistic, experimental, and utility-sensitive. The purpose is not to reduce all knowledge to theorem-proving, but to isolate the structural distinction between derivation from a current boundary and extension beyond it.

Fix a recursively axiomatized theory $T$ rich enough to interpret arithmetic — a knowledge boundary $\partial K$, in the framework's terms. Let $\mathrm{Cn}(\partial K)$ denote its deductive closure. Three capability tiers, by how their output relates to $\mathrm{Cn}(\partial K)$:

$$\begin{aligned} M_0(K) \;\text{(deductive)}\; & : \;\text{derives in } \mathrm{Cn}(K) \text{ for some fixed finite } K \subset \partial K \\[4pt] M_1(\partial K) \;\text{(boundary-complete)}\; & : \;\text{partial procedure that derives any } \varphi \in \mathrm{Cn}(\partial K), \text{ using the whole boundary} \\[4pt] M_2(\partial K,\,A) \;\text{(creative)}\; & : \;\text{proposes } \varphi \text{ s.t. } \partial K \nvdash \varphi \text{ and } \partial K + \varphi \text{ passes } A \end{aligned}$$

These tiers can all run on the same Turing-complete substrate under different programs. They are capability classes, not hardware classes.

A note on M₁. For recursively axiomatized $\partial K$ rich enough to interpret arithmetic, $\mathrm{Cn}(\partial K)$ is recursively enumerable but not decidable. M₁ should therefore be read not as an oracle that decides membership in the closure, but as a capability class that can uniformly search, derive, and enumerate consequences from the whole boundary rather than from any fixed finite fragment. In practice no physical system achieves even idealized M₁; every implementation is a resource-bounded M₀ on a growing fragment, which is the engineering reality the asymptotic idealization abstracts over.

A clarification worth pausing on. $\mathrm{Cn}(\partial K)$ is the set of statements logically entailed by $\partial K$, not the set of statements expressible in $\partial K$'s vocabulary. These are easy to conflate — a rich vocabulary (a dictionary, a programming language, the symbols of physics) makes vastly many statements expressible without entailing any of them. M₁'s scope is the entailed; M₂'s territory is the expressible-but-not-entailed. "There exists a unified theory beyond GR and QFT" is expressible in current physics vocabulary but not derivable from current physics — which is exactly why finding such a theory would count as creation rather than derivation.

The two separations

The $M_0/M_1$ distinction is not that an $M_1$ theorem lacks a finite proof basis — every proof uses finitely many premises, so every individual theorem in $\mathrm{Cn}(\partial K)$ is derivable from some finite $K \subset \partial K$. The distinction is uniformity: when $\partial K$ is not captured by any fixed finite basis (as with schema-based theories like PA and, more loosely, open-ended scientific knowledge), no single $M_0(K)$ ranges over the whole boundary. $M_1$ does.

The $M_1/M_2$ distinction is sharper, and Gödel⁶ makes it formal. For a sufficiently strong, recursively axiomatized, sound theory $T$ satisfying the usual derivability conditions, the second incompleteness theorem gives $T \nvdash \mathrm{Con}(T)$; since $T$ is sound, $\mathrm{Con}(T)$ is true, so $T + \mathrm{Con}(T)$ is consistent. A strictly stronger boundary always exists.

But mere non-entailment is not creation. A uniform-random oracle produces unentailed sentences trivially; a useless-but-consistent axiom is too cheap. The acceptance predicate $A$ does real work — and is, in fact, where the framework's substantive content lives.

The acceptance predicate

$A$ is not a minor filter applied after generation. It is part of the creative act: the criterion that separates noise from knowledge in each domain.

Mathematics: proof in a stronger system, relative consistency, conservativity over prior domains, explanatory or proof-theoretic fruitfulness.
Empirical science: reproducible predictive accuracy, experimental confirmation, causal-intervention success, unification, robustness under distribution shift.
Engineering: function, reliability, manufacturability, safety, cost, scalability.
Economic activity: durable utility-gain under competition.
Civilization-scale systems: sustainability, alignment, social legitimacy, resilience, institutional compatibility, ethical constraint.

$M_2$ is therefore not novelty generation alone. It is novelty generation coupled to validation: the machine proposes extensions outside the current boundary and either produces, discovers, or elicits enough evidence for those extensions to survive the domain's $A$.

Gödel's role, then, is narrower than it may first appear: for sufficiently strong, sound, recursively axiomatized boundaries, true statements not derivable from the boundary always exist — derivation from a fixed boundary is not the end of knowledge. Gödel does not, however, identify which extensions are useful, empirically meaningful, or admissible under $A$. The engineering content of M₂ — the production of justified extensions in particular domains — is not implied by the existence result; it is the open problem the framework points toward.

FIG. 01The three capability tiers. M₀ machines reach finite-prior consequences; M₁ ranges uniformly over the boundary's deductive closure; M₂ targets accepted extensions — a sub-region of the larger candidate space that passes the domain's acceptance predicate A. In the formal setting, Gödel guarantees the candidate space is non-empty; A determines which candidates count.

04Recursion — The Self-Replicating Hierarchy

Now the key move. If an $M_2$ machine produces a new boundary $\partial K^+$, that new boundary inherits the structural properties of the old (recursively enumerable, consistent under $A$, etc.). So $\partial K^+$ can serve as the boundary of a child machine class $\mathcal{S}^+$ — which has its own $M_0, M_1, M_2$ tiers, including its own creative capability.

The same construction iterates:

$$\mathcal{S} \;\longrightarrow\; \mathcal{S}^{+} \;\longrightarrow\; \mathcal{S}^{++} \;\longrightarrow\; \mathcal{S}^{+++} \;\longrightarrow\; \cdots$$

The hierarchy extends along any computable ordinal path — a classical result going back to Turing's 1939 ordinal logics⁷ and refined by Feferman in 1962.⁸

This is the Recursive in Recursive Creating Machines. Each act of creation seeds the next. Each generation stands on what the prior generation produced.

FIG. 02The recursive tower in motion. Each 𝓢 creates the next: the arrow rises (creation), and a new class appears whose knowledge boundary is what the prior class produced. The pattern is the diagram.

The formal tower above is the abstract pattern, not a literal claim that any single mechanism produces all the kinds of recursion this essay touches. At least four related but distinct recursions are in play: formal recursion ($T \to T + \mathrm{Con}(T) \to \cdots$); knowledge recursion, in which discoveries enable further discoveries; material and institutional recursion, in which systems, tools, infrastructures, and organizations build their successors; and AI-for-AI recursion, in which AI systems improve AI systems. Each has its own acceptance criteria and resource constraints. What the framework asserts is the shared structural shape — a boundary is extended, and the extension becomes the prior for a successor process — not that the formal tower and Mars colonization are the same construction.

This is not abstract. Humanity itself is a running instance. With the brain as substrate, humans have built up an accumulating $\partial K$ across millennia — language, writing, mathematics, science, engineering, computation — each layer resting on what the prior layer made possible. Building AI is one step in that same process: humans ($\mathcal{S}$) producing a successor machine class ($\mathcal{S}^+$) whose basis includes everything we have written down. AlphaGo, AlphaEvolve, and every contemporary AI system lives in that $\mathcal{S}^+$.

The same structural shape shows up at smaller scales — assembly enables C enables Python; stone tools enable metal tools enable silicon ones — though these aren't strictly $\partial K \to \partial K^+$ in the formal sense. They're the same compounding pattern, applied to engineering layers and physical capability rather than logical theories.

A machine that satisfies Origination must do what humanity did across millennia — millions of layered creations, each building on the last. The only way is recursion.

05What This Looks Like in Practice

The framework is testable in fragments. Each capacity generates a question — Can foundation models host the M₀ substrate? What does M₁ look like in deployment? Where does M₂ first show up? My work has pursued these piece by piece. The sketch below is a map of open questions and partial results; the projects themselves are described elsewhere.

Substrate — can foundation models host M₀?

The substrate question is the long-running symbolism vs. connectionism debate, recast: can neural networks host the structured, compositional reasoning that everything above depends on? Transitional Dictionary Learning⁹ proposes a transitional representation that captures compositionality and shows it can be learned unsupervisedly — evidence that NNs can host structured reasoning. A second, ongoing line uses Solomonoff induction¹⁰ and algorithmic information theory to look for conserved quantities and invariances of foundation-model capabilities — the "physics laws" of the substrate. APEX takes the agentic-system level, modeling architecture, task, and performance as $P(m \mid a; t)$ with tools from causal inference, and surfacing design laws across a three-tier validation ladder.

Agency — M₁ via foundation-model agents

An agent is just something that acts. Of course, all computer programs do something, but computer agents are expected to do more: operate autonomously, perceive their environment, persist over a prolonged time period, adapt to change, and create and pursue goals. Russell & Norvig, AIMA³, §1.1.4

If the substrate holds, M₁ — boundary-completeness in deployment — is pursued by building systems where foundation models act as agents in this sense. Each clause of the definition maps to a research component: SocioDojo¹¹ handles perceiving the environment with lifelong agents over real-world text and time-series feeds, and serves as the integrating framework for the rest; AAPM¹² handles persistence via lifelong memory, with empirical asset pricing as a clean lens because excess return is a precisely measured proxy for information gain; Apeiron¹³ handles adaptation via agent-as-judge optimization applied to autonomous software synthesis; Analytica¹⁴ handles create-and-pursue-goals through soft-propositional neural-symbolic reasoning.

Discovery — M₂ clues from AI-for-AI

Genuine M₂ at general scale is still aspirational, but the clearest available clues come from AI-for-AI: systems that study and extend the boundary of AI itself, as the most direct demonstration of $\mathcal{S} \to \mathcal{S}^+$. Genesys¹⁵ discovers new language-model architectures, with each discovery becoming the prior for the next round — recursion in architecture space. Another ongoing project extends the pattern to LM inference infrastructure. Across both, the discovery loop is AlphaEvolve / AI-Scientist–style evolutionary search¹⁶, with neurosymbolic representations for grounding and compositionality.

Engineering — three layers

What I cannot create, I do not understand. Richard P. Feynman¹⁷

Three open-source frameworks underpin everything above, each owning a distinct layer of the agentic stack. LLLM is the programming framework for individual agentic systems — packages, runtimes, prompts, tactics. SSSN is the distributed protocol for networked, phylogenetic-level intelligence — multi-generational transmission and decentralized coordination as engineering primitives. AAAX sits above both as the governed kernel and control plane.

06What's Open

The thesis argues for three candidate structural requirements — completeness, creation, recursion — for general creating machines. What remains genuinely open splits into three categories.

Sufficiency — safety, ethics, and societal impact. Necessary conditions are not sufficient. RCM characterizes what it takes to build a creating machine, but says nothing about whether such machines, if built, would be safe, beneficial, or ethical. AI is reshaping society across every dimension; alignment, control, interpretability, and human flourishing are not formally covered by this thesis. My neural-symbolic work offers some interpretability and grounding affordances — soft propositional structure in Analytica, transitional representations in TDL — but these are not safety guarantees, and the broader sufficiency question is its own program.

Completeness — the Gödelian limit. RCM's guarantee that candidate M₂ extensions are always available comes from Gödel's incompleteness theorem⁶: every sufficiently strong $\partial K$ has true statements it cannot derive. But Gödel cuts both ways. No formal system is ever complete; every level of the recursive hierarchy faces the same limit. This is a structural feature, not a bug — but it means the tower is fundamentally open-ended. We never run out of things to create, and we never reach a final description of what is creatable. The framework guarantees the next step exists; it does not guarantee a destination.

Optimality — are foundation models the right substrate? The theory work asks whether foundation models can serve as the M₀ substrate, not whether they are the optimal one. As TDL discusses, neural networks are the de facto choice today because of the Universal Approximation Theorem¹⁸ and established infrastructure — but other paths exist. Energy-Based Models¹⁹ reframe the substrate around energy minimization. The Free Energy Principle²⁰ grounds it in variational inference and biological priors. Yet-to-be-designed hybrid neuro-symbolic architectures may displace both. Which substrate is right, not just which is sufficient, remains open.

07Why It Matters

We're in a peculiar moment. Current AI systems do remarkable work in $M_1$ territory — drawing on the closure of existing knowledge. Narrow $M_2$ has also begun to appear, with AlphaEvolve's algorithmic discoveries the clearest current example. What's still emerging — and what this essay points toward — is $M_2$ at the breadth the Civilization Tests would demand.

The Civilization Tests are the limit case, but the structural argument has a quieter consequence at every scale below them. Borrow a term: every value-generating activity that survives competition depends on producing alpha — value beyond what is already shared. A discovery, a design, a method, an insight: each is, in the framework's terms, content beyond the relevant deductive closure. By construction, alpha cannot come from purely shared knowledge; if everyone already had it, it wouldn't be alpha.

Much economic value, of course, still comes from execution: reliability, distribution, cost, trust, latency, integration. These are M₁ activities, and they remain valuable. But once routine execution and shared knowledge commoditize — as M₁-capable AI makes them increasingly easy to access — what compounds under competition is what creates. Most of what AI systems will be paid for under those conditions is relative M₂: creation against a competitor's, a market's, or a customer's boundary, even where the output is still M₁ relative to humanity at large.

M₂-style creation is therefore a necessary condition for AI that produces durable epistemic or strategic advantage under competition, once shared M₁ capabilities commoditize. The Civilization Tests are the limit form — what "AI that does meaningful work" looks like when the demand scales all the way up to building civilizations from raw conditions. They are the criterion against which everything else gets calibrated: the north star.

Can we build a general thinking machine that really creates knowledge beyond imitation in the real world — and thus pass the Civilization Tests?
I think yes — we likely already have much of the necessary basis, even if it may be far from sufficient.

Will robots inherit the earth? Yes, but they will be our children. Marvin Minsky, 1994²¹

Written in 2026 at Dartmouth — 70 years after the 1956 Dartmouth Summer Workshop on AI.

LLMs were used as editorial assistants for prose polishing, copyediting, and review of earlier drafts. The framework, arguments, and conclusions are the author's own.

Citation

Please cite this work as:

Cheng, Junyan. "Recursive Creating Machines." junyan.ch (May 2026).
https://junyan.ch/blog/recursive-creating-machines

Or use the BibTeX citation:

@article{cheng2026rcm,
  title  = {Recursive Creating Machines},
  author = {Cheng, Junyan},
  journal = {junyan.ch},
  year   = {2026},
  month  = {May},
  url    = {https://junyan.ch/blog/recursive-creating-machines}
}

References

Turing, A. M. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433–460.
Lovelace, A. A. (1843). Notes by the Translator. In L. F. Menabrea, Sketch of the Analytical Engine Invented by Charles Babbage. Scientific Memoirs, 3, 666–731.
Russell, S. J., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
Silver, D., Huang, A., Maddison, C. J., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Lu, Chris, Lu, Cong, Lange, R. T., Foerster, J., Clune, J., & Ha, D. (2026). Towards end-to-end automation of AI research. Nature, 651(8107), 914–919.
Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38(1), 173–198.
Turing, A. M. (1939). Systems of logic based on ordinals. Proceedings of the London Mathematical Society, Series 2, 45, 161–228.
Feferman, S. (1962). Transfinite recursive progressions of axiomatic theories. The Journal of Symbolic Logic, 27(3), 259–316.
Cheng, J., & Chin, P. (2024). Bridging Neural and Symbolic Representations with Transitional Dictionary Learning. ICLR.
Solomonoff, R. J. (1964). A formal theory of inductive inference. Parts I & II. Information and Control, 7(1), 1–22; 7(2), 224–254.
Cheng, J., & Chin, P. (2024). SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series. ICLR (Spotlight).
Cheng, J., & Chin, P. (2025). Empirical Asset Pricing with Large Language Model Agents. ICLR Advances in Financial AI Workshop.
Cheng, J., Srivastava, A., Zeng, J., Drinic, M., & Stokes, J. W. (2026). Apeiron: A Scalable LLM-agentic Framework for Autonomous Full-lifecycle Demand-optimized Application Synthesis. ACL Findings.
Cheng, J., Richardson, K., & Chin, P. (2026). Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis. ICLR.
Cheng, J., Clark, P., & Richardson, K. (2025). Language Modeling by Language Models. NeurIPS (Spotlight).
Novikov, A., et al. (2025). AlphaEvolve: A coding agent for scientific and algorithmic discovery. Google DeepMind.
Feynman, R. P. (1988). Note on his Caltech office blackboard, reproduced posthumously in The Pleasure of Finding Things Out (1999).
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4), 303–314.
LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence. OpenReview.
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138.
Minsky, M. (1994). Will Robots Inherit the Earth? Scientific American, 271(4), 108–113.