History of AI: From Automata to LLM

Published: September 10, 2025.

Introduction: The Unbroken Thread

The modern fascination with artificial intelligence often presents it as a new frontier of human ingenuity, a phenomenon born of the digital age. However, a deeper examination reveals that the quest to create artificial life and intelligent machines is a long-standing human aspiration, a recurring theme that has woven its way through millennia of history. The earliest concepts of artificial beings were not rooted in circuitry or code but in the rich tapestries of mythology and philosophy. These ancient narratives, which explored the nature of automata—a Greek term for “moves on its own”—grappled with the same core questions that define today’s debates about AI: the nature of consciousness, the dilemma of control, and the moral responsibilities of a creator.

This report traces this unbroken thread from antiquity to the present, demonstrating that the foundational concerns of artificial intelligence are deeply embedded in our collective cultural imagination. The journey begins with mythological creations, delves into the tangible clockwork marvels of the Enlightenment, and chronicles the conceptual leap to programmable computation. It will explore the birth of the AI field and its subsequent periods of disillusionment, and conduct a focused examination of Andrey Markov’s foundational work, which provides a crucial, often-overlooked intellectual precedent for modern language models. By contextualizing the current AI boom within this expansive history, it becomes clear that we are not just building new tools; we are continuing a very old conversation about the nature of creation, the limits of control, and the long-standing human dream of a mechanical mind.

Part I: In the Image of the Creator — The Ancient Roots of Artificial Life

Long before the advent of electricity or mechanical engineering, ancient civilizations imagined artificial beings that could move and act on their own, often imbuing them with divine or mystical power. These creations were not merely feats of fantasy but were a means for cultures to explore profound philosophical and ethical dilemmas related to technology and humanity.

The Mythological Automaton: Greek and Roman Ideals

One of the most compelling early visions of a robot is the Greek myth of Talos. Talos’s character aligns remarkably well with a modern definition of a robot: he was self-moving, animated by a life-giving fluid called ichor that flowed through a single vein sealed with a nail in his ankle. His function was specific and pre-programmed: he was tasked with patrolling the island three times a day to repel invaders by hurling boulders and heating his body red-hot to crush any who came ashore.

The narrative of the bronze giant, an unthinking enforcer of a preordained duty, ends not by brute strength but by the cunning of the sorceress Medea. By playing on his vulnerabilities and perhaps his human-like emotions, she removed the nail that contained his life-fluid, causing him to bleed to death. This resolution is a powerful allegory for the ancient concern with controlling a creation that is powerful but lacks judgment. It posits that a machine, no matter how physically invulnerable, can be undone by intelligence and strategy.

This theme echoes throughout ancient thought. The deus ex machina—literally “god appearing by means of a machine”—was a theatrical device used to miraculously resolve a plot. It was an act of divine intervention powered by an unseen mechanism, highlighting the dual nature of these creations as both divine revelations and clever mechanical tricks. Aristotle further explored this duality, using the term thaumata to describe automatic marvels that elicited philosophical reflection. He recognized that these devices, by their seemingly autonomous nature, prompted questions about the principles of life and motion. The philosophical discussions of the time, such as Plato’s metaphor of the human being operated by elastic cords and Erasistratus’s view of man as a pneumatic machine, further illustrate that ancient thinkers saw the human body as a mechanical system, blurring the line between biological and artificial life.

The Golem of Prague: Mysticism and the Uncontrollable Servant

Moving beyond the classical world, the Jewish mystical tradition offers another potent metaphor for artificial life in the legend of the Golem. A Golem is a soulless creature formed from an inanimate material, such as clay or dust, and brought to life not through mechanical engineering but by magical incantations and Hebrew letters, often involving the word emet (“truth”).

The most famous version, the Golem of Prague, details how Rabbi Loew created a Golem to protect the Jewish ghetto. While initially a diligent and powerful assistant, the Golem eventually ran amok, becoming a danger to its creator and the community it was made to serve. The only way to stop it was for the Rabbi to remove the magical word from its mouth, causing it to revert to dust. This legend, like the myth of Talos, is a timeless parable about the hubris of creation and the risks of a well-intentioned invention becoming a threat due to its lack of a soul, empathy, or moral judgment.

Both the Talos and Golem narratives, despite their different cultural and technical origins, are profound meditations on the same fundamental anxiety: the fear that our creations, no matter how clever or well-intentioned, will become uncontrollable and dangerous. The Golem’s return to dust by the removal of a word and Talos’s defeat by puncturing his vein are both metaphors for a “kill switch” that is the creator’s last resort. The enduring relevance of these myths is further demonstrated by the poet Edmund Spenser’s reimagining of Talos as a merciless, unbending iron knight named Talus, whose job was to dispense justice. This allegory offers a direct parallel to contemporary debates about the role of AI in judicial systems, predictive policing, and other high-stakes domains, where an inflexible, algorithmic form of “Iron Law” could be imposed without the human capacity for empathy or context.

Part II: The Mechanical Mind — From Clockwork to Computation

As the world moved from mythology to science, the dream of artificial beings evolved from a spiritual or divine concept to a tangible, if limited, reality. This period marks a critical shift from philosophical musings to the practical engineering of machines that mimicked life.

The Enlightenment’s Clockwork Marvels

The Enlightenment era in Europe brought a renewed interest in mechanical automatons, fueled by a philosophical view of living organisms as “complex machines.” One of the undisputed masters of this art was Jacques de Vaucanson, a French inventor who created sophisticated clockwork automatons in the 18th century. His creations, such as “The Flute Player” and the infamous “Defecating Duck,” were designed to arouse awe by seemingly exhibiting the complex, life-like movements of real creatures. The duck, for example, could quack, drink water, eat grain, and even appear to digest and excrete it.

While these were brilliant illusions—the “digestion” was later revealed to be an artifice—they were also crucial intellectual exercises. Vaucanson’s work exemplified the Enlightenment’s focus on scientific wonder and the belief that the mysteries of life could be broken down and replicated through mechanical principles. This era represented a conceptual shift from the divine or magical origins of automata to a purely mechanical one. The creation of such marvels was no longer seen as a god-like act but as a display of superior engineering. The practical applications of this ingenuity were soon realized when Vaucanson, appointed inspector of silk manufacturing, invented an automated loom that used perforated cards to guide the weaving process. This invention, a precursor to Joseph-Marie Jacquard’s later loom, represents a crucial link between mechanical artifice and industrial automation.

The Dawn of the Programmable Machine

While Vaucanson’s machines were marvels of mimicry, the next great leap involved a fundamental shift to computation. In the 19th century, English mathematician Charles Babbage envisioned a machine that would not merely mimic life, but perform logical operations. His initial design, the Difference Engine, was a mechanical calculator intended to automate the production of mathematical tables and eliminate human error. However, his more ambitious project, the Analytical Engine, was a true departure.

The Analytical Engine was the first design for a general-purpose, Turing-complete computer. It featured a “mill” that served as the central processing unit (CPU), a “store” for memory, and a system that could execute control flow through conditional branching and loops. The machine was designed to be programmable using punched cards, a concept Babbage borrowed directly from automated looms. The design for the Analytical Engine was a profound intellectual blueprint, but it was never completed due to a lack of funding and conflicts with his chief engineer. It was not until more than a century later that the first general-purpose computer, the Z3, was built.

The conceptual significance of the Analytical Engine was cemented by Ada Lovelace, who worked with Babbage and wrote extensive notes on his designs. Her annotations included an algorithm for calculating Bernoulli numbers using the machine, an achievement widely considered to be the first complete computer program. The Vaucanson–Babbage–Lovelace lineage is the story of how the focus of artificial creation shifted from physical imitation to abstract computation. While the “Defecating Duck” was an illusion of intelligence, the Analytical Engine was the blueprint for a machine capable of true symbolic manipulation, even if the technology to build it did not yet exist. This is the critical juncture where the modern conception of a computer as a universal tool for information processing is born.

Part III: The Birth of a Field and the First AI Winter

The 20th century saw the formal birth of artificial intelligence as a dedicated field of study, propelled by grand ambitions and a new wave of thinkers. However, this period of immense optimism would soon give way to profound disillusionment as the promises of early AI research failed to materialize.

The Moment of Conception: Turing and Dartmouth

In 1950, Alan Turing, a mathematician and cryptanalyst, proposed his famous “Imitation Game,” or Turing Test, as a way to determine if a machine could exhibit intelligent behavior indistinguishable from a human. Turing’s thought experiment, while often oversimplified today, was nuanced, including a focus on empathy and aesthetic sensitivity as components of intelligence.

The field was formally christened at the 1956 Dartmouth Summer Research Project on Artificial Intelligence, a workshop organized by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon. McCarthy coined the term “Artificial Intelligence,” defining it as “the science and engineering of making intelligent machines.” The project’s proposal was audacious, based on the conjecture that “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” The workshop aimed to find a way to make machines use language, form concepts, and solve problems typically reserved for humans. This was a pivotal moment, establishing a shared identity and a collective agenda for a new generation of researchers.

The Great Disillusionment: The First AI Winter

By the 1970s, the initial excitement had been replaced by a period of stagnation known as the first “AI winter,” characterized by a steep decline in funding and interest. A major catalyst was the 1973 Lighthill Report, commissioned by the UK government to assess the state of AI research. Sir James Lighthill’s report was highly critical of the field, arguing that it had failed to deliver on its promises and was unlikely to make significant progress in the near future. He highlighted the “combinatorial explosion” problem, suggesting that many AI algorithms were only suitable for simplified problems and would fail when faced with real-world complexity. This critique targeted the dominant symbolic AI of the time, which relied on explicitly programmed rules and knowledge bases. An earlier critique in 1969 by Minsky and Papert showed the limitations of early neural networks (perceptrons), contributing to the onset of the winter and leading to a temporary halt in neural network research.

Part IV: Deep Dive — The Markovian Precedent for Language

While the mainstream of AI research was grappling with the limitations of symbolic, rule-based systems, a parallel intellectual tradition had been quietly developing for decades, one that would eventually provide the foundation for the most significant AI breakthroughs of the 21st century.

The Probabilistic Thread: Andrey Markov’s Literary Analysis

The intellectual heritage of modern large language models (LLMs) can be traced back to the work of Andrey Markov, a Russian mathematician celebrated for his pioneering work in stochastic processes. In the early 20th century, Markov developed the theory of “Markov chains,” which describe a sequence of events where the probability of each event depends only on the state of the previous event, not on the entire sequence.

In a groundbreaking 1913 study, Markov applied his theory to the distribution of vowels and consonants in the first 20,000 letters of Pushkin’s Eugene Onegin, treating the text as a sequence of symbols and analyzing the statistical probabilities of transitions between them. This was the first empirical application of a Markov chain to language—demonstrating that the structure of language could be modeled probabilistically from data, rather than via explicit rules.

From Stochastic Theory to Natural Language Processing

Markov’s work laid the groundwork for a data-driven approach to language, and its significance was further established by Claude Shannon’s 1948 “A Mathematical Theory of Communication.” Shannon proposed using a Markov model to statistically approximate the structure of English text, foreshadowing the data-centric paradigm of modern NLP and LLMs. The core idea: the “rules” of language can be inferred from data.

The Markovian tradition is the conceptual bridge between 19th-century mathematics and 21st-century LLMs. The failure of symbolic AI was tied to its rigidity and inability to handle combinatorial explosion. Meanwhile, Markov’s statistical approach, later picked up by Shannon, provided a “sub-symbolic” tradition that would eventually dominate. Modern Transformers were designed precisely to overcome the key limitation of simple Markov models—their short memory.

Markov Model vs. Transformer Architecture

Aspect	Markov Model	Transformer Architecture
Core Mechanism	Probabilistic transitions based on the preceding k-gram.	Self-attention weighing all tokens in a sequence.
Language Representation	Sequence of symbols (letters, words).	Vector embeddings capturing semantic and syntactic information.
Key Limitation	“Memoryless” property; weak long-range dependencies.	High computational cost and vast data requirements.
Key Strength	Simplicity; efficient for small, specific tasks.	Captures long-range dependencies; strong generalization.

Part V: The AI Renaissance — Data, Hardware, and a New Paradigm

After the second AI winter in the late 1980s, following disillusionment with expert systems, the field re-emerged gradually through the confluence of refined algorithms, immense datasets, and exponential compute.

The Thaw and the Resurgence

During the winters, foundational research continued. A key breakthrough was the backpropagation algorithm, popularized by Geoffrey Hinton and colleagues in 1986, which provided an efficient way to train multi-layer neural networks. This helped thaw the second winter by overcoming perceptron limitations.

Another pillar was the Neocognitron, a hierarchical neural network by Kunihiko Fukushima (1979), inspired by the visual cortex and introducing feature detection and position invariance—precursors to CNNs. By the mid-2000s, the internet and smartphones created massive datasets, while GPUs (born for gaming) provided the compute to train deep networks. This set the stage for a new AI “summer.”

The Transformer Revolution: The Architecture Behind LLMs

Early ML made big strides in vision and speech. The major breakthrough for generative AI came with the 2017 paper Attention Is All You Need, which introduced the Transformer and its self-attention mechanism. Unlike RNNs, Transformers process entire sequences at once, capturing long-range dependencies efficiently—directly addressing the “memoryless” constraint of Markov models.

This innovation led to rapid scaling: encoder-only models like BERT and decoder-only GPT models, culminating in the 2022 public launch of ChatGPT and the mainstreaming of LLMs.

Part VI: The Present and the Future — Contextualizing Modern LLMs

Today’s large language models embody centuries-old aspirations and anxieties. Their strengths and weaknesses echo the philosophical dilemmas first explored in ancient myths and later refined in mechanical and computational forms.

The Current Landscape and Its Philosophical Echoes

The current generation (e.g., ChatGPT, Gemini) represents “Narrow AI,” performing specific sophisticated tasks like generating human-like text—distinct from hypothetical “General AI.” These models learn from vast datasets without explicit rules, solving problems via probabilistic pattern learning.

Yet the statistical approach revives enduring ethical questions. The “black box” dilemma—creators’ difficulty in explaining outputs—mirrors the inscrutability of Talos or the Golem. Algorithmic bias echoes the Golem’s unthinking force or Spenser’s iron Talus: models can perpetuate harmful patterns present in training data, lacking human empathy and judgment.

Critical Considerations and the Road Ahead

The ethical challenges of modern AI are technological manifestations of ancient concerns: control, accountability, and transparency. The debate over whether LLMs “understand” or merely simulate language statistically reprises Turing’s philosophical questions. We have made immense technical progress, but fundamental questions about intelligence and consciousness remain open.

Conclusion: The Long-Standing Aspiration

The history of artificial intelligence is not a linear march but a cyclical, often contradictory journey: ambition and winter, myth and mechanism, rules and statistics. The technologies defining the current boom—from backpropagation to Transformers—were often conceived or refined during winters that validated data-driven approaches over symbolic methods.

This report shows AI as a human story of enduring aspiration and deep anxiety. From Talos and the Golem to today’s LLMs, the dilemmas of control, accountability, and soulless creation persist. By understanding these ancient roots, we can better navigate today’s ethical and technical challenges, recognizing that we are not just building the future—we are continuing a very old conversation.

Historical Timeline of AI and Automation: From Myth to Model

Ancient World (c. 700 BCE – 400 CE)

Myth of Talos: A bronze automaton from Greek mythology, embodying an early vision of a robot.
Philosophical Pondering: Plato and Aristotle debate life and motion in the context of automata.
Heron of Alexandria: Early mechanical theater and automata.

Medieval & Renaissance (c. 1300 – 1700)

Golem Legends: Jewish mystical tales of a soulless creature brought to life from clay.
Da Vinci’s Designs: Sketches of fanciful mechanical creations, including a robotic knight.

The Mechanical Age (c. 1700 – 1840)

Vaucanson’s Automatons: Clockwork marvels like “The Flute Player” and “Defecating Duck.”
Babbage’s Engines: Difference Engine and the programmable Analytical Engine.
Ada Lovelace: First known computer algorithm for the Analytical Engine.

Early Computing & The Birth of AI (c. 1900 – 1970)

Andrey Markov: Stochastic processes applied to language.
Claude Shannon: Markov models approximate English statistically.
Alan Turing: The “Imitation Game” as a measure of machine intelligence.
Dartmouth Workshop: McCarthy coins “Artificial Intelligence,” founding the field.

AI Winters & Re-emergence (c. 1970 – 2010)

Lighthill Report: Funding cuts and the first AI winter.
Backpropagation: Training deep networks; thawing the second winter.
Neocognitron: Precursor to modern CNNs.
Technological Convergence: Large datasets + powerful GPUs.

The LLM Era (c. 2010 – Present)

AlexNet: Deep learning wins ImageNet; a new AI summer.
Attention Is All You Need: The Transformer architecture.
BERT & GPT: Transformer-based models scale; GPT-3 and public release of ChatGPT.