The intellectual foundation for modern AI came together before anyone had used the term. In the 1940s, mathematics, logic, and early neuroscience were converging, and the framework that emerged was initially called "cybernetics," built largely by mathematician Norbert Wiener.
Wiener's core ideas came out of World War II, when he was working on systems to aim anti-aircraft guns at fast-moving bombers. The problem was prediction: human pilots don't move randomly, so past behavior could be modeled statistically to forecast future trajectories. More importantly, he formalized the concept of the "feedback loop," observing that both biological and mechanical systems work by sensing their environment, processing that input, and adjusting behavior accordingly. That architecture is the same "perceive-plan-act" loop that governs modern autonomous AI agents.
In 1943, Warren McCulloch and Walter Pitts published the first mathematical model of an artificial neuron, showing that networks of simple binary threshold devices could perform basic logic operations. It could handle AND/OR logic but was mathematically unable to solve nonlinear problems like XOR or XNOR.
Donald Hebb added something critical in 1949. His principle "neurons that fire together, wire together" introduced neuroplasticity to artificial networks and directed connectionist research toward weighted inputs.
In 1950, Alan Turing reframed the entire question. His Imitation Game replaced "can machines think?" with a behavioral test: if a machine could convince a human interrogator via typed exchange that it was human, that was sufficient. He also described "learning machines," systems that could alter their own rules through inductive processes, essentially what gradient descent does today.
These threads came together at the 1956 Dartmouth Summer Research Project, where the term "Artificial Intelligence" was formally adopted. Frank Rosenblatt introduced the Perceptron in 1957, the first trainable neural network. Then Minsky and Papert demolished it in 1969, proving single-layer networks couldn't compute XOR. Funding collapsed, and neural networks were set aside for more than a decade.