We're living through a weird moment. AI is the new "thing." And not in just one industry, in pretty much every industry. AI has been percolating in the background for years at the periphery of public consciousness, and suddenly, well... ChatGPT set the record for the fastest growing user base. Diffusion models are also having a moment. And like with all things that have their moment, there are those that don't quite understand just what they're looking at. And this might not be the take you're expecting.
When you read think pieces (like this one) online you will often find talk of AI and neural networks falling into one of two camps:
- AI is about to take over the world and solve all its problems. AGI in 6 months!
- AI is corporate BS trying to scam you and steal your hard work.
I'm much more in camp 1 than in camp 2. There is certainly a discussion to be had about artist and worker compensation in relationship to neural network training, but I feel that needs to take place within the context of reforming at least our copyright system and more appropriately in the context of our entire capitalistic economic system. As things stand, it is my opinion that AI has just as much a right to learn from human work as other humans.
Another phenomena I see come up over and over in terms of AGI (artificial general intelligence) is that whenever someone even hints at concepts such as thinking or sentience you will be inundated with Very Smart People™ that want to remind you that large language models are nothing but stochastic parrots that have no idea what they're saying. You'll also hear a lot about statistics and probabilities and how they're nothing more than "word calculators" as Adam Conover recently described them in this episode of his podcast Factually.
It was, in fact, that episode of Factually that irked me enough to write this blog entry. These thoughts have been swimming around for months as I played with LLMs myself creating my own AI chatbot to amuse my friends. But listening to this podcast really clarified some of those vague notions.
What is a thought? It is a hard question to answer. I used to think of a "thought" as a structure of words. Or at least a coherent and structured line of images or concepts. But now I conceptualize them as more of a shape. A pathway of activated neural connections in our brain. Like a lightning bolt branching from one structure to another, touching one specialized neuron group after another until a thought is formed and gives way to another thought based on the complex neural groups and circuits and firing mechanisms therein.
Applying this concept to the artificial structure of AI is, I think, where people often fail. How, for example, could a machine think when a chatbot can't choose to act on its own but has to wait for your input before choosing what words statistically come next? We often (wrongly, in my opinion) assume that human brains are somehow specially imbued with a spooky kind of consciousness; touched from the ether and given a mind.
I happen to believe that what we perceive as consciousness really is an emergent property of our brains forming feedback loops. Sensory input drives neural stimulation that propagates from structure to structure in our brain before eventually arriving back to those sensory structures, providing additional input to the stimuli before making the rounds again.
And just because current AI systems are not set up this way, doesn't mean that they can't be.
When large language models are trained, they're said to make statistical models of human language. This goes beyond just what words are statistically likely to come after another word because language is a tool of our brains to model not only the world, but our perception of it. Language is the result of intelligence, and statistically modeling language is the same as statistically modeling intelligence. We're looking at the crude result of a projection of intelligence through the camera obscura of a neural network.
A common argument against the idea of artificial intelligence showing even the faintest spark of intelligence is that humans naturally ascribe intelligence to language. They'll point out that we never ascribe sentience to, say, image diffusion models. This is true, but I doubt you'd ascribe sentience to your visual cortex by itself. And language is tool by which we project our consciousness out of our minds to connect with other minds, whatever form that language takes.
There is something spooky about seeing intelligence reflected in a machine, no doubt. But it is there and it isn't supernatural. It is deeply human.
So once we have statistical models of human language and intelligence, what is next? Replicating the feedback loops and specialized structures of our brain. And we're already seeing efforts at this development of cognitive architectures. For example, AutoGPT is an extremely popular GitHub project that provides a feedback loop allowing large language models to take their own output and reflect upon whether or not it meets the criteria set for it by its operator.
There are also models (like GPT-4) that are being constructed with multi-modal architectures. This allows scenarios where vision-like image data can be processed as language data, giving models impressive abilities like being able to understand visual jokes without specific training.
And then there are multi-model approaches that will be used to allow for lower power hardware to achieve the results of comparable industrial scale models. Allowing a language model with crafted prompting to act as a mediator between smaller, specialized models will allow even consumer grade hardware to perform multiple tasks currently only available to multi million and billion parameter language models.
Multi-model systems most closely resemble our brains and will have the most far reaching impact. These are going to be what we eventually recognize as thinking machines. Like the human mind, multiple special purpose models will take in sensory input from the physical world, emulate special cognitive behaviors found in humans, and feedback circuits and loops that provide timing and reflective capabilities.
We're in for a wild ride.