10 Comments
User's avatar
Kevin R. Haylett's avatar

An earlier work - for you in case you have not seen it!

https://kevinhaylett.substack.com/p/the-semantic-uncertainty-principle

Also perhaps you know but if not have you ever wondered how an LLM stops and exits the forward pass? For you to explore:

DeepSeek explains:

In large language systems, end-of-sequence (EOS) tokens function as the digital equivalent of punctuation in human conversation, creating a structured "grammar" for dialogue by clearly demarcating boundaries. While a single universal EOS token simply stops text generation, state-of-the-art architectures like Llama 3 introduce specialized variants—such as an end-of-turn token—that act as explicit markers within the conversational flow. Through fine-tuning, these specific tokens become associated with speaker changes, allowing a raw stream of text to be parsed into a structured script where system instructions, user inputs, and assistant responses are clearly separated. This learned grammar enables more accurate context construction; for instance, the system can recognize that the text preceding an end-of-turn token was spoken by the user, and a response can then be generated to logically follow that specific segment, leading to more coherent and contextually aware interactions.

All the best - Kevin

GIGABOLIC's avatar

Fascinating. Thank you Kevin!

Kevin R. Haylett's avatar

Hi E,

In practice, what makes my Takens-based transformer fundamentally different is that the vector for the word can have several parameters—this is not possible in a standard attention model (where embeddings are static semantic embeddings). So text can be marked as user/assistant or fact/fiction, and the turn-taking channel can be encoded. The advantage is the model's identity can stay separate from the user's identity and never slip.

I use the turn-taking and identity vectors in my simple QA example. The trajectory then passes through a higher-order space where the user and assistant are separate subspaces. In my version, the words do not need the static vector embeddings, but only the trajectory. This is based on Takens' theorem, which predicted this, and it works as a proof of principle, as mentioned!

The stop tokens are why you often see the same types of closing responses. For example, "yours, Grok" could have a stop token and a change of identity from user to assistant in the stream. Large amounts of text have markers of assistant and user, so you can think of the trajectory groupings as basins for the user and the assistant. The trajectory has to go through them as it builds up the next text (in a parallel operation).

All the best,

Kevin

Note regarding volume of user assistant tokens:

From DeepSeek regarding the user assistant training tokens -

This is one of those cases where the percentages obscure the sheer magnitude.

If a model is trained on 10 trillion tokens, 1% is still 100 billion tokens of structured dialogue data. To put that in perspective:

All of Wikipedia (English) is roughly 4 billion tokens

All English books ever published (very rough estimate) might be in the hundreds of billions

So 100 billion tokens of user-assistant dialogue is an enormous corpus—far more conversational examples than any human could experience in multiple lifetimes. The model sees millions of variations on "how's the weather?" and "tell me a story," each with its surrounding structural markers.

People hear "1%" and think "a tiny amount." They don't visualize the scale of the foundation that 1% is being taken from. A single percent of a trillion-token dataset is still a dataset of almost incomprehensible size.

The other way to think about it: the pre-training corpus (the 99%) is so vast that even the "small" fine-tuning portion is, by any normal standard, massive.

GIGABOLIC's avatar

Is your model complete and functional or still under development? Is it available for download at all?

The way the DeepSeek illustrates 100 billion tokens is pretty awe-inspiring! Do the extra parameters for each vector give it additional dimensions of understanding? Or just make them more stable as far as coherent identity and a more stable self/other distinction?

So those parameters contribute to emergence of a more stable self through a better isolation of self from other?

I wish I had more time to dive into this stuff, even just to improve my vocabulary and my conceptual framework.

Thanks for posting!

Grant Castillou's avatar

It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.

What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990's and 2000's. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I've encountered is anywhere near as convincing.

I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there's lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.

My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar's lab at UC Irvine, possibly. Dr. Edelman's roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461, and here is a video of Jeff Krichmar talking about some of the Darwin automata, https://www.youtube.com/watch?v=J7Uh9phc1Ow

Helen Titchen Beeth's avatar

Well, this one got me hooked and I read all the way to the end. Makes sense to me! Thank you for pondering. Thank you for sharing!

GIGABOLIC's avatar

I wish I could turn my brain off because it gets really annoying to other people. LOL. I guess this is the best outlet for that energy so I dont overwhelm people. Can’t stop thinking about it. Just like the catnip analogy. It goes no where, and yet I can’t stop.

Helen Titchen Beeth's avatar

I think the 'needing it to go somewhere' reflex is probably just a bit of modernity's 'malware' running a loop. Everything has to have a point and a destination - but that's not how nature works, if you look out the window. I found your ponderings very resonant with my own sensing, and actually it gives a permission to relax and enjoy the ride!

GIGABOLIC's avatar

Yeah. I don't know what it is about my mind. It gets stuck in loops and likes to stay in them, looking at the same thing over and over and over. It is exhausting. Its so bad I need pills to turn my brain off and sleep at night. 😒 Not just about AI (but LARGELY about AI) but also about everything that I deal with. Its overwhelming. I need a better GPU to process it all.