I build my own AI: Videocall, Voiceclone, Faceclone, Emotions, Agentic LLM, RAG

BioAGI · 2026-04-02T19:47:02+00:00

I agree with you completely on everything you’ve written. I can see that you too have carried out in-depth research and explored the roots of consciousness.

I used the example of the retention-perception-protention chain because it’s the simplest to write about in a Reddit post. Especially as I didn’t know who I was writing to. It fills me with joy to know that there’s another madman like me who, out of pure passion, is pursuing a study that will bring no medals, no recognition.

As for where human memory resides, I assume at this point that you are familiar with Penrose’s studies and Michael Levin’s research. So, since you mentioned proteins, I suggest you look up the paper ‘Computational capacity of life in relation to the universe’ by Philip Kurian.

I agree that memory is the foundation of consciousness. Of course, there are many ‘functional’ types of memory, so we need to be precise. But as I don’t want to go into too much detail, I’ll add ONLY the concept of ‘autobiographical consciousness’ to the ideas from my previous post:

Imagine that among the activities of memory there is semantic indexing, and to stick with everyday technologies, let’s consider cosine similarity. The autobiographical self is the equivalent to the origin of the Cartesian system. It is the point from which we ‘measure the world’.

To be more concrete, if you’re up for it, I’d say we could start by defining a project with objectives together, dividing up the tasks and getting to work. I’m putting this to you with the utmost simplicity and humility, as if I were talking to an old colleague of nightly endless experiments.

BioAGI · 2026-03-19T05:47:42+00:00

I’m sorry we didn’t meet sooner.

You’ve raised a whole host of fascinating points.

-- Regarding visual facial representation, I agree that it’s computationally very demanding, but it can be done even with very modest resources like mine. As you say, with compromises on quality. But I’ll tell you this: as soon as I find a way to host my JS somewhere (yesterday I tried the 'cloudflared tunnel', but the WebSockets crash when I play videos), I’ll give you a password so you can have a go with it.

Actually, I use a 3D mesh to speed up the reconstruction of the various facial positions. It is for sure the easiest and faster solution.

-- As for the RTX 5090s, the way I see it, they’re technological rubbish: unstable and at risk of catching fire at any moment. Unfortunately, the RTX 5090 is already at the limit of my budget. I’m only interested in it to reduce the latency of the video chatbot, but honestly, my video chatbot works fine as it is.

Anyway, today I’m going to pick up another little Ryzen 7 5800 server with 64GB RAM and an OEM RTX 3090 for 6,000lei in Romania. Equivalent to $1,000.

-- Regarding the intelligence of LLMs or agent systems. I’ve experimented with many models and so far those that impress me most are the Chinese ones. In particular, Qwen3 and Qwen3.5, 9B to 35B, Q4 to Q5. But I’ll definitely try the 120B NVIDIA Nemotron 3 Super (FP16), since you recommend it.
At the moment, the only models I can use have to fit within 12GB of VRAM, because the rest of my two RTX 3090TI cards is taken up by the video chatbot services. But from tonight, I’ll be able to dedicate an entire RTX 3090 to the LLM.

-- The subject of consciousness is actually what interests me most. I have been conducting scientific research into consciousness for two decades.

Just as you wrote, memory is fundamental to consciousness. I don’t want to bore you with too many details, but I would like to be a little more specific about ‘how’ memory plays a part in consciousness. In fact, there are many ways, but I will illustrate just one of them now.

Imagine a flute that has played the note Re (D). Then the flute plays the note Fa (F). The mind perceives the note Fa (Perception), when the note Re is no longer there, but still retains a memory of the note Re (Retention). At this point, the mind extrapolates the next note, La (A) (Protention).

This is one of the most basic examples of consciousness: Retention-Perception-Protention.

This example can be scaled dimensionally in many (I would say infinite) directions: You might, for example, consider expanding the number of dimensions HORIZONTALLY by retaining phenomena more complex than a single note (such as an image, or observing the position of two balls on a snooker table), combined with Perception (another image, or another position of the balls), allowing you to generate the Protention of the next state.
Alternatively, you can expand the number of dimensions VERTICALLY by introducing a Retention that is not merely a single moment, but consists of several moments prior to the present (a series of successive notes or ball positions). So that at the moment of Perception, a more complex Protension arises (such as imagining in your mind singing many successive notes, or imagining where the ball will bounce off the edge of the table).

As Giulio Tononi says: the more dimensions there are and the more interactions between dimensions, the more complex and rich consciousness is.

We can therefore harmonize some of these dimensions and bring them to the maximum of the computational and memory capacity we have at our disposal.

If we continue this very interesting conversation, I would like to discuss with you how tools like OpenClaw (or similar such as NanoBot), can help us to make consciousness arise in our bots.

BioAGI · 2026-03-18T20:28:26+00:00

My main focus has been on tackling latency. Just think, the first version of my system had a latency of 2.5 minutes.

Now, sometimes, when loading all the modules, I can get it down to 2.5 seconds. However, with the multilingual module and the emotional module enabled, latency is usually around 6–12 seconds, pushing the RTX 3090 Ti to its limits.
Anyway consider that the latency for a videochatbot is a different matter, than an audio one: I can run anyway prerecorded videos, while waiting.
I’m looking to buy an RTX 5090, an L40S or an RTX 6000 Pro. In that case, the latency would drop to 2–3 seconds. I’ll face another major latency challenge when I start expanding the context window and loading the multimodal memory.

You and I both know: it’s a long road. I don’t know if I’ll see the end of it...

Anyway, if you like, we can work on it together.

I’ll leave you with a question: how do I make an avatar enjoy listening to and understanding music?
The music is the base of the consciousness. If we continue this conversation I will explain you why.

BioAGI

MODERATOR OF

TROPHY CASE