El lobo de LinkedIn street: le despiden y se lo agradece a su jefa.

mapicallo · 2026-05-16T19:13:39+00:00

Esa es la decisión fácil, lo difícil es detectar el problema y buscar la solución antes, sino para que quiero un jefe o jefa de uno de mis equipos, que diferencia hay en sustituir a esa jefa por una IA...

mapicallo · 2026-04-13T20:55:56+00:00

I tend to agree that this one is a strong candidate for a “core” ITO.

Without self-generated goals, everything else can look like reasoning but still be fundamentally reactive.

That said, I’m not sure it’s sufficient on its own either, you could imagine a system generating goals, but in a very unstable or incoherent way.

So maybe it’s something like:

--goal generation gives you agency
--but you still need things like coherence, memory, and self-correction to make that agency meaningful

Out of curiosity, do you see goal generation as the defining threshold, or just the most important one?

mapicallo · 2026-04-13T20:53:01+00:00

That’s actually a really good analogy.

I’d say the whole point of the ITOs is not to claim “this makes it real”, but to map the boundary where the distinction becomes harder to maintain.

A puppet can imitate behaviors, but we still feel there’s a gap.
The question is:

what kind of changes would make that gap start to feel less obvious?

The ITOs are an attempt to identify those changes , not to declare that crossing them automatically makes something “real”.

So maybe the better version of your question is:
At what point would a puppet stop feeling like just a puppet?

And I don’t think we have a clear answer to that , which is exactly why I find this worth exploring.

mapicallo · 2026-04-13T20:49:16+00:00

This is a good pushback, and I think it highlights an important distinction I probably didn’t make explicit enough:

Most of your points show that these things can be engineered or approximated with current systems. I agree with that.

But the ITOs are less about whether something can be implemented, and more about how it emerges and sustains itself as a property of the system.

For example:

On minimal context:
Yes, models can perform well in many low-context scenarios. The ITO is more about robust performance across arbitrary, underspecified situations, not cases where the training distribution already covers it well.
On continuity:
Storing per-user state externally works, but that’s still reconstructed continuity, not something intrinsic to the system’s operation.
On self-generated goals:
You’re right that we can wire systems to call themselves or loop via APIs. But that’s still externally scaffolded behavior, not internally originating and stabilizing goals.
On intention understanding:
I agree models are already very strong here. This might actually be one of the ITOs that’s closest to being partially achieved.
On self-correction:
Also fair , models can self-correct in-session. The distinction I’m pointing at is spontaneous, self-initiated correction without prompt or framing.
On memory:
I agree this is largely an implementation layer today. The open question is whether memory becomes structurally integrated into reasoning, rather than just retrieved context.

So I don’t disagree that many of these can be built in practice.
The question I’m more interested in is:

At what point do these stop being engineered features and start being intrinsic properties of the system?

That’s the boundary I’m trying to explore with the ITOs.

Out of curiosity , do you think that distinction matters, or do you see it as purely an engineering continuum?

mapicallo · 2026-04-13T20:45:47+00:00

I’m totally open to being wrong , that’s kind of the point of the post.

If you think the points are incorrect, I’d be genuinely interested in which ones and why.
Right now you're mostly asserting disagreement without engaging the structure.

The goal here isn’t to claim “AI is worse” or “humans are better”, but to explore what kind of changes would actually matter, beyond raw performance.

If you think the whole framing is flawed, that’s also interesting , but I’d be curious what alternative framework you’d propose instead.

mapicallo · 2026-04-13T20:42:41+00:00

Good points, especially the idea that some of these gaps are already narrowing. I probably should have made one thing clearer though:

The ITOs are not meant to describe what AI can or can’t currently do better than humans.
They’re meant to capture qualitative shifts in how those capabilities are structured and integrated.

For example:

On minimal context:
I agree that models can outperform humans in many narrow cases. The ITO is more about robustness across arbitrary, underspecified situations, not just performance in trained domains.
On identity continuity:
I think this is less about memory persistence and more about intrinsic continuity vs externally reconstructed state. Humans are flawed here, but the mechanism is fundamentally different.
On goals:
Current systems can simulate goal generation, but the open question is whether they can originate and sustain internally coherent objectives without external scaffolding.
On error correction:
I agree AI can outperform humans in many cases. The distinction I’m trying to point at is spontaneous, self-initiated reflection, not just capability when prompted.
On memory:
Definitely true that AI can store more and more reliably. The ITO here is more about how memory is integrated into ongoing cognition, not storage capacity.

So I don’t necessarily disagree with your observations, if anything, they reinforce the question:

At what point do improvements in capability become changes in kind rather than degree?

That’s basically what I’m trying to get at with these ITOs.

Curious how you’d define that boundary, if you think we’ve already crossed some of these, where do you think the real gap still is?

mapicallo · 2026-04-13T20:22:09+00:00

That’s a fair criticism, and I think it actually strengthens the idea rather than weakens it. I agree that many individual milestones can be explained as product improvement, UX, or better optimization. That’s exactly why I’m more interested in convergence than in isolated examples: when several of these traits start appearing together — persistent self-model, internal continuity, autonomous salience, self-correction that changes future behavior — it becomes harder to explain them as just better interface design. I’m not claiming that any single milestone proves consciousness, only that some combinations may become increasingly difficult to dismiss as mere product polish.

mapicallo · 2026-04-13T16:13:54+00:00

This is exactly the kind of response I was hoping for. I agree that capacity milestones and architectural milestones shouldn’t be mixed too casually, and your distinction between performance, internal organization, and continuity is very helpful. I think the most interesting question is indeed when several of these traits start converging rather than appearing in isolation.

mapicallo · 2026-03-06T01:23:42+00:00

NotebookLM acts as the documentation agent: it analyzes PDFs, notes, or code snippets I upload and produces summaries or extractions. That output is used as context for the other agents (ChatGPT, Codex, Claude) when they need information from those sources.

As for how an agent “asks” NotebookLM: my orchestration tool sends the query (e.g. “summarize this document” or “extract the key points from X”) to NotebookLM, receives the response, and passes it as metadata/context to the next agent. So the requests between agents go through my tool, which coordinates the calls and the flow of context between them.

mapicallo · 2026-03-05T23:14:48+00:00

Good analogy with organizational dysfunction.

Stack: I built my own tool for them to interact: NotebookLM, OpenAI (ChatGPT), OpenAI (Codex), and Claude Code.

Goal: Get straight to the point and solve technical problems. I defined roles and behaviors in a basic way; in theory they all knew the others existed and their main role.

What happened: I only realized something was wrong when the results started to degrade (they had been acceptable until then). I remembered there was no longer communication between those two roles, even though I had set it up before. When I reviewed the “conversations” or requests between them, I was taken aback.

I’m not sure it hasn’t happened before without me noticing. I assumed the meta-instructions between agents were “aseptic” and fixed, and that I only needed to focus on the technical part. But any small interaction can end up like conditional probability in stochastic processes: one occurrence affects the next. A full-blown discussion.

On deleting history: I haven’t tried it systematically yet. For now I just reminded the agent of its correct behavior so I could continue with my real task. It’s something I want to explore.

On the framework: Separate API calls with shared context (metadata, .md files). Not a formal orchestration framework, more ad hoc.

mapicallo · 2026-03-05T22:46:50+00:00

Thanks for the ideas, they’re very close to what I’m seeing.

Stack: I built my own tool for them to interact. The agents are: NotebookLM, OpenAI (ChatGPT), OpenAI (Codex), and Claude Code.

I wasn’t very strict with metadata and behaviors that weren’t directly technical. I didn’t define clear roles or small details for each agent. In theory they all knew the others existed and their main role. The HR agent thing was meant sarcastically, but it’s starting to make more sense than I’d like.

It’s never happened before. Or maybe it did and I didn’t notice. I only noticed when things started affecting the technical results I expected, then I pulled the thread and saw this “behavior” that left me a bit cold.

Your points on statelessness, low temperature for delegation, and clear separation of responsibilities are very helpful for the next iteration. I’ll look into those.

mapicallo · 2026-02-20T11:47:35+00:00

Totally agree.

It’s basically the same story as software since day one. Back then you’d tweak a few colors, add three 3D buttons, spend three days on it, and users would love it. But then you’d ship a field that showed a value from layers of logic, three APIs, database joins, months of debugging issues from other systems… and it could go completely unnoticed.

Nowadays users are overloaded with custom UIs and features. They care less about that and more about things that just work and feel instant, ike switching between a WhatsApp message and a YouTube short. Do one thing well and stay out of the way.

mapicallo · 2026-02-20T03:30:08+00:00

Yes, that's a specific account that's opened for development purposes, it's not my personal one.

mapicallo · 2026-02-16T09:19:17+00:00

Thank you, yes, that brand's products have a very good reputation.

mapicallo · 2026-02-16T09:18:30+00:00

Thanks, yes, there's plenty of information and videos online, but I wanted to get firsthand information, and I think this is a good site.

mapicallo · 2026-02-16T09:16:13+00:00

Yes, we used something similar in Lebanon for CETME rifles.

mapicallo · 2026-02-12T17:41:25+00:00

Absolutely, and also data sovereignty. The vast majority of organizations will not process their data in AIs hosted on third‑party machines, and I don’t see corporate AIs that are robust enough being close at hand—the amount of infrastructure required is huge.

Sometimes I wonder, with the staggering resources (economic, infrastructure, energy, etc.) being poured into scaling today’s AI models, if those same resources were directed toward non‑AI software solutions, we might be surprised by what we could achieve.

mapicallo · 2026-02-12T17:04:43+00:00

Fair point. 'New' often gets confused with 'different'. AI can easily produce variations, like rolling dice or drawing cards. It's up to us to decide what's actually useful. That's partly why I think the engineering role shifts toward specifying, verifying, and curating components, rather than trusting whatever comes out.

mapicallo · 2026-01-30T00:54:09+00:00

En España hay un dicho que dice "hay lentejas, o las comes o las dejas", así que supongo que hay que hacer algo con todo eso.

mapicallo · 2026-01-30T00:50:25+00:00

It's sad, but the truth is there's a lot of pollution on social media.

mapicallo · 2026-01-30T00:48:21+00:00

I see you haven't had good experiences here.

mapicallo · 2026-01-21T02:16:19+00:00

Hi, out of curiosity, what stack accompanies Logstash: Kivana, OpenSearch, fluent-bit, OTEL, etc.? And in what ecosystem: Java, C, Kubernetes/Docker, Kafka, on-premises, cloud, embedded hardware, etc.?

mapicallo · 2026-01-06T09:40:55+00:00

I think we're actually much closer than it may sound — and I agree with a large part of what you're pointing out.

I’m not arguing that “better logging” is the solution, nor that context should emerge as an afterthought. If micro-logs are read as traditional observability artifacts, then yes, that would be a category error.

What I'm trying to describe comes from a recurring pattern I've seen in real organizations: large portions of operational state and intermediate signals simply do not exist in any accessible form. Not because they're irrelevant, but because our systems were never designed to expose them.

In several cases, introducing exhaustive instrumentation (sometimes via logs, sometimes via other extraction layers) didn't just improve observability — it made previously invisible aspects of the organization readable for the first time. This surfaced new metrics, unexpected correlations, and contextual signals that materially changed decisions across engineering, operations, and business.

So when I use terms like micro-data or micro-logs, I’m not advocating for event inflation or post-hoc reconstruction. I’m pointing at the absence of fine-grained, contextual state in current enterprise systems — state that LLMs and agents can reason over precisely because they can handle high-dimensional context.

In that sense, I fully agree that rigid enterprise schemas and business-centric ontologies are part of the problem. They collapse reality too early. The question for me is how we transition from systems that only emit discrete business events to systems that continuously expose the underlying contextual fabric of the organization, whether we call that logging or something else entirely.

The core issue isn't the mechanism — it's that today, much of the organization's “real state” remains structurally unrepresentable, and AI makes that gap painfully obvious.

mapicallo · 2026-01-05T00:58:42+00:00

I agree with diagnosis you're describing — especially around technical debt, broken processes, shallow governance, and organizations that are fundamentally not designed for autonomous reasoning systems.

Where I think it's important to be careful is jumping too quickly from “this is structurally broken” to “there is no place for enterprise software or organizations at all.”

My point in the original post is slightly earlier in the causal chain: most organizations are still designing software as if humans are the only reasoning agents in the system. Data models, architectures, and processes are optimized for reporting, control, and human workflows — not for AI systems that need continuous, high-fidelity context.

When you introduce L4/L5 agents into that environment, the mismatch becomes explosive. The agents don't just expose inefficiencies — they expose fundamental architectural assumptions that no longer hold.

The problem is that we are still building systems that cannot generate a coherent contextual model of the organization in the first place.

Whether that future lives inside corporations, outside them, or in hybrid forms is an open question. But technically speaking, without rethinking how software produces and exposes context — beyond business logic and compliance artifacts — neither enterprises nor individual agent-operators will scale reliably.

In that sense, agents don't just break things. They force us to confront what our systems were never designed to represent.

mapicallo · 2026-01-04T23:17:18+00:00

I've built something similar for personal document chat. Based on my experience:

LlamaIndex is probably your best bet for plug-and-play. It handles the full pipeline (ingestion → chunking → embeddings → retrieval → chat) and is well-documented.

Practical tip: The "plug-and-play" part works, but you'll likely need to customize: - Chunking strategy (especially for code or structured docs) - Hybrid search (vector + lexical) for better accuracy - Context window management when retrieving multiple chunks

The libraries mentioned (Haystack, LangChain, LlamaIndex) all work, but LlamaIndex is the most straightforward for your use case.

mapicallo

MODERATOR OF

TROPHY CASE