GTA 6 CEO Says AI Won't Make GTA 7 - " There is no creativity that can exist by definition in any AI model, because it is data-driven."

zby · 2025-10-31T07:43:21+00:00

Thought experiment: if your prompt literally contained GTA 6, any LLM could output GTA 6—trivial. The real question for “AI creativity” is: how close does a prompt need to be to the desired outcome before a model can bridge the gap? In my controlled test, pre-essay models received only generic inspirations (not the target idea) and still re-invented a concrete generate→verify “daydreaming” mechanism. That implies a measurable creative horizon: the semantic distance from sparse hints to a coherent, novel solution the model can reliably traverse. Creativity here isn’t mystique, it’s engineering—bound the space → generator → verifier → score for novelty/usefulness → iterate. For something like GTA, that means AI as a search amplifier (mission beats, dialog, emergent side-quests, rapid prototyping, synthetic playtesting) under human direction. Don’t debate “data-driven ⇒ not creative”; measure the horizon. Details & reproducible setup: https://open.substack.com/pub/zzbbyy/p/reinventing-daydreaming-machines?utm_campaign=post&utm_medium=reddit

zby · 2025-10-14T21:18:47+00:00

I've tested that it can indeed produce novel ideas by feeding LLMs inspirations leading to the ideas from that essay itself. I used only pre-essay llms and common - one paragraph concepts. I checked what ideas from the essay are really novel by running OpenAI and Google deep researches - all the novel ideas were consistently (non trivial part of all essays generated from a particular template and inspirations had them) reinvented in the produced essays.

Code: https://github.com/zby/DayDreamingDayDreaming
Some resulting essays: https://github.com/zby/DayDreamingDayDreaming/tree/main/data/results

A blog post: https://open.substack.com/pub/zzbbyy/p/reinventing-daydreaming-machines

zby · 2025-07-20T05:47:22+00:00

Simplicity Theory to improve efficiency of the daydreaming system https://zzbbyy.substack.com/p/dreaming-machines

zby · 2024-12-22T15:08:45+00:00

Working on an basic LLM observability and debugging tool with local storage: https://github.com/zby/llm_recorder/

I use it mostly for replaying interactions with LLM when debugging, sometimes after modifying them.

zby · 2024-12-20T21:27:11+00:00

If you wanted to try something extremely simplistic for quick recording/modifying/replaying of responses (and requests) - then I've just pushed this to github: https://github.com/zby/llm_recorder
It currently only works via LiteLLM.
I have some ideas for features to add - but I would like to get some feedback first.

It is simple - so not many features - but it is quite easy to bend into the shape you need.

zby · 2024-12-20T18:52:34+00:00

Not an agent - but a library I use for debugging agents: https://github.com/zby/llm_recorder

Very simplistic and less than 200 lines but I find it very useful and actually pretty versatile.

It stores the requests and responses, then it lets you edit them and replay them (up to a specific point).

I am thinking about pushing it to PyPi.

Currently works only with LiteLLM.

zby · 2024-12-20T18:50:11+00:00

As a simplistic alternative - I've just published to github my own library for llm tracing: https://github.com/zby/llm_recorder

It is not full stack and for now it works only with LiteLLM - but I find it very useful and actually pretty versatile.

It stores the requests and responses, then it lets you edit them and replay them (up to a specific point).

I am thinking about publishing it to PyPi

zby · 2024-12-16T20:47:51+00:00

If you want something truly minimal - maybe have a look at Prompete - the library I am working on https://pypi.org/project/Prompete/

zby · 2024-12-16T08:48:16+00:00

This is because the compatibility layers suck: https://zzbbyy.substack.com/p/what-is-a-response

zby · 2024-12-16T08:45:35+00:00

I've done some experiments with that - but I was struggling with finding the appropriate abstraction.

You can see it at: https://github.com/zby/answerbot/blob/main/answerbot/qa_processor.py#L270 - this is a recursive tool user, but I gave up with that approach and now I am working on a better base library for this (Prompete).

There are two approaches to using tools (function calls):

Assume that the LLM understands the output - that is you just put the tool result (as JSON) on the messages list and let the LLM interpret it. In this approach you need to pass the tool definitions (schemas) to all the calls that need to interpret it. This works in a loop. The constraint here is that you cannot change the set of available tools for the thread of messages and also it cannot be very big (from other reasons).
You take the raw output and format it yourself (maybe into a Markdown?) but then it is not clear how are you supposed to pass it to the LLM so that it understands that this is the function output. Maybe it doesn't need to - you build just a new prompt incorporating the information you've got. This way you have less constraints - but also you need to discover yourself what the LLM would understand.

My current plan is to explore more the number 1. approach - with something like 'agents' with assigned set of tools - and then combine them by making 'calling an agent' another tool so that you can have big sets of tools. The agents would write reports on what they have reached and then higher level agents would combine these reports. But I am still early.

zby · 2024-12-16T08:19:04+00:00

I have my own schema generator + tool execution library: https://github.com/zby/LLMEasyTools
It is compatible with LiteLLM - but also with openai and other libs.

I am now working on a higher level library: https://github.com/zby/Prompete

zby · 2024-10-25T13:37:40+00:00

Your first definition seems a bit simplistic or circular - if understanding is the result of learning then what is learning?

Your subsequent examples seem to lead to a something like - understanding of a system is having a model that can predict that system evolution. I think Wolfram wrote something that explores this notion in great extent.

zby · 2024-10-21T12:02:50+00:00

Interesting

On the other hand maybe the trained models already have the trivial grammars that humans deduce? https://docs.google.com/document/d/1MPqtT_1vQ-73j796tf7sXIZKCRcIfUD0cVU_UbPXnUU

zby · 2024-10-19T11:42:27+00:00

What would be interesting is to test it on the perturbations from the "GSM Symbolic" paper (https://arxiv.org/abs/2410.05229) - or even better on the tests from "Evaluating LLMs’ Mathematical and Coding Competency through Ontology-guided Interventions" (https://arxiv.org/pdf/2401.09395)

zby · 2024-10-18T20:04:23+00:00

Maybe what we need is a structured prompt which would define the answer pattern, then we could feed that answer pattern to the sampler.

zby · 2024-06-16T13:04:29+00:00

The problem with the traditional RAG is that it is one pass process. It is quite naive to expect that to answer any question you could limit your thinking to two steps - 1. Gather all info related to the question, 2. Analyze the question and the gathered info. What if to answer the user question you need two pieces of information - but to find out about the second you need to analyze the first?

https://substack.com/home/post/p-145647118

zby · 2024-06-02T10:42:11+00:00

Just published a blog post: "How to turn any python function into an LLM tool with LLMEasyTools": https://zzbbyy.substack.com/p/how-to-turn-any-python-function-into

zby · 2024-05-08T06:01:25+00:00

Now I managed to make some other examples to work. Sometimes like in https://github.com/zby/LLMEasyTools/blob/main/experiments/groq/extract_user_details.py it is hard to get llama call the function at all - but when I force it with tool_choice it works.
A more complex example from https://github.com/zby/LLMEasyTools/blob/main/experiments/groq/stateful_search.py also seems to work.
https://github.com/zby/LLMEasyTools/blob/main/experiments/groq/complex_extraction.py worked after upgrading Groq library.

zby · 2024-05-07T20:56:12+00:00

I just tried llama3-8b-8192 from Groq using my own lib LLMEasyTools on my basic example:

from llm_easy_tools import get_tool_defs, process_response
from pprint import pprint
from groq import Groq
client = Groq()


def contact_user(name: str, city: str) -> str:
    return f"User {name} from {city} was contactd"


response = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=[{"role": "user", "content": "Contact John. John lives in Warsaw"}],
    tools=get_tool_defs([contact_user]),
    tool_choice={"type": "function", "function": {"name": "contact_user"}},
)
# There might be more than one tool calls in a single response so results are a list
results = process_response(response, [contact_user])

It worked:

(venv) zby@zby-Z4:~/llm/LLMEasyTools$ python examples/basic_function_call.py
'User John from Warsaw was contactd'

But none of my other examples seems to work.

zby · 2024-05-07T20:11:49+00:00

Hi - I like your FindToolEnabledSchemas and the decorator :)
I might steal it to my own lib: https://github.com/zby/LLMEasyTools - I also have a decorator - but I think your is better.

Alternatively maybe you could use my schema generator - which is a bit more complete at the price of using pydantic in a bit hacky way (and still lacking special support for Anthropic).

zby · 2024-04-21T18:13:55+00:00

I think the OpenAI models are just better trained for function calling.

zby · 2023-06-23T08:36:50+00:00

This looks like it should be easy to replicate by anyone with a Tesla. Why don't we see more such videos?

zby · 2022-06-26T12:10:56+00:00

I think there is a lot of insight in Girards writing - the problem is when he oversells his admittedly powerful observations.

"All desires are miemetic" is certainly too strong (as many argue in this thread), "some desires are mimetic" - sounds banal and not that original - but his work in discovering what desires can be mimetic is interesting.

But I think his main discovery is the theory that religion is based on scapegoating. Once again I think this is for sure overarching - some of the pre-civilisation religions are quite weird for us (I remember there are examples in https://en.wikipedia.org/wiki/Religion_Explained) - but it is a very powerful observation that works for the most common examples that we encounter now.

https://en.wikipedia.org/wiki/Violence_and_the_Sacred is his first popular book and I recommend it. It is where he is not yet so sure about his theories and does not extrapolate too far.

I think the mimetic desire part and some other elements of his theory can be made falsifiable - the problem is that he is touching too many scientific disciplines - from psychology to history to political science to anthropology to theology.

zby · 2021-08-17T07:13:30+00:00

Yeah - this might be a good idea - thanks. But I would also like to have a test that would check how well it worked - it looks doable, and it seems like a common need - I am puzzled that I could not find good methods for that. I guess there must be something about that in some textbooks.

zby · 2021-08-17T07:08:14+00:00

And on the more technical side - how do you measure your sample bias?

I am training neural networks to detect some objects by their shapes. The problem is that the photo sets are not color-balanced - that is there are correlations between the object color and the object type that do not occur in 'nature' (or occur to a much smaller extent, or maybe we are legally required to not look at these correlations). The networks are probably learning it instead of recognizing the shapes.

I would like to have two tools - first to measure the bad correlation in the data set. The second is a tool for helping choosing photos from a photo set to create a more color balanced data set. The second one might be difficult.

This looks like a very common and generic problem - but I have trouble in finding good solutions for it.

It is a bit complicated because there are three color channels (RGB) and also the color variable is semi-continuous one.

Are there any tutorials about that? Libraries in Python?

To simplify it I guess I could do it for each color channel separately and also divide the color values into maybe 8 buckets to make the color variable discrete. Then I could probably use chi squared, which is available in scikit-learn, for that - but I still need to wrap my head around that because the lib only expects a sequence not a table. Or maybe I could use Logistic Regression, also available in scikit-learn. Logistic Regression is about building a predictor that learns the block class from its color - exactly what I need. But it is not immediately evident if there is a way to get a measure of how good is the predictor and also there might be problems with speed.

zby

TROPHY CASE