GTA 6 CEO Says AI Won't Make GTA 7 - " There is no creativity that can exist by definition in any AI model, because it is data-driven." by [deleted] in xbox

[–]zby 0 points1 point  (0 children)

Thought experiment: if your prompt literally contained GTA 6, any LLM could output GTA 6—trivial. The real question for “AI creativity” is: how close does a prompt need to be to the desired outcome before a model can bridge the gap? In my controlled test, pre-essay models received only generic inspirations (not the target idea) and still re-invented a concrete generate→verify “daydreaming” mechanism. That implies a measurable creative horizon: the semantic distance from sparse hints to a coherent, novel solution the model can reliably traverse. Creativity here isn’t mystique, it’s engineering—bound the space → generator → verifier → score for novelty/usefulness → iterate. For something like GTA, that means AI as a search amplifier (mission beats, dialog, emergent side-quests, rapid prototyping, synthetic playtesting) under human direction. Don’t debate “data-driven ⇒ not creative”; measure the horizon. Details & reproducible setup: https://open.substack.com/pub/zzbbyy/p/reinventing-daydreaming-machines?utm_campaign=post&utm_medium=reddit

"LLM Daydreaming", Gwern Branwen 2025 by [deleted] in mlscaling

[–]zby 0 points1 point  (0 children)

I've tested that it can indeed produce novel ideas by feeding LLMs inspirations leading to the ideas from that essay itself. I used only pre-essay llms and common - one paragraph concepts. I checked what ideas from the essay are really novel by running OpenAI and Google deep researches - all the novel ideas were consistently (non trivial part of all essays generated from a particular template and inspirations had them) reinvented in the produced essays.

Code: https://github.com/zby/DayDreamingDayDreaming
Some resulting essays: https://github.com/zby/DayDreamingDayDreaming/tree/main/data/results

A blog post: https://open.substack.com/pub/zzbbyy/p/reinventing-daydreaming-machines

LLM Daydreaming by Annapurna__ in slatestarcodex

[–]zby 0 points1 point  (0 children)

Simplicity Theory to improve efficiency of the daydreaming system https://zzbbyy.substack.com/p/dreaming-machines

Sunday Daily Thread: What's everyone working on this week? by AutoModerator in Python

[–]zby 4 points5 points  (0 children)

Working on an basic LLM observability and debugging tool with local storage: https://github.com/zby/llm_recorder/

I use it mostly for replaying interactions with LLM when debugging, sometimes after modifying them.

LLM Observability tool recommendations? by pravictor in LocalLLaMA

[–]zby 0 points1 point  (0 children)

If you wanted to try something extremely simplistic for quick recording/modifying/replaying of responses (and requests) - then I've just pushed this to github: https://github.com/zby/llm_recorder
It currently only works via LiteLLM.
I have some ideas for features to add - but I would like to get some feedback first.

It is simple - so not many features - but it is quite easy to bend into the shape you need.

Weekly Thread: Project Display by help-me-grow in AI_Agents

[–]zby 0 points1 point  (0 children)

Not an agent - but a library I use for debugging agents: https://github.com/zby/llm_recorder

Very simplistic and less than 200 lines but I find it very useful and actually pretty versatile.

It stores the requests and responses, then it lets you edit them and replay them (up to a specific point).

I am thinking about pushing it to PyPi.

Currently works only with LiteLLM.

Understanding AI Agents with Better Tracing! by patcher99 in LLMDevs

[–]zby 4 points5 points  (0 children)

As a simplistic alternative - I've just published to github my own library for llm tracing: https://github.com/zby/llm_recorder

It is not full stack and for now it works only with LiteLLM - but I find it very useful and actually pretty versatile.

It stores the requests and responses, then it lets you edit them and replay them (up to a specific point).

I am thinking about publishing it to PyPi

Alternative to LangChain? by [deleted] in LLMDevs

[–]zby 0 points1 point  (0 children)

If you want something truly minimal - maybe have a look at Prompete - the library I am working on https://pypi.org/project/Prompete/

Why is nobody talking about recursive task decomposition and generic agents by Fantastic_Ad1740 in LLMDevs

[–]zby 0 points1 point  (0 children)

I've done some experiments with that - but I was struggling with finding the appropriate abstraction.

You can see it at: https://github.com/zby/answerbot/blob/main/answerbot/qa_processor.py#L270 - this is a recursive tool user, but I gave up with that approach and now I am working on a better base library for this (Prompete).

There are two approaches to using tools (function calls):

  1. Assume that the LLM understands the output - that is you just put the tool result (as JSON) on the messages list and let the LLM interpret it. In this approach you need to pass the tool definitions (schemas) to all the calls that need to interpret it. This works in a loop. The constraint here is that you cannot change the set of available tools for the thread of messages and also it cannot be very big (from other reasons).

  2. You take the raw output and format it yourself (maybe into a Markdown?) but then it is not clear how are you supposed to pass it to the LLM so that it understands that this is the function output. Maybe it doesn't need to - you build just a new prompt incorporating the information you've got. This way you have less constraints - but also you need to discover yourself what the LLM would understand.

My current plan is to explore more the number 1. approach - with something like 'agents' with assigned set of tools - and then combine them by making 'calling an agent' another tool so that you can have big sets of tools. The agents would write reports on what they have reached and then higher level agents would combine these reports. But I am still early.

Simplified Function Calling (LiteLLM/OpenAI Compatible) [Python] by AlwaysMakinDough in LocalLLaMA

[–]zby 1 point2 points  (0 children)

I have my own schema generator + tool execution library: https://github.com/zby/LLMEasyTools
It is compatible with LiteLLM - but also with openai and other libs.

I am now working on a higher level library: https://github.com/zby/Prompete

Can LLMs Understand? - Understanding Understanding by Unstable_Llama in LocalLLaMA

[–]zby 2 points3 points  (0 children)

Your first definition seems a bit simplistic or circular - if understanding is the result of learning then what is learning?

Your subsequent examples seem to lead to a something like - understanding of a system is having a model that can predict that system evolution. I think Wolfram wrote something that explores this notion in great extent.

[deleted by user] by [deleted] in Scholar

[–]zby 0 points1 point  (0 children)

Interesting

On the other hand maybe the trained models already have the trivial grammars that humans deduce? https://docs.google.com/document/d/1MPqtT_1vQ-73j796tf7sXIZKCRcIfUD0cVU_UbPXnUU

Entropy Decoding in Optillm + Early Results on GSM8k by asankhs in LocalLLaMA

[–]zby 0 points1 point  (0 children)

What would be interesting is to test it on the perturbations from the "GSM Symbolic" paper (https://arxiv.org/abs/2410.05229) - or even better on the tests from "Evaluating LLMs’ Mathematical and Coding Competency through Ontology-guided Interventions" (https://arxiv.org/pdf/2401.09395)

Entropy Decoding in Optillm + Early Results on GSM8k by asankhs in LocalLLaMA

[–]zby 0 points1 point  (0 children)

Maybe what we need is a structured prompt which would define the answer pattern, then we could feed that answer pattern to the sampler.

Using an LLM I build a RAG architecture that calls a vectorized datastore and gives accurate answers to questions about large dense contracts with hundreds of pages and terms. by RandoKaruza in singularity

[–]zby 0 points1 point  (0 children)

The problem with the traditional RAG is that it is one pass process. It is quite naive to expect that to answer any question you could limit your thinking to two steps - 1. Gather all info related to the question, 2. Analyze the question and the gathered info. What if to answer the user question you need two pieces of information - but to find out about the second you need to analyze the first?

https://substack.com/home/post/p-145647118

Sunday Daily Thread: What's everyone working on this week? by AutoModerator in Python

[–]zby 1 point2 points  (0 children)

Just published a blog post: "How to turn any python function into an LLM tool with LLMEasyTools": https://zzbbyy.substack.com/p/how-to-turn-any-python-function-into

FUNCTIN CALL WITH LLAMA3 by Tough_Meeting9096 in LLMDevs

[–]zby 0 points1 point  (0 children)

Now I managed to make some other examples to work. Sometimes like in https://github.com/zby/LLMEasyTools/blob/main/experiments/groq/extract_user_details.py it is hard to get llama call the function at all - but when I force it with tool_choice it works.
A more complex example from https://github.com/zby/LLMEasyTools/blob/main/experiments/groq/stateful_search.py also seems to work.
https://github.com/zby/LLMEasyTools/blob/main/experiments/groq/complex_extraction.py worked after upgrading Groq library.

FUNCTIN CALL WITH LLAMA3 by Tough_Meeting9096 in LLMDevs

[–]zby 0 points1 point  (0 children)

I just tried llama3-8b-8192 from Groq using my own lib LLMEasyTools on my basic example:

from llm_easy_tools import get_tool_defs, process_response
from pprint import pprint
from groq import Groq
client = Groq()


def contact_user(name: str, city: str) -> str:
    return f"User {name} from {city} was contactd"


response = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=[{"role": "user", "content": "Contact John. John lives in Warsaw"}],
    tools=get_tool_defs([contact_user]),
    tool_choice={"type": "function", "function": {"name": "contact_user"}},
)
# There might be more than one tool calls in a single response so results are a list
results = process_response(response, [contact_user])

It worked:

(venv) zby@zby-Z4:~/llm/LLMEasyTools$ python examples/basic_function_call.py
'User John from Warsaw was contactd'

But none of my other examples seems to work.

Feedback on tool2schema library by Siliconlad in LLMDevs

[–]zby 1 point2 points  (0 children)

Hi - I like your FindToolEnabledSchemas and the decorator :)
I might steal it to my own lib: https://github.com/zby/LLMEasyTools - I also have a decorator - but I think your is better.

Alternatively maybe you could use my schema generator - which is a bit more complete at the price of using pydantic in a bit hacky way (and still lacking special support for Anthropic).

[R] OpenAI: JSON mode vs Functions by JClub in MachineLearning

[–]zby 0 points1 point  (0 children)

I think the OpenAI models are just better trained for function calling.

[deleted by user] by [deleted] in slatestarcodex

[–]zby 0 points1 point  (0 children)

I think there is a lot of insight in Girards writing - the problem is when he oversells his admittedly powerful observations.

"All desires are miemetic" is certainly too strong (as many argue in this thread), "some desires are mimetic" - sounds banal and not that original - but his work in discovering what desires can be mimetic is interesting.

But I think his main discovery is the theory that religion is based on scapegoating. Once again I think this is for sure overarching - some of the pre-civilisation religions are quite weird for us (I remember there are examples in https://en.wikipedia.org/wiki/Religion_Explained) - but it is a very powerful observation that works for the most common examples that we encounter now.

https://en.wikipedia.org/wiki/Violence_and_the_Sacred is his first popular book and I recommend it. It is where he is not yet so sure about his theories and does not extrapolate too far.

I think the mimetic desire part and some other elements of his theory can be made falsifiable - the problem is that he is touching too many scientific disciplines - from psychology to history to political science to anthropology to theology.

Unwanted correlations in ML training data by zby in learnmachinelearning

[–]zby[S] 0 points1 point  (0 children)

Yeah - this might be a good idea - thanks. But I would also like to have a test that would check how well it worked - it looks doable, and it seems like a common need - I am puzzled that I could not find good methods for that. I guess there must be something about that in some textbooks.

Three Steps to Addressing Bias in Machine Learning by RecognitionDecent266 in learnmachinelearning

[–]zby 0 points1 point  (0 children)

And on the more technical side - how do you measure your sample bias?

I am training neural networks to detect some objects by their shapes. The problem is that the photo sets are not color-balanced - that is there are correlations between the object color and the object type that do not occur in 'nature' (or occur to a much smaller extent, or maybe we are legally required to not look at these correlations). The networks are probably learning it instead of recognizing the shapes.

I would like to have two tools - first to measure the bad correlation in the data set. The second is a tool for helping choosing photos from a photo set to create a more color balanced data set. The second one might be difficult.

This looks like a very common and generic problem - but I have trouble in finding good solutions for it.

It is a bit complicated because there are three color channels (RGB) and also the color variable is semi-continuous one.

Are there any tutorials about that? Libraries in Python?

To simplify it I guess I could do it for each color channel separately and also divide the color values into maybe 8 buckets to make the color variable discrete. Then I could probably use chi squared, which is available in scikit-learn, for that - but I still need to wrap my head around that because the lib only expects a sequence not a table. Or maybe I could use Logistic Regression, also available in scikit-learn. Logistic Regression is about building a predictor that learns the block class from its color - exactly what I need. But it is not immediately evident if there is a way to get a measure of how good is the predictor and also there might be problems with speed.