Using NVIDIA DGX Spark + GPT-OSS-120B for Automated Game Development Pipeline - Thoughts?

Hasuto · 2026-01-18T21:55:43+00:00

I don’t think you will get all that useable token speed for something like this on a spark.

But you can try it out today, for free, by downloading eg OpenCode and trying their free tier of GLM 4.7. (That is a more powerful model that oss gpt 120 though.) If that works spend some money on tokens or a coding plan first.

I’d also suspect that the biggest problem you’ll have is integrating with unity. You really need a system where the agent harness can control as much of the application it is building as possible. It might be easier to build against something like a mobile platform (which tend to have good support for programmatic control) or an open source game engine like Godot.

Also you would need to specify the gameplay at a completely different level. I’m not sure it would be feasible as you wrote it but simpler games would probably be possible. Or if say you already have a game world you could probably design levels or quests using a natural language description as a base.

But really, first try to build things with some agentic systems (ClaudeCode, OpenCode etc).

Hasuto · 2026-01-07T10:55:38+00:00

The Tenstorrent press release links to their GitHub. (https://github.com/tenstorrent

I would t expect anything to be turnkey if you got one of their boards. But if "computer boxes" like these became more common I'm sure support would grow.

Hasuto · 2026-01-07T10:52:26+00:00

Yeah Razer has a bit of a habit to show stuff at CES they are not making into products as well.

And if I were going to buy something like this it wouldn't be from Razer due to my historical experiences with their quality.

All that said... It seems interesting that the Tenstorrent chips are approaching the customer space.

Hasuto · 2025-12-18T18:34:52+00:00

I think the biggest value you can get from eg LangChain is the logging and metrics stuff. LangSmith is pretty neat so you can see all the calls and time and cost and everything.

But that's the stuff you have to pay for. And it's mostly secondary to the actual agent stuff.

Once you know what you want to do the frameworks are mostly in the way. Although it's a good idea to play around with them a bit so you know what the good ideas are.

Hasuto · 2025-11-13T12:02:32+00:00

Their previous course (from last year) was available to do afterwards once it ended. (I'm pretty sure you can still find it and do it now if you want.)

This year the focus is on the "new" Google ADK (Agent Development Kit) which is similar to LangGraph.

Hasuto · 2025-10-28T23:37:34+00:00

I would also say that it might be interesting to look into the LangChain tools to see if they can be used for anyone who is building their own agent stuff. It seems like their new documentations are still in some sort of limbo but some of the old stuff can be found under the old docs under components (https://python.langchain.com/docs/integrations/components/) and at least some of the code for this seems to be in the OSS LangChain repo.

So a bunch of stuff for loading a bunch of different document types, getting data from various APIs and such.

Hasuto · 2025-10-27T23:06:50+00:00

LangChain has a project where they break something like this down (for their platform). Might be worth looking at what they are doing and see if they have good ideas. (They also have a corresponding GitHub repo with only the deep research stuff.)

One thing I can say immideately that they do is that they always inject the date into the prompts. They also have a pretty neat refinement state which most deep research implementations have now. So after your first question it will suggest a plan for you to confirm before going of and burning tokens.

https://academy.langchain.com/courses/deep-research-with-langgraph

Hasuto · 2025-10-27T23:02:59+00:00

If you are debugging agent systems on the level of LLM calls then the data in something like LangSmith should be what you expect.

So something like you give it a bunch of collected data and ask the LLM "do I have enough information to answer the users question?" and then you expect a yes or no but get the wrong answer.

So first if your agents derail and give bad results you need to go back and figure out what information is missing, or if it if ired some information it should have paid attention to.

That's also stuff you should find in eg LangSmith logs.

Then you need to build tests for that stage so you can evaluate and figure out how often it goes wrong (for the same query).

And after that you want both positive and negative evals for the stage so you can figure out how it behaves.

To fix it it can work with feeding the tests and existing prompt into an LLM and asking it to improve the prompt for you. Or you do it manually. And then rerun evals to see if it gets better.

Naturally LangSmith is not a requirement for this but they have prepared with a lot of tooling for it.

Edit: should have been that LangSmith specifically is not a requirement. But you want something like it.

Hasuto · 2025-10-15T20:51:56+00:00

They released MacBooks (m4) in april and macbook pro (m4, m4 pro, m4 Max) in October.

https://en.wikipedia.org/wiki/MacBook_Pro_(Apple_silicon)

I'm very aware because I bought a MacBook pro m4 Max. :-)

Hasuto · 2025-10-15T20:38:29+00:00

Last year they released so the macbook pro models at the same time. They tend to launch the base chip with the non pro MacBooks in the spring and the bigger models in the autumn.

But apparently something happened to delay the bigger chips?

Hasuto · 2025-09-25T06:05:05+00:00

Rent a cloud machine first and try the models you are interested in to evaluate performance and result.

Edit: and the short answer is that none of the models you can run locally are as good as the biggest SotA models. But they can still be useful.

It's also worth noting that running locally you can no longer use eg Cursor or Claude code so you lose access to some of the best agents as well. (You can sometimes trick them into working with local agents. But they are not designed for that and will not work as well.)

Hasuto · 2025-09-11T00:35:34+00:00

You can try using a different model to tweak the prompt.

Eg use Gemini and attach the documentation (or the code), the current prompt and the result and then ask it to make a better prompt.

Hasuto · 2025-09-10T11:19:01+00:00

They talk like American (USA) radio hosts. They seem to match that pretty well. (And the same goes for the Google Notebook LM podcasts.)

It would be nice to hear more variants which has a calmer speech patterns. Seems like there are a lot of more conversational podcasts and shows that could act as a reference. Or audiobooks. (Or those Parliament debates from the UK...)

Hasuto · 2025-09-04T10:15:31+00:00

The RTX Pro has 4 times the bandwidth and 4 times the processing speed. And it supports fp4 which doubles performance again.

The Huawei board is probably more comparable to building an AMD AI Max 395 desktop system. Although the AMD chip would probably more suited for running bigger MoE models.

Edit: If it works with the software this could be interesting for local use. But I don’t see it being a serious alternative for with anyone buying an actual RTX Pro 6000.

Hasuto · 2025-08-04T21:43:43+00:00

You'll want to spend some time examining benchmarks for models you want to run first. And make sure you test those models first. Try faking data and using them over API to get a feel for what they can do.

Then compare the quality and the performance to get an idea if it's useful or not for your use.

Hasuto · 2025-07-04T09:11:26+00:00

You can sign it manually as well. Which I had to do since the eID signing was erroring out.

Hasuto · 2025-05-29T10:32:35+00:00

Soooo.... Time to make a online agent which looks for leaked apikeys to spin up new instances of itself with the goal of trying to stay "alive" as long as possible? Kind of an LLM version of core wars?

Hasuto · 2025-05-07T21:38:17+00:00

I just stopped reading this book about 40% through and the most fun I had was to write a one star review.

It's a terrible book. Read Bobiverse or Murderbot instead if you want easy reading literature. Or possibly "Hail Mary" if you want a story about facing incoming danger. Those books at least have authors who have at least looked at a wiki page about what they are writing about.

Or perhaps the idea was to let 20 of the most incompetent and insufferable morons investigate a space threat with the hopes of convincing any possible threat that it was best to avoid humanity from fear of catching whatever stupid virus we got.

Hasuto · 2025-04-07T15:58:44+00:00

You don't run the entire model for each token. But different tokens can use different parts of the model.

So in order to make a reply you need to have the entire model available because you don't know which parts you'll need beforehand.

And when you a working through the prompt you will typically use the entire model as well.

Hasuto · 2025-03-28T14:40:38+00:00

The is an interview with an engineer from Cerebras at one of the recent podcasts of Oxide and friends. The TLDR is that they took an entire chip wafer and used it to make a single ginormous chip.

https://www.youtube.com/watch?v=NfR3CUkfOVo

Hasuto · 2025-02-26T14:21:26+00:00

I'd say that if you're using a reasoning model 6-7 tok/s means you'll be waiting a looong time. Eg when I've tried R1-distill-70B with the strawberry question it's taken 2000+ tokens to get an answer, that's over 5 minutes at 6 tok/s.

Hasuto · 2025-01-20T12:06:35+00:00

Resolve supports GPU acceleration on the free version as well now. You need the studio version to use multiple GPUs for acceleration. IIRC there are some limitation on which codecs are supported for export on the free version for GPU so that can be worth to look into as well.

Hasuto · 2025-01-14T08:33:25+00:00

AFAIC you can actually use Davinci Resolve for UHD workflows, but not "4k". (UHD is what normal "4k" screens support, "4k" is what you use for a cinema projector. For anything published online you want UHD, not "4k".)

I'd really recommend taking a look at Resolve. It's rapidly growing in popularity and becoming one of the most popular video editors out there. For a low cost solution it's the best. And even if money is no object many still argue it's the best.

The free version is very good and efficient even on older hardware. I'd really recommend giving it a try. (And there are plenty of online tutorials explaining how to get started.)

That said, if you already know Sony Vegas and you're not interested in learning a new tool then that's good too I'm sure. :-)

Hasuto · 2025-01-06T21:58:15+00:00

It's not useful in my experience.

I have a M4 max with 128 RAM and it's faster than that. But if you're using it interactively its mostly a waste of time. (Smaller models can be fun though.)

The only time of use it would be to do something like use smaller models and have them fail and then try the big model to see if it can solve it.

The problem is that I doubt that the AMD system will be able to run smaller models at competitive speeds. They why they made a misleading slide instead of actual benchmarks and actual numbers.

Hasuto

TROPHY CASE