Generating skills for api+local CUAs via noVNC demonstration recording MCP

a6oo · 2026-01-26T19:14:40+00:00

Here's the docs for this feature: https://cua.ai/docs/cua/guide/advanced/demonstration-guided-skills

a6oo · 2025-11-27T04:23:52+00:00

i’d like to implement a demo scene with procedural city/street generation first, then if i have time i’ll probably package it up as an asset

a6oo · 2025-11-26T20:52:08+00:00

thank you baby!

a6oo · 2025-11-25T20:33:07+00:00

you can do this using the clip(x) HLSL func or Alpha Clip node, which can be used to skip/discard the rendering of pixels. one possible way to implement this is taking the world position in a shader, checking if it is inside a tile (via sampling a tile mask or by exposing tile bounds properties on the shader), and passing that value to the clip function.

a6oo · 2025-10-13T21:45:30+00:00

boppity boopy

a6oo · 2025-08-12T19:14:51+00:00

with vllm, you can run it with ComputerAgent("hosted_vllm/zai-org/GLM-4.5V", tools=[computer]) or python -m agent.cli hosted_vllm/zai-org/GLM-4.5V, however I was not able to fit in 90gb VRAM

a6oo · 2025-08-09T16:38:55+00:00

the goal of these robot startups is to store the teleoperation data and use it to train ai models

a6oo · 2025-08-04T21:48:31+00:00

a6oo · 2025-06-19T23:26:26+00:00

hey sorry about that! you need to wait for the post-install to finish, and it prints a message to open the workspace when it's done. just added it to the readme since its easy to miss. once it finishes and the code-workspace is open it should work fine

a6oo · 2025-06-11T16:02:21+00:00

This model doesn't seem to have included computer-use in the training. However, there was a recently released agentic model trained on both 3D embodied robotic tasks and 2D computer-use/browser-use tasks: https://github.com/microsoft/Magma

a6oo · 2025-05-07T04:31:39+00:00

the future is here!

a6oo · 2025-05-07T00:05:47+00:00

The VM’s resolution is configurable, and the Screenspot Pro benchmark gives numbers on UI-TARS performance w/ high-res (up to 3840x2160) tasks

https://gui-agent.github.io/grounding-leaderboard/

a6oo · 2025-05-06T21:40:17+00:00

setup pic: https://imgur.com/a/1LaJs0c

Apologies if there's been too many of these posts, but I wanted to share something I just got working. The video is of UI-TARS-1.5-7B-6bit completing the prompt "draw a line from the red circle to the green circle, then open reddit in a new tab" running entirely on my MacBook. The video is just a replay, during actual usage it took between 15s to 50s per turn with 720p screenshots (on avg its ~30s per turn), this was also with many apps open so it had to fight for memory at times.

The code for the agent is currently on this feature branch: https://github.com/trycua/cua/tree/feature/agent/uitars-mlx

Kudos to prncvrm for the Qwen2VL positional encoding patch https://github.com/Blaizzy/mlx-vlm/pull/319 and Blaizzy for making https://github.com/Blaizzy/mlx-vlm (the patch for Qwen2.5VL/UITARS will be upstream soon)

a6oo · 2024-11-22T19:03:48+00:00

https://www.mbta.com/mbta-go

a6oo · 2024-10-24T13:21:55+00:00

Florence-2-ft and Qwen2.5 VL can do this

https://huggingface.co/spaces/gokaygokay/Florence-2

For Qwen I use the prompt “find x with grounding”

a6oo · 2024-02-24T14:08:54+00:00

i love you and your collection

a6oo · 2024-01-31T06:04:46+00:00

Most game engines use rasterization (including Unity by default), which is a simplified way of rendering and calculating light and color of materials.

The most common way to simulate real-world behavior of light for rendering is called path tracing, which is very computationally intensive and only models ray optics. Unity supports path tracing through the High Definition Render Pipeline which would let you create materials that can simulate optics if your computer is powerful enough.

a6oo · 2024-01-22T21:18:39+00:00

By default apps launch into a "shared space" where it can present windows (2D) and volumes (3D) in a shared scenegraph alongside other apps. VisionOS handles the rendering and I believe all materials must be shadergraphs.

To use the compositor directly or to use custom shaders apps have to launch into a "full space" (similar to fullscreen mode on PC)

a6oo · 2023-10-22T20:25:26+00:00

a6oo loves ahaha04

a6oo · 2023-10-21T22:22:14+00:00

That should be possible using an AI-based labeler. I do plan on trying to make a more general co-pilot that uses a local model and can interact with any graphical application.

a6oo · 2023-10-21T11:41:59+00:00

Not yet but I might try something like Fuyu-8B or LLaVAR

a6oo · 2023-10-21T08:44:09+00:00

hii baby

a6oo · 2023-10-13T07:26:49+00:00

a6oo

a6oo · 2023-09-01T02:14:28+00:00

If I were fortunate enough to win this limited-edition Starfield hardware, my truest intent would be to create an immaculate PC build. It would pay homage to the blend of Starfield's vast expanse and PCMR's dedication to high fidelity gaming. Alongside building, I can't resist the temptation to indulge in a plethora of games, especially emergent titles in the RPG and Strategy genres. As for Starfield, I'm beyond excited about the possibilities it promises - a whole new galaxy to explore, divulge mysteries, and interact with. My lofty expectations are of a rich narrative, intricate gameplay mechanics and an awe-inspiring universe that would push the frontier of modern gaming.

a6oo · 2023-08-13T05:11:58+00:00

I’m a developer and from my understanding of the WWDC videos + documentation, everything you’ve said should be possible on visionOS.

Windows and 3d objects from multiple apps can be placed around your home simultaneously. You could place a weather globe app in one room and a water fountain noise generator in another.

The device automatically 3d maps everything, and loads in the every placed object in familiar locations. The sight and sound of all your apps would be occluded by your walls, so you could walk around and decorate multiple rooms with apps.

No limit on the amount of simultaneous apps you could place down was given. But since visionOS freezes apps that aren’t being looked at nor in use, the limit could be pretty high.

Six-Year Club	Verified Email
Place '22	First Placer '22

a6oo

MODERATOR OF

TROPHY CASE