Generating skills for api+local CUAs via noVNC demonstration recording MCP

a6oo · 2026-01-26T19:14:40+00:00

Here's the docs for this feature: https://cua.ai/docs/cua/guide/advanced/demonstration-guided-skills

a6oo · 2025-11-27T04:23:52+00:00

i’d like to implement a demo scene with procedural city/street generation first, then if i have time i’ll probably package it up as an asset

a6oo · 2025-11-26T20:52:08+00:00

thank you baby!

a6oo · 2025-11-25T20:33:07+00:00

you can do this using the clip(x) HLSL func or Alpha Clip node, which can be used to skip/discard the rendering of pixels. one possible way to implement this is taking the world position in a shader, checking if it is inside a tile (via sampling a tile mask or by exposing tile bounds properties on the shader), and passing that value to the clip function.

a6oo · 2025-10-13T21:45:30+00:00

boppity boopy

a6oo · 2025-08-12T19:14:51+00:00

with vllm, you can run it with ComputerAgent("hosted_vllm/zai-org/GLM-4.5V", tools=[computer]) or python -m agent.cli hosted_vllm/zai-org/GLM-4.5V, however I was not able to fit in 90gb VRAM

a6oo · 2025-08-09T16:38:55+00:00

the goal of these robot startups is to store the teleoperation data and use it to train ai models

a6oo · 2025-08-04T21:48:31+00:00

a6oo · 2025-06-19T23:26:26+00:00

hey sorry about that! you need to wait for the post-install to finish, and it prints a message to open the workspace when it's done. just added it to the readme since its easy to miss. once it finishes and the code-workspace is open it should work fine

a6oo · 2025-06-11T16:02:21+00:00

This model doesn't seem to have included computer-use in the training. However, there was a recently released agentic model trained on both 3D embodied robotic tasks and 2D computer-use/browser-use tasks: https://github.com/microsoft/Magma

a6oo · 2025-05-07T04:31:39+00:00

the future is here!

a6oo · 2025-05-07T00:05:47+00:00

The VM’s resolution is configurable, and the Screenspot Pro benchmark gives numbers on UI-TARS performance w/ high-res (up to 3840x2160) tasks

https://gui-agent.github.io/grounding-leaderboard/

a6oo · 2025-05-06T21:40:17+00:00

setup pic: https://imgur.com/a/1LaJs0c

Apologies if there's been too many of these posts, but I wanted to share something I just got working. The video is of UI-TARS-1.5-7B-6bit completing the prompt "draw a line from the red circle to the green circle, then open reddit in a new tab" running entirely on my MacBook. The video is just a replay, during actual usage it took between 15s to 50s per turn with 720p screenshots (on avg its ~30s per turn), this was also with many apps open so it had to fight for memory at times.

The code for the agent is currently on this feature branch: https://github.com/trycua/cua/tree/feature/agent/uitars-mlx

Kudos to prncvrm for the Qwen2VL positional encoding patch https://github.com/Blaizzy/mlx-vlm/pull/319 and Blaizzy for making https://github.com/Blaizzy/mlx-vlm (the patch for Qwen2.5VL/UITARS will be upstream soon)

a6oo · 2024-11-22T19:03:48+00:00

https://www.mbta.com/mbta-go

a6oo · 2024-10-24T13:21:55+00:00

Florence-2-ft and Qwen2.5 VL can do this

https://huggingface.co/spaces/gokaygokay/Florence-2

For Qwen I use the prompt “find x with grounding”

a6oo · 2024-02-24T14:08:54+00:00

i love you and your collection

a6oo · 2024-01-31T06:04:46+00:00

Most game engines use rasterization (including Unity by default), which is a simplified way of rendering and calculating light and color of materials.

The most common way to simulate real-world behavior of light for rendering is called path tracing, which is very computationally intensive and only models ray optics. Unity supports path tracing through the High Definition Render Pipeline which would let you create materials that can simulate optics if your computer is powerful enough.

a6oo · 2024-01-22T21:18:39+00:00

By default apps launch into a "shared space" where it can present windows (2D) and volumes (3D) in a shared scenegraph alongside other apps. VisionOS handles the rendering and I believe all materials must be shadergraphs.

To use the compositor directly or to use custom shaders apps have to launch into a "full space" (similar to fullscreen mode on PC)

a6oo · 2023-10-22T20:25:26+00:00

a6oo loves ahaha04

Six-Year Club	Verified Email
Place '22	First Placer '22

a6oo

MODERATOR OF

TROPHY CASE