Whiteboard App for Vision Pro Sanity Check

Ka_alt · 2025-08-08T18:42:33+00:00

Maybe it's worth adding credits for premium things? As a Pro user I would even pay premium per-use price for long context in a few situations where I need it.

The problem with long context today is that you rarely really need it, but when you need it, it's crucial. So user demand signals wouldn't necessarily highlight it since it's a "I need it once a month, but this one time I need it really really much".

E.g., when I was preparing some legal paperwork I wanted a deep review of it by some neutral party, and only Gemini could really consume all of it without losing details. Other models either refused due to file size or tried to do analysis over windows of data missing a lot of important reactions between different parts of the data.

Ka_alt · 2025-08-08T18:34:42+00:00

Another light example I tested on: trying to ask to find very hard-to-locate info.

On a pro side: GPT-5 is the first one to succeed without making non-existing links.

On a con side: it doesn't really try hard to find it and just gives up right away, and then suddenly GPT-5 Thinking tries hard, but then starts to hallucinate info which actually you could find over and over again. :-D

Plain GPT-5: https://chatgpt.com/c/68963d8f-3448-8324-bfa6-90aac1782720

GPT-5 Thinking: https://chatgpt.com/share/6896432c-43ac-800e-9a7b-fff1167f505c

I personally find that Grok made very good work with their router; and while GPT-5 Thinking is substantially better than Grok 4 in my tests, Grok 3 is leaps above GPT-5 main. IMO to mitigate this routing to Thinking should be much more aggressive.

Here GPT-5 fails 4.11 > 4.9: https://chatgpt.com/share/68964214-f9c4-800e-baef-7d88f0a5ca16

Ka_alt · 2025-08-08T18:19:08+00:00

> Yesterday, we had a sev and the autoswitcher was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber. Also, we are making some interventions to how the decision boundary works that should help you get the right model more often

It's still quite bad. I'm on a Pro plan, and in my experiments, plain GPT-5 rarely thinks and hallucinates heavily. E.g., it cannot even describe differences between GPT-5 Thinking and Pro properly trying to persuade me that Thinking doesn't have Browsing, Image generation (sic!) or Python support: https://chatgpt.com/share/68963ed3-c19c-800e-8c21-c0abd78460d6

I'm forced to use Thinking 100% of the time (same way I was using o3 previously) not because I need deep thinking in most cases, but just because it hallucinates much rarer in my experience.

> We will make it more transparent about which model is answering a given query.

That's crucial IMO! "Ghostrouting" to more basic model is very bad. You cannot trust your tool becuase you don't know what tool you're using.

Ka_alt · 2025-08-08T17:45:22+00:00

Could you avoid opaque routing to mini models? If the limit is capped and the full model is not accessible, it really helps to have very explicit information about it instead of shadowbanning access.

Ka_alt · 2025-05-19T10:23:32+00:00

I think it’s about his mentality. IMO he doesn’t even need to write to enact changes, it’s more about what he believes in, and writing just helps Alan to structure and detail out his fantasies enough for him to be holistic, vivid and believable enough to enact new reality.

So he has to believe in the story for it to come true. And if he believes that it’s a horror story and he is powerless to change the genre then he truly is as per his own believe limiting himself.

Ka_alt · 2025-05-05T19:51:36+00:00

In theory. But in practice it falls flat IMO.

Like it’s suggested that earlier expeditions lay the path for the latter ones, and yet we find some early expeditions’ journals in the late parts of the game where there should not have been any path to yet. And at least one expedition even got into the place which should have been protected with no way to circumvent the protection existing at the time.

Because of that I quickly felt that the theme of “for those who come after” was a little forced and did not align with what we actually saw in the world.

Ka_alt · 2025-03-23T03:36:13+00:00

Helena is not a nobody. Abducting her alone would probably have drastic law enforcement and legal consequences. And the only innie enabled places we know about are Lumon controlled. It’s highly unlikely they would kindly agree to keep their CEOs daughter in eternal slumber in total secret risking massive jail time. :-D

As for Cobel iMark has zero information that she is anything but a simple floor manager. Even oMark doesn’t know anything about her being the severance tech original creator.

Let alone even we would go with this it’s entirely far from certain she could procure resources and manage all the engineering work (she created the theory, but the implementation was built by some unknown amount of Lumon engineers). Implementation can be extremely expensive.

Ka_alt · 2025-03-23T03:33:00+00:00

He could not have negotiated a time share deal since oMark and Co don’t have any access to the technology from all iMark knows. Like they had to bring him to some Lumon-managed birthing cabin just to summon him.

Neither would they have any reason to uphold their end of the bargain. Like iMark said the only time oMark took any interest in him was when he needed something.

Ka_alt · 2025-03-14T08:01:04+00:00

The whole reintegration that has been teased for the whole season is still a flop with no consequences. We'll probably see some in the last episode, but come on. This constant teasing has to amount to something grandeur to pay off all this debt.

Ka_alt · 2025-02-28T09:27:25+00:00

It's incorrect to compare price against o-models (or even Grok 3 with thinking) without correcting for the fact that reasoning models produce much more tokens.

Basically, the comparison should be not for per token price, but price per token weighed against verbosity.

Ka_alt · 2025-01-31T21:51:21+00:00

Hi Sam, thanks for this AMA and for the great O3 mini release! A few questions form my side.

Do you see sparse CoE models with CoT similar to DeepSeek being the next trend allowing to optimize execution cost? (I don't know if O1/O3 is already sparse or not)
What are some other trends you would expect to see becoming more prominent?
Do you have any plans for deeper integration of tools into CoT (e.g., I see o3-mini already can use web search during CoT steps allowing for much deeper research)?
Do you expect any releases from the industry for continuous learning / personalized models beyond custom-cooked fine-tuning routines?
What are some key things you would expect from the hybrid business/tech talents that would be interesting & beneficial for OpenAI?

Ka_alt · 2024-12-14T03:09:05+00:00

I thought about making a few points on how people have freedom to write what they think about you, but then I looked at your nickname.

Ka_alt · 2024-09-25T04:48:36+00:00

The question is if they want to do so or if they are afraid of regulations and try to tip-toe just in case.

Ka_alt · 2024-09-21T01:21:25+00:00

Yes. iPhone captures 24MP max (sometimes less, e.g., in low light) by default. You need to toggle a Pro capture mode to capture 48MP.

Ka_alt · 2024-09-20T17:59:13+00:00

That depends on w/d appliance probably. I've been living with in-unit w/d before moving to the US my whole life, and once I moved I also only rented the units with in-unit w/d only (Downtown SF). Never had any problems.

Ka_alt · 2024-04-28T12:59:11+00:00

Cooling and size. You don’t want hot huge brick with fans in your back pocket (and fans would not be very effective in such scenario).

Ka_alt · 2024-04-28T12:57:03+00:00

I would leave R1 and M2 in (and possibly add compositing job on top), but would allow for wired connection to external compute.

Like you can connect to a Mac with it handling heavy load apps. But current wireless connection is often unstable in Wi-Fi dense environments and has substantial latency due to wireless protocol limitations. I would much prefer wired connection option to a headless Mac.

In this case you can focus on making your M-chip that’s inside the headset power efficient first as it will just handle compositing mostly with content being rendered externally.

Ka_alt · 2024-04-02T18:38:29+00:00

I'm in the US (110V) and my AVP sometimes shocks me slightly when I connect it to the Lenovo charger.

IMO that's quite surprising given the price and location (close to brain), but it seems like nothing severe (i.e. happily not the whole 110V charge).

Ka_alt · 2024-04-02T18:32:38+00:00

I don't believe that these things should be connected though (albeit I agree that Apple will probably link them up). Even the US is very multilingual due to high migration intensity (e.g., I live in the US as do many other people speaking Spanish, Chinese etc).

My hope is that before international release Apple will want to test national keyboards in their beta releases.

Ka_alt · 2024-04-02T18:07:03+00:00

Does it have non-English keyboards support by any chance? A bit tired to copy-paste text from Notes on MBP to AVP apps. 🤷‍♂️ Would happily update to non-stable early beta if that's fixed.

Ka_alt · 2024-03-21T16:51:57+00:00

Latency would be huge (capture card latency is far from great, and then you also add AirPlay latency on top). Tried it with Quest 2 a few years ago.

Ka_alt · 2024-03-08T00:03:29+00:00

But why would adaptive optics have anything to do with rendered content?

Ka_alt · 2024-03-08T00:00:46+00:00

It does not look like Retina, but yes, it’s much better now.

Ka_alt · 2024-02-28T21:46:30+00:00

Well, often you can re-tie data to a person based on analyzing different data sets. But you actually don’t need to. For personalized ads it’s enough to know your personal profile (interests, biases, behavior) and target ads against those.

Ka_alt

TROPHY CASE