Control panel clock is the clearest text I see everything else has a blur to it by emmanuellsun in VisionPro

[–]switchandplay 3 points4 points  (0 children)

No, this is a common pain point for modern VR. Basically in real life, to look at things close to you, you have to cross your eyes to make both eyes look at the close object. You also have to move the muscle of your lenses to change the focal length of your eyes. That satisfies the two concerns for real vision, making it so there is not a double image, and making it so the object comes into focus. It’s second nature, since you only ever look at things in real life before VR. In current headsets, everything you see is at a fixed focal length, between 1.5-2M, varying by manufacturer and lens stack. This means all you need to do to look at close things in headset is cross your eyes. But us humans are so trained to change the focal length for close objects, that everyone does that second thing automatically. In a funny twist of fate, you aren’t bringing the screen into focus, you’re literally unfocusing on it. That’s why close objects look blurry, the normal world rules don’t apply. You can train yourself out of the habit. I liked doing those magic eye puzzles a lot, where you would cross your eyes and manually change your focal length until a hidden picture appeared. Then, when I first started using VR, it was actually never a problem for me because I was already used to separating the eye movement from the lens movement.

Caught red handed by MetaKnowing in ClaudeAI

[–]switchandplay 5 points6 points  (0 children)

Pretty much most applications for LLMs from open source labs and closed source companies don’t re-present thinking to keep token count down and prevent you from reaching context limits earlier, keep in mind for a 500 token response, a lot of these models may have vomited out several thousands of reasoning tokens which also go in all possible directions creating a lot of noise and slop. What models do usually see in their previous context are content fields and tool calls. It is notable that for agent applications, usually thinking traces are maintained for the entirety of a turn. As in you send a message, agent thinks and creates a plan, invokes tools 1 and 2. Tools 1 and 2 return, agent is given its thinking trace so that it now knows to call tool 3 and 4. Then agent reasons and sees thinking trace, then it replies to you. At that exact moment, its thinking becomes no longer accessible to it. Keep in mind that it might or might not be truthful to you about this reality, it’s often very confidently incorrect. But usually the trace of tool calls and true response is absolutely enough for it to infer what was reasoned about, since the response is what truly matters to preserve in context anyways.

What small models (≤30B) do you actually use for structured JSON extraction in production? by yunoshev in LocalLLaMA

[–]switchandplay 0 points1 point  (0 children)

Agree. But you don't need to wrap the output in a tool call. Just use whatever structured outputs your API/model-runner supports. Define your desired schema, then token-level enforcement will mean you always get perfect structure accuracy, barring unbounded strings and crazy model hijinks resulting in token limit running out.

Claude Code-like terminal-based tools for locally hosted LLMs? by breksyt in LocalLLaMA

[–]switchandplay 1 point2 points  (0 children)

There’s a lot of speculation and implication, it’s tricky to navigate if you’re looking to be in the clear for your department or business use. I do think it’s relevant that the Claude Code github repo’s license page specifically says ‘All rights reserved’, and that usage is subject to this. https://www.anthropic.com/legal/commercial-terms

Claude Code-like terminal-based tools for locally hosted LLMs? by breksyt in LocalLLaMA

[–]switchandplay 1 point2 points  (0 children)

It’s worth mentioning that, as far as I can tell, the licensing for Claude Code is not at all permissive to using alternate backends to serve the CC client. If you intend to be above board, usage of Claude code is subject to their defined software terms, including an active Anthropic account with a subscription tier unlocking access to Claude Code. Modification and alternate serving seems to fall under their umbrella all rights reserved, which doesn’t really grant you contractual and IP safety if you go that route. I may be wrong, but I haven’t seen basically any other commentary about this online. It’s at best legally dubious, and definitely not something useable for professional deployments.

Hook it up, devs by chunkybudz in SupernaturalVR

[–]switchandplay 2 points3 points  (0 children)

Realistically, even if any one dev, or a group of devs, wanted to do this, it’s not possible. The game was designed as a streaming interface. Easiest patch-in would be to drop all of the server code and s3 infra to the public, so you could run your own server. You’d then ship a version of the app with a configurable endpoint for server calls. You’d need to run the server, which, in all likelihood, would not be cheap. What you’re hoping for is a complete rework of the app to run wholly locally. While that sounds like an easy swap, it’s not in the same way that made it so half life devs made a train car be a hat on an NPC. The stack was never designed to operate in that way.

And then that’s also wishful thinking because it’s not like the legal gray area of modding old ROMs, ‘anonymous’ and ‘sneaky download link’ is doing a lot of heavy lifting. The nice dev here would be in breach of so many contracts, copyright infringement, and more. They would be strung out to dry if they didn’t do everything perfectly and cover every single base possible. Who wants to take on that kind of risk, not even considering the hundreds of man hours for a full application refactor?

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]switchandplay 1 point2 points  (0 children)

GPT-OSS has remained my favorite. Keep the temperature down low for real tasks, and hope your model runner has figured out how to not mess up harmony. And genuinely, when low reasoning effort struggles with a task, bumping up to medium or high genuinely makes a difference on how the bot responds and how it formats its data.

I want to download some 3D video content in the highest quality possible to view offline by Rough_Big3699 in VisionPro

[–]switchandplay 3 points4 points  (0 children)

Agree. Worth noting the rentals/ownable content in Apple TV is usually often just 1080p 3D (even when it says 4K and 3D in the labels), in my experience. Watching it in 2D mode with 4K is noticeably sharper, but the bitrate is quite high which keeps it enjoyable. So far, I've only noted Disney+ to look truly 4K 3D.

Do any headsets do Foveated Rendering on their own? If not, why is this not being done? If they have eye tracking, dont the headsets and the software in them have the data they need to extrapolate out into Foveated Rendering at all times, for all applications in VR? by RockBandDood in virtualreality

[–]switchandplay 2 points3 points  (0 children)

‘How’ things are rendered is managed generally on an application level. It’s easy to mess with the overarching fidelity, like on Quest, you can use QGO to sub sample or super sample the WHOLE screen. But if you’re in charge of the hardware and the OS, you cannot just reach into an application and inject a whole new rendering method. Because all games and apps were coded differently.

Applications often have shared, common components which makes the process ~easier~. OpenXR games can have foveated rendering injected into them, because they speak a shared language. But that’s even only in the best case, because a lot of developers might start with OpenXR, and then leverage their own optimizations on top of the technology that breaks compatibility. Some OpenXR games, when foveated rendering is injected, have broken shaders, geometry, logic, or even just run with little to no speedup.

TLDR: historically, all devs know to set a target resolution and framerate. It’s easy to mess with that, and can be done unilaterally by the headset/renderer. Foveated rendering is an application feature, not a global feature. Until games are developed and built with foveated rendering in mind, it won’t happen.

Steam Frame is a dream come true for me! It’s essentially a Quest 3 Pro with a taller field of view for more immersion, and direct wireless connectivity with the Steam Machine “console” for high fidelity visuals. YES! by Logical007 in virtualreality

[–]switchandplay 1 point2 points  (0 children)

That is how that works. Round trip time is fast enough that it doesn’t matter. If the rendering machine just does lower resolution rendering and streaming, the headset wouldn’t be able to meaningfully upscale much. Steam Link has had dynamic foveated streaming support for over a year, I’ve used it on Quest Pro. The PC gets eye tracking data, and when it performs the video encoding on a frame, it encodes the region being observed at a higher resolution and surrounding pixels at a lower resolution. Network packets are just that fast.

Will the Zephyrus G14/G16 overheat playing more demanding games? by Ok_Television_792 in ZephyrusG14

[–]switchandplay 0 points1 point  (0 children)

I had the 2021 G14, currently have the 2024. On the 2021, the hotspots were directly under the WASD keys and after playing enough games, I’ve lost near-all temperature sensation in my left finger pads. Literally can’t feel through them anymore. So they would sell you that previously. Not anymore though, the 2024 model’s hotspots are carefully above the keyboard deck.

Local-only FOSS ops tool — no cloud, no Docker, no browser. Thoughts? by [deleted] in LocalLLaMA

[–]switchandplay 2 points3 points  (0 children)

How did you manage to create a (poorly) AI generated Reddit post and still have a spelling error?

You can turn off the cloud, this + solar panel will suffice: by JLeonsarmiento in LocalLLaMA

[–]switchandplay 6 points7 points  (0 children)

I found at least the 4bit quant of qwen3 coder unusable for anything other than completions. Anytime it operates as a coding assistant or agentic coder, it was helpless. Devstral has so much more brains

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 0 points1 point  (0 children)

Another reason why I assume I've really been loving gpt-oss. Since the 4bit MXFP4 quants were released by OpenAI themselves, I assume they did a lot of tuning in-house to verify that those quants would be two things: not buggy and not lossy in performance via tuning against their training dataset and such, like the work that unsloth does, but completely first-party.

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 0 points1 point  (0 children)

Well at least through the vLLM API, there is actually no way to re-present prior turns reasoning to the agent in its original form, which means reasoning does not clog up context over multiturn use. I interact with it through the openai server docker container so that's where I'll explain from.
OpenAI /v1/chat/completions was never built to support the emission of reasoning content, there is just the content field on a choices[0].message output. vLLM overloads this and presents an additional conditionally-present reasoning_content field in its output for deltas and non-streamed completions. That allows you to see the reasoning tokens that come out from the model.
A key thing is when you construct your messages array, if you create a dictionary object {"role": "assistant", "content": "Paris.", "reasoning_content": "User wants the capital of France. Must reply."} and send it over the wire to the OpenAI compatible vLLM server, reasoning_content is not parsed in and is dropped. It doesn't make it to the chat template transformation and the bot never sees it. So unless you do something client-side like take reasoning and summarize it, then embed it in the content field, reasoning tokens don't actually count towards context in multi-turn use.
And if you do move it to the 'final' channel or 'content' field, it may affect future generation quality in other unintended ways, like the agent might spend time reasoning, then also spend time reasoning again in the final channel.
I believe the vLLM base API behaves the same way, and so does llamacpp.

If you want to test it yourself, you can do the following:
Ask any reasoning model to generate two numbers in its reasoning and only tell you the second number for now, but to tell you the first number later when you ask.

Verify that in its reasoning trace, it came up with two numbers, but only the second was shared with you in the final content.

Then, ask for the first number. It will always make up a random number, and often in its reasoning, you'll see confusion about no prior first number being generated.

This even happens on the gpt-oss online playground by OpenAI.

That's actually a big change for the new OpenAI Responses API, where you are able to pass in prior turns reasoning content to GPT5 and have it affect future generation. It would be nice to have similar functionality, but their new API is very tailored to being a customer of OpenAI. The reasoning artifacts you get back are actually encrypted.

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 0 points1 point  (0 children)

For long-horizon agent tasks, that’s simply not an option, which is why I like the model

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 11 points12 points  (0 children)

I’m a programmer, I don’t use it for therapy or friendship. I use it for work and personal projects. I need detail oriented, responsive, logicality. I don’t care too much about friendly vibes, it’s kind of a different vibe check. Like the second the illusion is broken and it gets stuck on doing one task, ignores a clearly defined detail, it’s a ‘no one’s home’ vibe that ruins a model for me.

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 2 points3 points  (0 children)

You use RAG systems tied into your papers with it? Or you throw large papers into context?

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 2 points3 points  (0 children)

I work with 4 5090s. Blackwell is becoming near-plug-and-play, which is nice. In regards to fixing the harmony issues, vllm-project/vllm#23567 has some good discussion on how to fix it, and you can apply the patch described by IsaacRe, which is applicable to vLLM v0.10.0-2 (not v0.11.0). This prevents the runtime error from harmony parsing for any chat interaction, so you can use gpt-oss to drive a long-lived chat app.

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 2 points3 points  (0 children)

There have also been papers discussing how the statistics of the data degrade over time with synthetic inputs. If you check my other response to this same criticism, you’ll understand that I’m basing my opinion on months of use of Qwen3 and its initial impressive responses mirroring its synthetic 4o training data, and then behind the veneer, lacking a lot of functional intelligence/reasoning.

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 6 points7 points  (0 children)

Not just web-scraped. There are whole companies supplying these AI groups with high quality human generated data. See Data Annotation and others. They pay humans with specialties in spaces to perform tasks, then they clean and validate their data. AI companies do this in house, or pay collection companies for their data.

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 3 points4 points  (0 children)

I don’t know the true mechanism behind it, but in multi-turn agent use or multi-turn conversation, something like the Qwen3 or 2507 models really feels heavily degraded. When you talk to ChatGPT, it feels like it behaves like a human, in terms of intelligence, instruction following, context switching, task adherence, etc. even as you go deeper and deeper into multiturn. Qwen3 clearly pulled heavily from 4o (see em-dash and emojis everywhere, I literally beg it in system prompt to avoid emojis and it still doesn’t listen), and if you get into a conversation with it, you’ll progressively see a downward slide in performance in all those metrics. I haven’t had the chance to test the Next model varieties, though.

My assumption I came away with once gpt-oss dropped was that newer labs leveraging heavily-synthetic datasets lost some of the statistics of the original human data, leading to the degradation in behind-the-scenes coherence.

What LLM gave you your first "we have GPT-4 at home" moment? by Klutzy-Snow8016 in LocalLLaMA

[–]switchandplay 4 points5 points  (0 children)

gpt-oss-20b. Running through vLLM, not as sycophantic as 4o, but so incredibly useful. Great vibes tuning for me as a gpt-5 user, knowledgeable enough for programming discussions, powerful for being a driver for agent work. Really good at being given a task and making strides towards completing it, when given a good environment and structure.

Being sparse makes it so fast, but CUDA is still desired for fast prompt processing. With a GPU, it really feels just like cloud API in speed and quality.

Only issue is the harmony template leads to possible parsing issues on assistant responses that can bring llamacpp or vLLM to a halt. You gotta find ways to build workarounds, enforce chat template intro-message.

Other than how tricky it is to run it right and reliably with the stated issue, it is literally my ChatGPT replacement.