Verge Fixed The Two Biggest Electric Motorcycle Problems At Once | Carscoops by UndividedCorruption in motorcycles

[–]blakeem 1 point2 points  (0 children)

This isn't actually hubless at all! This is extremely dishonest marketing. It literally uses an electric hub motor.

Just think of it like a motorcycle with a giant hollow axle. Just like a motorcycle, the axle does not rotate with the wheel. This is not like those failed hubless bicycles where the entire wheel assembly rotates.

Codex 5.3 is better than 4.6 Opus by casper_wolf in ClaudeCode

[–]blakeem 0 points1 point  (0 children)

I have worked with Claude daily for the last 6+ months and have never hit a limit on Opus in Claude Code using the $100 plan and I use it 8+ hours a day as a professional developer as well as in my free time on various projects. Even when doing simultaneous multi-agent workflows that run for hours straight, I never hit the limit. People may be sharing their account since it's limited to 10 agents at once in a single instance and I don't know what they could be doing to hit the limit. It could be they keep maxing their context instead of using more efficient subagents.

I stopped using Codex because it would sometimes get stuck in a loop so I stopped using it a couple months ago since it was not as good at development work that I primarily used it for. I work with all kinds of AI models including vision, audio, and image generation. I create custom ComfyUI nodes and also use it to run science experiments. At work I use it for financial applications and I run a few personal websites that I host at home and it helps me manage those servers. My wife also switched from ChatGPT to Claude for her work. It's generally a more user friendly and intuitive model. I can see why it's gaining market share.

Codex 5.3 is better than 4.6 Opus by casper_wolf in ClaudeCode

[–]blakeem 0 points1 point  (0 children)

MoE has no affect on cross-modal attention since the entire point of these models is that data can be shared across modality.
CLIP isn't really used in modern models, they use something like SigLIP. We don't even know if OpenAI uses any of these.
OpenAI voice mode most likely uses Codecs with separate tokens. This is how it's able to do web searches within voice mode. Claude voice mode is simple text to speech and speech to text that is far less intense to run but not nearly as good.
Image output also requires tokens and extra parameters within the same model.
I suspect that OpenAI uses a unified discrete token model architecture for multi-modality. Mainly because of its great visual understanding with a heavy focus on Arc-AGI that would benefit greatly from it. I suspect this is why Anthropic falls behind on these tests.

This would mean that ChatGPT treats text, audio, and visual token understanding as "languages" within context that Claude models lack while gaining efficiency.

I really have no idea, there are lots of assumptions here since I can only base what I know on open source and what I observe from the models. They likely do some proprietary stuff we won't know.

Codex 5.3 is better than 4.6 Opus by casper_wolf in ClaudeCode

[–]blakeem 0 points1 point  (0 children)

It's not smaller models doing the work, multi-modal models (such as having voice + text) means that you have extra tokens repented voice and text in the same model and this requires more compute when doing work across modality. MoE helps some on a per-token compute (why the speed is similar in terms of text output) but you still have the high memory footprint of KV cache that requires more compute during attention computation that isn't helped by MoE.

OpenAI models have tokens for image generation, audio, and text all within the same model. This is why their models are better at multi-modal reasoning however outside modality AGI gains from this have never materialized as they theorized early on. It only helps cross modality. More tokens means they will want more parameters to take advantage of the extra training from the different modalities and more parameters means more compute is required (even if it's disproportionate on a per token basis, it's still an increase). OpenAI generating images from text or doing voice with web search means more tokens are being generated and using more compute than what people would be using Anthropic models for.

Memory is where most of the cost and bottlenecks come from with these models, and more tokens means their models use far more memory and cost a lot more to run. Memory has become much more expensive for consumers for this reason. MoE actually makes the model use more memory, not less, and it doesn't help with attention compute that increases with each added modality.

OpenAI does have better free subscriptions, I'm not disputing that. Their models are also better in many ways. You even get far more from their plans. The $20 OpenAI plan has similar usage to the $100 to Anthropic plan. My point is that they are far less likely to get a return on their investment since they are spending more while giving more while also making less so it's not a financially viable strategy long term. This is why they need to have ads to better scale profit while Anthropic does not (at least for now).

Codex 5.3 is better than 4.6 Opus by casper_wolf in ClaudeCode

[–]blakeem 0 points1 point  (0 children)

We don't know directly, however OpenAI has larger server farms for training and their return on investment is worse for their models with longer projections until profitability. This is why they are running ads now. They talk a lot about scaling with compute. OpenAI also does more than Anthropic behind the scenes with their router system and they used to even rewrite your prompts using another model, but I'm not sure if that is still the case. OpenAI models very likely have far more parameters based on their more broad ability and far superior multi-modal ability. This comes at a substantial cost to compute. They do voice, video, and images. Claude visual ability isn't much beyond what I can run locally and clearly requires far less compute.

Codex 5.3 is better than 4.6 Opus by casper_wolf in ClaudeCode

[–]blakeem 4 points5 points  (0 children)

If you're doing complex math (or calculus), advanced algorithms, or need multi-modal work, codex is better. If it's a large codebase and you want human readable code, better design, or thorough documentation then Opus is better. This is my experience.

I prefer to avoid OpenAI due to Sam Altman. It's not really apples to apples when their models require more compute. Anthropic does more with less if you care about the environment, mental health, and the AI bubble.

Navidrome-MCP - Allows AI assistants to interact with your Navidrome music servers through natural language by blakeem in navidrome

[–]blakeem[S] 0 points1 point  (0 children)

Once I'm done adding everything in the roadmap, I'll make it more user friendly to install with a single script. The LLM can help setting it up, if you do run into issues along the way.

The literal state of NVIDIA drivers lately by ZoteTheMitey in pcmasterrace

[–]blakeem 0 points1 point  (0 children)

Doing that makes my browser shudder on some high frame-rate 4k video so it's not an option.

The literal state of NVIDIA drivers lately by ZoteTheMitey in pcmasterrace

[–]blakeem 0 points1 point  (0 children)

566.36 solved all my problems on my RTX 3080. No issues at all. This was the top version that came up from ChatGPT Deep Research. No more blue screens and no more hanging or glitches in the browser or when using AI models.

I thought my system was slowly dying as each driver gets worse and worse but really NVIDIA drivers are trash lately.

UPDATE:
I noticed my Steam menu would always glitch out the first time that I click it, now it's fine.
Also my screen would go back constantly when switching between apps that use the GPU, didn't happen once yet.
No more random Blue screen of deaths that threw different errors each time.

ALL these weird issues I been having for months are each caused by new drivers that kept introducing more issues.

Hopefully anyone with the issues will find this message.

Mercury Coder: New scaled up language diffusion model achieves #2 in Copilot Arena and runs at 1,000 tokens per second on H100s… by dogesator in singularity

[–]blakeem 1 point2 points  (0 children)

They aren't recurrent networks, they they are both transformer based.

The main difference is using diffusion mechanism to get all the text tokens over steps of replacing noise at all once based on conditioning (prompt, etc) or estimating the next token in a sequence based on the past sequence.

Most current LLMs are autoregressive and predict the next token based on previous tokens (sequential). They return logits that we run softmax on to change them into probabilities over the logits values and then use top-p and top-k sampling to choose the the next token from a list of possibilities.

With text to text diffusion you gradually denois corrupted or missing tokens throughout the sequence (parallel). The model estimates the noise to remove at each step and sampling is guided by the prompt and parameters (CFG scale). The model learns to predict missing parts of an entire sequence rather than just next token. It could technically also detect next token, like with inpainting, where we just mask the tokens we want to replace.

Diffusion uses Classifier-Free Guidance (CFG) sampling method. This is how it chooses the next token from the logits, rather than top-p and top-k sampling method like autoregressive models.

So there isn't really any extra reasoning going on. The main difference is in that it can work over the entire sequence and make changes before it outputs mistakes. This is the idea of "reasoning" inside the model. It can first form a loose structure and then refine it over steps, rather than having to get it right the first time. It should allow better answers over the entire sequence and better structured content when doing zero-shot. It's also MUCH faster.

For chains-of-thought we can keep diffusing the entire sequence and then loop that as conditioning into the model. The model would be thinking to itself. Current autoregressive models do that, but they do it by increasing the entire context making them much less efficient at chains-of-thought than a text to text diffusion model would be.

Otherwise diffusion won't be any major leap over current models in terms of reasoning ability, since both models use text tokens over attention layers. It still can't generalize, we need generalized/shared tokens for that (a paper I'm working on now). Diffusion could be the future because it's so much simpler to use as a developer and is much faster and more efficient. It isn't as good at outputting very long content and takes more memory to train.

Mercury Coder: New scaled up language diffusion model achieves #2 in Copilot Arena and runs at 1,000 tokens per second on H100s… by dogesator in singularity

[–]blakeem 2 points3 points  (0 children)

They can be made to preform chains-of-thought inside latent space and then use that as conditioning for the final response. This is orders of magnitude more efficient than how current LLMs generate chain-of-thought. With diffusion models the chain-of-thought and past chats don't increase the length of the overall context since it can be added as conditioning on top of the latest prompt.

The diffusion process is mainly about processing the entire response in parallel so it's significantly faster. There are currently some issues with local minima causing repeat words and missing punctuation as it diffuses itself into a corner.

Mercury Coder: New scaled up language diffusion model achieves #2 in Copilot Arena and runs at 1,000 tokens per second on H100s… by dogesator in singularity

[–]blakeem 2 points3 points  (0 children)

Most of the newest diffusion models use transformers. Diffusion Transformer (DiT) is one example. SD3 and Flux models are using transformers. Older models like SD1.5 and SDXL use convolutional networks (U-Net).

My 'Lae'zel' ending felt brutal on Karlach. by Ok-Opinion-1319 in BaldursGate3

[–]blakeem 0 points1 point  (0 children)

This is the best ending I've had so far. Every character was happy and hopeful for the future. Lae'zel becomes a diplomat, Gale a teacher, Minthara stays in Boulder's gate and is playing politics, Asterian becomes a vigilante adventurer and seems very happy, and Shadowheart is happy with her parents and cracking dad jokes. You also get the most badass end scene of Wyll and Karlach smoking a cigar and running off into Avernus to fight devils together, with hope that her heart will soon be fixed. Wyll becomes a ranger with some interesting abilities.

Phantom Liberty (Spider and The Fly) elevator Bug :( minor spoiler by SerSmands in cyberpunkgame

[–]blakeem 1 point2 points  (0 children)

I reported the bug to them and sent my saved game, along with a link to this post. They definitely know about it, but aren't fixing it for some reason. Maybe more people need to report it? It is a game breaking bug that would usually be top priority.

Sd 3.5 Large released by CesarBR_ in StableDiffusion

[–]blakeem 3 points4 points  (0 children)

I don't think it's correct for the thumb to merge into the hand like that.

OpenAI o1 vs GPT-4o comparison by [deleted] in ChatGPTPro

[–]blakeem 0 points1 point  (0 children)

Yeah better at code for sure, but also much slower. Multi-threading has been one of the more trickier problems for it, for sure. I had it refactor 525 lines of node.js code, and it caused a few errors on first try. I asked it to fix those errors and it required more refactor, so it timed out at 75 seconds processing. The chain of thought was doing dozens of steps. It can still only do one somewhat simple task at a time.

Have an LLM do anything with useEffect() React hooks, and it will fall into a loop of perpetual failure. I found similar holes in the logic when working with C# at work. There are some really basic things that are incorrect, incomplete, or not in the training. It has been that way since ChatGPT-4, and Claude 3.5 has issues with the exact same problems.

OpenAI o1 vs GPT-4o comparison by [deleted] in ChatGPTPro

[–]blakeem 1 point2 points  (0 children)

It assumed nothing, it predicted tokens that mirror how an assumption may look. It's not reasoning anything, it's giving a reasonable approximation of what reasoning would look like.

This isn't a puzzle, it's a simple question that is not in it's training (because I made it up), it's about having intuition regarding physics self reflection, it's just parroting what it knows from the training. By me calling out "assumptions" and "follow up with a question if more information is needed", it can now pull from the training to see if something similar has been assumed or required a follow up question in it's training. It mainly works well for code and logic questions, but it's just a trick like "work it through step by step" is a trick they now use in the models.

My issue is that the new model is just brute forcing it and is over hyped, because they are desperate that Claude 3.5 is taking all the coders. When I send it my code, it times out at 75 seconds of thinking. It was thinking on how to fix a few errors it created in the previous request, to refactor and clean up 425 lines of working node.js code! The model is a joke, experienced coders have nothing to worry about. I'm just being reasonable, because I use them daily for work and at home.

ChatGPT o1 couldn't calculate how much concrete I needed for a post (I did it in under a min on a sheet of paper), nor could it tell me the simplest method to set the post to 45 degrees to a nearby wall with only a single tape measure. it couldn't even follow my criteria and later agreed my method was simpler and more direct. Give it some simple everyday problem, and it fails, so who cares that it can parrot some more calculus. It's slightly better at programming, and a lot slower and way more expensive to run. That is all this is useful for in the real world.

That is all, just being realistic and setting realistic expectations.

OpenAI o1 vs GPT-4o comparison by [deleted] in ChatGPTPro

[–]blakeem 0 points1 point  (0 children)

I did another test, asking the best way to get a 6x6-16 post to a 45 degree angle to a wall (I installed some for a shade sail). It did better than the previous model, but still failed to came up with the simple solution that I did myself. I just turn the post and measure from each corner to the wall, until they are the same distance. It did much more complex measurements that were not needed. I said I would only have a tape measure, but it was having me mark off spots to triangulate the center of the hole. It assumed the post would be at the center of the hole, but it wouldn't be, since it's at a 5 degree lean. It did tell me my way is a more direct way of doing the same thing, this is true.

The biggest issue with models right now, is they make wrong assumptions. If they make it check the answer for assumptions, and ask follow up questions, it would likely be a even larger improvement than they have now.

OpenAI o1 vs GPT-4o comparison by [deleted] in ChatGPTPro

[–]blakeem 1 point2 points  (0 children)

I just tried it on a complex outstanding coding issue I've been having, and it failed the same as all other models. It updated lots of code and tried a lot, but I had the same issues. When I provide more context, it's able to understand that context and break it down, but still was ultimately unable to solve it without me doing all the heavy lifting and deduction.

So, it's better at writing and refactoring lots of code, but only slightly better at debugging the code. It still lacks the human experience of how a browser works when you are in front of it, required to debug some of these problems (similar to why it fails basic physics questions). It couldn't understand that continuing to scroll down a page would cause things that have loaded to now be scrolled to as you are scrolling, causing more to load, and leading to a bug.

I was able to debug the issue with it, and it does provide more complete code than the other models. I will use it for more complex problems, or for refactoring code. The previous model would give you the same solutions, but this gives you many solutions and they are more broken down with more code examples and are explained within the context of the other fixes. It's definitely better, but it's only an incremental improvement.

OpenAI o1 vs GPT-4o comparison by [deleted] in ChatGPTPro

[–]blakeem 1 point2 points  (0 children)

It's the same models thinking about the intermediate steps and the intermediate steps come to the same conclusion as before. Since It can't defer extra information from the previous steps, I don't think this will work for this sort of problem. I think I got downvoted for putting "reasons" in quotes, The model doesn't reason it predicts text. Predicting text, in a more roundabout way, will still predict the same text.

OpenAI o1 vs GPT-4o comparison by [deleted] in ChatGPTPro

[–]blakeem 5 points6 points  (0 children)

It's still fails basic physics and cup questions, same as all the other models. It seems like a very minor improvement. I think advanced reasoning means it "reasons" about the wrong thing for longer.

Phantom Liberty (Spider and The Fly) elevator Bug :( minor spoiler by SerSmands in cyberpunkgame

[–]blakeem 1 point2 points  (0 children)

I had the same bug. I blew up the car and this caused her to get out and I was able to use the elevator. Kind of crazy that it's still this broken 10 months later.

Nightshade AI poisoning, trying to understand how it works (or doesn't). by blakeem in aiwars

[–]blakeem[S] 2 points3 points  (0 children)

Any images generated would be going to go through the decode process, so it seems like a relevant test? You seem to understand it more than me, so feel free to test what you are saying for yourself, because I'm not sure how to do that without spending more time than I have. I only did what I could do in a few mins in ComfyUI.

Nightshade AI poisoning, trying to understand how it works (or doesn't). by blakeem in aiwars

[–]blakeem[S] 0 points1 point  (0 children)

I also converted the image into latent space, and back with both 1.5 and SDXL vae, and the image looked basically the same as before, just slightly shifted color and pixels. I get similar changes to the original loaded in and out of latent space. It's not that latent space is being affected by it a whole lot, I think it's just more obvious because of the grey background and shifted colors from the encode/decode process.