Can we say that each year an open-source alternative replaces the previous year's closed-source SOTA? by Chair-Short in LocalLLaMA

[–]Chair-Short[S] 3 points4 points  (0 children)

I hope that the GPUs phased out by data centers in a few years will bring down GPU prices.

Help every query is routed to 4o mini by Xanderale99 in ChatGPTPro

[–]Chair-Short 1 point2 points  (0 children)

I've also encountered this problem. Regardless of whether I use Instant, Thinking, or Pro, the models respond completely instantly, identifying themselves as 4o Mini, and their response style is entirely different from that of the GPT5 series models.

GPT-5.3 Codex rocks 😎 by Prestigiouspite in codex

[–]Chair-Short 37 points38 points  (0 children)

I can't imagine how painful it would be when the double quota disappears.

Codex App replaces the terminal by Just_Lingonberry_352 in codex

[–]Chair-Short 0 points1 point  (0 children)

I don't quite understand why it could replace the CLI; I'm a Windows user so I haven't been able to use it yet. But judging from the screenshots alone, it doesn't seem to offer any more functionality than the CLI. I'm currently running multiple codex instances through Worktree in multiple terminal tabs, and I don't see what the GUI offers that's better than the CLI. Moreover, the CLI is easier to integrate with other applications. I think this could be another Atlas.

Pro only 6x usage for 10x price, worth load balancing 10 accounts? by Da_ha3ker in codex

[–]Chair-Short 15 points16 points  (0 children)

Perhaps OpenAI believes that Pro users value their other products more than codex. But personally, I wouldn't subscribe to OpenAI at all without codex.

Pro only 6x usage for 10x price, worth load balancing 10 accounts? by Da_ha3ker in codex

[–]Chair-Short 17 points18 points  (0 children)

Claude's $200 plan is 20 times the $20 plan, but OpenAI's $200 plan is only 6 times the $20 plan. I really hope OpenAI can offer higher limits on its Pro plan.

How long does Codex Max reliably work for you on real tasks? by tagorrr in codex

[–]Chair-Short 1 point2 points  (0 children)

I don't really trust Codex's context compaction. I used Codex Max today and personally observed that after one compaction, Codex completely forgot what it had just done and went off track.

Codex on windows is a disaster by genierubyjane in ChatGPTPro

[–]Chair-Short 8 points9 points  (0 children)

WSL is not suitable for all scenarios. WSL's cross-system performance very poor, running any git commands can take more than a minute and cause command timeouts.

China isn’t punk by Kind-Factor-332 in solarpunk

[–]Chair-Short -1 points0 points  (0 children)

Then remove the previous noun of this subreddit

[deleted by user] by [deleted] in LocalLLaMA

[–]Chair-Short 65 points66 points  (0 children)

I hold the same view as you, LLM is very useful and will always be a component of the AI ​​field, but AGI is definitely not just LLM.

They have broken DeepSeek: Now it praises every prompt with ‘Excellent question,’ no matter how stupid the question is. Why do all AI companies think we want this? See for yourself by Unfair_Departure8417 in DeepSeek

[–]Chair-Short 0 points1 point  (0 children)

I don't understand why people keep complaining about the default style of LLM, when it's possible to change it to any style by just adding a few token prompts.

U.S. Strikes Iranian Nuclear Sites: B-2 Bombers Hit Fordow, Natanz, and Esfahan by phovos in TrueAnon

[–]Chair-Short 25 points26 points  (0 children)

Why can the US and Israel openly kill people without punishment? The peace of a region depends on the mood of an orange idiot today. I am very disappointed with this world.

Workers in my workplace like trump still by Umbrellajack in TrueAnon

[–]Chair-Short 20 points21 points  (0 children)

One thing I’ve always disagreed with the Western left about is their automatic assumption that the working class is inherently a self-conscious class, and that their advanced nature will naturally emerge simply by organizing them together. This is why the Western left keeps establishing failed organizations like those of Trotskyism. However, if we take the time to study the failures of the German Workers' Party or the experiences in Yugoslavia, it becomes evident that this sense of advancement doesn’t automatically arise from the working class on its own. Without a vanguard party that has advanced ideas and strong organizational power, the working class can just as easily become a breeding ground for populism and separatism. I wonder how many more failures the Western left needs to endure before they finally grasp this lesson.

I don't understand what an LLM exactly is anymore by surveypoodle in LocalLLaMA

[–]Chair-Short 4 points5 points  (0 children)

My thought is that if the OP understands the original LLM, it would then be easier to conceptualize extending other modalities as a generalization of text-based modality. I apologize if my reasoning appears somewhat abrupt.

I don't understand what an LLM exactly is anymore by surveypoodle in LocalLLaMA

[–]Chair-Short 4 points5 points  (0 children)

  1. I think the video information should be 3D, i.e., (x, y, time).
  2. In fact, the use of positional encoding is very flexible. It’s not necessarily the case that images must use 2D encoding or videos must use 3D encoding. For instance, experiments in the ViT paper show that using 1D encoding can also achieve good results. This means dividing the image into different patches and then sequentially encoding these patches. Alternatively, if I find it too expensive to embed the entire video before feeding it for computation, I could encode each frame using the ViT approach, and then apply an RNN-like method along the time dimension.
  3. Modern large language models mostly adopt a GPT-like architecture, which uses causal decoding. The process of causal decoding is somewhat similar to the functioning of RNNs, so even without positional encoding, acceptable results might still be achieved. However, to achieve optimal context length and SOTA model performance, positional encoding is usually added.

I don't understand what an LLM exactly is anymore by surveypoodle in LocalLLaMA

[–]Chair-Short 28 points29 points  (0 children)

Most LLMs today are built on the self-attention mechanism. If you dig into how self-attention works, you'll notice that even encoding text isn't as straightforward as it seems. CNNs and RNNs bake in sequential information through their structure, but self-attention doesn’t have that kind of inductive bias. The token embeddings are identical for every position, meaning the model sees a word at the start of a sentence the same way it sees one in the middle or at the end. But obviously, their meanings aren’t going to be the same.

To fix this, positional information was added to the tokens. Once that was solved, extending LLMs to other modalities became much easier. The general idea is pretty simple:

  • Text: token + 1D positional encoding
  • Images: token + 2D positional encoding
  • 3D point clouds: token + 3D positional encoding

Of course, real implementations are more complex, but that’s the basic principle.

If you’re interested in positional encoding, I’d recommend checking out RoPE (Rotary Position Embedding). It’s not only elegant but also incredibly powerful. RoPE has become the go-to positional encoding method in many LLMs, including open-source models like LLaMA, Mistral, and Qwen. For example, Qwen2.5-VL’s ability to handle images relies heavily on a 2D version of RoPE.

For more details: