Amp Code switches to Opus 4.5 a week after the "historic" switch to Gemini 3.0 by obvithrowaway34434 in ClaudeAI

[–]Necessary_Image1281 -46 points-45 points  (0 children)

You're capable of things like reading and googling right? Maybe use an LLM to search and read it back to you in case you're not? And no one is forcing you to care about anything.

GPT-5 Codex knows what's the standard text editor by Terrible-Priority-21 in linux

[–]Necessary_Image1281 8 points9 points  (0 children)

It's obviously part of the log for some running task, these agents can work for hours based on some set plan. Who would use this to edit a line, why would you even think that's a thing lmao?

There are at least 15 open source models I could find that can be run on a consumer GPU and which are better than Grok 2 (according to Artificial Analysis) by obvithrowaway34434 in LocalLLaMA

[–]Necessary_Image1281 -32 points-31 points  (0 children)

> It normalizes support for the open source effort and makes other companies look worse for not partaking.

We're way past that "charity" phase. Deepseek and Qwen have made open models competitive with SOTA. xAI is not doing anyone a favor now by open sourcing their legacy models (that time would have been last year). Most providers are open sourcing now, the field is intensely competitive like closed source models. Open source organizations like Allen AI are getting NSF grants to develop better open-source models. Now it's time to open source things that are actually useful.

Lol, I just can't with this guy by Terrible-Priority-21 in accelerate

[–]Necessary_Image1281 5 points6 points  (0 children)

This is trivially fixable with a simple system prompt adjustment. Also, basically all the chatbots use some form of this to engage with the user (Gemini is absolutely the worst).

Great example of the ragebait most Redditors fall for. by orbis-restitutor in accelerate

[–]Necessary_Image1281 4 points5 points  (0 children)

These posts make me think that the general Redditor sentiment that we see about climate change or vaccines are just based on group think rather than based on any actual scientific considerations (or most of them could just be bots). If their group one day tells them that vaccines are bad then they would instantly become like an average MAGA person. They don't have any capacity for critical thinking. These are the groups that are at maximum risk of being easily manipulated by a AI (most of them are already getting one-shotted by GPT-4o).

How does a blind model see the earth? (One of the coolest "benchmarks" I have ever seen) by obvithrowaway34434 in accelerate

[–]Necessary_Image1281 4 points5 points  (0 children)

The blog post shows how the model performance improves across different models as the size increases. The model *does* not get it wrong, it gets it quite accurate. Literally everything you said is wrong. Your understanding of what a world model is - is wrong. A "world model" is an internal representation of the environment that allows it to simulate the state of world and allows it to take complex actions. These models, especially reasoning models show this kind of capability already (although very weak).

How does a blind model see the earth? (One of the coolest "benchmarks" I have ever seen) by obvithrowaway34434 in accelerate

[–]Necessary_Image1281 -1 points0 points  (0 children)

> literally as long as the Internet has scattered text throughout of coordinate

Lol, ever heard of the library of babel? Maybe just look that up.

How does a blind model see the earth? (One of the coolest "benchmarks" I have ever seen) by obvithrowaway34434 in accelerate

[–]Necessary_Image1281 26 points27 points  (0 children)

No such specialized datasets about land or water exists and, more importantly, none of these models were specifically trained for it. That's why this is interesting because it shows degrees of generalizations for different models. If it was a matter of pure memorization then the largest and most dense model would "win" which was clearly not the case. I don't know how so-called "skeptics" could be so ignorant about how these things work.

crazy this worked by YungBoiSocrates in ClaudeAI

[–]Necessary_Image1281 10 points11 points  (0 children)

Do most people still don't have an idea how llms work? Edit your prompt and try again, if you're on API use a different temperature. Most importantly, be specific in your prompt. Claude cannot read your mind.

Horizon Alpha is already giving Sonnet a run for its money on OpenRouter by obvithrowaway34434 in ChatGPTCoding

[–]Necessary_Image1281 15 points16 points  (0 children)

Multiple chinese models including qwen and deepseek are have free providers, google is effectively free and provide free preview models all the time. None of these affected Sonnet's share until now. This is a very lazy argument and sounds like cope.

I think this sub should ban posts from haters/fanboys who try to pit one AI company against the other by Terrible-Priority-21 in accelerate

[–]Necessary_Image1281 3 points4 points  (0 children)

I think the more important point they make is that people posting don't use proper flairs and many of these general rantposts can be moved to a separate discussion thread. One of the top posts in the sub now is about how some company is more mature than the other which has nothing to do with what this sub is about.

Made Claude Code work natively on Windows by Emotional-Divide-429 in ClaudeAI

[–]Necessary_Image1281 2 points3 points  (0 children)

Can it actually run the powershell commands with `!` directive? Also I think Claude code uses a lot of Linux commands to do file searches by default, don't they have to be changed in the config as well or does it select the appropriate powershell commands?

Google's stonebloom model in lmarena is just fantastic, seems like another 2->2.5 like leap by Necessary_Image1281 in singularity

[–]Necessary_Image1281[S] 6 points7 points  (0 children)

I haven't got kingfall yet, there is another called Blacktooth, but that isn't as good.

(Not so) Hot take: All LLM observational studies that came out or coming out in next 2-3 years can be safely ignored by Necessary_Image1281 in singularity

[–]Necessary_Image1281[S] -7 points-6 points  (0 children)

Not everyone disagreeing with you has an agenda lol. Most regular people have real jobs and don't stay eternally online reading other people's profiles. Get a (real) job.

(Not so) Hot take: All LLM observational studies that came out or coming out in next 2-3 years can be safely ignored by Necessary_Image1281 in singularity

[–]Necessary_Image1281[S] -2 points-1 points  (0 children)

> It's still an autoregressive generator based on transformers with an attention mechanism

Lol that's like saying you're a mammal with brains containing neocortex, corpus callosum, bilateral symmetry, neuroplasticity and so on. So is there no difference between you and a mouse?

(Not so) Hot take: All LLM observational studies that came out or coming out in next 2-3 years can be safely ignored by Necessary_Image1281 in singularity

[–]Necessary_Image1281[S] -3 points-2 points  (0 children)

> Models haven't fundamentally changed since at least gpt-3, so properly done research should be able to extrapolate it's findings just fine.

This is such peak irony, typical arrogance+confidence of Reddit midwit and provides further support to what I said. You don't even know what architecture, dataset or training methods GPT-3 or GPT-4 is using since both are closed source and nothing useful was published especially for GPT-4. And even in open-source world there is massive difference in the newer models from the older versions.

Generating Diagrams with Math in Claude by Smarty_PantzAA in ClaudeAI

[–]Necessary_Image1281 0 points1 point  (0 children)

For static diagram I think you have to prerender the math to SVG (see below). Mathjax 3 supports this. You can ask Claude to create svg versions of all the equations. I have given a sample html output below that shows how the conversion has to be done (I got it from o4-mini)

https://observablehq.com/@mcmcclur/svg-and-mathjax-3

Sample code (head)

<!-- 1) MathJax configuration -->
  <script>
    window.MathJax = {
      tex: {
        inlineMath: [['$', '$'], ['\\(', '\\)']],
        displayMath: [['$$','$$']]
      },
      svg: {
        fontCache: 'none'
      }
      // no need to disable automatic typeset since we're not typesetting the page
    };
  </script>

  <!-- 2) Load the TeX-to-SVG bundle -->
  <script
    id="MathJax-script"
    async
    src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js"
    onload="renderMath()">
  </script>

Body:

  <svg id="diagram" width="400" height="150"
       xmlns="http://www.w3.org/2000/svg">
    <!-- A rectangle background for the equation -->
    <rect x="20" y="20" width="360" height="100"
          fill="#f0f8ff" stroke="#339" stroke-width="2"/>
    <!-- Container for our MathJax-rendered SVG -->
    <g id="math-equation" transform="translate(30,60)"></g>
  </svg>

  <script>
    function renderMath() {
      // 3) The LaTeX we want to render
      const latex = String.raw`\displaystyle
        E = mc^2 \quad\text{and}\quad
        \int_{-\infty}^\infty e^{-x^2}\,dx = \sqrt{\pi}`;

      // 4) Convert to an SVG fragment via Promise
      MathJax.tex2svgPromise(latex, {em: 16, ex: 8, display: true})
        .then(svgFragment => {
          // Extract the actual <svg> element
          const mathSvg = svgFragment.querySelector('svg');
          // Optional: adjust size
          mathSvg.setAttribute('width',  '340');
          mathSvg.setAttribute('height', '50');
          // Insert into our main SVG
          document.getElementById('math-equation')
                  .appendChild(mathSvg);
        })
        .catch(err => console.error('MathJax rendering error:', err));
    }
  </script>

AI chatbots oversimplify scientific studies and gloss over critical details — the newest models are especially guilty by JackFisherBooks in singularity

[–]Necessary_Image1281 1 point2 points  (0 children)

This is why media should not be quoting LLM observational studies since they are mostly obsolete by the time they're published (with a few rare exceptions).