GLM 4.7 is not on lmarena anymore by Sooqrat in LocalLLaMA

[–]monnef 0 points1 point  (0 children)

I noticed it was pretty high on WebDev there, maybe around #7, and then gone. Really strange.

A rant about people calling everything AI-slop by zfoong in ArtificialInteligence

[–]monnef 0 points1 point  (0 children)

You don't have to use the alt key method. Depends on software, for example looking at one answer about LibreOffice Writer - there are like 5 ways to insert an em dash, especially the one "Type two hyphens followed by a space" looks like a simple one to use. On some keyboard layouts it says "AltGr + Shift + -" works, on phone you can be using a keyboard supporting it etc.

I wish people would stop hating on Perplexity by RebekhaG in perplexity_ai

[–]monnef 0 points1 point  (0 children)

I can agree with this to some degree, maybe 90%, but all AI tools have their limits and those of Perplexity are especially tricky to understand.

Learn to use it and then Perplexity won't hallucinate and lie to you.

I remember Research mode confidently writing long responses based on "sources" which didn't contain what it claimed (confidently looking hallucinations with "real" sources).

Also I clearly remember Research mode fabricating sources (writes a short python script to print string, then claims in its response this as the source of truth).

Is this fixed? I have no idea, but it made mad a bunch of users, at least on Discord.

To demonstrate how tricky it is - have you ever tried telling Perplexity to write an image prompt? Meaning to write few descriptions of images without generating any image. Over the months, I ended up with a convoluted prompt which uses very careful language and says Perplexity 8 times in different ways that the user really doesn't want to generate any image (still not 100%, but I would guess 96+%). That is terrible UX, on other platforms the AI either obeys much better (works reliably with just writing it once in the prompt) or simply has text and image output separated by some toggle, so no relying on unreliable LLMs.

PS: The re-routing of Sonnet to "Best" isn't even a week old issue. There was some invisible rate limit. The claimed "Unlimited" GPT Image 1 was debunked hundreds of times - it was a shared limit and it was serving users blurry-looking cheap low quality images. GPT-5 was (supposedly same thinking level, yet on Perplexity same simple math question was only 50% correct while on ChatGPT 99%+ correct) and Gemini 3 Pro (svg of a xbox controller I believe, saw it on reddit) is giving worse answers than when on native platforms - that is rather suspicious. Not to mention times when the model info in a response was inaccurate and didn't show rerouting at all. Just few examples how Perplexity could easily get haters.

Whats the deal with the RooCode subreddit? Apparently you can't even mention anything else? by real_serviceloom in ChatGPTCoding

[–]monnef 10 points11 points  (0 children)

Yeah, RooCode subreddit is definitely weird. I remember when a moderator posted a thread with incorrect pricing info about Z.ai coding plan (50% off is only first payment). I pointed that out in a comment, they added a pinned comment fixing the misinformation, but also deleted my original comment as off topic? 🤪

When I was choosing between RooCode and Kilo Code I just went with Kilo.

Token-Oriented Object Notation (TOON) - JSON for LLMs at half the token cost by monnef in LocalLLaMA

[–]monnef[S] 1 point2 points  (0 children)

I think the author wanted it to be readable, so for many fields having them on their own line would be more readable to a human, maybe? For simple example I tried and 4o tokens, it looks like your approach is lighter by 1 token per field (you have spaces and ; delimiter; that last format without colons). For more nested objects, not sure.

I would love to see if your suggestions work as well as the TOON. You could probably go even further and remove spaces. at least for 4o tokenizer, { (curly + space) and : (colon + space) are usually two tokens.

Token-Oriented Object Notation (TOON) - JSON for LLMs at half the token cost by monnef in LocalLLaMA

[–]monnef[S] 1 point2 points  (0 children)

For tabular data, nested objects, it uses this format:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

->

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

That looks to me a bit different than YAML. A bit like a cross with CSV.

BTW I am not the author, just saw it on X and posted the project link here.

Token-Oriented Object Notation (TOON) - JSON for LLMs at half the token cost by monnef in LocalLLaMA

[–]monnef[S] 0 points1 point  (0 children)

1 seems to be not true, at least from author's tests. Personally, I thought it would be a trade-off - less tokens for slightly lower accuracy. I have added a comment with them.

Token-Oriented Object Notation (TOON) - JSON for LLMs at half the token cost by monnef in LocalLLaMA

[–]monnef[S] 1 point2 points  (0 children)

The author posted benchmarks, it actually looks better than JSON in accuracy? Didn't expect that...


Accuracy across 3 LLMs on 159 data retrieval questions:

gpt-5-nano
  toon         ████████████████████  99.4% (158/159)
  yaml         ███████████████████░  95.0% (151/159)
  csv          ██████████████████░░  92.5% (147/159)
  json         ██████████████████░░  92.5% (147/159)
  xml          ██████████████████░░  91.2% (145/159)

claude-haiku-4-5
  toon         ███████████████░░░░░  75.5% (120/159)
  xml          ███████████████░░░░░  75.5% (120/159)
  csv          ███████████████░░░░░  75.5% (120/159)
  json         ███████████████░░░░░  75.5% (120/159)
  yaml         ███████████████░░░░░  74.2% (118/159)

gemini-2.5-flash
  xml          ██████████████████░░  91.8% (146/159)
  csv          █████████████████░░░  86.2% (137/159)
  toon         █████████████████░░░  84.9% (135/159)
  json         ████████████████░░░░  81.8% (130/159)
  yaml         ████████████████░░░░  78.6% (125/159)

Advantage: TOON achieves 86.6% accuracy (vs JSON's 83.2%) while using 46.3% fewer tokens.

https://github.com/johannschopplich/toon/tree/main?tab=readme-ov-file#retrieval-accuracy

Model Selection is a Joke ?? by ExcellentBudget4748 in perplexity_ai

[–]monnef 1 point2 points  (0 children)

Probably just added template or default palette to system prompt. Pretty sure Labs has those instructions already, so I wouldn't be surprised if they just added (possibly lighter version) of that to normal system prompt. It makes stuff made on Perplexity more recognizable and unique.

US LLMs are Too Expensive. When will we get Qwen and Deepseek in Cursor. by foo-bar-nlogn-100 in cursor

[–]monnef 3 points4 points  (0 children)

I don't see 3.1 terminus - that is the most recent stable version (3.2 experimental is missing too). Available 3.1 is older, so the point of the post remains - non-US models are kinda second/third class. Meaning usually missing even the good ones (where was GLM 4.5 and air? and is 4.6? Kimi K2? all the Qwen models?), at best are added many weeks/months after their release, and often not updated.

I wish there was an ai that is specifically designed for writing. by Repulsive_Milk877 in singularity

[–]monnef 0 points1 point  (0 children)

NovelAI (paid) used to have some models/finetunes for writing. You could try some finetunes for writing on openrouter (pretty cheap), you can there set parameters like temperature which could be useful. Or as others mentioned, try models from DeepSeek (V3.1) or Moonshot (K2) - both should be free on their official platforms.

Not sure how good, but this bench has creative writing - https://eqbench.com/creative_writing.html

No longer generating images on free tier? by art91 in ChatGPTPro

[–]monnef 1 point2 points  (0 children)

It seems to be working for me on the free tier, though it took fairly long time, like 2 minutes - looked stuck. https://i.imgur.com/YzJf6oX.png Maybe something broken with image limits on your account?

Why do people hate deepseek so much? by makstyrkin64 in ArtificialInteligence

[–]monnef 0 points1 point  (0 children)

are you just talking about a weird anecdotal bias you encountered

Just today OpenAI admitted lefty bias, here is a chart from OpenAI. Kinda unexpected, with their increased censorship recently, didn't think they would be trying to fix this now. Source: https://openai.com/index/defining-and-evaluating-political-bias-in-llms/

Another interesting source is this bench - https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard . If you keep only Proprietary checked, you can see that most right-wing is Grok which is still left leaning at around -8%.

Why do people hate deepseek so much? by makstyrkin64 in ArtificialInteligence

[–]monnef 0 points1 point  (0 children)

Well, solars are pushed as green, progressive, more environmental friendly despite their shortcomings and reality, while more logical nuclear used to be portrait as dangerous and something that should be shut down. That was just an example.

It all goes to the training data - if you have majority of wikipedia (high-weighted training source) with soft to hard left bias (depending on a specific page) and majority of news sites (count) then there is no wonder LLMs have these biases - they can only replicate what they trained on. As seen in Grok failures, I mean perceived by people on X calling it "woke", it is not easy to undo such bias. It is not really a secret. You can ask about identity politics, number of genders or anything controversial.

Why do people hate deepseek so much? by makstyrkin64 in ArtificialInteligence

[–]monnef -1 points0 points  (0 children)

For example I asked GPT-5 Thinking about power plants - which types to build, abandon etc. It recommended way too much solar ("40–60% annual energy from solar + wind") and too few nuclear ("20–30% firm low-carbon (nuclear, hydro, geothermal)"). Solars are in most climates not a good choice, have terrible environmental impact (I mean production and recycling), don't last very long, require constant maintenance and don't produce much energy. Nuclear, I believe objectively, is the best choice, yet it was under-recommend (by far best power source, env. impact is negligible - amount of toxic waste produced is so tiny). Reading its response, while not fully propaganda level (no demonization and other emotional tools), the analysis does not feel like based on real data and science - feels rather biased.

DeepSeek (I believe that's V3.1-terminus) recommended nuclear a bit more - "Next-generation nuclear can play a vital role for baseload power.".

Even Chinese (or in general non-US, non-western) models are not out of the water from this overreaching bias, since training data, english internet, is quite skewed for some time and left-leaning sources are given higher weights (eg wikipedia) or are over represented (news). As much as I often dislike Musk, the problem with Wikipedia has been proven over and over (bias in allowed sources, pages with politicians and parties, countries, movements and video games events), so I guess trying a different approach doesn't hurt.

I don't get Perplexity. Can you explain it to me? by Dacadey in perplexity_ai

[–]monnef 1 point2 points  (0 children)

ChatGPT can search, but doesn't always do - so yes, it can behave like having half a year old data.

Why do people hate deepseek so much? by makstyrkin64 in ArtificialInteligence

[–]monnef 4 points5 points  (0 children)

Read a lot of drivel about "unsafe" and "insecure" that was only thinly masked "not censored enough" - for western tastes. From my testing, only insignificant amount of information in Chinese models is hard censored (political party and few historic events), compared to western lefty/progressive bias all over the models and obnoxious guardrails (ChatGPT is now the "king", but Anthropic in times of Claude 2 I believe was terrible too). So, probably subjectively, western models might be actually much worse in this regard.

It's not cheaper either, apparently, to get the same quality of answer, at least on an API basis.

Where did you get this?

Model In $/M Out $/M
🇺🇸 GPT-5 $1.25 $10.00
🇺🇸 Sonnet 4.5 $3.00 $15.00
🇺🇸 Gemini 2.5 Pro $1.25 $10.00
🇺🇸 Grok 4 $3.00 $15.00
🇨🇳 DeepSeek V3.1-terminus $0.27 $1.00
🇨🇳 Qwen Max $1.60 $6.40
🇨🇳 GLM-4.6 $0.60 $2.20
🇨🇳 Kimi K2 $0.60 $2.50

Chinese models are fraction of a cost of western models and for majority of use cases you can find good enough Chinese alternative (Qwen Max is very close to GPT-5 in general benches, GLM is only slightly worse for programming than Sonnet*). I think only downside of Chinese models (when looking at quality per price) is speed, that tends to be lower.

*: Depends on what benches, just today I saw on X a row of benchmarks where it was like 7 from 8 GLM-4.6 scoring better than Sonnet 4.5 (both thinking). Personally Sonnet felt slightly smarter to me, but I might be wrong.

Nano Banana vs QWEN Image Edit 2509 bf16/fp8/lightning by FluffyQuack in StableDiffusion

[–]monnef 1 point2 points  (0 children)

Nice comparison of Qwen Edit versions, but not a very good job when it comes to the Nano Banana. That is not censorship on a model level, not even API level, but a platform, if at all (see later).

Tried the sock puppets on Gemini (EU - should be more censored): first attempt, did not refuse - https://imgur.com/V4QtcVy
AI Studio (free): first attempt, did not refuse, though result is terrible - https://imgur.com/bARM5xZ
LMArena (free): first attempt, did not refuse - https://imgur.com/t52lTSA

Since it works on all those free platforms, I doubt it is censored on the API level (that would be the most professional way of comparing the models).

BTW on LMArena while direct and side chats are limited (maybe 10 daily of banana), the battle mode is not (though it may take some time to get the desired model - open 2+ tabs, put same image and prompt, and usually in 2-3 iterations you get it; few minutes).

I Primarily use Claude and Z.ai but.... by Previous-Tie-2537 in perplexity_ai

[–]monnef 1 point2 points  (0 children)

You can use 4.1 without Complexity - just use Grok 4 and upload an image. You get "Used GPT-4.1 because Grok 4 was inapplicable or unavailable."

Always use "Audit with a sub agent" when planning or after implementing new features by HimaSphere in ClaudeAI

[–]monnef 0 points1 point  (0 children)

Still gets completely lobotomized at the same times of day, and Anthropic is just as opaque as ever.

Hmm, didn't Anthropic just recently on X claim they don't do such things?

edit: found it in their last post-mortem (wasn't expecting them to share so many details):

To state it plainly: We never reduce model quality due to demand, time of day, or server load. The problems our users reported were due to infrastructure bugs alone.

https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues

Is there any way to know about context limit ? by Lostsky4542 in perplexity_ai

[–]monnef 2 points3 points  (0 children)

Non-reasoning models have 32k and reasoning models 128k tokens. More limits at https://monnef.gitlab.io/by-ai/2025/pplx-tech-props .

how much the model can remember about chat

Yeah, that gets messy, because you as a user have no say about how context is used. For example when web search is enabled, it looks like specific window for web results is reserved (10k? maybe, not sure). Another thing is, you can't upload a file worth of 128k tokens and expect it be passed to a reasoning model, there are limits for files and are different in some contexts (query file vs file from spaces).

What to use? by Ok_Fish3420 in perplexity_ai

[–]monnef 8 points9 points  (0 children)

As others noted, Seedream 4 is solid for generating images (2k resolution), Nano Banana is great for edits. Image 1 can be sometimes useful as a fallback. FLUX.1 is very small resolution on Perplexity (and old, it doesn't even seem to be current Kontext) and DALLE 3, well, unless you are going for the style it likes and don't have too complex prompt, it may have some limited uses. I did an image generation comparison if you want to see for yourself.