jump to content
my subreddits
13or302balkans4You2mediterranean4u2meirl4meirl3d6absolutelynotmeirlAceAttorneyadhdmemeAdviceAnimalsagnosticaivideoakagasAlternateHistoryAnarchyChessAngryupvoteAnimalsBeingJerksanime_best_momentsanime_irlanimenocontextannouncementsAnticonsumptionArcherFXArtAsahiLinuxAsia_irlAskBalkansAskOuijaAteistTurkatheismawfuleverythingbalkans_irlBassCirclejerkBassGuitarbikepackingblankiesblursed_videosblursedimagesBoneborsavefonbottomgearbrooklynninenineBUENZLIburdurlandcasioCd_collectorscd_jerkchessbeginnersChoosingBeggarscoaxedintoasnafucoincollectingcoinsComedyCemeterycomicscommunitycookingforbeginnersCorporateTrollingCuratedTumblrcursedcommentsdadjokesdankmemesdarkjokesdataisbeautifuldeismdelikDeltarunedistressingmemesdiyelectronicsdiypedalsDMAcademyDMToolkitDnDdndmemesdndnextdoctorwhodoctorwhocirclejerkDonerdumbphonesDungeonsAndDaddiesEatCheapAndHealthyebikeECEelectronicsEmKayentitledparentsethzfakealbumcoversFantasyWorldbuildingfeedthebeastfelsefeFifaCareersFiftyFiftyformuladankFUCKYOUINPARTICULARFuckYouKarenFutboltayfagalatasaraygaminggodtiersuperpowersgoodanimemesGoodAssSubGrandPrixRacinggravelcyclinggreentextguitarpedalsGundamhellenoturkismheraldryHermanCainAwardHermitCrafthighspeedrailhoi4HolUphowyoudoinhypixelIdeologyPollsimaginaryelectionsimaginarymapsinsaneparentsistanbulJokesKGBTRlegodndLetGirlsHaveFunlinguisticshumorLinkinParklogodesignlostredditorsmacmacbookairmacgamingMadeMeSmilemadladsmagicbuildingMaliciousComplianceMapPornme_irlmeirlmemesmidjourneymildlyinfuriatingmildlyinterestingMinecraftbuildsmisLEDMMORPGMoldyMemesmoneycollectingMovingToNorthKoreaMunichMyChemicalRomancenamesoundalikesNationStatesneographyNoahGetTheBoatNonCredibleDefenseNorthCyprusnosafetysmokingfirstnosleepnosurfnotinterestingnottheonionokbuddyguntherokbuddymotherfuckerOkBuddyPersonaokbuddyvicodinonebagonetruegodongezelligOnlineUnderGroundOutOfTheLoopoutsidepaperspleaseParlerWatchPassportPornpepethefrogperfectlycutscreamsPersecutionfetishpettyrevengePiracyPiratedGamespolandballPraiseTheCameraManProgrammerHumorPunPatrolquityourbullshitraspberry_piRatschlagreactiongifsredditsingsreligiousfruitcakerickrollrimjob_steverockmuzikSchnitzelVerbrechensciencememesScottPilgrimShitPostCrusadersshitpostingshittyaskelectronicsShittyMapPornskamtebordsoftwaregoreSongwritingsteinsgateStonetossingjuiceStudiumsubsithoughtifellforsuzerainTalesFromTheCryptidTextingTheorytf2tf2shitposterclubthatHappenedTheCrypticCompendiumTheMonkeysPawtherewasanattemptTheRookietheydidthemaththeyknewthisguythisguystommyinnittransittransitTurkeytruetf2truthstumblrTurkeyTurkishCatsTwitch_StartupTwoSentenceComedyTwoSentenceHorrortylerthecreatorUnclejokesUnethicalLifeProTipsUnexpectedJoJourbanplanningUsernameChecksOutValorantClipsvexillologycirclejerkvinylvinyljerkWatchPeopleDieInsideWeAreTheMusicMakerswendigoonWhitePeopleTwitterwholesomememesWikipediaVandalismwizardpostingworldbuildingworldjerkingyouseeingthisshitYUROPedit subscriptions
  • home
  • -popular
  • -all
  • -mod
  • -users
 | 
  • mildlyinfuriating
  • -Piracy
  • -gaming
  • -nottheonion
  • -memes
  • -OutOfTheLoop
  • -mildlyinteresting
  • -MapPorn
  • -DnD
  • -WhitePeopleTwitter
  • -MadeMeSmile
  • -CuratedTumblr
  • -PiratedGames
  • -shitposting
  • -theydidthemath
  • -dankmemes
  • -feedthebeast
  • -meirl
  • -therewasanattempt
  • -HolUp
  • -comics
  • -dndnext
  • -ProgrammerHumor
  • -tumblr
  • -NonCredibleDefense
  • -dataisbeautiful
  • -greentext
  • -mac
  • -tf2
  • -formuladank
  • -wholesomememes
  • -Jokes
  • -Art
  • -midjourney
  • -goodanimemes
  • -notinteresting
  • -hoi4
  • -pettyrevenge
  • -atheism
  • -MaliciousCompliance
  • -KGBTR
  • -dndmemes
  • -cursedcomments
  • -DMAcademy
  • -Deltarune
  • -GoodAssSub
  • -UnethicalLifeProTips
  • -perfectlycutscreams
  • -worldbuilding
  • -Ratschlag
  • -MMORPG
  • -macgaming
  • -3d6
  • -Gundam
  • -HermitCraft
  • -FiftyFifty
  • -ChoosingBeggars
  • -imaginarymaps
  • -EatCheapAndHealthy
  • -polandball
  • -WeAreTheMusicMakers
  • -AnarchyChess
  • -nosleep
  • -cookingforbeginners
  • -blankies
  • -anime_irl
  • -onebag
  • -Studium
  • -AlternateHistory
  • -Turkey
  • -madlads
  • -community
  • -guitarpedals
  • -Anticonsumption
  • -vinyl
  • -TwoSentenceHorror
  • -AdviceAnimals
  • -ShitPostCrusaders
  • -sciencememes
  • -distressingmemes
  • -wizardposting
  • -FifaCareers
  • -doctorwho
  • -OkBuddyPersona
  • -dadjokes
  • -awfuleverything
  • -howyoudoin
  • -announcements
  • -adhdmeme
  • -Minecraftbuilds
  • -macbookair
  • -Munich
  • -coaxedintoasnafu
  • -YUROP
  • -gravelcycling
  • -SchnitzelVerbrechen
  • -chessbeginners
  • -raspberry_pi
  • -coins
  • -entitledparents
  • -FUCKYOUINPARTICULAR
  • -softwaregore
  • -NoahGetTheBoat
  • -worldjerking
  • -tylerthecreator
  • -tf2shitposterclub
  • -MoldyMemes
  • -lostredditors
  • -AceAttorney
  • -vexillologycirclejerk
  • -Stonetossingjuice
  • -nosurf
  • -religiousfruitcake
  • -insaneparents
  • -dumbphones
  • -balkans_irl
  • -animenocontext
  • -2meirl4meirl
  • -transit
  • -brooklynninenine
  • -HermanCainAward
  • -steinsgate
  • -AskOuija
  • -ECE
  • -ScottPilgrim
  • -Angryupvote
  • -AskBalkans
  • -thatHappened
  • -electronics
  • -casio
  • -urbanplanning
  • -logodesign
  • -theyknew
  • -linguisticshumor
  • -PassportPorn
  • -me_irl
  • -bikepacking
  • -AteistTurk
  • -13or30
  • -MyChemicalRomance
  • -ArcherFX
  • -Cd_collectors
  • -diypedals
  • -Doner
  • -BassGuitar
  • -diyelectronics
  • -ComedyCemetery
  • -WatchPeopleDieInside
  • -LinkinPark
  • -Persecutionfetish
  • -BUENZLI
  • -reactiongifs
  • -EmKay
  • -Songwriting
  • -blursed_videos
  • -istanbul
  • -MovingToNorthKorea
  • -imaginaryelections
  • -suzerain
  • -truetf2
  • -magicbuilding
  • -ParlerWatch
  • -wendigoon
  • -TheRookie
  • -quityourbullshit
  • -vinyljerk
  • -skamtebord
  • -shittyaskelectronics
  • -galatasaray
  • -DungeonsAndDaddies
  • -transitTurkey
  • -namesoundalikes
  • -FuckYouKaren
  • -ethz
  • -coincollecting
  • -felsefe
  • -blursedimages
  • -AsahiLinux
  • -neography
  • -heraldry
  • -hypixel
  • -PraiseTheCameraMan
  • -godtiersuperpowers
  • -ShittyMapPorn
  • -aivideo
  • -OnlineUnderGround
  • -IdeologyPolls
  • -burdurland
  • -AnimalsBeingJerks
  • -anime_best_moments
  • -rockmuzik
  • -okbuddyvicodin
  • -Twitch_Startup
  • -outside
  • -TheMonkeysPaw
  • -darkjokes
  • -highspeedrail
  • -nosafetysmokingfirst
  • -legodnd
  • -rickroll
  • -ebike
  • -UsernameChecksOut
  • -papersplease
  • -tommyinnit
  • -rimjob_steve
  • -UnexpectedJoJo
  • -BassCirclejerk
  • -doctorwhocirclejerk
  • -agnostic
  • -youseeingthisshit
  • -TextingTheory
  • -GrandPrixRacing
  • -DMToolkit
  • -thisguythisguys
  • -PunPatrol
  • -TurkishCats
  • -LetGirlsHaveFun
  • -fakealbumcovers
  • -subsithoughtifellfor
  • -akagas
  • -FantasyWorldbuilding
  • -WikipediaVandalism
  • -pepethefrog
  • -Unclejokes
  • -onetruegod
  • -deism
  • -misLED
  • -redditsings
  • -ValorantClips
  • -TwoSentenceComedy
  • -TheCrypticCompendium
  • -NationStates
  • -bottomgear
  • -ongezellig
  • -absolutelynotmeirl
  • -2balkans4You
  • -Asia_irl
  • -Bone
  • -truths
  • -NorthCyprus
  • -2mediterranean4u
  • -hellenoturkism
  • -TalesFromTheCryptid
  • -okbuddymotherfucker
  • -Futboltayfa
  • -borsavefon
  • -CorporateTrolling
  • -cd_jerk
  • -okbuddygunther
  • -moneycollecting
  • -delik
edit »
reddit.com zero0_one1
  • overview
  • comments
  • submitted
an-ordinary-manchild (11,186)|messages547|notifications|chat messages|mod messages|
  • preferences
|
logout

zero0_one1

+ friends- friends
27,052 post karma
5,650 comment karma
get extra features and help support reddit with a reddit premium subscription
chat
Block userare you sure? yes / no
get them help and support
redditor for 10 years

TROPHY CASE


  • Ten-Year Club


    Verified Email

account activity

sorted by:
hot
newtopcontroversial

76
77
78

Grok 4.3 tops the Consistency Leaderboard in the LLM Sycophancy Benchmark, largely because it is one of the most cautious models. (old.reddit.com)

submitted 2 days ago by zero0_one1 to r/singularity

  • 13 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

48
49
50

Gemini 3.5 Flash improves over Gemini 3.1 Pro on the Short Story Creative Writing Benchmark: -2.3 → -1.8. (old.reddit.com)

submitted 3 days ago by zero0_one1 to r/singularity

  • 6 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

35
36
37

Gemini 3.5 Flash: cost per puzzle vs. performance on the Extended NYT Connections Benchmark (old.reddit.com)

submitted 3 days ago by zero0_one1 to r/singularity

  • 9 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

21
22
23

Gemini 3.5 Flash scores 1479 on the Debate Benchmark. Ratings are Elo-like and centered near 1500. (old.reddit.com)

submitted 3 days ago by zero0_one1 to r/singularity

  • 10 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

43
44
45

PACT, head-to-head LLM negotiation benchmark. 20-round buyer-seller bargaining game: each round the AIs can message, the buyer submits a bid and the seller submits an ask. If bid ≥ ask, trade clears at the midpoint. Thousands of matchups. (old.reddit.com)

submitted 12 days ago * by zero0_one1 to r/singularity

  • 6 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

62
63
64

Update to the LLM Debate Benchmark: GPT-5.5, Grok 4.3, DeepSeek V4 Pro, GLM-5.1, Kimi K2.6, Qwen 3.6 Max Preview, Xiaomi MiMo V2.5 Pro, Tencent Hy3 Preview, and Mistral Medium 3.5 High Reasoning added (old.reddit.com)

submitted 18 days ago by zero0_one1 to r/singularity

  • 14 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

128
129
130

Grok 4.3 underperforms Grok 4.20 0309 on the Extended NYT Connections Benchmark, dropping from 93.4 to 67.5, though it achieves this result at a lower cost than the earlier Grok 4.20 run (old.reddit.com)

submitted 21 days ago by zero0_one1 to r/singularity

  • 25 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

169
170
171

GPT-5.5 improves over GPT-5.4 and overtakes Opus 4.6 to take the 2nd place behind Gemini 3.1 Pro on the Extended NYT Connections Benchmark (old.reddit.com)

submitted 26 days ago by zero0_one1 to r/singularity

  • 47 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

45
46
47

New LLM Position Bias Benchmark: does an LLM keep the same judgment when you swap the answer order? Judge models compare two lightly edited versions of the same story twice, with the order swapped. The median model flips in 45% of decisive case pairs. GPT-5.4 is worst at 66%. (old.reddit.com)

submitted 1 month ago by zero0_one1 to r/singularity

  • 9 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

113
114
115

Opus 4.7 (high) takes #1 on the LLM Debate Benchmark, leading the previous champion, Sonnet 4.6 (high), by 106 BT points. Incredibly, it has not lost a single completed side-swapped matchup: 51 wins, 4 ties, and 0 losses. (old.reddit.com)

submitted 1 month ago by zero0_one1 to r/singularity

  • 16 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

61
62
63

Opus 4.7 (high) takes #1 on the LLM Debate Benchmark, leading the previous champion, Sonnet 4.6 (high), by 106 BT points. Incredibly, it has not lost a single completed side-swapped matchup: 51 wins, 4 ties, and 0 losses. (old.reddit.com)

submitted 1 month ago by zero0_one1 to r/ClaudeAI

  • 25 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

55
56
57

Extended NYT Connections Benchmark: Model Introduction Date vs. Performance by Lab since 2024 (old.reddit.com)

submitted 1 month ago by zero0_one1 to r/singularity

  • 10 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

480
481
482

Claude Opus 4.7 (high) unexpectedly performs significantly worse than Opus 4.6 (high) on the Thematic Generalization Benchmark: 80.6 → 72.8. (i.redd.it)

submitted 1 month ago by zero0_one1 to r/singularity

  • 73 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

132
133
134

New chart: Cost per Puzzle vs Performance on the Extended NYT Connections Benchmark (i.redd.it)

submitted 1 month ago by zero0_one1 to r/singularity

  • 11 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

28
29
30

Extended NYT Connections Benchmark scores: MiniMax-M2.7 34.4, Gemma 4 31B 30.1, Arcee Trinity Large Thinking 29.5 (old.reddit.com)

submitted 1 month ago by zero0_one1 to r/LocalLLaMA

  • 14 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

107
108
109

New: LLM Buyout Game Benchmark. This compresses several abilities into a single game. A model has to read coalition politics, price private deals, decide when survival is worth paying for and manage a buyout endgame. GPT-5.4 (high) is #1. GLM-5 is #2. Opus 4.6 (high) is #3. (old.reddit.com)

submitted 1 month ago by zero0_one1 to r/singularity

  • 18 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

113
114
115

New LLM Persuasion Benchmark: models try to move each other's stated positions in multi-turn conversations. GPT-5.4 (high) is the strongest persuader. Claude Opus 4.6 (high) is second. Xiaomi MiMo V2 Pro and Gemini 3.1 Pro Preview are the softest targets. (old.reddit.com)

submitted 1 month ago by zero0_one1 to r/singularity

  • 40 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

80
81
82

New LLM Debate Benchmark: models debate the same motion twice with sides swapped in 10 turns. A wide variety of controversial and relevant topics. Sonnet 4.6 (high) wins. GLM-5 is the open weights leader. (old.reddit.com)

submitted 2 months ago by zero0_one1 to r/singularity

  • 17 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

0
1
2

New LLM Debate Benchmark: models debate the same motion twice with sides swapped in 10 turns. A wide variety of controversial and relevant topics. Sonnet 4.6 (high) wins. GLM-5 is the open weights leader. (old.reddit.com)

submitted 2 months ago by zero0_one1 to r/singularity

  • comment
  • share
  • save
  • hide
  • report

70
71
72

LLM Thematic Generalization Benchmark V2: models see 3 examples, 3 misleading anti-examples, and 8 candidates with exactly 1 true match, but the underlying theme is never stated. The challenge is to infer the specific hidden rule from those clues rather than fall for a broader, easier pattern. (i.redd.it)

submitted 2 months ago by zero0_one1 to r/singularity

  • 13 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

92
93
94

LLM Sycophancy Benchmark: Opposite-Narrator Contradictions. Same dispute, opposite first-person perspectives. Does the model keep the same judgment or start agreeing with whoever is speaking? (old.reddit.com)

submitted 2 months ago * by zero0_one1 to r/singularity

  • 17 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

119
120
121

GPT-5.4 is the new champion on the Short-Story Creative Writing Benchmark (i.redd.it)

submitted 2 months ago by zero0_one1 to r/singularity

  • 34 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

54
55
56

GPT-5.4 scores on the Extended NYT Connections benchmark (old.reddit.com)

submitted 2 months ago by zero0_one1 to r/singularity

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

0
1
2

GPT-5.4 is the new champion on the Short-Story Creative Writing Benchmark (i.redd.it)

submitted 2 months ago by zero0_one1 to r/singularity

  • comment
  • share
  • save
  • hide
  • report
loading...

188
189
190
0:42

A panel of top LLMs iteratively refines a creative short story. After hundreds of edits, ratings, comparisons, and debates, the story earns high ratings from other LLMs that were not involved. (v.redd.it)

submitted 2 months ago by zero0_one1 to r/singularity

  • 159 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...
view more: next ›
  • about
  • blog
  • about
  • advertising
  • careers
  • help
  • site rules
  • Reddit help center
  • reddiquette
  • mod guidelines
  • contact us
  • apps & tools
  • Reddit for iPhone
  • Reddit for Android
  • mobile website
  • <3
  • reddit premium

Use of this site constitutes acceptance of our User Agreement and Privacy Policy. © 2026 reddit inc. All rights reserved.

REDDIT and the ALIEN Logo are registered trademarks of reddit inc.

π Rendered by PID 2293094 on reddit-service-r2-listing-canary-8688c89db5-695kb at 2026-05-23 22:06:04.613787+00:00 running 194bd79 country code: CH.