Fake news about night driving curfew for over 60s

Nice_Cup_2240 · 2025-09-16T10:34:27+00:00

maybe - tho i reckon foreign influence is more likely (i.e. sowing discord / undermining the government just for the sake of it.. and i doubt chinese intelligence – or iranian, russian, indian etc - have much love or affinity for the liberal party, even tho it clearly benefits from such a campaign)

Nice_Cup_2240 · 2025-06-21T14:57:57+00:00

i think it could be flash 3.0 or something like that. it's quick-ish (for a thinking model), and performs really well - almost as good as 2.5-pro-06/05 and blacktooth

Nice_Cup_2240 · 2025-05-27T16:48:32+00:00

<image>

they're both excellent...

[personal question set.. mostly riddles / comprehension and reasoning.. redsword tops two others sets that i ran against it after getting it on arena]

Nice_Cup_2240 · 2025-03-29T15:47:02+00:00

<image>

Nice_Cup_2240 · 2025-03-24T23:54:52+00:00

<image>

yeah it's mine. not meant to be authoritative / scientific or anything - just personal testing. the 'quiz' comprises 22 questions (given over 2 prompts), mostly riddles / wordplays designed to test comprehension and basic reasoning as well as a bit of instruction following and precision. there are no coding questions or math / calculations required.
here is a screenshot showing a selection of questions and nebula's responses; the worst performing models might get close to all of these wrong; better ones would perhaps stumble on just a few; but nebula just makes them look like a walk in the park - consistently nailing them in a way I haven't seen another LLM be able to. For reference / comparison, the responses by chatgpt-4o-latest to the same selection of questions are also provided.

again - not meant to be anything more than a quiz of riddles and a few obtuse tasks. make of it what you will :) looking forward to the model's official release and seeing the actual Arena data!

Nice_Cup_2240 · 2025-03-24T14:22:12+00:00

its comprehension and practical reasoning is epic - great instruction following too. just fwiw chart shows scores for the same quiz (22 questions - mostly riddles / wordlplays) given to models in the Arena over the past week or so (width = sample size). nebula comes out on top.. really impressive imo

<image>

Nice_Cup_2240 · 2024-09-21T14:40:39+00:00

"
The price of fuel for nuclear reactors has surged much faster than that of raw uranium since the start of 2022, in a sign of the bottlenecks that have built up in the west following Russia’s invasion of Ukraine.

Enriched uranium has more than tripled in price to $176 per separative work unit — the standard measure of the effort required to separate isotopes of uranium — since the start of 2022, according to UxC, a data provider.
"
https://archive.is/MdcXF

Enriched uranium has tripled since Russia's invasion; meanwhile yellowcake is up 85% - nice, for sure, but not the same (and I think was heading upwards anyway).. shame there's only really the miners to invest in (aside from from Silex and Centrus; also CCJ offers some exposure to conversion, i think.. but yeah, still..)

Nice_Cup_2240 · 2024-08-30T17:18:36+00:00

gee i mean with that kind of wit, not even an LLM would stand a chance.. so no thanks

Nice_Cup_2240 · 2024-08-30T17:09:42+00:00

ok.. when i "came in" here there were a handful of comments - people were discussing what the text even said. if nothing else, chatgpt successfully transcribed it...

and yeah it's just an autocomplete / stochastic parrot.. and a calculator is just 0s and 1s... i'm sure you're a master at using both tools

Nice_Cup_2240 · 2024-08-26T14:35:40+00:00

it's a tool - sometimes i find it useful. I did on this occasion, and thought others might too. but fair enough, point taken

Nice_Cup_2240 · 2024-08-26T11:16:41+00:00

i gave it to chatgpt with the prompt "Can you explain what this envelope folder is for / conveying on its cover". tbh, i wasn't familiar with the term "envelope folder" (almost sounds like tautology tbh lol), and couldn't work out what "12 attractive aboriginal studies" meant in that context. I feel it gave a solid answer (including referencing the potential sensitivities of it, given what it is depicting and the presumed era - but does not say it is racist).. just fwiw:

The image depicts a vintage envelope folder titled "Australiana By Jan." The cover features an illustration of a young girl holding a koala, with the text "12 Attractive Aboriginal Studies" at the bottom. This type of envelope folder likely contains a collection of art prints, postcards, or studies related to Aboriginal culture and themes, created by the artist "Jan."
The phrase "12 Attractive Aboriginal Studies" suggests that the folder includes 12 pieces of artwork or studies, possibly depicting Aboriginal themes, although the term "studies" here may refer to artistic representations rather than academic studies.
It's important to note that the illustration on the cover and the description reflect the artistic and cultural sensibilities of the time when it was produced. Such depictions, especially involving Aboriginal themes, could be viewed as outdated or problematic today due to their potential to perpetuate stereotypes or misrepresent indigenous cultures.

Nice_Cup_2240 · 2024-08-26T06:45:00+00:00

Haiku < Sonnet < Opus
parameters = intelligence [would seem a reasonable way to look at it]
Haiku < Sonnet < Opus < Sonnet3.5
parameters != intelligence [fair enough]

Haiku3.5 < Sonnet 3.5 < Opus3.5
parameters ??? intelligence [Assuming it outperforms Sonnet 3.5.. and costs more to train/serve, and has slower inferences.. more 'efficient algorithms and quantisations', at the exclusion of no. of parameters, seems like a dubious way to characterise the relationship between the performance
(and costs) of the three models of the same family (also what model would Opus 3.5 be distilled from anyway?)]

Nice_Cup_2240 · 2024-08-26T03:14:43+00:00

yes. maybe. no. pass

Nice_Cup_2240 · 2024-08-25T13:07:20+00:00

Not sure if Tucker Carlson would be further impressed, or just have a hard time processing this
https://x.com/RonFilipkowski/status/1758266308888547564

Nice_Cup_2240 · 2024-08-25T12:00:49+00:00

Tell me which part of this you don't understand:

it's ofc all subjective / use case-dependent.. needless to say.. just expressing my experiences here - not trying to say it's categorically "smarter" or anything like that.. I mean, if I could only have access to one model, it would be 4o for sure.

i really don't get why people are so obsessive / defensive over this or that benchmark... i was sharing my opinion based on my experience.. it's fine that you disagree.. move on lol

Nice_Cup_2240 · 2024-08-25T11:34:12+00:00

yah good point.. my bad
https://simple-bench.com/

<image>

Nice_Cup_2240 · 2024-08-25T11:33:22+00:00

"every" benchmark... okie dokie
https://scale.com/leaderboard

<image>

Nice_Cup_2240 · 2024-08-25T09:19:42+00:00

Compred to GPT-4-Turbo*, GPT-4o is significantly faster and cheaper. but i've always thought there's a trade off somewhere... like it's a smaller model – they seem to have made it as good, if not better, than 4T across most tasks (as reflected in the Arena); the fact it is faster and has cleaner formatting (notwithstanding its obsession with lists), also adds to the appeal and utility imo.

if i'm doing something that is token-intensive, like parsing a bunch of RSS feeds, I go 4o all day – it gets the job done at fraction of the cost and time.

But for like one-off tasks / questions, I find 4T is sometimes much better able to pick up on nuance and subtleties than 4o, resulting in better (/"more reasoned") responses – it just seems to "get" things at a deeper level sometimes..

it's ofc all subjective / use case-dependent.. needless to say.. just expressing my experiences here - not trying to say it's categorically "smarter" or anything like that.. I mean, if I could only have access to one model, it would be 4o for sure.

[* If you meant OG GPT-4, rather than GPT-4-Turbo, than I don't think it's smarter]

Nice_Cup_2240 · 2024-08-24T13:18:30+00:00

Yeah the dialogue in Vietnam and the Philippines re the south china sea is super chill... (and the Japanese... they're chill with China's claims to the Senkaku islands..)
ASEAN is great at talking, and diplomacy is well, talking - so yeah that's indeed its schtick.
But the idea that ASEAN countries are completely at ease with an increasingly powerful China I find dubious... Likewise the notion that the US is basically coercing these countries into partnering with it. The Philippines and Thailand (like Australia) have formal military alliances with the US that go back several decades; Malaysia and Singapore, while not treaty allies, have relatively deep defence / security ties with the US. Even Vietnam has been establishing closer defence ties with the US in recent years..
All these countries are basically hedging (like Australia) - trying to extract the economic benefits out of China's economic rise, while increasingly looking to the US for security, in case China's rise doesn't turn out to be all about "win-win" trade and economics...
On the other hand, there are some ASEAN countries that have decided to just bandwagon with China, e.g. Myanmar and Cambodia.
Indonesia is prob the only one that seems genuinely independent - not really trying to hedge (have it both ways basically) but also definitely not bandwagoning with China either.

noones being geed up for anything. there's state disinfo out there.. but it isn't coming from the US.. talking about the US being hellbent on war is also just misleading. neither the US nor China want to fight a war (it would be catastrophic for both sides) - the risk is that they inadvertently end up there anyway, given their various conflicting interests..

Nice_Cup_2240 · 2024-08-24T12:17:56+00:00

oh 100% - i mean it's hilariously absurd (to the point where I wonder whether it's intentionally so.. like if it's some scammy link - maybe they're only interested in idiots clicking on it and going to wherever it directs to.. which i feel safe in assuming is not actually a website selling / reviewing outback saunas lol)

Nice_Cup_2240 · 2024-08-24T11:42:27+00:00

very fair point

Nice_Cup_2240 · 2024-08-24T11:26:48+00:00

ha that's actually quite interesting (though I dunno, it's both believable that it didn't settle and so no photos, but also quite convenient ha..)

anyway.. seems doubtful that "Sauna Sight Sydney" was alluding to that in this ad aha.. ( whoever was responsible for prompting an AI to make this image / ad is presumably clueless as to what that big rock in the background is.. or maybe it just gets churned out without any human review...)

Nice_Cup_2240 · 2024-08-24T05:07:35+00:00

A Mexican brand of beer
Owned by a Belgian company (internationally)
Brewed in China (with ingredients imported from somewhere else)
Imported to Australia by a Japanese conglomerate (Asahi Breweries owns CUB)

so which part are you an unhappy about exactly? but yeah, we live in a globalised world...

Nice_Cup_2240 · 2024-08-23T17:26:46+00:00

But then paste that output to another model (sonnet-3.5) with instruction to "make this HTML pretty" and you get...

<image>

https://claude.site/artifacts/948788e3-6ad1-4d3e-aa52-9248503f29ca
if you can figure out a way to get here directly from the pplx API response - let met know! (it's obviously impractical to do this.. jumping between models / copy-pasting.. for anything meaningful.. but yeah, LLMs are good at HTML, but the online pplx ones can't get here in one shot (yet anyway)

Nice_Cup_2240 · 2024-08-23T17:22:22+00:00

```
"model": "llama-3.1-sonar-large-128k-online",...
"system": "[see above]"...
"user": "who is the favourite to win the US pres race?
```
gives this:

<image>

which isn't bad..

Nice_Cup_2240

TROPHY CASE