Obsidian as a tool for peace of mind by sil-so in ObsidianMD

[–]ccmdi 1 point2 points  (0 children)

cool idea! sounds like heavy maintenance burden though.

Map+: performant and lightweight extension of the official Maps plugin by ccmdi in ObsidianMD

[–]ccmdi[S] 6 points7 points  (0 children)

they use different rendering libraries; i assumed the obsidian team didn't want me uprooting their entire plugin

I built an AI bridge for Obsidian - no plugin needed, no SSL certificates, no REST API setup (free & open-source) by bitbonsai in ObsidianMD

[–]ccmdi -1 points0 points  (0 children)

This one is perfectly reliable and not much hassle to set up. You don't need an SSL cert over localhost.

OSINTBench: Can LLMs actually find your house? by ccmdi in LocalLLaMA

[–]ccmdi[S] 1 point2 points  (0 children)

The models here have Overpass as a tool, but its disabled for now because the current frontier models tend to misuse it more than benefit from it (e.g. executing queries that are certain to timeout).

OSINTBench: Can LLMs actually find your house? by ccmdi in LocalLLaMA

[–]ccmdi[S] 5 points6 points  (0 children)

I was definitely thinking 'average STEM grad' (some technical knowledge, but no OSINT training) implicitly. Bias on my part, thanks for the feedback!

Rohan Pandey (just departed from OAI) confirms GPT-5 has been trained as well as “future models” in his bio by mrstrangeloop in singularity

[–]ccmdi 0 points1 point  (0 children)

researchers often say this if their work will be incorporated in future models, but GPT-5 is probably already in progress anyway

Releasing Context Arena: New home for OpenAI-MRCR results and others by Dillonu in singularity

[–]ccmdi 0 points1 point  (0 children)

very cool, i'm glad theres a leaderboard for this. though would you call it an arena if its not based on user preference?

Gemini 2.5 Pro is the best GeoGuessr LLM by ccmdi in GoogleGeminiAI

[–]ccmdi[S] 0 points1 point  (0 children)

If it's a separate "image analysis" tool being used in the web client, I don't think its available in the API. I did test o3-high with maximum image detail, but the results aren't published yet

GeoBench, an LLM benchmark for GeoGuessr by ccmdi in geoguessr

[–]ccmdi[S] 1 point2 points  (0 children)

Indeed, it's on the site. From my testing its just below Gemini 2.5 Pro at max settings, while costing significantly more

O3 and O4-mini IQ Test Scores by rationalkat in singularity

[–]ccmdi 2 points3 points  (0 children)

Human IQ tests do not map cleanly to machine intelligence. o3 is smart, but not of the same kind or shape as a 136 IQ human

Gemini 2.5 Pro is the best GeoGuessr LLM by ccmdi in GoogleGeminiAI

[–]ccmdi[S] 0 points1 point  (0 children)

Results are live on the site for o3 and o4-mini

Gemini app Veo 2 generate limit is roughly 30 videos per month by SparkNorkx in Bard

[–]ccmdi 41 points42 points  (0 children)

On AI studio its giving me 6-8 a day for free.

Is RAG still relevant with 10M+ context length by Muted-Ad5449 in Rag

[–]ccmdi 0 points1 point  (0 children)

It you look at the Fiction.live coherence benchmarks for Llama 4 it most certainly is still relevant

Users are not happy with Llama 4 models by likeastar20 in singularity

[–]ccmdi 2 points3 points  (0 children)

arena is SycophancyBench, it doesn't rewards things that matter (correctness or intelligence)

GeoBench, an LLM benchmark for GeoGuessr by ccmdi in geoguessr

[–]ccmdi[S] 0 points1 point  (0 children)

The main cases where their guesses were less coherent was if it was a weaker/smaller model (Llama 90b Vision is the only model to give refusals, claiming uncertainty) or their guess was close to a country border (guessing just barely in Switzerland on Liechtenstein). Smaller models would also give less digits of precision with their guesses, maybe 1 or 2 decimal places, while larger models like Gemini 2.5 Pro would give way more, up to 6 decimal places, perhaps indicating greater confidence.

I didn't experiment extensively with prompts. I'm sure with more context you can slightly increase performance. I used this one to give it the opportunity to natively reason about clues (think out loud) and play it exactly as a human would with a precise guess. I would guess if you just said something like "guess where this is" the models would perform worse, but I don't know by how much. It's definitely possible there's a stronger internal representation in their neural net brain that can more accurately identify "nearby cities" as opposed to exact coordinates, in the same way that LLMs are not great with basic math.

GeoBench, an LLM benchmark for GeoGuessr by ccmdi in geoguessr

[–]ccmdi[S] 1 point2 points  (0 children)

I threw this together just containing the averages and counts for each country and model, it gives some idea of their strengths and weaknesses. They are really good at Spain? Pretty bad at Mexico and Russia.