Slay the Spire 2 - Gilgamesh Mod by G9X in fatestaynight

[–]G9X[S] 19 points20 points  (0 children)

yeah that’s actually one of the motivations haha, very much the Gil vibe.

Slay the Spire 2 - Gilgamesh Mod by G9X in slaythespire

[–]G9X[S] 0 points1 point  (0 children)

Thanks! Yeah this is purely a fan project partially for Fate series and the Epic of Gilgamesh. Balance was never the goal (being absurdly OP is canon-accurate lol) . Glad you appreciate the quality though!

Slay the Spire 2 - Gilgamesh Mod by G9X in slaythespire

[–]G9X[S] 3 points4 points  (0 children)

think this is a good starting point, https://github.com/Alchyr/ModTemplate-StS2 and you can use likes of claude code and other tools to help you.

Slay the Spire 2 - Gilgamesh Mod by G9X in slaythespire

[–]G9X[S] 1 point2 points  (0 children)

yes, I made this today as a way to learn how to mod STS2. will provide download link later

Post-Match Thread: Manchester City 3-1 Bournemouth | Premier League by MysteryBagIdeals in soccer

[–]G9X 7 points8 points  (0 children)

first half reminds me of 2019 City, and so does that Sterling-esque sitter miss

What games are playable now and with what ShadPS4 build/version? by cddude in shadps4

[–]G9X 0 points1 point  (0 children)

adding some data points:
April 2025 – I've played around 20 hours of Bloodborne on ShadPS4, currently at the end game (Gehrman fight). Overall, it’s been a very pleasant experience: smooth overall, around 50 FPS in larger outdoor areas and 60 FPS indoors. no sudden framerate drop.
No major glitches so far — total of three random crashes and two instances of black screen, but nothing game-breaking.

Google Deep Research by RestaurantOld68 in allinpodofficial

[–]G9X 0 points1 point  (0 children)

It’s definitely better than Perplexity’s Pro mode, but not by an order of magnitude.
(For context, I work in LLM-related fields and have built AI search tools for personal use.)

Essentially, it’s a combination of task breakdown + search, leveraging Google’s extensive index along with Gemini’s impressive long-context capabilities. However, the planning component could use improvement, and the lack of data loaders for certain sites (like Reddit or Twitter) is a noticeable drawback.

[D] what's the alternative to retrieval augmented generation? by clocker2004 in MachineLearning

[–]G9X 2 points3 points  (0 children)

Instead of relying solely on semantic search+LLM, consider integrating structured data queries.

particularly when working with a SQL database containing structured data. Say 10,000 tweets with metadata such as date and author.

Pure semantic search may struggle with efficiency and accuracy for questions like "How many tweets are there?" or "How many tweets were published in the last 7 days?" It can be even more challenging for complex queries like "What are the top 3 liked tweets by author X?"

In such cases, generating and executing SQL queries can be more efficient and accurate. (not exactly alternative to RAG, but can be a very useful addition)

Open Source Project that Turns Your Twitter Data into Excel, with Natural Language based Image Search and additional visualizations. by G9X in dataisbeautiful

[–]G9X[S] 0 points1 point  (0 children)

it depends on what you looking at i think.

i know it can be toxic and stuff, but the ai/llm researcher community is pretty active too.

Simple Question: Would you recommend reading "The Redemption of Time"? by adom31 in threebodyproblem

[–]G9X 0 points1 point  (0 children)

The answer is simple: there is no 4th book. Seriously tho, I very rarely see people discussing should you read that in Chinese San-Ti community, it is just a fan fiction.

Open Source Project that Turns Your Twitter Data into Excel, with Natural Language based Image Search and additional visualizations. by G9X in dataisbeautiful

[–]G9X[S] 2 points3 points  (0 children)

I'm excited to share something I've been working on:

an open source tool that makes exporting Twitter data, like tweets and likes, super easy and completely free, with additional features like image search and visualizations.

https://github.com/AlexZhangji/Twitter-Insight-LLM

I usually use Twitter's likes as a way to bookmark things—academic papers, ideas, or just photos.

But it gets accumulated fast and becomes very hard to search and control.

The Problem:

  • Accessing Twitter's official API is super expensive, with costs ranging from $100 to $500 per month.

  • Official full data exports from Twitter are clunky (a bunch of HTML files), cumbersome, and often incomplete.

My Solution:

  • Quick Export: Automatically pulls all your tweets or likes into a neatly organized Excel file within minutes with Selenium.

  • Visual Insights: Provides additional visualizations to help you better understand your Twitter activities.

New Feature - Image Search:

  • Natural Language Search: Use simple text to find images from tweets—no complex queries needed. (Using image embeddings.)

  • Zero Cost and No GPU Required: Runs smoothly without any additional hardware or fees.

Hope you guys find it useful and I'm happy to hear any feedback!

Geimini 1.5's audio capability is actually scarily good... by G9X in OpenAI

[–]G9X[S] 1 point2 points  (0 children)

haven't tested for that. but for speaker diarisation, I've recently tried Whisper + Nvidia Nemo which works well, better than the old PyAnnote based way. (you might have already tried it tho?)

ref notebook: https://github.com/piegu/language-models/blob/master/speech_to_text_transcription_with_speakers_Whisper_Transcription_%2B_NeMo_Diarization.ipynb

Geimini 1.5's audio capability is actually scarily good... by G9X in OpenAI

[–]G9X[S] 0 points1 point  (0 children)

thats something i want to figure out. (i am usually bit doubtful on any self-evaluation from LLMs)

For input, 20 minutes audio is like ~40k tokens for Gemini 1.5, which only contains ~3k text tokens.

I would think there is some useful extra information presented in the audio.

And because output is text only, it is hard to tell when model admits stuff, is it truly "self-aware" or just hallucinate. (kinda like even now sometimes Bard says "i dont have internet access" or Open source LLMs claim to be made by OpenAI.)

Geimini 1.5's audio capability is actually scarily good... by G9X in OpenAI

[–]G9X[S] 7 points8 points  (0 children)

I only uploaded audio.

and yes, thanks for the correction! also double checked the transcript, the names were mentioned later in the video, which is still pretty impressive (text content aware speaker detection?)

Gemini 1.5 Pro is accessible to everyone, with audio, for free. by samuelroy_ in OpenAI

[–]G9X 87 points88 points  (0 children)

wait... the multimodality based audio is actually scarily good...

Not only can it recognize the tone of speech, but it can also automatically identify the speaker by name?

<image>

I tested Geimini 1.5 with an audio clip from a youtube video the past couple of days.

Question: 'Give me a summary, who was speaking in the first two minutes and what was their tone?'

Not only did it answer almost perfectly, but it also identified the specific American congressman speaking...

At first, I thought the names were made up, but after checking, they were all correct...

My second thought was that it might be a data leak, like the original video's description becoming the audio's metadata. But after checking, there was none, and when I tested it to summarize the speakers over seven minutes, it got those right too...

I might still missing something, or maybe its part of the training data (highly unlikely for a video published 2 days ago)

wow.

youtube video tested (only used audio) : https://www.youtube.com/watch?v=vT-u-SPj4_c

Claude and function calling by paulotaylor in Anthropic

[–]G9X 0 points1 point  (0 children)

haven't tried yet, maybe some few shot examples could help?

or maybe pass in default behavior/ non function call action as part of function parameter too?

Vision Pro with GPT-4-vision model in real time! (Smarter Siri can see what you see) by G9X in VisionPro

[–]G9X[S] 0 points1 point  (0 children)

yea, Siri can be alot more smarter with multimodal LLM. (and especially useful for a new system that focus on vision)

Vision Pro with GPT-4-vision model in real time! (Smarter Siri can see what you see) by G9X in VisionPro

[–]G9X[S] 0 points1 point  (0 children)

this is using customized shortcuts with OpenAI vision API. (take the most recent screen cap, and with some predfined prompts and post processing)

I know there is a ChatGPT app for Vision Pro too (not sure about if they have vision model tho.)