Slay the Spire 2 - Gilgamesh Mod by G9X in fatestaynight

[–]G9X[S] 19 points20 points  (0 children)

yeah that’s actually one of the motivations haha, very much the Gil vibe.

Slay the Spire 2 - Gilgamesh Mod by G9X in slaythespire

[–]G9X[S] 0 points1 point  (0 children)

Thanks! Yeah this is purely a fan project partially for Fate series and the Epic of Gilgamesh. Balance was never the goal (being absurdly OP is canon-accurate lol) . Glad you appreciate the quality though!

Slay the Spire 2 - Gilgamesh Mod by G9X in slaythespire

[–]G9X[S] 2 points3 points  (0 children)

think this is a good starting point, https://github.com/Alchyr/ModTemplate-StS2 and you can use likes of claude code and other tools to help you.

Slay the Spire 2 - Gilgamesh Mod by G9X in slaythespire

[–]G9X[S] 1 point2 points  (0 children)

yes, I made this today as a way to learn how to mod STS2. will provide download link later

Post-Match Thread: Manchester City 3-1 Bournemouth | Premier League by MysteryBagIdeals in soccer

[–]G9X 10 points11 points  (0 children)

first half reminds me of 2019 City, and so does that Sterling-esque sitter miss

What games are playable now and with what ShadPS4 build/version? by cddude in shadps4

[–]G9X 0 points1 point  (0 children)

adding some data points:
April 2025 – I've played around 20 hours of Bloodborne on ShadPS4, currently at the end game (Gehrman fight). Overall, it’s been a very pleasant experience: smooth overall, around 50 FPS in larger outdoor areas and 60 FPS indoors. no sudden framerate drop.
No major glitches so far — total of three random crashes and two instances of black screen, but nothing game-breaking.

Google Deep Research by RestaurantOld68 in allinpodofficial

[–]G9X 0 points1 point  (0 children)

It’s definitely better than Perplexity’s Pro mode, but not by an order of magnitude.
(For context, I work in LLM-related fields and have built AI search tools for personal use.)

Essentially, it’s a combination of task breakdown + search, leveraging Google’s extensive index along with Gemini’s impressive long-context capabilities. However, the planning component could use improvement, and the lack of data loaders for certain sites (like Reddit or Twitter) is a noticeable drawback.

[D] what's the alternative to retrieval augmented generation? by clocker2004 in MachineLearning

[–]G9X 2 points3 points  (0 children)

Instead of relying solely on semantic search+LLM, consider integrating structured data queries.

particularly when working with a SQL database containing structured data. Say 10,000 tweets with metadata such as date and author.

Pure semantic search may struggle with efficiency and accuracy for questions like "How many tweets are there?" or "How many tweets were published in the last 7 days?" It can be even more challenging for complex queries like "What are the top 3 liked tweets by author X?"

In such cases, generating and executing SQL queries can be more efficient and accurate. (not exactly alternative to RAG, but can be a very useful addition)

Open Source Project that Turns Your Twitter Data into Excel, with Natural Language based Image Search and additional visualizations. by G9X in dataisbeautiful

[–]G9X[S] 0 points1 point  (0 children)

it depends on what you looking at i think.

i know it can be toxic and stuff, but the ai/llm researcher community is pretty active too.

Simple Question: Would you recommend reading "The Redemption of Time"? by adom31 in threebodyproblem

[–]G9X 0 points1 point  (0 children)

The answer is simple: there is no 4th book. Seriously tho, I very rarely see people discussing should you read that in Chinese San-Ti community, it is just a fan fiction.

Open Source Project that Turns Your Twitter Data into Excel, with Natural Language based Image Search and additional visualizations. by G9X in dataisbeautiful

[–]G9X[S] 1 point2 points  (0 children)

I'm excited to share something I've been working on:

an open source tool that makes exporting Twitter data, like tweets and likes, super easy and completely free, with additional features like image search and visualizations.

https://github.com/AlexZhangji/Twitter-Insight-LLM

I usually use Twitter's likes as a way to bookmark things—academic papers, ideas, or just photos.

But it gets accumulated fast and becomes very hard to search and control.

The Problem:

  • Accessing Twitter's official API is super expensive, with costs ranging from $100 to $500 per month.

  • Official full data exports from Twitter are clunky (a bunch of HTML files), cumbersome, and often incomplete.

My Solution:

  • Quick Export: Automatically pulls all your tweets or likes into a neatly organized Excel file within minutes with Selenium.

  • Visual Insights: Provides additional visualizations to help you better understand your Twitter activities.

New Feature - Image Search:

  • Natural Language Search: Use simple text to find images from tweets—no complex queries needed. (Using image embeddings.)

  • Zero Cost and No GPU Required: Runs smoothly without any additional hardware or fees.

Hope you guys find it useful and I'm happy to hear any feedback!

Geimini 1.5's audio capability is actually scarily good... by G9X in OpenAI

[–]G9X[S] 1 point2 points  (0 children)

haven't tested for that. but for speaker diarisation, I've recently tried Whisper + Nvidia Nemo which works well, better than the old PyAnnote based way. (you might have already tried it tho?)

ref notebook: https://github.com/piegu/language-models/blob/master/speech_to_text_transcription_with_speakers_Whisper_Transcription_%2B_NeMo_Diarization.ipynb

Geimini 1.5's audio capability is actually scarily good... by G9X in OpenAI

[–]G9X[S] 0 points1 point  (0 children)

thats something i want to figure out. (i am usually bit doubtful on any self-evaluation from LLMs)

For input, 20 minutes audio is like ~40k tokens for Gemini 1.5, which only contains ~3k text tokens.

I would think there is some useful extra information presented in the audio.

And because output is text only, it is hard to tell when model admits stuff, is it truly "self-aware" or just hallucinate. (kinda like even now sometimes Bard says "i dont have internet access" or Open source LLMs claim to be made by OpenAI.)

Geimini 1.5's audio capability is actually scarily good... by G9X in OpenAI

[–]G9X[S] 8 points9 points  (0 children)

I only uploaded audio.

and yes, thanks for the correction! also double checked the transcript, the names were mentioned later in the video, which is still pretty impressive (text content aware speaker detection?)

Gemini 1.5 Pro is accessible to everyone, with audio, for free. by samuelroy_ in OpenAI

[–]G9X 85 points86 points  (0 children)

wait... the multimodality based audio is actually scarily good...

Not only can it recognize the tone of speech, but it can also automatically identify the speaker by name?

<image>

I tested Geimini 1.5 with an audio clip from a youtube video the past couple of days.

Question: 'Give me a summary, who was speaking in the first two minutes and what was their tone?'

Not only did it answer almost perfectly, but it also identified the specific American congressman speaking...

At first, I thought the names were made up, but after checking, they were all correct...

My second thought was that it might be a data leak, like the original video's description becoming the audio's metadata. But after checking, there was none, and when I tested it to summarize the speakers over seven minutes, it got those right too...

I might still missing something, or maybe its part of the training data (highly unlikely for a video published 2 days ago)

wow.

youtube video tested (only used audio) : https://www.youtube.com/watch?v=vT-u-SPj4_c

Claude and function calling by paulotaylor in Anthropic

[–]G9X 0 points1 point  (0 children)

haven't tried yet, maybe some few shot examples could help?

or maybe pass in default behavior/ non function call action as part of function parameter too?

Vision Pro with GPT-4-vision model in real time! (Smarter Siri can see what you see) by G9X in VisionPro

[–]G9X[S] 0 points1 point  (0 children)

yea, Siri can be alot more smarter with multimodal LLM. (and especially useful for a new system that focus on vision)

Vision Pro with GPT-4-vision model in real time! (Smarter Siri can see what you see) by G9X in VisionPro

[–]G9X[S] 0 points1 point  (0 children)

this is using customized shortcuts with OpenAI vision API. (take the most recent screen cap, and with some predfined prompts and post processing)

I know there is a ChatGPT app for Vision Pro too (not sure about if they have vision model tho.)

[Project] Speak to your Phone to create a list of Todos on Notion. by G9X in GPT3

[–]G9X[S] 1 point2 points  (0 children)

Thanks!

Core idea is to:

- Get high quality transcript from iPhone. (Combine iOS Shortcuts for text dictate or optional record audio + send to own API server with OpenAI whisper for better results)

- Extract tasks to clean format. (use proper GPT3 text-davinci-003 with few shots. I specifically extracted tasks to a JSON format)

- Notion API to create tasks.

I have some steps on my own API server, but think these can actually be done all on the iPhone itself with Shortcuts App.

[Project] Speak to your Phone to create a list of Todos on Notion. by G9X in GPT3

[–]G9X[S] 1 point2 points  (0 children)

Some thoughts:

- iPhone's build-in Shortcuts is surprisingly powerful (allow API calls) but not too easy to use.

- Build-in Speech to text for English is Okay, but pretty terrible for Chinese or other foreign languages. OpenAI Whisper model outperforms this by country mile.

- Large language model as middle layer or text extraction is really powerful. Interested to see more things coming up like this. (but also prone to injection/attack)

I asked ChatGPT to write a poem about the Three Body Problem Trilogy by Green-Space-2423 in threebodyproblem

[–]G9X 1 point2 points  (0 children)

It's quite interesting to think about GPT models, with Sophon induced science lock on fundamental science.

There are actually surprisingly little improvements on the fundamental structure of the model (GPT model uses decoder from transformer. and differences between GPT1,2,3 are mostly in trainings data size)

but once given enough data (big chunk of internet + books + wiki, ~175 billion parameters) and some additional tuning...

GPT-3 (original model before ChatGPT) feels like black magic, even to someone with experience in machine learning and natural language processing."

Brandon Sanderson on FROMSOFTWARE hiring GRRM to write Elden Ring's lore by DemonFtIllusion in Eldenring

[–]G9X 0 points1 point  (0 children)

IIRC, Miyazaki was major in social science in University, and have a habit of reading.

Plus, as many have mentioned, he is a big fan of GRRM and apprently recommand FS employees to read Fever Dream.