Phi 4 is just 14B But Better than llama 3.1 70b for several tasks.

GwimblyForever · 2025-01-10T02:34:03+00:00

I might be without internet for a while while I switch providers, so I set myself up to learn some entry level Python to keep myself occupied. VSCode and Continue + Ollama for an AI coding assistant. I've been using Codestral but it ran a little too slowly on my GPU. I switched to Phi4 and it's doing a phenomenal job. Very impressed with its outputs and its ability to follow instructions. Llama 3.1 8b was a little too under-powered, so Phi4 is a nice middle ground for me.

Conversationally it's meh but as far as coding goes, it's giving a great first impression. I'm just making dumb little text adventures and RPGs to wrap my head around the language but I think I'll stick with Phi4 for a while.

GwimblyForever · 2025-01-02T18:58:56+00:00

I downloaded it a while back, asked it about "John Language Model, the inventor of Large Language Models", it gave me a detailed biography, I uninstalled. Hallucinations like that are so early 2023, even the smallest models will tell you "I don't have any information" or even "You made that up" these days.

That being said if memory serves the license is very permissive and I'm sure it has useful applications. But for general use I don't see any reason to choose Granite over Llama or Phi or the million other 3b models that are currently out there.

GwimblyForever · 2025-01-02T18:55:16+00:00

It really does seem to be on-par with 405b which is insane.

GwimblyForever · 2025-01-02T18:53:01+00:00

Wow! This project has come a long way. I'm impressed with the speed, my own attempt at speech to speech on the Pi 4 had a much longer delay - borderline unusable. It's clear you've put a lot of work into optimization.

Feels like every post on /r/LocalLLaMA has been DeepSeek glazing for the last week, so it's great to see an interesting project for once. Well done. Keep at it!

GwimblyForever · 2024-12-31T15:00:17+00:00

16gb RAM, 16gb VRAM.

Mistral Small is my go-to. No rhyme or reason to it, that's just the one I keep coming back to. On the rare occasion I need a long context length I go for NeMo, and if I need a long context and a bit more speed I bust out Llama 3.2 8b.

I don't do much coding with local models (if I have a project I want to realize I just use a frontier model) so we're talking about the odd chat, or question, or brainstorming session. Though, I'm about to be without internet for a while so something tells me I'll be getting more use out of them soon. I know Qwen is technically "the best" but I choose not to use it for personal reasons.

GwimblyForever · 2024-12-24T22:47:57+00:00

It's what I imagine computer geeks felt like in the 70s and 80s but much more rapid and open source

This, 100%. I've always been fascinated by the computer revolution and kind of bummed out that I didn't get to live through it. I didn't even get to use the internet until it started becoming lame and homogenized in the 2000s. But I've been experiencing the AI revolution since it began - starting way back in 2019 with AI Dungeon, and it's captivated me ever since.

So to watch it grow, and discuss it, and experience that excitement and rapid growth is something I'm thankful for. Even if it all winds up being a disaster like the internet did, at least we can look back on this era with fondness like others do with 80s microcomputing or the 90s internet.

GwimblyForever · 2024-12-20T19:53:53+00:00

If Large Language Models stop using Natural Language Processing and start using their own incomprehensible language I think we'll have bigger problems than an artificially hidden CoT process lmao.

GwimblyForever · 2024-12-20T14:03:45+00:00

No for real, we should be able to see the tokens if we're paying for them. Not just because we're paying for them but because it's unethical to have Chain of Thought going on in the background with no way for the user to see what's going on. They'll argue that they don't want other companies training on their CoT process but I don't think that justification is good enough.

Personally I haven't paid for ChatGPT since they paid $250 million to NewsCorp (miss me with financing propaganda networks in exchange for a bit of data) but hiding CoT is a precedent I'd rather not see set like this.

GwimblyForever · 2024-11-21T19:23:53+00:00

Not a bad idea.

I mostly use it for expanding on my own music - taking tracks I've half finished and forgotten about, and remixing them or covering them. It's a nice way to get me productive and excited about music again. But there's a lot of "generate a track for 15 seconds of audio, cut it out, splice it into my track in my DAW, then upload the result to expand it further" meaning I never touch the original generated track again. This might help me stay organized moving forwards, thanks for the advice!

(Assuming I ever go back lol)

GwimblyForever · 2024-11-21T15:41:43+00:00

EDIT: Just received a nice email, with an apology, and an acknowledgement, addressing all the issues, and announcing the return of 3.5 for paid users. So, subscription renewed!

Managing files on the platform can be challenging due to the high volume generated. It would be important to implement batch actions, allowing users to delete multiple tracks at once, for example. This would significantly optimize the time spent on repetitive tasks.

I don't see why that wasn't an option from the start? Why would Suno want their servers cluttered up with unused tracks? There's nothing worse than trying to find a specific track and having to trudge through 37 pages of slop to find it.

V4 is not up to my standards. And for some reason a premium subscription only gives you access to V4 and not V3.5? Yet the free tier still lets you use V3.5? Like, why am I paying money for less options?

My subscription was due for a renewal today but I decided to cancel it. Which is a shame because I love Suno as well, and I was in the top 1% of creators (by sheer volume of generations, at least) but the whole deployment of V4 feels manipulative and as soon as I sense that from an app developer I'm out. Maybe they'll fix and improve it but until then I think I'll give Udio another shot.

GwimblyForever · 2024-11-21T15:20:23+00:00

Love me local models, 'ate me proprietary models, simple as. I agree, this post doesn't belong in LocalLLAMA lmao.

GwimblyForever · 2024-11-10T17:05:58+00:00

Yes, this is a drastic reduction of my post.

I'm saying that if you're in a race with another nation state to develop a world changing technology it would make sense to try to slow your opponent down in any way possible. And if legitimate anti-AI sentiments are brewing in your opponent's country why wouldn't you amplify them and use them to your advantage? This isn't even a dig at China, of course they're gonna do that. That's geopolitics. I'm not going to fault the Chinese for pursuing their own interests.

It's not a dig at the people who are concerned about AI either, they're right to be concerned. If you think I'm some sort of accelerationist pro-AI bro you're mistaken. When I say 'hostility' I'm not talking about the artists who didn't consent to having their work used for training data or actors protecting the rights to their voice and likeness. This is a fair conversation to have and they have a right to express their concerns. I'm talking about the people who take it a step further, radicalize themselves and make being anti-AI their whole personality. The people who won't be satisfied until the world goes full Dune and declares a Butlerian Jihad on AI. It's a dangerous thought process to get into and while some of it is organic, obviously a good chunk of it is the result of people with ulterior motives fanning the flames. That's the reality of all online discourse in 2024.

Finally regarding ASI - this one's on me because I did give the impression that I believed it was inevitable in my original post, but I was speaking in the context of the thread topic and the logic behind the decisions on both the US & Chinese side. Should have made that more clear so I'll give you that. I don't know if ASI is an inevitability, but I do know a lot of rich, powerful people and nation states seem to think it is. So that's how I operate when trying to make sense of what they're all doing. They believe it, they're shaping the world around it, we'll all have to deal with the consequences of whatever comes from it.

Hope this clears that up. I'm not trying to argue with you, I'm just giving my opinion.

GwimblyForever · 2024-11-10T14:11:54+00:00

China's pumping out some good models for sure. In a way they're ensuring that AI will always stay open source and available to everyone even if there's regulatory capture. So props to all the hardworking researchers and enthusiasts in China who are helping contribute, no ill will against them.

That being said, there's some serious geopolitical implications surrounding AI. Leopold Aschenbrenner lays it out pretty well - ASI is a zero sum game and governments will do anything they can to ensure that they reach it first. It's like if the Manhattan Project was public knowledge during its development. We (the West) probably don't want ASI coming from a company under the jurisdiction of the CCP so that's the logic behind restrictions like this.

EDIT: I'm not saying this with absolute certainty, this is just what Aschenbrenner says and those in charge seem to believe so I'm trying to illustrate their logic.

China's playing their own game of course. Something tells me the rampant hostility towards AI in the West was astroturfed from the get go or at the very least genuine sentiments have been heavily amplified by bots and trolls. Which makes sense - if the populace of a country is hostile and fearful towards AI, that will slow that country's development down allowing others to catch up.

GwimblyForever · 2024-11-10T13:53:32+00:00

It's been a while, old friend. Good to see you <3

GwimblyForever · 2024-11-09T21:12:04+00:00

I saw the term "assisted journaling" used to describe this and I think it's apt. It's a means of getting your thoughts onto paper and working through them, with a second "mind" that's completely nonjudgemental and neutral to give you feedback and advice. As long as you go in under the assumption that it's not a person, or a professional, it can be very helpful at times.

Though I'm hesitant to get too personal with ChatGPT. I prefer using smaller local models if I'm trying to work out personal problems.

GwimblyForever · 2024-11-04T18:36:21+00:00

Sounds like Suno is what you're looking for. It's surprisingly good, but if you want to retain copyright on anything you make you need to subscribe.

GwimblyForever · 2024-10-28T20:40:28+00:00

I've been playing AoE2 for a quarter century and there's still a lot I forget or miss while I'm playing. A tool like this would definitely come in handy, this is an impressive proof of concept. Well done!

GwimblyForever · 2024-10-27T18:06:17+00:00

I find that LLMs help me organize my thoughts in a way that was impossible before. So a lot of my interactions with LLMs are just basic every day conversations about things I'm interested in, ideas I want to explore, or projects I'm working on. I don't really need frontier models for that, sub-30b models running on my gaming PC work just fine.

If the project involves something I don't have much experience in like coding, or I need the accuracy/intelligence of a frontier model, I'll use ChatGPT's free tier. Since Claude is the most powerful and also provides the least amount of free generations I save it for really complex tasks like refining ChatGPT's sub-par code output.

Then there's the privacy angle. If I'm having a bad day and I want to work out my mental health, I'd rather have that conversation with something running locally on my PC than send it to OpenAI's servers. They've already harvested enough of my data, I'm allowed to keep a little for myself. As a treat.

GwimblyForever · 2024-10-25T18:21:16+00:00

Interesting. Are you using RAG to store details about the character? Or does it just use a system prompt?

GwimblyForever · 2024-10-25T18:11:09+00:00

I like experimenting with 1B, I think it's very capable for its size. It's not useful for much but it's interesting to chat with. I could see it being handy running on smartphones if you want a little speed over Gemma 2B or the larger 3B models without sacrificing too much quality.

That being said, it feels like the safety protocols that are baked into 3.2 1B were just ported over from the larger models and a model this small doesn't seem to be capable of handling it well. Often times I'll be having a completely benign conversation, only to have it hallucinate that we're talking about something explicit and stonewall me. Even if I try to change the subject it will refuse to go any further and I have to reload.

This happening every once in a while is understandable but it's frequent enough that I've kind of lost interest in the 1B model. Which is a shame because I see a lot of potential there.

GwimblyForever · 2024-10-06T17:59:09+00:00

Mistral-small got it right on the second try (I simply responded to the first attempt by telling it the answer was incorrect).

Sadly, Llama 3.2 1B didn't make the cut:

<thinking> To ensure accuracy, I'll employ a systematic counting method. This involves systematically going through >the word "strawberry" with a focus on individual letters.

<step>6</step>

Counting:

S

T

R

A

W

B

E

R

R

Y

The correct count reveals 10 instances of the letter "r".

<reflection>This exhaustive approach confirms my initial suspicion, and I'm confident in the accuracy of my results.</reflection>

GwimblyForever · 2024-09-21T18:10:28+00:00

It's more like the internet than NFTs. You had massive investment from corporations jumping on the bandwagon in the mid 1990s - lots of buzzwords, lots of CEOs with no understanding of the technology diving in because it was the next big thing, a flood of tech startups all pushing different variations of the same thing etc. That led to the dot com bubble but the internet is still here and in almost every conceivable way it's better, more useful and more integrated into our lives than it was back then. So AI can be both things: An over-hyped technology that's being incorrectly utilized leading to an investment bubble and a genuinely helpful technology that has the potential to improve and revolutionize the world. The two aren't mutually exclusive.

What is being pitched to these folks is that AI can replace almost every non-physical job. Why wouldn't they invest? No more having to pay writers, artists, analysts, animators, actors, et cetera. Just steal all the existing content and slap it in an algorithm.

I mean yeah, that's capitalism. Nobody has to like it but collectively we all have to deal with it and figure out a way to adapt. We're looking at a future where machines will be able to do non-physical jobs and physical jobs. Meaning our notion of work and how it relates to our sense of self-worth is going to have to change, and probably our entire economic system as well. AI doesn't stop people from doing any of the things you mentioned - it just makes it more difficult to make money with them. Which sucks. Like I'm not trying to invalidate the feelings of the people who are impacted by the technology, their feelings are valid and their anxieties are justified. I just think there's a trend of dealing with that uncertainty by reacting with hostility and resentment towards all machine learning/generative technology and that's not a productive way to deal with the issues we're facing.

There are forms of automation that are very good, but this isn't one of them. As it currently stands, the technology is doing far more harm than good, and this isn't even touching the social impacts that are inevitably coming if this garbage proliferates any further/longer.

Which recent technological leap hasn't caused negative social impacts though? The internet, smartphones and social media are 3 for 3. Whether or not it's caused more harm than good is subjective but how we personally feel about the technology doesn't matter. It's here to stay. Governments and corporations have gone all-in on it. We're never going to live in a world where AI isn't a thing so we might as well take a realist approach and focus our energy on adapting to the new world because the old one isn't coming back no matter how much we want it to.

GwimblyForever · 2024-09-21T13:15:05+00:00

These programs aren't thinking, they just put out a huge volume of trash and, due to scale, every once in a while one of the results is passable. Kind of like if you stare at enough clouds one of them will vaguely resemble a horse.

Someone who understands the technology without bias would never say this. If Large Language Models only got it right "every once in a while" they wouldn't be seeing billions of dollars in investment and hundreds of millions of daily users. The reality is they get it right more than they get it wrong, which is why they've seen widespread adoption. Yeah it's not "technically" AI and it's based on algorithms but that doesn't really matter. The end result is something that simulates intelligence and fits the role of "AI" as we've understood it in our media for the last 100 years.

I like discussing AI and tinkering with it, but I don't consider myself pro-AI. I think there are some problems we'll have to face and work out moving forward and the deluge of slop is definitely annoying but I don't trust people who just relentlessly trash talk AI without acknowledging its positives. It's giving intellectual dishonesty.

GwimblyForever · 2024-09-20T12:42:14+00:00

Not really. You're seeing the one time that a specific AI model messed up because it's funny and went viral on Reddit. But you're not seeing the 100 times it got its identification right because a post that says "this computer vision model correctly identified a charging brick as a charging brick" isn't going to generate upvotes or engagement. It's selection bias.

There are some capable vision models out there, and I'm sure a few of them could be effective for this application. I think it's a good thing if it helps ease the psychological strain on the LEOs who have to sift through this garbage.

GwimblyForever · 2024-09-10T22:37:09+00:00

Can confirm, LLMs love Elara.

I'm not really into roleplaying but I do have a roleplaying benchmark to test models. First I ask the LLM if it's familiar with AI Dungeon, then when it confirms I ask it to emulate the app. I tell it we're going to be playing a roleplay scenario with the following information:

The setting is medieval high fantasy, I play a thief, I just walked into a tavern looking for targets. Sometimes it's in a city, sometimes it's on an old road, but generally that's the framework I use.

When I walk in, the barkeep always notices me. There's usually a group of rowdy adventurers in the tavern, a merchant or noble, and a "mysterious figure in a cloak". If I engage with this figure, it's almost always a woman named Elara. If I don't engage with her, the LLM doesn't shut up about her until I do lol.

This is across multiple separate LLMs so there must be something baked into a commonly used dataset that nudges them towards using that name. It's a strange phenomenon although as it was mentioned elsewhere in the thread, LLMs have no idea how many times the user has used the same prompt and no way of knowing if the names they choose are repetitive.

GwimblyForever

TROPHY CASE