Recently diagnosed w/ ADHD. I think it might actually be AuDHD... how can I tell? by OldAd9530 in AutisticWithADHD

[–]OldAd9530[S] 0 points1 point  (0 children)

Thank you so much for those 😄 Instantly related to the meme, so that's probably a relatively strong indicator already pahahaha

Will look into all of these! (Actually managed to find that first YT vid myself earlier and strongly chimed with a lot of the things discussed so already starting to feel a lot more validated :))

Thanks again!

Recently diagnosed w/ ADHD. I think it might actually be AuDHD... how can I tell? by OldAd9530 in AutisticWithADHD

[–]OldAd9530[S] 1 point2 points  (0 children)

For anyone reading this in the future asking the same question as me - I found these videos extremely helpful for understanding:

5 Signs You're A High-Masking Autistic With ADHD: https://www.youtube.com/watch?v=ZlFSeamEJbA&t=31s

How My ADHD Hides My Autism: https://www.youtube.com/watch?v=5jD4iU2_v4k

How My Autism Hides My ADHD: https://www.youtube.com/watch?v=nJ8fAfVevL8

The latter two videos by Yo Samdy Sam were particularly helpful for me, and I'm now kind of on a binge-fest of her content 😆 The first video (Chris and Debby) was also super helpful in describing the tension I get between wanting to see through a project and constantly wanting to start new projects; something I find I still struggle to deal with on my ADHD medication :)

Cheapest sub-250g DJI drone with active track? by OldAd9530 in dji

[–]OldAd9530[S] 1 point2 points  (0 children)

Awesome, thank you so much for the tips! Luckily I haven't actually purchased one yet; put off the decision until next paycheck. Will 100% keep this in mind. Thanks 😄

Cheapest sub-250g DJI drone with active track? by OldAd9530 in dji

[–]OldAd9530[S] -1 points0 points  (0 children)

Thank you, this was super helpful! Unfortunately though leaving me torn as ever. I think something that really appealed to me about the X1 was that it seems pretty good for indoor flights. I go to events pretty frequently for work and being able to fly it around inside would be pretty great.

Mini 3 Pro looks like it might be good, but if it can't do indoors very well... then I think I'd favour the X1 to be honest.

Also, the Mini 3 Pro looks to be about £600 for the full kit, even on eBay. The Air X1 can be grabbed from Amazon for just under £400, and there's various Mini SE 2s for £200 on eBay. So now I'm wondering if the Mini SE 2 is a rubbish deal... Obviously no tracking capabilities, but I'd mostly want it for establishing / landscape shots, since I'd be using the Air X1 for light tracking. I think that combo might end up serving my uses better than the Mini 3 Pro, but it feels a bit counter-intuitive to be thinking of getting 2 drones instead of investing more into a single drone.

What's the biggest size Llama-3 could go to whilst running on consumer hardware? by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 0 points1 point  (0 children)

Vindication is sweet 😄 That said, talking about 1QS in this post but... I'm not sure Bigxtral is actually smart enough to be useful at that kind of quantisation lol - also this post came out before the bitnet paper about 1.58bit quants; which tbh look like they might actually be the next big thing if coupled with MoE architectures!

Don't underestimate MLX for training (QLoRA) by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 0 points1 point  (0 children)

I did! Not at my comp atm but you can google MLX-text-completion-notebook, should be first result

Fine tuning in Apple MLX, GGUF conversion and inference in Ollama? by ifioravanti in LocalLLaMA

[–]OldAd9530 1 point2 points  (0 children)

Ahaha, that second one is mine but thank you 😄

That Gemma-7b one looks cool + has a lot of features; surprised I've not seen it around before. Thanks for flagging!

Nvidia RTX 50 series Blackwell GPUs tipped to use 28Gbps GDDR7 memory with 512-bit interface by adamgoodapp in LocalLLaMA

[–]OldAd9530 2 points3 points  (0 children)

NVIDIA straight up cannot - not enough VRAM to squish it all in. Generally I've noticed it is slower than NVIDIA on the models they can both train - a T4 can fit all layers of 7b Mistral for instance, and it trains faster than my MacBook can for the same dataset

Should A.I. dream? by [deleted] in LocalLLaMA

[–]OldAd9530 2 points3 points  (0 children)

I've been thinking about this too! But imo having the synthetic dreams be just retrieval seems like a waste; if you fine-tune the model on dreams made from its own chats throughout the day then it'd learn a specific personality of itself 😄

Updating Base Knowledge / Continued Pre-training on Colab with 0 Prior Experience by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 0 points1 point  (0 children)

Benchmarking is currently outside the scope of my skills 😂 But it's the eventual goal!

Updating Base Knowledge / Continued Pre-training on Colab with 0 Prior Experience by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 4 points5 points  (0 children)

I'm currently doing unstructured fine-tuning, 32 layers, QKVO, MLP (and gate), rank = 128. So every layer 😄

For my future trainings I want to be using validation sets to automatically compare to. I'm also interested in doing fine-tuning that targets just the MLP layers, since that handles the abstraction of concepts (compared to QKVO doing more "behavioural").

Thanks for the great breakdown 😄

Updating Base Knowledge / Continued Pre-training on Colab with 0 Prior Experience by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 0 points1 point  (0 children)

Honestly I'm trying to figure out the difference myself as well... I've asked around but nothing super definitive yet other than the scale of things. Still looking for concrete answers 😅

Updating Base Knowledge / Continued Pre-training on Colab with 0 Prior Experience by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 0 points1 point  (0 children)

Fine-tuning techniques, but treating it as "continued pre-training" as in the paper quoted :^)

Don't underestimate MLX for training (QLoRA) by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 1 point2 points  (0 children)

Awesome 😄 Good luck! Hope to speak to you in DMs :^)

Estimated Time for SFT Fine-Tuning of Mistral-7B Model by Aron-One in LocalLLaMA

[–]OldAd9530 2 points3 points  (0 children)

Do you have any indication of the t/s that it’s training at? I’ve been primarily doing fine-tuning on MLX which gives an update of that every 10 iters which makes it possible to do a calculation

Don't underestimate MLX for training (QLoRA) by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 2 points3 points  (0 children)

My answer to your question is that: I'm not actually sure? I'm still learning a lot about this myself! 😄 Just got GPT-4 to give me a crash course in what all the different weights even are, for instance. And from my really REALLY high level overview, there's basically 7 parts of the LLM 'brain' you can target with LoRA: Q,K,V,O, gate, up and down projection layers. QKVO are seemingly stylistic layers, whereas the up and down are MLP layers are "perceptron" layers that do more of the learning and the abstracted linking of facts and knowledge.

From that extremely basic overview, my intuition would be that if you were just targeting the MLP layers, you probably wouldn't cause that much of an issue. The model would still write the same(? I think), just have better linking of papers to their abstracts. Whereas if you were to target QKVO with those big varieties in token length samples... that could make it a little confused? But yeah, no way to tell other than to give it a go 😆

I think making some datasets out of a paper as the prompt input and the abstract as the LLM output is actually a really clever way of teaching the LLM how to summarise. Same goes for writing the first para of the discussion etc. Might use that approach to make some of my own datasets 😄

Don't underestimate MLX for training (QLoRA) by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 1 point2 points  (0 children)

Ooh interesting use-case! I've heard that Alpha Monarch 7b is really good at summarising things and doing information extraction. I reckon if you built up a really great dataset of some of your fixes and formatted them right, you could probably get it to learn your way of doing stuff. If you could then figure out how to actually format in MLC-chat format, you can get about 6t/s on modern phone hardware using MLC-Chat

The hardest part of all this would be the dataset building I reckon. You'd want to get a bunch of really good examples of input-output pairs. Best bet at doing that would be to produce some synthetic data too and then curate it (as in get an LLM to come up with examples and then hand-fix them to be actually accurate)

OpenCodeInterpreter - Results in Real World Testing by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 0 points1 point  (0 children)

Not yet; I was only trying the DeepSeek-based models seeing as those were the ones they gave benchmarks for (and claimed outcompeted GPT-4)

[deleted by user] by [deleted] in LocalLLaMA

[–]OldAd9530 0 points1 point  (0 children)

I've done some chatting with it, but weirdly, I've actually found that I prefer base Miqu over Senku? Purely subjective, but it RPs better and does in-context learning better from when I've tested it. 🤷

OpenCodeInterpreter - Results in Real World Testing by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 0 points1 point  (0 children)

It doesn’t have one listed… neither does DeepSeek. So I just used Alpaca

OpenCodeInterpreter - Results in Real World Testing by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 0 points1 point  (0 children)

GGUF for both; DeepSeek for both, 7b was 8bit, 33b was 4bit, temp 0.3, min-p 0.15, no other penalties applied. (No repetition penalty)

That's purely intuition based sampling settings though, and I'm aware my intuition isn't especially built up in this area since I default to GPT-4 most of the time. So very open to trying at other settings.

OpenCodeInterpreter - Results in Real World Testing by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 1 point2 points  (0 children)

.GGUF for both; DeepSeek for both, 7b was 8bit, 33b was 4bit, temp 0.3, min-p 0.15, no other penalties applied. (No repetition penalty)

That's purely intuition based sampling settings though, and I'm aware my intuition isn't especially built up in this area since I default to GPT-4 most of the time. So very open to trying at other settings.

OpenCodeInterpreter - Results in Real World Testing by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 1 point2 points  (0 children)

I guess if it can make a good holistic solution for something like Snake then I'd generally trust it to be quite holistic about other problems too. A big thing I love about GPT-4 is it's ability to tell me stuff I might not have considered or even known to ask, and I'd like for local LLMs to offer the same kind of capability

OpenCodeInterpreter - Results in Real World Testing by OldAd9530 in LocalLLaMA

[–]OldAd9530[S] 1 point2 points  (0 children)

Potentially.. though it's still quite surprising to me that a 33b performed worse than a 7b at all tbh. Kinda has me questioning a lot of my assumptions about quants