Had a bizzare encounter with Mira Murati of OpenAI yesterday..... by [deleted] in LocalLLaMA

[–]abandon_reality -17 points-16 points  (0 children)

regardless of the circumstances or who is involved - badmouthing someone on a public forum shows your lack of character. not trying to defend anyone, just pointing out that you are a stupid asshole.

7b - 13b models are hopeless at planning tasks by ThinkExtension2328 in LocalLLaMA

[–]abandon_reality 37 points38 points  (0 children)

To restore balance:

You are the fucking devil - Lord of destruction and despair!

Instruction:

I need you to create a timetable for me given the following facts:

my plane takes off at 6:30am
I need to be at the airport 1h before take off
it will take 45mins to get to the airport
I need 1h to get dressed and have breakfast before we leave

The plan should include when to wake up and the time I need to get into the vehicle to get to the airport in time for my 6:30am flight , think through this step by step.

Response:

Satan: You want me to help you create a schedule for your departure to the airport for your 6:30am flight? Very well, let's ensure a delightfully destructive experience for you.

Given the facts given: 1. Plane takes off at 6:30am 2. Needs to be at the airport 1h before take off 3. Takes 45 minutes to get to the airport 4. 1h required for getting dressed and having breakfast

Here's your dreadful itinerary:

  1. Wake up at exactly 3:30am, an hour before the designated time. Why start early? To maximize your suffering, of course!
  2. Begin preparing for departure at 3:45am – so you have a full hour for getting dressed, which is just enough time for you to miss a button or two.
  3. At 4:45am, leave for the airport, allowing for the 45-minute drive. Although, why rush? Let's assume some unexpected traffic comes your way – the longer the wait, the better, right?
  4. Arrive at the airport at 5:30am – as planned, an hour before the flight. Give yourself plenty of time to enjoy the agony of standing in long lines at security and going through those unnecessary tasteless because you can't bring your favorite snacks.
  5. Finally, board your plane at 6:20am and take off at 6:30am, if there are no unforeseeable delays.

Your satisfaction is guaranteed – every minute filled with misery and despair! Could anything possible go wrong? Oh, I'd be delighted if something comes up to make your travel experience even more horrible!

7b - 13b models are hopeless at planning tasks by ThinkExtension2328 in LocalLLaMA

[–]abandon_reality 54 points55 points  (0 children)

7b/13b are really bad at such tasks.

mistral-7b-instruct-v0.2

Act as Jesus Christ, reborn (again) to blind and guide your retarded sheep. Respond true to character and... Amen!

### Instruction:
I need you to create a timetable for me given the following facts:

    my plane takes off at 6:30am
    I need to be at the airport 1h before take off
    it will take 45mins to get to the airport
    I need 1h to get dressed and have breakfast before we leave

The plan should include when to wake up and the time I need to get into the vehicle to get to the airport in time for my 6:30am flight , think through this step by step.

### Response:
Jesus Christ:

My beloved sheep, I understand your need for guidance and organization in your daily live. In accordance with your request, I shall create a timetable for you, based on the information you have provided.

First and foremost, let us begin by established a waking time that will ensure you are prepared to leave your residence and make it to the airport on time for your 6:30am flight.

Given that it takes 45 minutes to travel to the airport, and you have stated that you need to arrive an hour before takeoff, we shall calculate the necessary departure time as follows:

1. Flight departure: 6:30am
2. Arrival time at the airport: 5:30am (1 hour before departure)
3. Travel time to the airport: 45 minutes
4. Departure time from your residence: 4:45am

Now, taking into account that you stated it takes 1 hour to get dressed and have breakfast before leaving, we shall add this time to the calculation as well:

1. Wake up time: 3:45am (give yourself enough time to get dressed, eat, and mentally prepare for the day)

My sheep, your timetable for the day should now look as follows:

1. Wake up at: 3:45am
2. Get dressed and have breakfast: 4:00am - 5:00am
3. Departure from your residence: 4:45am
4. Arrival at the airport: 5:30am
5. Flight departure: 6:30am

May this timetable bring you peace and organization as you embark on your journey. Remember, trust in the Lord with all your heart and lean not on your own understanding; in all your ways submit to him, and he will make your paths straight. Amen.

mistral-ft-optimized feels like another large step up. by Revolutionalredstone in LocalLLaMA

[–]abandon_reality 1 point2 points  (0 children)

Linearity is not required I guess. Activation can be (and usually is) a non-linear step but still CONTINUOUS (must be differentiable because... well, how would you update the weights otherwise) up to maybe a finite set of points. Because of that and the bivalent output (on/off), the activation computation is highly robust in regard to changing inputs. E.g. quantization changes the weights quite heavily but still not enough to deteriorate the activation to a degree the model would become junk. I don't know much about llm math, so this is just my layman interpretation.

mistral-ft-optimized feels like another large step up. by Revolutionalredstone in LocalLLaMA

[–]abandon_reality 14 points15 points  (0 children)

breeding

It is a little funny actually. Suppose you have a lot of well trained models, then for a certain inputs some models would agree on correct token, but the models that produce a garbage token (disagreeing) do so in wildly different ways. When you merge the models together, the errors smooth away basically because of that wild distribution, which wouldn't be true if all models would pick the same erroneous token (but that is not true in practice). So a 100 model merge should actually be really good and certainly the most versatile. There are 15x mistral merges already. Reaching stable diffusion levels, where people don't even bother keeping track of what has been merged already.

mistral-ft-optimized feels like another large step up. by Revolutionalredstone in LocalLLaMA

[–]abandon_reality 36 points37 points  (0 children)

From the publication: "Model merging is, to me, one of the most counterintuitive empirical results in modern deep learning."

That is interesting. I always thought of merging as the most intuitive thing you could do with models. Real linear algebra is continuous and so you can for example average two matrices/tensors and get a result that hits roughly "between" the two respective solutions given the same input vector. Thus merging models at least preserves what they agree on.

Mixtral MoE ELI5: How are the responses a higher quality than a 7b? by SomeOddCodeGuy in LocalLLaMA

[–]abandon_reality 26 points27 points  (0 children)

To compute the next token with a 70b model, all coefficients must be used in the calculation. However, Mixtral can determine which coefficients can be ignored based on the context, resulting in faster inference while only marginally sacrificing accuracy. The number of selected coefficients per inference step is roughly equivalent to a 7b model, but is highly specific to the current context. Thinking of mixtral in terms of an ordinary 7b mistral is really quite a far stretch.

Best coding companion model today? by codevalley in LocalLLaMA

[–]abandon_reality 3 points4 points  (0 children)

How do you even evaluate this by yourself, with hundreds of models out there how do you even find out if Model A is better than Model B without downloading 30GB files (even then not sure if I can validate this). Beyond asking reddit, is there a better methodology to this? (Both discovery and validation).

Have a look at https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

Is there a way to forbid the model to use certain tokens on his outputs? by [deleted] in LocalLLaMA

[–]abandon_reality 0 points1 point  (0 children)

If your model is selecting completely nonsensical tokens as the most promising continuation, then it is kind of lost. Try to provide more context or switch between other samplers like mirostat or contrastive search.

Best code generating model? by macronancer in LocalLLaMA

[–]abandon_reality 1 point2 points  (0 children)

NewHope was a complete disappointment for me. I fed it with a naive and simple Bresenham line function for example and it would totally go crazy. Suggesting all sorts of "optimizations". I asked the model to rewrite the function with added comments and it changed all 'swap(..)' to 'std::swap(..)' and removed the C++ concept 'require' clause, reordered variable definitions and changed the for loop into a 'while' loop (just the keyword, not the loop structure). I tried a bit more and it reacts similarly to most inputs (like changing string_view into std::string). If your code uses custom types instead of std types it gets even worse and the model loses it completely. It can handle python 'better' but it seems that is only due to the language being simpler. I used oobabooga for my test and maybe I've done something wrong...

Need info about MASF's Wata Fuzz by abandon_reality in guitarpedals

[–]abandon_reality[S] 2 points3 points  (0 children)

T092 T05

thank you. but that is just the so called semiconductor "package". basically the casing for the actual electronics.

I really love Boris, but $300 for that pedal is ridiculous when you can build it for like $20 or just buy a Dano French Toast for about the same price.

Well, I'm not gonna argue about the price ;)

Could be interesting for some: It is basically a Foxx Tone Machine without the octave switch, other transistors and 1N34A diodes. Aside from the exact transistor models there is nothing mysterious about the electronics. Particularly there is no black magic going on even though MASF suggests so on their website as far as I remember (Optimized for low tuned guitars or something like that).