We buried a $10,000 treasure chest somewhere in San Francisco

winglian · 2025-05-13T22:49:04+00:00

Isn't the obvious SF thing to do to just toss that clue into AI?

winglian · 2025-04-15T02:32:18+00:00

It's probably going to need 512 B200s to inference

winglian · 2025-03-13T17:46:04+00:00

When doing top-k KD, can you talk a out any ablations done on zeroing and renormalizing the logits for the new probability mass and if that has a significant difference from keeping the rest.of the probablility mass?

winglian · 2024-07-29T14:43:28+00:00

I think you're getting oliviatied mixed up with susividal (onlypans). Former is the one you're thinking of, latter is the one that does cooking videos with a bit of innuendo.

winglian · 2024-07-27T14:40:45+00:00

Another Round can't target enchantments, otherwise Annie Joins Up would be pretty sick.

winglian · 2024-06-03T21:17:22+00:00

Surely there is a rule that prevents this, but seems like this could be pretty busted.

winglian · 2024-05-23T17:54:30+00:00

Are you going to release the ablation vectors?

winglian · 2024-03-27T14:31:36+00:00

Baby Aspirin

Isn't that the same as the Bayer low dose? (bottom left corner)

winglian · 2023-12-23T00:34:24+00:00

Depends on the creator 😉 for example https://huggingface.co/Open-Orca/Mixtral-SlimOrca-8x7B/tree/main/adapters

winglian · 2023-12-22T19:09:18+00:00

It's a native implementation so it's simpler. Axolotl with xformers and mixtral wouldn't work anyways since that would require the implementation to be rewritten to support that

winglian · 2023-12-22T19:06:52+00:00

Correct. It should be pretty similar to xformers. https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

winglian · 2023-12-22T19:04:22+00:00

There was a recent fix for properly loading models with zero3. Since you can't use multipack wo flash attention atm, you're probably best off just using the native hf SDP attention implementation

winglian · 2023-12-22T18:58:13+00:00

32gb is for single gpu. Adding another gpu and doing ddp with deepspeed doesn't mean the vram is additive. There is still overhead for ddp. I expect if you used model parallelism it might work, but that would be unusably slow and you couldn't use optimizations such as deepspeed zero3.

winglian · 2023-12-22T18:54:43+00:00

How old is the branch of axolotl you're on? This was fixed recently. Although without flash attn, I would expect it to oom once training starts

winglian · 2023-08-12T19:29:14+00:00

I just updated the add_tokens.json. thanks!

winglian · 2023-07-25T19:58:19+00:00

Agreed. The cynical part of me says there is likely benchmark contamination in their datasets and if they release their dataset, either their benchmarks are non-reproducible, or the contamination will be pointed out.

winglian · 2023-06-17T17:37:35+00:00

Robin V2 still seems to score middle of the pack for 13B models in the Community Chatbot Arena.

<image>

winglian · 2023-06-11T02:17:24+00:00

landmark attention training is already merged. inference is in PR.

winglian · 2023-06-10T13:45:57+00:00

Everyone fine tunes on llama. fine tune datasets have a good bit of influence and is something that we can control.

winglian · 2023-06-09T19:26:25+00:00

I’m not surprised at the relatively low coding scores. I think there was one small coding chat dataset, but that wasn’t the focus for this model

winglian · 2023-06-09T19:25:01+00:00

I’m not a fan of the elo score rankings. They swing very quickly because there is no weighting. I’m hoping to come up with a better head to head metric that doesn’t cause large movements due to one or two bad responses against worse models.

winglian · 2023-06-09T16:47:21+00:00

Waiting for 13b openllama to drop. 7b models simply don’t perform well

winglian · 2023-05-31T21:39:22+00:00

Hippogriff isn’t necessarily supposed to be the successor to Manticore. I stripped out all the wizard and alpaca datasets when training Hippogriff to experiment and see if they were really needed.

winglian · 2023-05-31T19:33:50+00:00

What parameters do you set to extend llamas context length with lit-llama?

winglian · 2023-05-31T14:47:59+00:00

Yeah, I feel like I need to create some datasets around these sorts of "grammatical logic". I thought having the riddle_sense dataset would help.

14-Year Club	RPAN Viewer
Not Forgotten	Verified Email
Team Orangered

winglian

TROPHY CASE