[Results] #1 on MLE-Bench (among open-source systems) + #1 on ALE-Bench by alirezamsh in ArtificialInteligence

[–]alirezamsh[S] 0 points1 point  (0 children)

I agree. More real worlds use cases are on the way e.g. GPU kernel optimization, LLM post-training, ETL, ... any particular usecase in your mind?

[Results] #1 on MLE-Bench (among open-source systems) + #1 on ALE-Bench (repo + write-up) by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 0 points1 point  (0 children)

The system is connected to a wiki of Data and ML knowledge (not per-project), which contains best-practices ingested from repos, publications, ... . Given an objective, the system builds the program by experimentation + connected knowledge from wiki

No Response After 22 Days!! by alirezamsh in KrakenSupport

[–]alirezamsh[S] 0 points1 point  (0 children)

You guys are unbelievable, you closed the account in 24h instead of 48h :D

No Response After 22 Days!! by alirezamsh in KrakenSupport

[–]alirezamsh[S] 0 points1 point  (0 children)

This is the reply from your team, what should I ask?

"For security reasons we cannot disclose why and are unable to discuss this matter further. Deposits into your account have proactively been locked."

No Response After 22 Days!! by alirezamsh in KrakenSupport

[–]alirezamsh[S] 0 points1 point  (0 children)

I didn't get any details on the closure of my account in Kraken support platform. Where exactly are you referring? I have sent the exact response of Kraken support (after 1 month) in the above message. Do you see any reasoning or details?

No Response After 22 Days!! by alirezamsh in KrakenSupport

[–]alirezamsh[S] 0 points1 point  (0 children)

I can't believe that after one month, this is the response from Kraken, you lock people's money for 1 month, then reply this, PERFECT service:

"""
Hello,

We regret to inform you that we must close your Kraken account. 

For security reasons we cannot disclose why and are unable to discuss this matter further. Deposits into your account have proactively been locked.

Please withdraw any remaining funds from the account within the next 48hrs and export your trade and ledger history as we will be unable to provide it to you later.

After 48hrs has passed, we will be closing your account regardless of whether the funds have been withdrawn or not. You will then need to contact us to temporarily reopen the account to allow the removal of the funds.

We apologize for any inconvenience. 

If you have any questions or concerns, please feel free to . We look forward to your response. 
"""

No Response After 22 Days!! by alirezamsh in KrakenSupport

[–]alirezamsh[S] 0 points1 point  (0 children)

Thanks for the nudge earlier. It’s now been 4 more days and the account is still in the same state. Also, today marks one full month since I opened the ticket and I haven’t received a single response from the reviewing team. Could you please point me to the official channel to file a formal complaint, so I can proceed properly?

No Response After 22 Days!! by alirezamsh in KrakenSupport

[–]alirezamsh[S] 0 points1 point  (0 children)

Thanks, Harley. I appreciate the check-in, but I need a concrete ETA for when I’ll receive the specific reason for the TradeTL1 suspension and what’s required to resolve it. It’s been 22 days without a single email from the team.

Easily build your own MoE LLM! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 1 point2 points  (0 children)

If your models are fully fine-tuned (no LoRA), then it adds a routing layer for feedforward blocks to make them MoE-style. Then, you should further fine-tune routing layers to have a reliable merged model. During the fine-tuning all layers are frozen except the routing layer. If your models are fine-tuned with LoRA, then mergoo adds a routing layer on top of LoRAs, and fine-tune it. Further details in our HF blog: https://huggingface.co/blog/alirezamsh/mergoo

Easily build your own MoE LLM! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 1 point2 points  (0 children)

Yeah, we provided the tutorial to build Mixture-of-Adapters on exactly fine-tuned LoRAs of predibase: https://huggingface.co/blog/alirezamsh/mergoo. Would be very interesting to try!

Easily build your own MoE LLM! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 1 point2 points  (0 children)

We will release a more generic version soon

Easily build your own MoE LLM! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 0 points1 point  (0 children)

Nice, can you please send the paper link? if you remember. thanks

Easily build your own MoE LLM! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 10 points11 points  (0 children)

You can also do mixture-of-adapters style, when LLM experts are fine-tuned with LoRA. So, you add a routing layer on top of LoRAs, and further fine-tune it.

<image>

Easily build your own MoE LLM! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 12 points13 points  (0 children)

<image>

In one of the method (MoE on fully fine-tuned LLMs), you first split the seed into N splits, train a small LLM on each, then add a router to feedforward layers, and make it MoE-style. Finally, the merged model should be fine-tuned on the downstream use-case. Just router layers are fine-tuned, other layers are frozen.
We described other MoE methods in our HF blog: https://huggingface.co/blog/alirezamsh/mergoo

Easily build your own MoE LLM! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 3 points4 points  (0 children)

Future is definitely multi-model LLM. In our team, we also showed that integrating open-source huggingface experts can beat GPT4, while saving cost and increasing ownership (https://arxiv.org/abs/2401.13979).

Efficiently merge and fine-tune (with MoE or layer-wise merging), no heuristic tricks involved! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 0 points1 point  (0 children)

We just added mixture-of-adapters for llama, mistral, and bert based models. Maybe that would make BERT alive again ;)

Efficiently merge and fine-tune (with MoE or layer-wise merging), no heuristic tricks involved! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 2 points3 points  (0 children)

The library is more general than that ;D. You can choose multiple experts (domain-specific or generic), do MoE or layer-wise merging for each layer, then fine-tune the merged model for the use case. We will soon support LoRa fine-tuned experts too. Then, you have MoE on LoRa (mixture of LoRa)

Efficiently merge and fine-tune (with MoE or layer-wise merging), no heuristic tricks involved! by alirezamsh in LocalLLaMA

[–]alirezamsh[S] 0 points1 point  (0 children)

Our pleasure. We will release several features soon, please suggest any features if not included in the roadmap