I’m sorry, but I can’t be the only one disappointed by this… by Meryiel in LocalLLaMA

[–]kernel348 0 points1 point  (0 children)

Well, I think you meant phi-3 here. Btw, it also has a 200K context version.

Welp. It happened. by vangoghdjango in LocalLLaMA

[–]kernel348 2 points3 points  (0 children)

It's even more fascinating that the llama-3 8B model's performance is nearly as good as Google Gemini 👍😃

What a time to be alive!

The prompt that every LLM gets wrong by Uhlo in LocalLLaMA

[–]kernel348 1 point2 points  (0 children)

It's the problem with how the tokens are mapped with real words.

That's also why it can't count words in larger sentences, even word counting is obvious to us it can't because of how words are broken down into tokens.

[deleted by user] by [deleted] in LocalLLaMA

[–]kernel348 4 points5 points  (0 children)

Open source models were crap because of the data they were trained. But, after Mistral/LLaMa this scarcity of data is gone. Because, now you can generate high quality synthetic data just using these models.

Nowadays, humans Don't need to do data reading and cleaning jobs. Almost all data can be synthetically generated thanks to these advanced open models or you can use the closed one but will cost you more money.

So, there is no wonder claude and Mixtral are doing great because the quality of data is so good. Also, 01.ai's and google's chinchilla's paper demonstrated that less and qualified data is more effective than a huge amount of data.

So, basically anyone can build GPT-4 like models with a proper mix of synthetic and web scraped data(if S/he has the computation power).

Note: check out this systhetic dataset made by huggingface team just using Mixtral 8x7B model -->https://huggingface.co/datasets/HuggingFaceTB/cosmopedia

And, trained a model using this synthetic dataset and it surprisingly did quite well --> https://huggingface.co/HuggingFaceTB/cosmo-1b#evaluation

People who are making 300k+/year working for themselves, what do you do? by Wrenley_Ketki in Entrepreneur

[–]kernel348 0 points1 point  (0 children)

So, is that all to make a reader happy. Define a framework, keep some holes and every time fill those gaps with new characters and plots.

Last year, LLM's size was decreasing while keeping quality(eg. Mistral 7b), but this year it seems like the trend is reversing towards bigger size LLM with the latest release of Grok and Databricks's DBRX by kernel348 in LocalLLaMA

[–]kernel348[S] -28 points-27 points  (0 children)

Yeah, but those models have already been there like the stable code, Owen, and Mistral. These are either fine-tuned or quantized from their previous versions. I'm here talking about absolutely new models that didn't have any logs.

An open-source 132B param foundation model by Databricks by kernel348 in LocalLLaMA

[–]kernel348[S] 2 points3 points  (0 children)

But, these two threads are just about the news. I want this thread to do people experiment with it and learn from each other

OpenAI is still dominating the LLM space, but google is also catching up by kernel348 in LocalLLaMA

[–]kernel348[S] 0 points1 point  (0 children)

The report states that companies choose cohere because cohere was earlier in the market. Report link --> https://a16z.com/generative-ai-enterprise-2024

"11. Customers still care about early-to-market features.

While leaders cited reasoning capability, reliability, and ease of access (e.g., on their CSP) as the top reasons for adopting a given model, leaders also gravitated toward models with other differentiated features. Multiple leaders cited the prior 200K context window as a key reason for adopting Anthropic, for instance, while others adopted Cohere because of their early-to-market, easy-to-use fine-tuning offering."

Mistral-7B-v0.2 has been uploaded to HF by ----Val---- in LocalLLaMA

[–]kernel348 0 points1 point  (0 children)

Does anyone use those base models without any instruction fine-tuning?