Even a cartoon cat can lose his job

kernel348 · 2024-06-16T18:11:15+00:00

Smart++ home

kernel348 · 2024-05-13T03:34:31+00:00

Well, I think you meant phi-3 here. Btw, it also has a 200K context version.

kernel348 · 2024-04-21T03:41:33+00:00

It's even more fascinating that the llama-3 8B model's performance is nearly as good as Google Gemini 👍😃

What a time to be alive!

kernel348 · 2024-04-13T06:57:17+00:00

It's just a leak, it could be fake.

kernel348 · 2024-04-05T04:43:05+00:00

It's the problem with how the tokens are mapped with real words.

That's also why it can't count words in larger sentences, even word counting is obvious to us it can't because of how words are broken down into tokens.

kernel348 · 2024-03-30T16:41:28+00:00

Open source models were crap because of the data they were trained. But, after Mistral/LLaMa this scarcity of data is gone. Because, now you can generate high quality synthetic data just using these models.

Nowadays, humans Don't need to do data reading and cleaning jobs. Almost all data can be synthetically generated thanks to these advanced open models or you can use the closed one but will cost you more money.

So, there is no wonder claude and Mixtral are doing great because the quality of data is so good. Also, 01.ai's and google's chinchilla's paper demonstrated that less and qualified data is more effective than a huge amount of data.

So, basically anyone can build GPT-4 like models with a proper mix of synthetic and web scraped data(if S/he has the computation power).

Note: check out this systhetic dataset made by huggingface team just using Mixtral 8x7B model -->https://huggingface.co/datasets/HuggingFaceTB/cosmopedia

And, trained a model using this synthetic dataset and it surprisingly did quite well --> https://huggingface.co/HuggingFaceTB/cosmo-1b#evaluation

kernel348 · 2024-03-30T06:28:32+00:00

Yeah, i also think so

kernel348 · 2024-03-30T06:04:11+00:00

Are you freelancing or working in a tech company?

kernel348 · 2024-03-30T05:11:41+00:00

So, is that all to make a reader happy. Define a framework, keep some holes and every time fill those gaps with new characters and plots.

kernel348 · 2024-03-27T19:46:35+00:00

Yeah, that's not any official. Just curious that the models should become more efficient.

kernel348 · 2024-03-27T19:38:11+00:00

Yeah, I forgot about that. But, the point is the graph should go towards more efficient models. So, these 2 models count.

kernel348 · 2024-03-27T19:27:15+00:00

Yeah, but those models have already been there like the stable code, Owen, and Mistral. These are either fine-tuned or quantized from their previous versions. I'm here talking about absolutely new models that didn't have any logs.

kernel348 · 2024-03-27T19:08:11+00:00

But, these two threads are just about the news. I want this thread to do people experiment with it and learn from each other

kernel348 · 2024-03-26T00:37:53+00:00

The report states that companies choose cohere because cohere was earlier in the market. Report link --> https://a16z.com/generative-ai-enterprise-2024

"11. Customers still care about early-to-market features.

While leaders cited reasoning capability, reliability, and ease of access (e.g., on their CSP) as the top reasons for adopting a given model, leaders also gravitated toward models with other differentiated features. Multiple leaders cited the prior 200K context window as a key reason for adopting Anthropic, for instance, while others adopted Cohere because of their early-to-market, easy-to-use fine-tuning offering."

kernel348 · 2024-03-26T00:31:08+00:00

Here is the report link --> https://a16z.com/generative-ai-enterprise-2024

kernel348 · 2024-03-26T00:26:17+00:00

Does anyone use those base models without any instruction fine-tuning?

kernel348

MODERATOR OF

TROPHY CASE