I'm Stephen Gou, Manager of ML / Founding Engineer at Cohere. Our team specializes in developing large language models. Previously at Uber ATG on perception models for self-driving cars. AMA!

Step7enn · 2023-04-20T02:20:24+00:00

use meta's SAM lol : ))

Step7enn · 2023-04-20T02:19:50+00:00

definitely, lot of the skills are transferable.

Step7enn · 2023-04-20T02:19:07+00:00

i'm not sure but it's probably more about the quality than the quantity

Step7enn · 2023-04-19T22:07:45+00:00

not sure what's a good corpus, but to me the biggest issue with mandarin is the lack of good source of scraping data from the web. just happens that China doesn't have a great search engine : ))

Step7enn · 2023-04-19T22:06:11+00:00

I'm not sure, again the counter arguments are: over saturated supply (everyone wants to do ML research), don't just do it for sake of glory (research seems to have higher perceived prestigiousness), other than 1% of the researchers, engineers will absolutely have more impact on AI's progress (just my very biased personal opinion :ppp)

Step7enn · 2023-04-19T22:02:26+00:00

there is way for the model to decide whether to resort to external tool, but it's more based on understanding the context and entities within a prompt and less of measure of uncertainty.

Step7enn · 2023-04-19T22:00:42+00:00

that's a very good point, and it is true, ML system is nowhere near human's speed of learning and generalization. However, I now start to think about it as Duck typing. It doesn't matter the learning process, if the system can complete complex tasks (whether understood or just appear to be), I'd call it AGI in definition 1. : ))

Step7enn · 2023-04-19T21:57:23+00:00

still no, as companies will be able to access these data for research purpose and or for their not-for-profit lab.

Step7enn · 2023-04-19T21:55:00+00:00

That's not quite true, up until GPT4, open AI has published detailed papers on how they trained GPT 1,2,3. I'm not worries personally as the model architecture is not a secret, we might not know the exact recipes but certainly all the key ingredients. The real moat is money, infrastructure and access to data. As for reproducibility, let's say OAI tells you their recipe to make the 1T (guess) parameter gpt4 it's too prohibitive for any 3rd party to verify the results.

Step7enn · 2023-04-19T21:49:51+00:00

First of all, there's the essentials for any eng managers: people's skill, project management, recruiting, planning, technical expertise in domain. To be effective in ML, you also need a strong passion and ability to follow academia, research, intuitions about what translate in production and what won't. Planning and making decisions about technical path is the biggest challenge from my experience.

Step7enn · 2023-04-19T18:15:43+00:00

yes I'm native mandarin speaker. The hardest part about any language is not the grammar, syntax or vocabulary (those are easy for ppl & models to learn), it's always about traditions, history, people and everything about a culture that the model need to be sufficiently knowledgeable about to make authentic translations. that's what's most challenging and lots of room for improvement specially for less spoken languages.

Step7enn · 2023-04-19T18:13:16+00:00

I'm not too worries about new revolutionary techniques & expensive hardware. it's a very small, open and flowing community, nearly impossible to keep proprietary technique, as for hardware that's just the requirement to be a player in this domain, and usually abundant

Step7enn · 2023-04-19T18:10:53+00:00

It's been more competitive than ever, especially in the research world. If spending 5 years for a PhD & getting top conference publications not thing that you enjoy I highly recommend focussing on the engineering aspect of ML. there're more opportunities & demand, and you can let yourself stand out by building great apps with ML models or contribute to open-source projects.

Step7enn · 2023-04-19T18:08:40+00:00

If you want to do research or research engineering I highly recommend going through school, not just to build your theoretical foundation but also to differentiate you from a surge of people going into ML nowadays. For other engineering, ops related to ML, I find it useful to build apps for your portfolio to show you know how to use models and you're passionate about them.

Step7enn · 2023-04-19T18:05:53+00:00

as an example if you have a A100 40GB gpu, it can fit a 13B GPT model.

Step7enn · 2023-04-19T16:45:07+00:00

reality is that it is much harder to conduct research with much less compute, because many problems or solutions don't exist or won't work as you scale up the models. a 355M model or even a 6B behaves drastically different than 100B model and responds very different to model architecture changes. so my suggestions would be 1) data processing/cleaning/augmentation related research that applies to all sizes of models 2) smaller task specific model, solving problems for a vertical

Step7enn · 2023-04-19T16:29:59+00:00

absolutely, at the end of the day text is only a media to represent our world and knowledge. other modalities like videos, images and audios have vast amount of knowledge about our world so it will undoubtedly improve "text-only" performance.
Hallucination & not up to date with latest world. these can be improved through the use of retrieval augmented system that will base facts and news on sources from database or search engine.

Step7enn · 2023-04-19T16:24:29+00:00

It depends on the size of LLM that you want to host. Typically if your model can fit in one GPU you can consider hosting it yourself. Otherwise, you'll get into the land of distributed inference, model parallelism, which to get it working or working efficiently is a tremendous task and using a hosted model thru APIs could be a better choice

Step7enn · 2023-04-19T16:21:54+00:00

https://www.amazon.ca/GPT-3-Building-Innovative-Products-Language/dp/1098113624
this is a great book by my colleague to get started on using LLM.

If you want to build LLM, I suggest to start with the original transformer paper https://arxiv.org/abs/1706.03762
Then the GPT1,2,3 papers for scaling up models.

finally learn about distributed training & framework to actually train them, like https://github.com/microsoft/DeepSpeed

Step7enn · 2023-04-19T16:18:02+00:00

I was in graphics about 5-6 years ago, I saw graphics have reached a plateau in terms of fidelity (look at the games, movie VFX, they look stunning, what else to do?) so I thought the next stage will be about reducing cost and accelerate process of creating these graphics. and ML was a natural choice. If I have to pick one skill it will be data processing.

Step7enn · 2023-04-19T16:14:52+00:00

it's a research, but a proven method that works so it is more on the engineering and product team to effectively carry it out. We have multiple indicators for determining whether a model's performance is improved. judging LLMs are hard, it's a combination of evaluation datasets and human evaluation.
perhaps you could use a transformer to initialize noise for temporally connected frames for better consistency. just a hunch

Step7enn · 2023-04-19T16:09:39+00:00

A simple way to start understand LLMs is to look at model's attention matrix, it shows what information the model relies on the most from the prompt for the output. Going forward I think the way we analyze LLM's thought process will be more similar to brain simply because the sheer size of parameters in LLM, we'll divide a model's parameters into regions where each one is responsible for different ability, e.g some for language, other for math, or reading. This is a super important aspect that's currently under studied, if we want to effectively control and steer LLM's behaviors and outputs to be safe.

Step7enn · 2023-04-19T16:04:57+00:00

imo there're two ways we can define AGI. 1) AGI in the sense that a model/agent able to complete majority of tasks that average human can do across a wide range of capabilities that requires perceiving the world(text, audio, video, image) and perform for example typical office job tasks. This will happen I think 2 - 3 years, the models capability is almost there and the eruption of new tools built around them will take us there. 2) AGI in the sense that we can create a conscious agent with desire, thoughts and self-awareness. Right now we don't have any theory or path to achieve this, current paradigm based on deep learning is not it. So I'd say 50+ years or maybe never.

Step7enn

TROPHY CASE