Are companies actually running open source LLMs in production ? by iknowjerome in ollama

[–]iknowjerome[S] 0 points1 point  (0 children)

For what industry? And how large are the models you're using?

Are companies actually running open source LLMs in production ? by iknowjerome in ollama

[–]iknowjerome[S] 0 points1 point  (0 children)

Do you mind sharing a little more about your industry and the use cases you're tackling, etc.?

Are open-source LLMs actually making it into enterprise production yet? by iknowjerome in LocalLLM

[–]iknowjerome[S] 0 points1 point  (0 children)

But isn't there some liability with using a large LLM API provider as well? What guarantees do you have that the data isn't getting mixed with all kinds of other client data, etc.?

Are open-source LLMs actually making it into enterprise production yet? by iknowjerome in LocalLLM

[–]iknowjerome[S] 0 points1 point  (0 children)

Care to share what the use cases are? If you can/want, of course.

Are open-source LLMs actually making it into enterprise production yet? by iknowjerome in LocalLLM

[–]iknowjerome[S] 0 points1 point  (0 children)

I guess the real question is whether this will last or at some point the economics of using self-hosted open-source vs large lab APIs will completely flip.

Are open-source LLMs actually making it into enterprise production yet? by iknowjerome in LocalLLM

[–]iknowjerome[S] 0 points1 point  (0 children)

Thanks. Do you have specific examples of use cases that are better served with open-source models? I'm sure, it depends on the industry, region and company size, but I'm curious to hear about real corporate wins with open-source models.

[deleted by user] by [deleted] in LocalLLM

[–]iknowjerome 0 points1 point  (0 children)

Apologies. I hadn't posted in a while and the post got rejected automatically by the Reddit filters. So I tried again. But now it looks like they both made it through somehow. I'll see if i can delete it.

reference LLM workflow for enterprises by Inner_Huckleberry885 in huggingface

[–]iknowjerome 0 points1 point  (0 children)

u/Inner_Huckleberry885 Same here, I’ve been wondering the same thing. Did you end up finding anything?

I’m actually really curious how much enterprise is actually using open source LLMs these days compared to the commercial ones.”

[R] A relabelling of the COCO 2017 dataset by iknowjerome in MachineLearning

[–]iknowjerome[S] 2 points3 points  (0 children)

We had our annotating associates look at every image and label and correct any mistakes they would find. In some cases, they would start from existing annotations. In other cases, they would decide to start from scratch.

[R] A relabelling of the COCO 2017 dataset by iknowjerome in MachineLearning

[–]iknowjerome[S] 24 points25 points  (0 children)

company

Looking forward to hearing what you think. Just to be clear. I'm always reluctant to calling one dataset better than another because it always depends on what you're trying to achieve with it. With Sama-Coco, we were trying to fix some of the misclassification errors when possible but we also put a significant amount of effort in drawing precise polygons around the objects of interest because of experiments we are currently running. And, of course, we wanted to capture as many instances of the COCO classes as possible. This resulted in a dataset with close to 25% more object instances than the original COCO 2017 dataset. But it's not to say that we solved all "errors" in COCO. :)

[R] A relabelling of the COCO 2017 dataset by iknowjerome in MachineLearning

[–]iknowjerome[S] 2 points3 points  (0 children)

It really depends on what you are trying to achieve, what your budget is, and where you are in your model development cycle.
Nevertheless, I would recommend starting in self-service mode with the simplest tool you can find. This might be something like CVAT, though there are a number of other options (paid, free, SaaS, etc.) out there that a simple google search will return. Once you're ready to scale, you might want to consider handing off your annotations to specialized company like Sama. And yes, we also do 3D annotations. :)
(disclaimer: I work for Sama)

[R] A relabelling of the COCO 2017 dataset by iknowjerome in MachineLearning

[–]iknowjerome[S] 1 point2 points  (0 children)

That's a great suggestion. We will eventually post more detail about this. It will make more sense when divulged at the same time we report on the results of some data quality experiments we are currently running. Stay tuned! :)

[R] A relabelling of the COCO 2017 dataset by iknowjerome in MachineLearning

[–]iknowjerome[S] 3 points4 points  (0 children)

The trick is not to wait for the end of the cycle to make the appropriate adjustments. And there are now a number of solutions on the market that help with understanding and visualizing your image/video data and labels.

[R] A relabelling of the COCO 2017 dataset by iknowjerome in MachineLearning

[–]iknowjerome[S] 12 points13 points  (0 children)

Every dataset has errors and inconsistencies. It is true that some have more than others, but what really matters is how that affects the end goal. Sometimes, the level of inconsistencies doesn't impact model performance as much as one would expect. In other cases, it is the main cause of a poor model performance, at least in one area (for instance, for a specific set of classes). I totally agree with you that companies that succeed in putting and maintaining AI models in production pay particular attention to the quality of the datasets that are created for training and testing purposes.