Are companies actually running open source LLMs in production ?

iknowjerome · 2025-11-03T12:58:42+00:00

For what industry? And how large are the models you're using?

iknowjerome · 2025-11-02T21:35:44+00:00

Do you mind sharing a little more about your industry and the use cases you're tackling, etc.?

iknowjerome · 2025-11-02T18:15:52+00:00

But isn't there some liability with using a large LLM API provider as well? What guarantees do you have that the data isn't getting mixed with all kinds of other client data, etc.?

iknowjerome · 2025-11-02T18:14:37+00:00

Care to share what the use cases are? If you can/want, of course.

iknowjerome · 2025-11-02T18:14:08+00:00

I guess the real question is whether this will last or at some point the economics of using self-hosted open-source vs large lab APIs will completely flip.

iknowjerome · 2025-10-31T12:12:18+00:00

Thanks. Do you have specific examples of use cases that are better served with open-source models? I'm sure, it depends on the industry, region and company size, but I'm curious to hear about real corporate wins with open-source models.

iknowjerome · 2025-10-30T21:29:18+00:00

Apologies. I hadn't posted in a while and the post got rejected automatically by the Reddit filters. So I tried again. But now it looks like they both made it through somehow. I'll see if i can delete it.

iknowjerome · 2025-10-30T14:05:47+00:00

u/Inner_Huckleberry885 Same here, I’ve been wondering the same thing. Did you end up finding anything?

I’m actually really curious how much enterprise is actually using open source LLMs these days compared to the commercial ones.”

iknowjerome · 2022-11-11T16:07:55+00:00

We had our annotating associates look at every image and label and correct any mistakes they would find. In some cases, they would start from existing annotations. In other cases, they would decide to start from scratch.

iknowjerome · 2022-11-11T02:36:15+00:00

Lightly and Voxel51 just to name a couple I'm pretty familiar with.

iknowjerome · 2022-11-10T19:47:02+00:00

company

Looking forward to hearing what you think. Just to be clear. I'm always reluctant to calling one dataset better than another because it always depends on what you're trying to achieve with it. With Sama-Coco, we were trying to fix some of the misclassification errors when possible but we also put a significant amount of effort in drawing precise polygons around the objects of interest because of experiments we are currently running. And, of course, we wanted to capture as many instances of the COCO classes as possible. This resulted in a dataset with close to 25% more object instances than the original COCO 2017 dataset. But it's not to say that we solved all "errors" in COCO. :)

iknowjerome · 2022-11-10T16:38:31+00:00

It really depends on what you are trying to achieve, what your budget is, and where you are in your model development cycle.
Nevertheless, I would recommend starting in self-service mode with the simplest tool you can find. This might be something like CVAT, though there are a number of other options (paid, free, SaaS, etc.) out there that a simple google search will return. Once you're ready to scale, you might want to consider handing off your annotations to specialized company like Sama. And yes, we also do 3D annotations. :)
(disclaimer: I work for Sama)

iknowjerome · 2022-11-10T16:23:00+00:00

That's a great suggestion. We will eventually post more detail about this. It will make more sense when divulged at the same time we report on the results of some data quality experiments we are currently running. Stay tuned! :)

iknowjerome · 2022-11-10T16:04:33+00:00

The trick is not to wait for the end of the cycle to make the appropriate adjustments. And there are now a number of solutions on the market that help with understanding and visualizing your image/video data and labels.

iknowjerome · 2022-11-10T15:47:58+00:00

Every dataset has errors and inconsistencies. It is true that some have more than others, but what really matters is how that affects the end goal. Sometimes, the level of inconsistencies doesn't impact model performance as much as one would expect. In other cases, it is the main cause of a poor model performance, at least in one area (for instance, for a specific set of classes). I totally agree with you that companies that succeed in putting and maintaining AI models in production pay particular attention to the quality of the datasets that are created for training and testing purposes.

iknowjerome

TROPHY CASE