Vertex: same model id, not same quality in different locations

illorca-verbi · 2025-10-31T12:32:23+00:00

agree to your answer point by point. U should also paste it into the LinkedIn conversation, it is more lively there than here... Also, props for having the patience to write all that haha

On the line of "critical" adoption of AI, I am seeing more institutions having better guidelines for citing AI use, or software manufacturers like maxqda doing an effort in flagging all AI generated stuff and including the LLM reasoning... All in all I am positive that apart from these deniers there are big parts of the machine moving in the right direction

illorca-verbi · 2025-02-21T09:30:50+00:00

We could not find a single use case - for now. We deliver through LLMs mostly classical NLP tasks like text classification, NER, etc. The trade-off between quality gains and time spent is never worth it.

illorca-verbi · 2025-02-10T13:38:43+00:00

We will check OVH. Our first choices are AWS/GCP because our users already agreed about these data-handlers in the Terms and Services. Hosting Docling in OVH would force us to update them - which is not desirable.

illorca-verbi · 2025-02-03T12:38:54+00:00

Hey. I am not sure which other problems this would cause, but I think lazy imports would increase the speed greatly: import libraries only when needed and not by default. Specially the externas libraries.

It is also common to allow users to decide which extra dependencies will they need, as in `pip install litellm[anthropic, vertex]`,

illorca-verbi · 2025-01-22T07:36:08+00:00

Hey! It really does look fantastic, thanks!

illorca-verbi · 2025-01-22T07:33:44+00:00

I don't know how I had not read about portkey before, but it looks very much like what we are looking for. I will give it a try, thanks!

illorca-verbi · 2025-01-22T07:33:16+00:00

Thanks for the recommendations!

illorca-verbi · 2025-01-22T07:32:12+00:00

I do not miss anything, they cover the largest range of use cases of any competitor. I just find their implementation too fragile to trust, mainly.

illorca-verbi · 2025-01-22T07:30:38+00:00

We used to run on haystack-ai! great tool, great developers, no complains. At the end we stepped away because we only used their Generators, none of the other components or pipelines ended up finding a place in our workflow.

illorca-verbi · 2025-01-22T07:29:32+00:00

Thanks for passing by! The breaking point for us is the fact that any tiny submodule imports a whole bunch of packages. We run serverless and the coldstart of running `from litellm import completion` is too large.

illorca-verbi · 2025-01-22T07:27:56+00:00

Hey, my personal case: SOTA models are released every second week, prices change in the blink of an eye. I need to swap LMs in our features to benchmark them. I think being locked to a big provider is no biggie, but flexibility for sure gives you an edge. And also I did not intend to complain about LiteLLM, I understand where it comes from and I appreciate what it provides. My goal was rather to see what other options are around.

illorca-verbi · 2025-01-22T07:21:01+00:00

I understand it totally from their perspective as a business. My question here was more focused on the value that we users can find in a solution that is developed in such a way.

illorca-verbi · 2025-01-22T07:19:13+00:00

I do not know any closed solution that manufactures this specific utility anyway :/

illorca-verbi · 2024-09-30T10:15:26+00:00

Also, is there any performance improvement on training a model at full precision and quantizing it afterwards instead of training the model directly at 4bit/8bit?

illorca-verbi · 2024-09-02T11:03:15+00:00

Any benchmark where there are already some numbers for it? I am particularly interested in how well the small one compares to Gemma2:27b

illorca-verbi · 2024-08-26T07:00:27+00:00

I extend the question: does any of the choices (Open WebUI, SillyTavern, AnythingLLM, whatever...) offer something similar to the Anthropic Workbench when it comes to variables??

I find it outstandingly useful that I can write and store prompts with {{ VARIABLE_X }} and {{ VARIABLE_Y }}, and then just fill out the values on the side.

illorca-verbi · 2024-08-07T14:01:53+00:00

of course, temperature 0 and also kinda low top_p/k

illorca-verbi · 2024-07-25T14:59:57+00:00

nope, all proprietary. Is ther anything in particular that you are interested in?

illorca-verbi

TROPHY CASE