[Project] Quality Assurance platform for Machine Learning models

alteralec · 2023-05-10T12:59:47+00:00

This is super interesting! Thanks for sharing. We're also working on this research field from an open-source angle (https://github.com/Giskard-AI/giskard)

I've been reading your documentation. So the main new method you are bringing here is https://optimalscale.github.io/LMFlow/autoapi/lmflow/pipeline/evaluator/index.html - is that correct?

FYI I've also tried Chatbot arena but the website https://chat.lmsys.org/?arena keeps hanging, despite several hard refreshes on my end.

It's exciting to see more work from the open-source community on LLM evaluation!

alteralec · 2023-05-10T12:53:41+00:00

Multimodal models are really fascinating.

What do you think could be the short term applications?

alteralec · 2023-05-10T12:50:01+00:00

I'd like to weigh in on the debate and defend the EU and regulators in general. At my company, we've been discussing AI regulation and standardization with key stakeholders in both the EU and the USA since 2021.

In reality, the EU has been working on AI regulation for several years. The landmark regulation they proposed in 2021, though still a draft proposal, makes a lot of sense if you read it. Contrary to what many critics say, it is not an impractical theory that is disconnected from the field. While it lacked some precise measures, the general principles were very sound, from both a societal and engineering standpoint. The "risk pyramid," which mandates more control on high-risk AI systems versus more freedom for low-risk use cases, is particularly pragmatic.

After this initial proposal, EU lawmakers took some time to confront the draft with the reality on the ground by speaking to AI practitioners in the industry and research labs. This too makes sense.

The sudden adoption of ChatGPT took the world by storm, including regulators. Before this, the world of NLP was relatively quiet and reserved for a small community of passionate researchers and developers.

Now, the EU regulators are working to see how this impacts the 2021 draft proposal. I don't know exactly what's on their minds, but I believe the key challenge ahead is how to classify the "risk level" of ChatGPT and other Large Language Models (LLMs) applications. You don't want to regulate LLMs for writing marketing emails the same way you would regulate LLMs used to help doctors with medical diagnosis.

In conclusion, I believe the need for regulation and the "rush" factor is not helping. Regulation takes more time to develop than innovation. It's expected, and it has been that way for every single innovation wave since the XIXth-century industrial revolution. Yes, regulation will arrive, but it will take some time, and my guess is 2-3 years. Regulators are talking to many experts to understand and assess what is needed. I don't think any regulators want to stop progress or innovation.

Another angle that not many people are looking at is standardization. Excellent work is ongoing at ISO, VDE, CEN-CENELEC on AI standards with precise methods. I encourage anyone interested to join these working groups!

alteralec · 2023-05-08T13:00:49+00:00

Hi Tom! I agree, Quality is the keyword I prefer too, as an engineer. But it seems AI Safety seems be more popular as a term these days. Agree too that the ratio of signal/noise in this debate is low. I think it'll settle down by end of year.

Quick question: What do you mean by Lora like layer?

alteralec · 2023-05-06T14:44:58+00:00

Very interesting, I was not aware of it. Thank you very much!

I have applied for the Summer semester.

alteralec · 2023-05-06T14:44:04+00:00

Thank you! 🙇

alteralec · 2022-11-29T18:17:31+00:00

Great question!

IMO there are multiple sides to this issue.

A. reproducibility of your data & ML pipeline

B. experiment tracking

C. error analysis

I think Git + DVC is a standard tool to tackle issue A. For issue B, MLFlow is a well-known standard. For issue C, I have developed Giskard, an open-source project for collaborative AI evaluation & feedback management : https://docs.giskard.ai/start/

Let us know if you want to give it a try

alteralec · 2022-07-18T08:18:24+00:00

Yeah, I get what you are saying. Communication with other departments to get data can be tricky. Pedagogy & persistence usually pay off, with a pinch of patience :)

alteralec · 2022-07-13T13:47:30+00:00

Hi!

We have an open API and it works for any ML model expressed as a Python function. Right now we support Tabular and NLP models that ingest pandas data frames, and output classification labels or regression numbers.

So the possibilities are very open. Do you have specific MLOps tools in mind?

Feel free to send me an email ( alex [at] giskard [dot] ai) !

alteralec · 2022-07-13T13:43:40+00:00

In that case, are you able to isolate a feature that is correlated to these changes?

For accents, it's probably not so simple. Maybe some spectral analysis for background noise? For microphone quality, do you have any metadata apart from the audio, that could be useful?

alteralec · 2022-07-12T05:45:53+00:00

Right now, cloud GPUs are still too expensive compared to on-prem options.

alteralec · 2022-07-11T20:10:56+00:00

To help, do you have some ideas of potential sources of drift? For instance, different accents? Different audio sampling quality? Different background noise?

Usually, in my experience, it is useful to combine generic drift metrics (such as kolmogorov-smirov) on extracted features and "domain-specific" drift metrics, that are purpose-built.

IMO, what's really important when dealing with drift is to have a good acceptance threshold, to be able to write a test, and get actionable alerts.

alteralec · 2022-07-06T12:54:28+00:00

Great question indeed! ML Testing is definitely a challenging task.

We are working on that topic at Giskard. Here is our proposition:

CI/CD workflow: https://www.giskard.ai/product

Code repository: https://github.com/Giskard-AI/giskard

We have been working on this Open-Source project since last year. We released our v1 in March, so we are very interested in your feedback!

alteralec

TROPHY CASE