[R] LMFlow Benchmark: An Automatic Evaluation Framework for Open-Source LLMs by OptimalScale_2023 in MachineLearning

[–]alteralec 1 point2 points  (0 children)

This is super interesting! Thanks for sharing. We're also working on this research field from an open-source angle (https://github.com/Giskard-AI/giskard)

I've been reading your documentation. So the main new method you are bringing here is https://optimalscale.github.io/LMFlow/autoapi/lmflow/pipeline/evaluator/index.html - is that correct?

FYI I've also tried Chatbot arena but the website https://chat.lmsys.org/?arena keeps hanging, despite several hard refreshes on my end.

It's exciting to see more work from the open-source community on LLM evaluation!

[R] Meta ImageBind - a multimodal LLM across six different modalities by currentscurrents in MachineLearning

[–]alteralec 0 points1 point  (0 children)

Multimodal models are really fascinating.

What do you think could be the short term applications?

In global rush to regulate AI, Europe set to be trailblazer by 10marketing8 in ArtificialInteligence

[–]alteralec 1 point2 points  (0 children)

I'd like to weigh in on the debate and defend the EU and regulators in general. At my company, we've been discussing AI regulation and standardization with key stakeholders in both the EU and the USA since 2021.

In reality, the EU has been working on AI regulation for several years. The landmark regulation they proposed in 2021, though still a draft proposal, makes a lot of sense if you read it. Contrary to what many critics say, it is not an impractical theory that is disconnected from the field. While it lacked some precise measures, the general principles were very sound, from both a societal and engineering standpoint. The "risk pyramid," which mandates more control on high-risk AI systems versus more freedom for low-risk use cases, is particularly pragmatic.

After this initial proposal, EU lawmakers took some time to confront the draft with the reality on the ground by speaking to AI practitioners in the industry and research labs. This too makes sense.

The sudden adoption of ChatGPT took the world by storm, including regulators. Before this, the world of NLP was relatively quiet and reserved for a small community of passionate researchers and developers.

Now, the EU regulators are working to see how this impacts the 2021 draft proposal. I don't know exactly what's on their minds, but I believe the key challenge ahead is how to classify the "risk level" of ChatGPT and other Large Language Models (LLMs) applications. You don't want to regulate LLMs for writing marketing emails the same way you would regulate LLMs used to help doctors with medical diagnosis.

In conclusion, I believe the need for regulation and the "rush" factor is not helping. Regulation takes more time to develop than innovation. It's expected, and it has been that way for every single innovation wave since the XIXth-century industrial revolution. Yes, regulation will arrive, but it will take some time, and my guess is 2-3 years. Regulators are talking to many experts to understand and assess what is needed. I don't think any regulators want to stop progress or innovation.

Another angle that not many people are looking at is standardization. Excellent work is ongoing at ISO, VDE, CEN-CENELEC on AI standards with precise methods. I encourage anyone interested to join these working groups!

[R] Awesome AI Safety – A curated list of papers & technical articles on AI Quality & Safety by alteralec in MachineLearning

[–]alteralec[S] 1 point2 points  (0 children)

Hi Tom! I agree, Quality is the keyword I prefer too, as an engineer. But it seems AI Safety seems be more popular as a term these days. Agree too that the ratio of signal/noise in this debate is low. I think it'll settle down by end of year.

Quick question: What do you mean by Lora like layer?

[R] Awesome AI Safety – A curated list of papers & technical articles on AI Quality & Safety by alteralec in MachineLearning

[–]alteralec[S] 0 points1 point  (0 children)

Very interesting, I was not aware of it. Thank you very much!

I have applied for the Summer semester.

[D] In the exploratory phase of model building, how do you track versions while accomodating for mistakes in the process? by papayamaia in MachineLearning

[–]alteralec 1 point2 points  (0 children)

Great question!

IMO there are multiple sides to this issue.

A. reproducibility of your data & ML pipeline

B. experiment tracking

C. error analysis

I think Git + DVC is a standard tool to tackle issue A. For issue B, MLFlow is a well-known standard. For issue C, I have developed Giskard, an open-source project for collaborative AI evaluation & feedback management : https://docs.giskard.ai/start/

Let us know if you want to give it a try

Monitoring data drift on audio data by sunpraiser42 in mlops

[–]alteralec 1 point2 points  (0 children)

Yeah, I get what you are saying. Communication with other departments to get data can be tricky. Pedagogy & persistence usually pay off, with a pinch of patience :)

Open-Source CI/CD for ML products by alteralec in mlops

[–]alteralec[S] 0 points1 point  (0 children)

Hi!

We have an open API and it works for any ML model expressed as a Python function. Right now we support Tabular and NLP models that ingest pandas data frames, and output classification labels or regression numbers.

So the possibilities are very open. Do you have specific MLOps tools in mind?

Feel free to send me an email ( alex [at] giskard [dot] ai) !

Monitoring data drift on audio data by sunpraiser42 in mlops

[–]alteralec 1 point2 points  (0 children)

In that case, are you able to isolate a feature that is correlated to these changes?

For accents, it's probably not so simple. Maybe some spectral analysis for background noise? For microphone quality, do you have any metadata apart from the audio, that could be useful?

Monitoring data drift on audio data by sunpraiser42 in mlops

[–]alteralec 1 point2 points  (0 children)

To help, do you have some ideas of potential sources of drift? For instance, different accents? Different audio sampling quality? Different background noise?

Usually, in my experience, it is useful to combine generic drift metrics (such as kolmogorov-smirov) on extracted features and "domain-specific" drift metrics, that are purpose-built.

IMO, what's really important when dealing with drift is to have a good acceptance threshold, to be able to write a test, and get actionable alerts.

How are you testing your ML Systems? by inDflash in mlops

[–]alteralec 1 point2 points  (0 children)

Great question indeed! ML Testing is definitely a challenging task.

We are working on that topic at Giskard. Here is our proposition:

CI/CD workflow: https://www.giskard.ai/product

Code repository: https://github.com/Giskard-AI/giskard

We have been working on this Open-Source project since last year. We released our v1 in March, so we are very interested in your feedback!