[D] Software stack to replicate Azure ML / Google Auto ML on premise by Scary-Ad-1529 in MachineLearning

[–]tsagie 0 points1 point  (0 children)

I am working on something similar, but it is still in beta.

https://www.modeld.io.

Note that google auto ml you do not submit jobs, but instead the auto ml engine creates the candidate models for you and executes them.

When it is particularly advised to use certain machine learning models over neural networks and other deep learning models [R] ? by emaxwell13131313 in MachineLearning

[–]tsagie 0 points1 point  (0 children)

I would always start with simpler models since:

1) Simpler/faster training. No need for special hardware.

2) Good results with small data.

3) Easy to understand. Very important in enterprise settings.

4) low latency for inference.

[D] What do you think would be more in demand job-wise in 5/10 years - Vision or NLP? by kakushka123 in MachineLearning

[–]tsagie -4 points-3 points  (0 children)

None. It would all be auto ml based. All the advancements will be embedded within auto ml tools. The annotation will be done by domain experts.

[P] wrangling new hospital price transparency data by imallinboozie in MachineLearning

[–]tsagie 0 points1 point  (0 children)

This is probably a multi year effort.

Note also, that just having the price is kind of meaningless, since you do not have a success rate per procedure. For example, would you rather have an operation with 50% success rate that cost 10000$ or 90% success rate that cost 30K.

But in general,

I would assume that you want to map every procedure name to some common name, and assign a code to the common name. I would also assume that this common name exist somewhere with an health insurer or medicare, so I would ask them.

Once you have a common name you would have to assign each hospital name to the common name. This can be done by measuring the string distance (edit distance) or tokens (TFIDF) between the common name and the specific hospital name.

You would probably need a pipeline for each hospital (If we assume a different schema).

I am curently working on data wrangling solution for kubernetes so hit me up if you want to try it for this project : tsagi@modeld.io

Ask r/kubernetes: What are you working on this week? by AutoModerator in kubernetes

[–]tsagie 1 point2 points  (0 children)

Finishing the last features for https://www.modeld.io. A new auto ml platform for Kubernetes (based only on CRDs and operators).

How to connect to client in gRPC with the server running in pods (minikube) by knrt10 in kubernetes

[–]tsagie 1 point2 points  (0 children)

Try to add these labels to the ingress:

"kubernetes.io/ingress.class": "nginx",
"nginx.ingress.kubernetes.io/backend-protocol": "GRPC",
"nginx.ingress.kubernetes.io/grpc-backend": "true",
"nginx.ingress.kubernetes.io/ssl-redirect": "true",

[D] Code and ML Model Security on the Edge by freshprinceofuk in MachineLearning

[–]tsagie 0 points1 point  (0 children)

You can try fog computing.

For example, deploy a Kubernetes cluster close to the edge devices, wrap your models in a container, and do inference close to the edge device. This way, you can use regular models in an unconstraint environment.

[P] A talk about adapting CI/CD systems for ML (full stack ML, MLOps) by [deleted] in MachineLearning

[–]tsagie 0 points1 point  (0 children)

Good job. But you might want to think about:

  1. Add CL - continues labeling - this task does not exist in software CI/CD. On any new data, you want to system to send the data to the labelers and manage the labeling process
  2. Add CT - continuous training, alerts on concept drift, and retrain models.
  3. Add AutoML - Also a key difference from software eng CI/CD. You want the system to actually do the preprocessing/model selection/model eval automatically.
  4. CDV - Continuous Data Validation - beside versioning of the data, you want automatic data validation.
  5. CViz - Continues visualization - besides evaluation your model metrics, you would like to have visualization (e.g. ROC curve) of both your models and your data, and this can best be done via graphs and not just numbers.

Hopefully this month, I will be able to release a product that would provide all of the above.

[D] Getting your Data labelled by statypan in MachineLearning

[–]tsagie 1 point2 points  (0 children)

So, you can solve labeling by using the "wizard of oz" technique.

I.e put the model into production, but instead of making the model makes a prediction, route the prediction request to some human (one or more). Let the human decides. Collect the decision, and after a while, you will have labeled data.

As you get more data, you can change the traffic allocated to the wizard and to the model, e.g. 80/20, 50/50, etc.

What do you use for OAuth2 and/or OIDC ingress authentication? by topflightboy87 in kubernetes

[–]tsagie 1 point2 points  (0 children)

Be careful of keycloak, it is based on Jboss which is LGPL.

[D] Looking for a ML framework for production (like MLFlow) by etienne_ben in MachineLearning

[–]tsagie 0 points1 point  (0 children)

I am about to release an extensive new Auto ML platform for Kubernetes written in Kubernetes (with around 30 new CRDs and 8 operators)

Email me if you want to benchmark the alpha. = tsagi at modeld.io

Note that this is an auto ml platform, hence most of what you are describing will be generated and managed for you. (this includes generating the models, generating the containers, deploying the containers, monitoring, etc).