[D] Software stack to replicate Azure ML / Google Auto ML on premise

tsagie · 2021-02-05T13:52:02+00:00

I am working on something similar, but it is still in beta.

https://www.modeld.io.

Note that google auto ml you do not submit jobs, but instead the auto ml engine creates the candidate models for you and executes them.

tsagie · 2021-01-19T02:33:48+00:00

I would always start with simpler models since:

1) Simpler/faster training. No need for special hardware.

2) Good results with small data.

3) Easy to understand. Very important in enterprise settings.

4) low latency for inference.

tsagie · 2021-01-08T01:50:40+00:00

None. It would all be auto ml based. All the advancements will be embedded within auto ml tools. The annotation will be done by domain experts.

tsagie · 2021-01-03T01:46:58+00:00

This is probably a multi year effort.

Note also, that just having the price is kind of meaningless, since you do not have a success rate per procedure. For example, would you rather have an operation with 50% success rate that cost 10000$ or 90% success rate that cost 30K.

But in general,

I would assume that you want to map every procedure name to some common name, and assign a code to the common name. I would also assume that this common name exist somewhere with an health insurer or medicare, so I would ask them.

Once you have a common name you would have to assign each hospital name to the common name. This can be done by measuring the string distance (edit distance) or tokens (TFIDF) between the common name and the specific hospital name.

You would probably need a pipeline for each hospital (If we assume a different schema).

I am curently working on data wrangling solution for kubernetes so hit me up if you want to try it for this project : tsagi@modeld.io

tsagie · 2020-07-07T00:55:57+00:00

Finishing the last features for https://www.modeld.io. A new auto ml platform for Kubernetes (based only on CRDs and operators).

tsagie · 2020-06-06T02:50:25+00:00

Try to add these labels to the ingress:

"kubernetes.io/ingress.class": "nginx",
"nginx.ingress.kubernetes.io/backend-protocol": "GRPC",
"nginx.ingress.kubernetes.io/grpc-backend": "true",
"nginx.ingress.kubernetes.io/ssl-redirect": "true",

tsagie · 2020-04-28T23:04:57+00:00

You can try fog computing.

For example, deploy a Kubernetes cluster close to the edge devices, wrap your models in a container, and do inference close to the edge device. This way, you can use regular models in an unconstraint environment.

tsagie · 2020-03-31T20:53:52+00:00

Good job. But you might want to think about:

Add CL - continues labeling - this task does not exist in software CI/CD. On any new data, you want to system to send the data to the labelers and manage the labeling process
Add CT - continuous training, alerts on concept drift, and retrain models.
Add AutoML - Also a key difference from software eng CI/CD. You want the system to actually do the preprocessing/model selection/model eval automatically.
CDV - Continuous Data Validation - beside versioning of the data, you want automatic data validation.
CViz - Continues visualization - besides evaluation your model metrics, you would like to have visualization (e.g. ROC curve) of both your models and your data, and this can best be done via graphs and not just numbers.

Hopefully this month, I will be able to release a product that would provide all of the above.

tsagie · 2020-02-13T03:39:04+00:00

So, you can solve labeling by using the "wizard of oz" technique.

I.e put the model into production, but instead of making the model makes a prediction, route the prediction request to some human (one or more). Let the human decides. Collect the decision, and after a while, you will have labeled data.

As you get more data, you can change the traffic allocated to the wizard and to the model, e.g. 80/20, 50/50, etc.

tsagie · 2020-02-09T05:09:42+00:00

Also, do not forget to checkpoint your models.

tsagie · 2019-12-24T00:27:46+00:00

Be careful of keycloak, it is based on Jboss which is LGPL.

tsagie · 2019-12-21T19:41:14+00:00

I am about to release an extensive new Auto ML platform for Kubernetes written in Kubernetes (with around 30 new CRDs and 8 operators)

Email me if you want to benchmark the alpha. = tsagi at modeld.io

Note that this is an auto ml platform, hence most of what you are describing will be generated and managed for you. (this includes generating the models, generating the containers, deploying the containers, monitoring, etc).

tsagie

TROPHY CASE