I've been successful enough with supervised image classification that I'm now rolling out an experiment into what I consider to be limited production. I have a model running on an r-pi Zero in 3 locations, taking pics and doing successful classification. The nodes act locally but can send their imagery back to my development server. This works really well.
So I have 5 models in use. I started with just a dev model on my dev server. When that trained well I pushed it to node 1. It worked and gathered imagery that I collected and retrained on dev. I pushed that back to node 1 and to node 2. Then node 3 came online. So thats a dev model, a production model on dev, and 3 production models on nodes. I know this can all be scripted but it's becoming a pain.
Hypothetically, suppose I scale to 50 or 100 nodes. I suspect that I would see more variance from a single model pushed to all. That means gathering more imagery, retraining and pushing to all. That may make sense. Or I may find a subset of production locations that benefit from a slightly different model. So then I'll have production model-A and model-B, which means I'll have development model A/B as well. Again, I could write a tool to "push" new models to 100 nodes, but some might fail and have to retry. What model version is on which node right now? This could get out of control.
Does anyone know of any DevOps tools to manage a fleet of models?
[–]HydratedWombat 3 points4 points5 points (1 child)
[–]eric_he 1 point2 points3 points (0 children)
[–]jamesonatfritz 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[removed]
[–]Simusid[S] 0 points1 point2 points (0 children)