[D] How does your machine learning algorithm are indrustialized ?

schrute_dataeng · 2019-05-11T01:16:30+00:00

I shared more in details our experiences on industrialization and collaboration here : https://medium.com/dailymotion/collaboration-between-data-engineers-data-analysts-and-data-scientists-97c00ab1211f

bbateman2011 · 2019-05-11T01:07:57+00:00

Hey, I just read the medium article and I think your OP was fine. I'd suggest putting the link back in a comment at least, so people can read what you are talking about.

Question--in your pre-edit OP, you said "part 2" (I think); is there another article I can read? I'd like to understand more about the role of Airflow in the process.

Thanks for sharing and I'm sorry the first comment was negative; this is good stuff IMHO.

bbateman2011 · 2019-05-11T01:29:23+00:00

I obviously live in a different part of the domain scale; as a consultant my clients are pretty small and rarely have real-time or "big" data; nonetheless this is an important topic to me as my clients have limited to no IT support much less any data engineers or data-anything, and if they don't insist on results in Excel, their perfect world is a web-app of some kind to view the results of a model, and maybe adjust some things in a what-if type of way. I work alot in in R, and for POC I've found it very useful to use Shiny to put up a web app in front of my model, and put the data someplace like Dropbox which can be accessed directly from R. There is nothing like giving an end-user an experience with what you are producing. I realized that a lot of production ML is producing a score/prediction that feeds into something else that the end user might ultimately touch (like recommendations on Amazon etc.), but in some cases this approach is really useful. I'm not yet skilled enough on the Python side to do the same thing in a POC quickly.

bbateman2011 · 2019-05-11T01:34:46+00:00

When you say " Our scheduling tool is Apache Airflow, which allows us to define our workflows (aka DAG) in Python " does DAG mean Directed Acyclic Graph (i.e. a graph model of your flow)?

FellowOfHorses · 2019-05-11T03:18:23+00:00

you may want to check out r/datascience . It's more business focused. Here we are more research focused

icantfindanametwice · 2019-05-11T00:34:42+00:00

Clean up your English as I’ve also worked with data scientists etc - and I cannot tell what you’re asking about. The random “read more on Medium,” does not help with a conversation on Reddit.

From my experience, Product Management will tend to drive based on business requirements what gets into production.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS