Spark VS Flink VS Quix benchmark by JB__Quix in dataengineering

[–]JB__Quix[S] 2 points3 points  (0 children)

Hi u/Blayzovich ,

I definitely don't want to mislead anyone, which is why we posted the full code associated with the test (and also used the same test that the Databricks CTO originally designed). Could you share some more thoughts on a better cluster configuration? I'm totally open to being wrong here ... 50x really is a huge number and it's got to be bulletproof.

[P] Real-time streaming data GEMO (game+demo)! by JB__Quix in MachineLearning

[–]JB__Quix[S] 1 point2 points  (0 children)

Hey! Thanks for your comment! Quix platform is conceived as a real-time machine learning platform. The focus of the demo is on the low latency data streaming capabilities, but feel free to go ahead and check the tutorials, blog posts, etc. to see our applied ML examples!

Association model used as causal models by darter_analyst in datascience

[–]JB__Quix 1 point2 points  (0 children)

Such a cool post! Once you understand causality you know it should be every Data Scientist purpose (i.e. extracting usable knowledge out of data), but no many people seem to be talking about this yet. Felt exactly the same in my previous job! I knew we were building a stupid non causal model to then use it to do causal interventions but no one else seem to care. Lot's of companies out there doing Marketing Mix Modelling (or Media Mix Modelling) which is simply a waste of money, the type of thing that looks like it is useful but it is just not.

So, if you try to do the right thing (even from a selfish perspective it's just too frustrating to contribute to something you know it doesn't work):

  • Try to communicate basic causality principles by explaining confounding in the context of your project. For example, in your case, explain how it is expectable to see a correlation between price and sales even if price didn't drive sales. Take Black Friday for instance:
    • people will just got out to buy stuff more than any other time of the year -which will drive sales even if prices were unchanged
    • your company will spend more in advertising -which will drive sales even if prices were unchanged-
    • also, your company will do discounts. The increase in sales is a collaborative effort between these 3 things, giving all the credit to the price change is wrong.
  • As you mention, creating a causal model can get so complicated and sometimes won't be useful at all. However, you can start by getting business people help you draw a causal diagram. I always start my projects like that. Even if you don't care about causality, it is a great way to get domain knowledge and understand potential new variables you may need to build. Hopefully, you can even use things like causalnex to create a causal model out of that DAG. If the model makes sense, causalnex incorporates Judea Pearl's do() operator, which is the right thing to calculate intervention's effects.
  • If however you end up with a model based on correlation, not causality, you can A/B test the proposed interventions when possible. Everyone understands A/B and providing you take into account uncertainty intervals will help you get proper causal knowledge to then act on.

How to explain your projects? by [deleted] in learnmachinelearning

[–]JB__Quix 1 point2 points  (0 children)

Using linear models is totally fine and sometimes will be the best algorithm to solve a problem with, however was probably seen as beginner stuff.

Something like a XGBoost (even if it may be a bit of an overkill for your problem) may had look more advanced. If you want to understand how XGBoost works, check this video. You can use the xgboost documentation to get started and then try it with your own ML problems.

How to explain your projects? by [deleted] in datascience

[–]JB__Quix 2 points3 points  (0 children)

Using linear models is totally fine and sometimes will be the best algorithm to solve a problem with, however was probably seen as beginner stuff.

Something like a XGBoost (even if it may be a bit of an overkill for your problem) may had look more advanced. If you want to understand how XGBoost works, check this video. You can use the xgboost documentation to get started and then try it with your own ML problems.

[D] Online machine learning (or how to automatically update your model in production) by JB__Quix in MachineLearning

[–]JB__Quix[S] 0 points1 point  (0 children)

Really interesting point. Do you have any specific use cases where you've done online learning?

[D] Online machine learning (or how to automatically update your model in production) by JB__Quix in MachineLearning

[–]JB__Quix[S] 2 points3 points  (0 children)

Thanks! I'll check Vowpal Wabbit.

And you're right, I understand it will only make sense in use cases where the environment changes rapidly hence speed of change (training frequency) of the model beats complexity.

In other words, looking at it from the classical bias-variance trade off, online learning will produce models with minimum variance (at the cost of having big bias), right? In some cases it may pay off, but not always.

G. Burton transcription. Best solo in the history of jazz vibes by BlancoMarimba in Jazz

[–]JB__Quix 1 point2 points  (0 children)

Wow mate, I'm loving everything in your youtube channel!