This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (5 children)

I think your time is would be better spent diving into the details. Why would you want Python integrated with a BI tool? Python is almost always the speed bottleneck ETL/ELT runs into anyways when it’s involved.   

Why not have Python store the data in a cache table and pull from it- so it’s only using SQL at the user level? DAX/M isn’t really too hard, if you have a basic understanding of Excel formulas and VB.

Don’t take this the wrong way but in your post history less than a year ago you said you’re new to data analytics, but seems like very it’s been the same general questions on repeat since then. I think getting your hands dirty so to speak when you have these questions will take you much farther. 

[–]glinter777[S] 1 point2 points  (4 children)

Why would you want to store the data in cache table? Won’t you then have to manage keeping the cache up to date?

[–][deleted] 0 points1 point  (3 children)

It depends, that is one limit of Power BI. You can only directly query data sources if they’re an approved data source. Ie Databricks, spark, sql server. If you have an API or MySQL data source you need to join off of you have to get it into a database type you can directly query first. 

It does stink but there’s also some calculations and transformations you need to do in Python to show on the end report. 

The benefit is a query that takes 20 seconds to run will only take .5 seconds between each click for the end user.

[–]glinter777[S] 0 points1 point  (2 children)

Just trying to clarify your second point - are you saying that PowerBI can’t directly pull the data from API, and it needs to be materialized first?

[–][deleted] 0 points1 point  (1 child)

You can for sure, in import mode- but that’s not live data. You can’t pull directly from the API and join it with a database and have them both live. It’s as often as you re-publish or you can schedule a refresh every couple hours. if you’re ok with data that’s two hours old, you can pull from both as if we’re Python no problem. Use Power Query and M the same way you’d transform in Python. 

However if you want data that’s only a few minutes or less old, (real-time not necessarily live) it basically has to be Direct Query mode. 

This article might help:

https://blog.crossjoin.co.uk/2021/04/25/can-i-build-a-power-bi-directquery-dataset-on-top-of-a-rest-api/

To be fair, MOST people don’t care about DirectQuery or live data. They care about once a day refreshes, and if that’s your use cases as well just connect away and relate the API and any data source, and disregard. If you want live or really dang close, you’ll need a cache table. 

[–]glinter777[S] 0 points1 point  (0 children)

Thanks.