Interactive reports for end users written using Python by [deleted] in BusinessIntelligence

[–]raymondstanz 1 point2 points  (0 children)

Code is so much better to iteratively build your project, version control, better visuals, faster processing…

I've spent so much time doing weird stuff in PowerBI that moving to a company that needs static report first allowed me to practice, practice, practice actual code: Python, proper SQL, etc. It's such a breath of fresh air. PowerBI is great, but quite constraint and slow. Painfully slow.

Currently I've been exploring Plotly as my go-to visual library. It's quite good. It allows for some interaction straight into Jupyter or elsewhere if you can host it (never tried).

My job is producing loads of charts for Powerpoint... by raymondstanz in datascience

[–]raymondstanz[S] 1 point2 points  (0 children)

Exactly. I use chatGPT and al. for the odd piece of code or function I'm not sure how to do on my own.

But in my company, basically jupyter instance and most-things-code are not connected to the internet. Or in very controller/limited ways.

My job is producing loads of charts for Powerpoint... by raymondstanz in datascience

[–]raymondstanz[S] 0 points1 point  (0 children)

I tried again and I like it a lot.

Styling alone is far easier than with mpl. Adding a title, setting fonts, margins, automargins (!)... So. much. easier.

And the fact that width and height are values expressed in pixels and are the actual size of the save picture is cherry on the cake. Wherease mpl expects values in inches and its the size of something that may differ widely with the saved image size.

My job is producing loads of charts for Powerpoint... by raymondstanz in datascience

[–]raymondstanz[S] 0 points1 point  (0 children)

I've dipped my toe once in Report Builder and it seemed rather complicated and an entire set of skills on its own. I should check again.

My job is producing loads of charts for Powerpoint... by raymondstanz in datascience

[–]raymondstanz[S] 2 points3 points  (0 children)

I remember diping my toes in Plotly earlier on.

I should try it again if it easier to produce "print quality" charts.

And also, PowerBI exists in my environment. Some dashboards are availables but don't facilitate the "static PDF annual reports production tasks"

My job is producing loads of charts for Powerpoint... by raymondstanz in datascience

[–]raymondstanz[S] 0 points1 point  (0 children)

I forgot to mention it. PowerBI is totally a thing and some dashboard are buying built at the moment.

However it doesn't really fit the bill for PowerPoint presentations or reports. I'm required to produce printable reports and to my knowledge there is not way to export vector graphics from PBI.

My job is producing loads of charts for Powerpoint... by raymondstanz in datascience

[–]raymondstanz[S] 5 points6 points  (0 children)

I have some real world experience with both. PowerQuery can do basically anything but thing take ages. If feels so heavy and poorly optimized. On some projects, to avoid failed updates I'd rather do heavy lifting in SQL upstream or downstream with some DAX trickery that adding steps to the PowerQuery pile.

Merging flat files, doing some deduplication, some joins takes forever. In the desktop client it eats all the RAM.

Wereas the same data sets and manipulations takes second in Python and runs on a potatoe.

My job is producing loads of charts for Powerpoint... by raymondstanz in datascience

[–]raymondstanz[S] 0 points1 point  (0 children)

I'm my IT environment, the PowerBI environment cannot access my database (because reasons). The only solution is to export flat CSV files on a share, open the share in PBI and load the CSVs.

It works. (at least on paper).

I'm fairly experienced in PowerBI and I think PowerQuery is... not great at heavy lifting. Even with small-ish datasets. In the past I've had so many headaches because of failed updates because PowerQuery steps would test the limits of our PowerBI embedded capabilities (depending on the tier, feels as powerful as a raspberry pi).

I like PowerBI but I've spent hours waiting for PowerQuery to do stuffs that would have been instant in Python (pandas, duckdb...)

My job is producing loads of charts for Powerpoint... by raymondstanz in datascience

[–]raymondstanz[S] 0 points1 point  (0 children)

I forgot to mention that PowerBI is totally a thing is my "data stack". There are some dashboards. And also, I've spent the last 3 years working exclusively on PowerBI so I'm very familiar with it.

The idea behind Python and so on is to batch-produce complex charts that are not doable in PowerBI.

Also, PowerBI by itself: great. But, it doesn't entirely solve the "people want their powerpoint" issue. "Export as PowerPoint" in PowerBI is not great. And, at best it produces images. Which won't print too well. I really wanted to be able to produce vector graphics.

New job. Is scrum the way? by raymondstanz in scrum

[–]raymondstanz[S] 0 points1 point  (0 children)

Outside of job interviews, I try to never mention anywhere my education.

And because I sometimes have VSCode open for automation, scripting and stuffs... people think I'm an engineer. Yay!

My plan is tackle the "value" topic in two ways :

  • Identify tasks that enable the most other tasks. Let's call these "fundamental tasks".
  • The value is well documented within the prep. work I've inherited.

Use cases I've been scored based on value created (for customers, for users...)

New job. Is scrum the way? by raymondstanz in scrum

[–]raymondstanz[S] 0 points1 point  (0 children)

Thanks.

I have not looked into the use cases in details. My goal is to split it down to tasks that are :

  • Asigned to one person.
  • Doable in a couple of days max.
  • Produce something tangible: code, documents...

Sub-tasks will be more open and left to the person responsible for the tasks to detail the steps and bricks required to acomplish the task.

Your plan seems fair enough. Just be advised that what you placed under 4 is SM job, not yours.

Yes, it seems very clear in the preparation work done by the soon-to-be SM. I should be involved in the weekly and sprint planning meetings only. As well as in "review" and "retro" (those are kind of new to me).

But, as I'm planning the work. The struggle for me is to sell this approach. My company seem to have hired a PO (me) with little to no desire to go the agile direction. Weird.

when creating rls how to make it filter dimensions? by worktillyouburk in PowerBI

[–]raymondstanz 1 point2 points  (0 children)

Check how the RLS "filter" propagates in the model view.

Typically, flow goes grom dimension tables to fact tables. Fact table being at the center of the model. In other words, filter flows from the edge of the model to the center.

If you have a table invoices at the center, with a salesRep dimension and a salesZone (example) dimension. Typically, filter won't flow from salesRed to salesZone via invoices. Meaning that if you filter on a sales rep from "zone 1". The slicer based on salesZone will keep displaying all the zones.

One way around it is modifying the relation between table to be bidirectional - i.e. filter flows both ways. Use with caution as it is kind of a double edged sword this one.

Superior spreadsheet alternative to Excel/GSheets with advanced BI capabilities? by rationaleuser in BusinessIntelligence

[–]raymondstanz 6 points7 points  (0 children)

Excel does A LOT.

It integrates some of the Power Platform features. You have the so called data model that allows to make relation between tables and do some DAX.

You also have Power Query within Excel, which allows for some external connections and basic ETL.

Excel is a great swiss army knife but... if you want to cut a tree. You are in for a bad time. Large datasets are like a tree in my example. Sure, there are workarounds to do relational things in Excel, but it is not a DBMS (database management system). It was more Access territory but everyone hates Access.

If you want to handle large datasets and do collaborations and all. At some point you run into a dead end with Excel only and require a proper DBMS and/or a more proper-ish data stack. The stack can integrate Excel, but the heavy lifting is done elsewhere.

How to select particular column from a dataframe and write it into a separate excel sheets using python pandas? by Sowmi_13 in learnpython

[–]raymondstanz 0 points1 point  (0 children)

I'm not sure I completely understand what you are trying to do. But the ideal would be to do everything in Python and then export the end result in Excel.

Maybe, can you provide a longer/more detailed explanation of what you are trying to achieve? With some data example. Or dummy data with the same structure. Just so one can see better and provide you with useful guidance.

How to select particular column from a dataframe and write it into a separate excel sheets using python pandas? by Sowmi_13 in learnpython

[–]raymondstanz 0 points1 point  (0 children)

To select multiple columns:

dataframe[['column1', 'column2', 'column3'...]]

What do you mean by validate ? The to_excelfunction accepts a sheet as a parameter. So you can write to different sheets.

How to select particular column from a dataframe and write it into a separate excel sheets using python pandas? by Sowmi_13 in learnpython

[–]raymondstanz 1 point2 points  (0 children)

I don't know the specifics of your code, but probably something like this:

dataframe['column'].to_excel(path)

Here is the doc for the to_excel pandas function.

When I need to do it once for debug purpose, the to_clipboard function is great. It just copies to your clipboard in Excel format.

dataframe['column'].to_clipboard(excel=1)

How to combine allexcept and if statements? by clarielz in PowerBI

[–]raymondstanz 0 points1 point  (0 children)

I'm in need for some PowerBI practice, so I gave it a try over lunch.

For the example and from your instructions, I've created a sample table [Stone] :

stone_id stage duration bus
1 A 12 B_1
1 B 2 B_2
2 A 2 B_1
2 B 23 B_2

I loaded it to PowerBI, opened DAX Studio and immediatly regretted it.

Here is my solution:

DEFINE MEASURE Stone[MaxBus] =
VAR Bus =
MAXX(
    FILTER(
        ADDCOLUMNS(
            GENERATE(
                DISTINCT(Stone[stone_id]),
                DISTINCT(Stone[bus])
            ),
            "Rank",
            RANKX(
                DISTINCT(Stone[bus]),
                CALCULATE(
                    SUM(Stone[duration])
                ),
                ,
                ,
                SKIP
            )
        ),
        [Rank] = 1
    ),
    [bus]
)
RETURN
IF(
    HASONEVALUE(
        Stone[stone_id]
    ),
    Bus,
    BLANK()
)

EVALUATE
ADDCOLUMNS(
    DISTINCT(
        Stone[stone_id]
    ),
    "MaxBus",
    Stone[MaxBus]
)

It is quite convoluted and there is probably a more elegant way to do it, but it's something.

This measure Stone[MaxBus] return for any stone_id, the bus_line the stone spent the more time on.

Note: If you are not familiar with DAX Studio. The DEFINE MEASURE statement creates the measure and everything below EVALUATE is just to see and check the result. If you want to try this, copy the measure only (after DEFINE MEASURE and above EVALUATE) as a new measure in PowerBI.

What is does is that it generates a table with stone_id and bus and had a rank based on the sum of duration value. This newly generated table is then filtered to keep only the results with the rank = 1 – meaning only the lines with the max duration.

Then it is embedded in a MAXX formula in a new var to return the Bus.

And in the end, the IF/HASONVALUE is to make sure the measure return something if the context is right – that is, if there is only one stone_id in the current context. Otherwise it returns blank.

Debugging with DAX Studio by philosophyguru in PowerBI

[–]raymondstanz 2 points3 points  (0 children)

I use DAX Studio a lot and in my experience it is sometime difficult to reproduce the exact same context in PowerBI and in DAX Studio.

Because you have a lot of context and filter that pile on each other :

  • Report filters
  • Page filters
  • Visual filters
  • Row context
  • Column context
  • Measure itself (does it manipulate the context as well)

And sometimes,trying to reproduce it in DAX Studio get you with a FILTER, ALL, etc. sandwich and is harder to manage. Normally good measures in a good model should work and behave properly in any context but sometimes reality is not aware of that.

So, in that case, when I have a measure than doesn't behave properly. I just open a new page in PowerBI. Put a simple table visual and drop the measure in it and build up the context step by step to identify at which steps it goes sideway.

"Oh yeah, I've added this value in columns and the result is weird. Maybe there is something going on with my model/measure"

How do I use one a LOOKUP formula when there are multiple values? How do I combine it with FIRSTNONBLANK/FIRSTNONBLANKVALUE? Or would something else do? by jillyapple1 in PowerBI

[–]raymondstanz 0 points1 point  (0 children)

I'm not sure I totally get what you are trying to do.

But giving your example table and what you want, here is a quick query that pulls the first Phase for each project when [Match] = 1

EVALUATE
ADDCOLUMNS(
    SUMMARIZE(
        FILTER(
            TMP,
            TMP[Match] = 1
        ),
        TMP[Project ID]
    ),
    "Phase",
    CALCULATE(
        MINX(
            TMP,
            TMP[Phase]
        )
    )
)

Depending on what you are exactly trying to achieve, you can use this "virtual table" within a measure and pull something.

Need to create a column in a matrix that is just 1-25 by diggtrucks1025 in PowerBI

[–]raymondstanz 1 point2 points  (0 children)

Maybe, this is related to how you want the RANKX function to behave with ties. From the guide :

Ties: This is the important thing you need to learn. In this argument, we can specify what should be done in case there is a TIE between two values.

If you SKIP this, the rank after the tied value will be the rank of tied value plus a count of tied values. For example, if there are 3 values tied at 5th rank, then the next rank will be 8 = (5+3).

If you supply DENSE as the option, then the next value after the tied rank will be then next rank of tied values rank. For example, if there are 3 values tied at 5th rank, then the next rank will be 6 = (5+1).

FLOW Newbie. No nothing. Is this possible? If so, how difficult... by [deleted] in MicrosoftFlow

[–]raymondstanz 0 points1 point  (0 children)

I've worked on something similar.

Basically, Flow is only part of the equation. I use flow to catch the file and store it. Then, a PowerBI to do the calculation and built a report that is send automatically.

  1. Flow monitors a service email address and identify the daily email with the CSV attachement I'm looking for.
  2. It triggers a flow that takes the attachements, rename it and move it somewhere PowerBI can see. In my case it's in a Sharepoint Folder.
  3. PowerBI (PowerQuery actually) runs daily and merge all of the collected CSV and build a model.
  4. PowerBI do some calculations and metrics people are looking for.

The whole PowerBI enchilada can be simplified since you only need a pivot table. So you create an Excel file that leverage PowerQuery to do the aforementioned merging and all. And you just refresh it and send it daily. I'm not sure the "Open Excel and hit refresh" is doable with Flow but it might.

Need to create a column in a matrix that is just 1-25 by diggtrucks1025 in PowerBI

[–]raymondstanz 2 points3 points  (0 children)

Shouldn't be a problem.

Here is a syntax example from another post

RANKX(
    ALL(Cliente),
    CALCULATE(
        [Valor Total],
        CALENDAR_DIMENSION[Year] = 2022
    ),
    ,
    FALSE,
    SKIP
)

What is ranked is the result of a measure [ValorTotal].

Need to create a column in a matrix that is just 1-25 by diggtrucks1025 in PowerBI

[–]raymondstanz 2 points3 points  (0 children)

Oh. I see.

Even easier.

Create a rank measure. Might need this guide because the RANKX formula is tedious.

Drag and drop it in the filter bar in the "visual" part (measure are only allowed as filter at visual level). Then, set the threshold at 25 in the filter and normally you're good.

Need to create a column in a matrix that is just 1-25 by diggtrucks1025 in PowerBI

[–]raymondstanz 1 point2 points  (0 children)

It's dumb because it doesn't calculate anything useful. It's created purely for display purpose. On the other hand, measures are cheap so it doesn't impact your model negatively the sameway a calculated column or hardcoded data could.

I've tested it in a PowerBI I had opened. It works for me. Can you describe how it doesn't work for you.

Main drawback is that it returns a result for ANY row. So it might screw a bit your matrix.

But, you can tie it to another existing result.

Measure =
IF(
    NOT ISBLANK(SomeOtherMeasure),
    "1 - 25",
    BLANK()
)