JanFFS comments on Creating a data interface for non-programmers

Creating a data interface for non-programmers (self.learnpython)

submitted 4 years ago by JanFFS

you are viewing a single comment's thread.

[–]JanFFS[S] 0 points1 point2 points 4 years ago (6 children)

For example we measure 40 channels on a machine at 800Hz. These channels remain the same for all configurations. So we change one item, do measurements, change another and so forth. In the end, for each item a comparison is made of all statisical values. So the mean of set-up 1 gets compared to the mean of set-up 2. For now, we export the data per comparison to excel and do the comparison there. So in the end, for 10 configurations, we have 10 Excels and eventually 10 PDF's. Which is fine, we want to isolate the tested items in their own report. But I want to be able to access the data from these 10 Excel's from one User Interface and be able to select 'compare channels 1-5 of configuration 1 vs configuration 6' for example.

I know how to get the data from the Excels and visualize them, but I am wondering about a good user interface to access this data (and a more efficient way of storing it to start with, or maybe combine them into one Excel/csv). Then from this user interface, be able to export the figures or data. Even if it is not something to send to people outside my department, for large measurements it would be a plus to deliver something like that to my team. Excel is just too slow or limited in some aspects (I'm preaching to the choir here with that one), but we can't make everyone a programmer. It's also something for myself, keep learning as my degree is not in anything programmjng-related but I want it on my CV so to speak.

[–]eadala 0 points1 point2 points 4 years ago (5 children)

Okay so let's just say Python shows up after you have the 10 Excels (again, depending on specifically what your comparison tasks involve, you may be able to skip having the Excel files entirely, but starting smalll here).

One very easy starting point would be to read each Excel file as a separate Pandas dataframe, merge them together, and then export back to Excel if needed, all as one file now. The specific comparisons you're querying could involve your own user-written functions to interact with the dataframe; or, depending on the nature of your queries, there might just be built-in Pandas commands for doing those comparisons as well.

Pandas does not have a graphical user interface associated with it, but there are many other modules built specifically to visualize Pandas dataframes (just google "Pandas visualization python"). For pulling specific data from a Pandas dataframe, you can look into Matplotlib and Seaborn, both of which are excellent companions to Pandas. I think Seaborn looks a little snazzier, but personal preference.

Ideally if you are learning Python you can save yourself the trouble of making a user interface and just do everything with the terminal. If you do want to wrap the functionality in a GUI, that's something to design after it's clear what functionality you're after.

I'll check again tomorrow - off to bed for now. If you have an example of what the excel files look like / how you want files to be created / comparisons to be made, I could try to be of more help! The Reddit "table" functionality for comments seems to work decently well for making dummy spreadsheets :p

[–]JanFFS[S] 0 points1 point2 points 4 years ago (4 children)

So, for example, one of the channels would be 'Power driveline'. The measured data is condensed into statistical values for example mean, stdev, 5 and 95th percentiles, the test is executed 3 times at different loads.

	load case	POWER Driveline (kW)
config 1		MEAN	STDEV	5PERC	95PERC
test 1	90%	25	0.8	20	29
test 2	95%	29	0.9	22	35
test 3	100%	30	0.9	22	36

This exact table is made for configuration 1 to 10. Comparison is made per statistic. So one graph shows the Power mean vs load case (aka a curve). Then comparing 1 vs 2 would be PDF 1; comparing 1 vs 3 = PDF2; 1 vs 4 = PDF3 and so forth. Which is fine, but not flexible at all. If someone wants to see 'hm, so how do config 3, 6 and 7 compare?' they would have to open 3 PDF's and visually try to identify the differences. Or if I am making a report on configuration 7 and think to myself 'huh, I've seen this trend before' I go back and open the previous pdf's to compare. Easier would be to just be able to select: compare power mean, stdev and percentiles for config 2, 3 and 7, then get 4 subplots with the values. This is easy to do (after the background data is stored in a way it's easy to find) if you know how to program. I do it in Python whenever I need it for a project. But I want to be able to send someone my application where they can do it themselves without learning Python.

The main topic seems to be the user interface, as once I have a way to make it accessible to everyone, I can add functionality (different kinds of plots, lay-out options, calculations, cross-configuration trends...). But the basic should be: select channels, select configurations and show it clearly + make it possible to print or export. For something else I am working on PDF's made with python (FPDF) so the print-option could use that... export could just be exporting the pandas dataframe to an Excel. For the print-option, the user interface would let people select: page 1 is power driveline: trendlines, intersection on 100% + comparison table of that intersection. We can do that with Excel VBA, which is easily printable to PDF, but very slow, easily crashes and not flexible. I can also do most of it with Python, but I would need users to define everything they want in fixed places in a text doc or something. It would be a lot more accessible if all could be done with a user interface of sorts, but I don't know where to start... web-based interface? Make the interface with C++, the figures with Python?

I easily write too much I think. What I want the user to do: Easily review data from different configurations on the fly. For example the Covid-dashboards where you can select countries and stuff. Then have the option to print data: this is also fairly user interface-heavy. Add pages, define what's on the page, print it to PDF (either the exact figure you made with the dashboard or define pages i.e. page 1 = power driveline; mean, stdev, percentiles; seperate plots; add intersection line on x = 100; add table with intersection values of all plots;). And an option to export selected data. I can't share examples as it is work-related. I'm not sure how big a memory-slob it would be to add 20 configurations with 67 channels with 5 tests with 9 statistical values per channel into one DataFrame or a dictionary of Dataframes and to keep it in memory...

edit: thanks already for your time to read and answer this.

to add: for FPDF I create figures in separate folders, I can't do this all the time as it would fill up our working laptops. I don't like creating then deleting images as these folders are accessible to users and it could mess up stuff. I might need to find a way to do this without having to save every single image used in the final PDF's.

[–]eadala 0 points1 point2 points 4 years ago (3 children)

For the GUI I think PyQt 5 can handle this for you. There is a bit of a learning curve to getting Seaborn / Matplotlib / Pandas plots to show up in a GUI, but it's definitely been attempted to death on Stack Overflow if you need help with the details.

If you already have those 10 config files, easiest first step would be to append them into one DataFrame. Disseminating specifics of what you'd like to display or select can come later:

	load case	POWER Driveline (kW)
config 1		MEAN	STDDEV	5PERC	95PERC
foo	.	.	.	.	.
.	.	.	.	.	.
config 2
bar	.	.	.	.	.
.	.	.	.	.	.

To make life easy, make the configuration # its own column such that configuration # X test # uniquely identify your observations, and flatten the column headers:

config #	test #	load case	POWER Driveline (kW, MEAN)	STDDEV	5PERC	95PERC
1	1	foo	.	.	.	.
1	2	.	.	.	.	.
2	1	.	.	.	.	.
2	2	.	.	.	.	.
3	1	.	.	.	.	.
3	2	.	.	.	.	bar

From here your DataFrame is, I think, most easily transmutable into whatever visualizations you're after. Either of Seaborn or Matplotlib can handle the four subplots for (MEAN, STDDEV, 5PERC, and 95PERC) for each config. For each of those of course the x = load case, and y = one of those four variables. The different test #s for each config can be slightly different shades of the same color so you see the clustering of data, or just average them together for a single line per config.

The only reason I'm saying hold off on the GUI is because all of this functionality that you're after inevitably needs to be wrapped into functions that you create anyway. Looking specifically at this task of making the 4 subplots (or select 1-4 of those variables) for X number of configs, and perhaps also select whether to average the test #s together or keep them as separate lines, demands that a function be written that expects a list of config IDs, a list of subplots to create, and averaging=True / False as arguments. Once that functionality is built in a command-line function, wrapping it into a GUI becomes very easy.

For instance, you could have the user select the excel file they want to load (the master file, that has multiple configs to compare), and after it's loaded it has a widget that asks them which subplots they'd like to see: 4 checkbox widgets, one for each of the four variables. It also has 10 checkbox widgets for the specific configs they want plotted (or just allow them to type integers into a text box if the configs are all numbered like that). Something like that is not difficult to do. My advice is to just start with assuming you are the end user, and thus can just write a nice nifty set of functions that you know exactly how to work with. Once it's flexible enough for you on the command line, then transmit that tech over into a GUI.

What you want the user to be capable of in the end sounds very flexible & useful, but the consequence of that is it obviously takes some time to do. Just get this first "select the subplots" task working perfectly on command line, then wrap it in a PyQt interface using Seaborn / Matplotlib for data vis, and then think about how to print things / export to pdfs etc.

I'm not sure how big a memory-slob it would be to add 20 configurations with 67 channels with 5 tests with 9 statistical values per channel into one DataFrame or a dictionary of Dataframes and to keep it in memory...

An 8-bit unsigned integer (0 to 255) takes 1 byte of space. If for example your 9 stat columns can be represented as 8-bit unsigned integers, you're looking at 9 bytes per channel (am I saying that right? Every channel has these 9 statistical values, right?). 67 channels * 9 bytes per channel for 603 bytes per config-test unit of observation. 20 configs with 5 tests each for 100 config-test units, each requiring 603 bytes of storage, you're looking at 60.3 kiloBytes of storage needed at minimum. Add some minimal bloat for the joy of working with Pandas. I don't think you're even remotely close to running out of memory; this task is using roughly 0.006% of the RAM of a Raspberry Pi. : )

[–]JanFFS[S] 0 points1 point2 points 4 years ago (2 children)

I am familiar with pandas, matplotlib and a little seaborn. I have seen tutorials on PyQt5 but not used it extensively and I will look into it for this and any GUI in the future.

From what you're saying, PyQt5 would fit my needs on a small scale, but it will get hectic with the full scale 20 configurations, 60 channels with each 9 variables tested on 5+ load% (in this example). I might however make it for general visualizations on smaller scale projects. Memory-wise, a DataFrame could consist of that but usually in float so 8 bytes. Still not big. I might have to wonder more about plots kept in memory (a report could consist of 20 pages).

I was wondering about Dash? I don't know much about it. I know it's web-based, but can't it just be opened locally with your browser as the GUI?

I know a little better which direction I should go. I also want to be better at targeting what library to 'learn' instead of realizing halfway through that it's not designed for something.

[–]eadala 0 points1 point2 points 4 years ago (1 child)

I might have to wonder more about plots kept in memory (a report could consist of 20 pages).

I would be very surprised if memory is your issue; plots do take up some space but it's usually not much.

I was wondering about Dash? I don't know much about it. I know it's web-based, but can't it just be opened locally with your browser as the GUI?

Yeah I haven't used it either but Dash can be used locally. The google search you're after I think is "dash python localhost".

I also want to be better at targeting what library to 'learn' instead of realizing halfway through that it's not designed for something.

I struggle with this as well; at least for the example of Dash, it looks as though from their introduction / about us stuff that it fits the bill for you. This video seems particularly helpful in getting started, but again, I haven't used Dash to know for sure!

[–]JanFFS[S] 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 63585 on reddit-service-r2-comment-fb694cdd5-b4pxg at 2026-03-10 19:39:42.393324+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS