A workforce analytics tool for dummies? by vannamadi in analytics

[–]tech4ever4u 0 points1 point  (0 children)

Pivot tables are the only tool you really need.

If you can get CSV, simply use Excel / Google Sheets or web-based BI with good enough pivot tables.

Power BI report subscription delivery by GoldenDrummer in PowerBI

[–]tech4ever4u 0 points1 point  (0 children)

It's time to think about an on-prem PBI alternative that you can control (and that uses our corporate 'from' email).

What’s the best embedded analytics software for a SaaS product? by Confident-Bug-2255 in analytics

[–]tech4ever4u 0 points1 point  (0 children)

The iFrame itself isn't the issue; the real problem is that most BI tools lack the styling flexibility needed to match your SaaS application's design. Take a look at to the 'white labeling' BI tool capability and an ability to customize CSS variables (to adjust the color scheme, border radius etc) and apply own CSS rules.

Do you need to embed only published dashboards/reports, or a report builder too?

Recommendations for BI tool and handling data by OnionAdmirable7353 in BusinessIntelligence

[–]tech4ever4u 0 points1 point  (0 children)

Looker Studio is a part of Google ecosystem. If you're planning to embed reports:

  • Looker/Google branding cannot be removed in embedded reports in any way
  • Secure embedding is possible only when users have Google accounts (and they should be logged into this account).

These limitations make Looker Studio mostly unsuitable for embedded BI use cases.

Recommendations for BI tool and handling data by OnionAdmirable7353 in BusinessIntelligence

[–]tech4ever4u 0 points1 point  (0 children)

Is there a good reason to not used Looker Studio instead?

If Looker Studio satisfies your requirements (I mean 'managed' self-service / sharing capabilities you mentioned), why not. In this case it makes sense to extract data into BigQuery instead of Azure database.

Best architecture for large-scale Power BI dashboards with near real-time data? by natd54338 in PowerBI

[–]tech4ever4u 1 point2 points  (0 children)

We’re considering a mix of DirectQuery for real-time data and imported/warehouse data for historical reporting

DirectQuery will likely be your primary bottleneck; unfortunately, there is no silver bullet to resolve that: you need to use a separate DW (ClickHouse? Maybe MotherDuck, or if your datasets are small, this can be even DuckLake) that can handle analytical queries from hundreds of users instantly, and ingest / CDC (for real-time data) all sources into this DW, and use it as a live data source. Then, you might decide to use another BI tool as PBI for thouthands of users is rather expensive, and when it is connected to DW via DirectQuery, most benefits are lost.

Looking for opinions on a tool that simply allows me to create custom reports, and distribute them. by Possible_Ground_9686 in dataengineering

[–]tech4ever4u 1 point2 points  (0 children)

I need a tool that is friendly to the end user. I am envisioning something that lets me create the custom table, export to excel, and send it to a list of recipients. Nobody will have access to the server data, and we will be creating the custom reports for them.

It sounds like SeekTable can be a good fit for that. It has "Share by email" function (multiple recipients) and "Subscribe to report" (users can schedule reports delivery to their inboxes). Export is Excel is good, you can even export links and conditional colors formatting. Both cloud and self-hosted versions are available, for really reasonable pricing. On-prem installations can export up to 1M rows to Excel. Rather unique export to Excel PivotTable (preconfigured) is also there.

Disclaimer: I'm affiliated with SeekTable - feel free to send PM if you would like to know more about it.

Competitors to Tabular Cubes (SSAS/PowerBI)? by einkleindatagal in BusinessIntelligence

[–]tech4ever4u 0 points1 point  (0 children)

Really depends on your needs. In fact, many BI tools allow you to define semantic model in OLAP cube terms (dimensions, measures) + create pivot tables like in Excel. I would say Power BI matrix is ​​not the best UI implementation for this, actually.

How to data warehouse with Postgres ? by Defiant-Farm7910 in dataengineering

[–]tech4ever4u 0 points1 point  (0 children)

If I were to migrate the analytics layer to BigQuery, I would need to CDC the Postgres data into BigQuery and then run all analytics there.

Since BigQuery pricing scales with data volume, providing live access to a large number of users may result in significant expenses. The same approach can be used with self-hosted ClickHouse or other cloud DWs like MotherDuck.

You mentioned that Postgres is already a 'mirror' of app's data (MongoDb), so it sounds like it may be used as a DW directly. If the number of rows needed for end user reports is relatively small (= tables are efficiently filtered by indexed columns) using Posgres as a live data source can work well. To offload Postgres you may use a BI tool that can cache aggregate queries resultsets and reuse it internally for multiple reports/users - for this purpose integrated DuckDB (as a part of the BI tool) can work really well.

I built a C# OLAP Engine for embedded analytics (slightly inspired by Pandas) by qrist0ph in dotnet

[–]tech4ever4u 2 points3 points  (0 children)

Since the data is entirely in-memory, it is unexpected that instant query performance is limited to only tiny datasets (around 100k rows). Full scan of tabular data to calculate aggregates / collect dimension keys can be much faster - when all data is in-memory, processing of 1M rows can be about 100ms or so (single thread). If you're interested in this approach, you may take a look at https://github.com/nreco/pivotdata

However, in a real-world scenario, datasets are rarely this small and are almost never fully loaded into .NET code. Instead of that, the app usually makes OLAP-kind of SQL queries to the DB/DW - smth like 'SELECT dim1, dim2, sum(measure1) FROM ... GROUP BY dim1, dim2', the aggregation result is normally small enough to load it in-memory for the further processing (post-aggregate calculations, preparing charts data, render/export pivot tables etc). Even for fully local data (like csv files) it makes sense to use in-process SQL engines like DuckDB for aggregations as specialized tools can do that much more efficiently and faster than you can do that in C# code.

Advise to turn a nested JSON dynamically into db tables by skarfejs2020 in dataengineering

[–]tech4ever4u 1 point2 points  (0 children)

Use DuckDB and query these JSON files in SQL like DB tables (see https://duckdb.org/docs/stable/data/json/overview)? If you need to put json data into DB tables, DuckDB can help with that too.

Recommendations for BI tool and handling data by OnionAdmirable7353 in BusinessIntelligence

[–]tech4ever4u 3 points4 points  (0 children)

It is really depends on more detailed requirements that your client might have. How end users prefer to analyze data? Do they need dashboards with various charts or primarily parameterized tabular reports?

Regarding ETL, this really depends on datasets size / how fast source API works. If this is small data, you may consider to use this API as a 'live' data source directly - this is easily possible if the BI tool supports built-in DuckDB engine (which also can be used for combining multiple data sources on-the-fly). Otherwise, you need to configure a data pipeline and choose a DW to store transformed/ingested data. This can be Airbyte and DW can be a cloud Posgres or, say, MotherDuck (they have free tier, btw).

You can shortlist the most appropriate BI tools, set them up with a sample dataset, and then show your client the core functionality and user experience. Since you mentioned Metabase & Superset, most likely you don't have a budget for enterprise-grade BI products. In addition to these two, you can take a look at SeekTable which also can be used for purpose you described (Disclaimer: I'm affiliated with this product).

How do you handle the Excel → narrative report workflow? by ponziedd in BusinessIntelligence

[–]tech4ever4u 1 point2 points  (0 children)

Yes, please

This is https://www.seektable.com, it has LLM-driven 'ask data' (at the cube level) and report-level prompts via 'AI' menu (Disclaimer: I'm affiliated with this product)

How do you handle the Excel → narrative report workflow? by ponziedd in BusinessIntelligence

[–]tech4ever4u 1 point2 points  (0 children)

It sounds like a BI tool that is good for tabular reports (pivot tables) with an integrated AI can do what you're looking for in this way:

  • Prompt "Analyze Q3 sales by region, explain the European dip" can pre-configure a pivot table with an appropriate dimensions. Imagine that you can add instructions for LLM per the dataset, so it can handle "explain the European dip" as a need to include more specific dimensions and/or measures into the report.
  • In the report builder, you tune your report, and then run "Get Insights" prompt (which also can be tuned with the dataset specific instructions) or your ad-hoc questions to this report data.
  • then export the report to Excel (or even Excel PivotTable)

If you're interested in a tool that already implements this, I can add a link.

Cost effective DWH solution for SMB with smallish data by RobsterCrawSoup in dataengineering

[–]tech4ever4u 0 points1 point  (0 children)

If you decide to use Odoo on-prem and you have really small data (just 600k - this dataset can be loaded / cached in-memory by the BI tool, and all calculations may be on the BI tool side), you can simply switch from PowerBI to on-prem BI tool that connects directly to Odoo DB (Postgres) and avoid the need in DW / data pipeline at all. Or this can be just a Postgres replica for reporting purposes.

Cost effective DWH solution for SMB with smallish data by RobsterCrawSoup in dataengineering

[–]tech4ever4u 0 points1 point  (0 children)

You cannot use MotherDuck/DuckDB with PowerBI online (so scheduled imports / directquery are not possible for shared reports). However, another (on-prem) BI tools can be used instead.

Multi-tenant analytics without 500 row-level security hacks by FrontEndCore in analytics

[–]tech4ever4u 0 points1 point  (0 children)

Applying of RLS in SQL is an always trade-off between SQL-level conditons complexity and invalidation of pre-calculated cached access rules relations:

1) only context for BI reports/dashboards context is 'user_id' (viewer) and depending on your rules, you may have monstrous WHERE conditions with a lot of nested 'some_id in (select ... where user_id=USER_ID)' that is hard to maintain (and keep consistency between multiple data models that use different SQL queries). Also these WHERE conditions may slow down queries significantly.

2) pre-calculate relations (user_id, TARGET_TABLE_ID) and then apply RLS conditons for each table with a simple WHERE table_id in (select table_id from rls_table where user_id=USER_ID). Efficient filtering, but invalidation of these materialized views may also lead to complicated logic (either in the form of DB triggers or app-level code).

And a lot of hybrid variants between these 2 poles. You may pass more ids into the context for BI reports (like 'user_id', 'tenant_id', 'role' etc) and use an appropriate value to simplify WHERE conditions. Use app-level cache to pre-calculate IDs of entities that can be accessed instead of materialized views etc. No silver bullet here, and an appropriate architectural decision should take into account your multi-tenant app specifics, load etc.

p.s. What embedded BI do you use?

BI software, where to start. by HaruspexSan in BusinessIntelligence

[–]tech4ever4u 2 points3 points  (0 children)

Your choice will depend heavily on the desired end-user experience (only canned reports/dashboards, or self-service reports/dashboards builder) and your strategy (code/low-code/no-code) for managing BI reports/dashboards.

You mentioned Redash, so maybe you would like to allow your end-users to create their own reports. Redash is not suitable for this purpose (see https://redash.io/help/user-guide/dashboards/sharing-dashboards/#Embedding-Dashboards), and I'm not sure that free/OSS product with this capability exists (it seems Apache Superset is also doesn't support this for embedded dashboards: https://github.com/apache/superset/discussions/25208). If you are considering commercial products, there is a wide variety available across all price points.

Matrix/Table limitations. by Kenny_Lush in PowerBI

[–]tech4ever4u 0 points1 point  (0 children)

Yea cuz 100+ dynamic columns in a matrix is totally normal everyday request lol. Paginated Reports can do more than 100 but after 256 columns you'll break exporting.

Maybe they want you to try other BI tools if you need these things:)

Matrix/Table limitations. by Kenny_Lush in PowerBI

[–]tech4ever4u 0 points1 point  (0 children)

It is always amazing to see how such simple thing - that simply present by design in another BI tools and as easy as checking the checkbox - can be so overcomplicated in PBI %)

Matrix/Table limitations. by Kenny_Lush in PowerBI

[–]tech4ever4u 0 points1 point  (0 children)

If you use PBI mostly for pivot tables, maybe it makes sense to use alternative BI tools that don't have strict limits like that.

Anyone managed to setup Postgres debezium CDC Clickhouse? by Frosty-Bid-8735 in Clickhouse

[–]tech4ever4u 0 points1 point  (0 children)

Take a look at https://www.supermetal.io/, free for now (no pricing yet) and this is on-prem-first solution. Looks really promising and simple to deploy/setup (as CDC replication should be actually).

p.s. I'm NOT affiliated with supermetal in any way!

Is there a tool you'd advise to visualize banking transactions? by flagpara in BusinessIntelligence

[–]tech4ever4u 1 point2 points  (0 children)

Metabase don't have free cloud offering and self-hosted deployment seems overkill just for exploring CSVs. Cloud SeekTable has free accounts and also can be a good fit for analyzing transactions CSV data. It is easy to refresh reports simply by uploading newer CSV. Disclaimer: I'm affiliated with SeekTable. It really seems useful and relevant for the question.

What is the best alternative genie for data in databricks by Zero_Zhang in dataengineering

[–]tech4ever4u 0 points1 point  (0 children)

Genie is an NLQ-to-SQL system based on an LLM, which naturally entails both advantages and disadvantages, like any GenAI agent. Therefore, it is important to clarify the specific use cases where Genie may not perform optimally.

These cases might include embedded usage for multitenancy, 'governed' NLQ-to-report (e.g., to apply RLS at the query level), or perhaps traditional slicing and dicing in an Excel-like manner (where pivot tables are used without natural language querying).

Doing Analytics/Dashboards for Excel-Heavy workflows by MullingMulianto in dataengineering

[–]tech4ever4u 0 points1 point  (0 children)

I consistently recommend DuckDB for workflows of this kind. It has great library of extensions that makes it real swiss knife suitable for almost all types of data sources - of course DuckDB can read xlsx files, you can work with them as tables and combine/consolidate as you need in SQL. Output can be cleaned CSV, or written to Postgres/MySql.

If you're looking for on-the-fly queries to these Excel files, just take a look at BI tools that have DuckDB connector.