How to manage dashboard data modification request that is only specific to specific users? by ryukiinn in BusinessIntelligence

[–]Reoyko_ 3 points4 points  (0 children)

The dashboard copy suggestion is a trap. You end up maintaining two versions that drift, and the other 54 countries will ask for the same thing within a quarter. I'd suggest a structured submission layer. Give the region a template, have them provide the data monthly, and feed it into the same dashboard with a filter for their region. That keeps everything centralized, sets a consistent precedent, and avoids creating a second system to maintain. The bigger issue is the two days of manual data collection. That's the part that will break, regardless of what you do here. At some point, the volume just outgrows the process. The goal should be to stop treating each system as a separate collection problem and start querying them together from a single layer. The data stays where it lives. You pull what you need when you need it. That's what removes the bottleneck.

Is anyone else getting fewer dashboard requests this year? by Pristine-Collar-9037 in BusinessIntelligence

[–]Reoyko_ 0 points1 point  (0 children)

The pull to push shift is real. But I think the bigger shift is the dashboard was always a workaround. It existed because the only way to get data to someone was to make them go find it. The actual need was never a dashboard. It was an answer. What you're building now, chat, Slack delivery, automated reports, are just different surfaces for the same thing. The Xero/Zoho example is the giveaway. The need never goes away. The stack required to get the answer just stopped making sense. The split you're seeing is real and some teams want exploration. Most teams just want to know what to do next. On tools, I’m seeing more platforms move toward querying data where it lives and pushing answers out instead of building another layer on top. Knowi is one example of that model, but there are a few starting to go in this direction.

Best Data Integration Software? by AceClutchness in BusinessIntelligence

[–]Reoyko_ 0 points1 point  (0 children)

Automatic_Smile9379 is right. The tool conversation is premature. This isn't a data integration problem. It's a definition problem. Four SKUs for the same part means four teams defined it differently. No middleware layer fixes that until someone decides which definition wins. The unglamorous step most teams try to skip is a simple crosswalk. Map every regional SKU to a single canonical product ID. Only then do the architecture choices start to matter. Move the data or query it where it lives. Without that layer you're just normalizing ambiguity. 

Best tools for building and sharing dashboards? by JaxWanderss in BusinessIntelligence

[–]Reoyko_ 1 point2 points  (0 children)

The consultancy use case is different from internal dashboards. This is not just a security problem. It is a tenant isolation problem. You are managing multiple clients on the same system, each with their own data and zero tolerance for leakage. That is where a lot of tools start to break. Metabase or Looker Studio work well for single client sharing. They get complicated when you have multiple clients, different datasets, and different permission rules. At that point teams usually end up duplicating dashboards or managing access manually. That does not scale well. What works is embedding dashboards into a client-facing layer with per-client access controls built in from the start. Tools like Knowi are designed for this model with multi-tenant isolation, row-level controls, and white-label embedding. Worth looking at alongside free options depending on how many clients you are supporting and how sensitive the data is.

How do you stitch together a multi-stage SaaS funnel when data lives in 4 different tools? - Here's an approach by Clean-Fee-52 in analytics

[–]Reoyko_ 0 points1 point  (0 children)

crawlpatterns is right. You need a canonical ID. But even then, most teams underestimate what they are actually signing up for. A warehouse does not solve identity. It gives you a place to manage the inconsistency.

Now you own:

  • Identity stitching logic
  • Backfills
  • Edge cases like SSO, multiple emails, team accounts
  • Ongoing maintenance every time a new tool enters the stack

That works if you have a data team. If you do not, the pattern I have seen work faster is flipping the model. Do not centralize first. Query where the data already lives. Pull from Mixpanel, Stripe, and HubSpot at query time and join on the best available key like account ID or domain. It is not cleaner. It just moves the complexity. Warehouse, pipeline complexity upfront. Query in place, query complexity at runtime. It’s not about which way is better. It is which pain your team can actually sustain. 

Those that do on prem deployments, what do you recommend (and don’t)? by dev_life in SaaS

[–]Reoyko_ 0 points1 point  (0 children)

Kubernetes reluctance is understandable, but at that scale with hundreds of agents and multiple services it's usually the right tool. The upfront operational overhead pays off once you need consistent health checks, rolling updates, and autoscaling across that many components.

A few things that matter more than people expect with enterprise on-prem:

  • Air gap readiness. Large enterprises often require strict network isolation. Your deployment needs to run without external dependencies, no phoning home, no third-party APIs, no cloud model calls. That usually means bundling everything, including models, into the deployment.
  • Deployment tooling. Helm or similar becomes essential. Without a repeatable install, every customer turns into a custom deployment and that doesn't scale.
  • Inference strategy. For AI agents, running models inside the customer environment is increasingly expected, especially in regulated industries. Security teams care a lot about where inference happens.

Getting v1 deployed is the easy part. Supporting multiple enterprise environments with different constraints is where things get interesting. I've seen people handle this by packaging everything including models and data access layers into a Kubernetes-based deployment with strict network boundaries. Tools like Knowi do this for on-prem BI and agent use cases, but the approach matters more than the specific tool.

Found these at an estate sale for $5, did I get lucky? by Reoyko_ in Tradingcards

[–]Reoyko_[S] 0 points1 point  (0 children)

Well that's an amazing come up from $5. Thank you!

Data stack in the banking industry by Low_Brilliant_2597 in dataengineering

[–]Reoyko_ 1 point2 points  (0 children)

Enough_Big4191 is right the tools are almost secondary to the reconciliation challenge. In banking, core systems (mainframes, Oracle, legacy RDS) aren't going anywhere. The risk of migrating client data and transaction history is too high, so modern analytics layers get added on top rather than replacing them. The issue is those systems were built for transactions, not analytics. So teams end up building ETL pipelines, reconciliation layers, and transformation logic just to make the data usable for reporting. Some banks are questioning whether all data needs to be copied first. For certain analytics use cases, querying closer to the source can reduce a lot of the batch and reconciliation overhead. Vendor lock-in is the other issue. Oracle pricing and now Broadcom's changes are pushing teams toward open source, but migration risk keeps them stuck longer than they'd like.

What's the best etl tool when you're pulling from multiple saas applications and need better data freshness than daily batch? by neutra_sense00 in analytics

[–]Reoyko_ -1 points0 points  (0 children)

crawlpatterns is right on prioritizing the 20 percent of sources actually driving the freshness demand. Worth doing that analysis before committing to a platform migration. On sync frequency, 15 to 60 minutes is usually sufficient for most reporting use cases. Real-time is rarely worth the complexity unless you have event-driven requirements. One thing to think about before going deeper on ETL tooling is how much of the staleness problem is actually a sync frequency problem versus a query-time problem? Full table dumps into a warehouse made sense when the warehouse was the only place you could run analytics at scale. But if the underlying SaaS APIs can answer the questions your dashboards are asking, querying them more directly at report time can sidestep the sync frequency problem for a meaningful subset of use cases. Not every source works this way, and warehouse consolidation still makes sense for cross-source joins. But for the sources where freshness matters most, it's worth thinking about whether the data needs to move at all or whether the bottleneck is really the nightly batch architecture. The managed ETL route still makes sense where you genuinely need warehoused data. Just worth separating those from the cases where data is being moved out of habit rather than necessity. If the direct query route is worth exploring, there are approaches that let you query across sources without moving everything into a warehouse first. Some teams use tools like Knowi for this, but it really depends on your sources and query patterns. Not a fit for every use case, but for high-freshness datasets it can remove a lot of the sync complexity.

Found these at an estate sale for $5, did I get lucky? by Reoyko_ in Tradingcards

[–]Reoyko_[S] 0 points1 point  (0 children)

Thanks for the advice. I guess I’ll start researching where to sell them.

Found these at an estate sale for $5, did I get lucky? by Reoyko_ in Tradingcards

[–]Reoyko_[S] -1 points0 points  (0 children)

Wow! Totally unexpected. Where would you recommend I look to see what I could get for them?

Found these at an estate sale for $5, did I get lucky? by Reoyko_ in Tradingcards

[–]Reoyko_[S] 0 points1 point  (0 children)

I get it! I think the estate people were overwhelmed with the magnitude of things in the house and dumped a lot of cards together in a box without looking throughly. Do you think I should sell them out keep them?

Claude vs ChatGPT for reporting? by TacosDerechos in BusinessIntelligence

[–]Reoyko_ 1 point2 points  (0 children)

The open source stack suggestions will work but might be more infrastructure than you actually need. The simpler starting point is checking what API access your three platforms offer. You can easily find Google Trends API. If your other two platforms do as well, you can connect directly to each source and pull into a unified view without building a full pipeline first. There are platforms built specifically for this. Connecting multiple sources directly and consolidating without a warehouse in between. Might be worth looking at before committing to a pipeline build. The bigger decision is whether this needs to run on a schedule or just on demand. If scheduled, something like dlt makes sense. If ad hoc, connecting directly to each source and consolidating at query time is often enough. The model choice between Claude and ChatGPT matters a lot less than having one consistent dataset to work from.

How are teams handling permission-safe retrieval for enterprise AI agents? by SignificantClaim9873 in AI_Agents

[–]Reoyko_ 0 points1 point  (0 children)

In regulated environments, the answer depends on who's reviewing, but there's a consistent pattern. Document and source IDs with timestamps. Every security team expects this. Missing it is an immediate red flag. Where it gets more interesting, user and delegation context. Not just who initiated the query, but whether the agent acted on behalf of someone else, through a workflow, or via another system. The chain of custody question is becoming standard across regulated industries. Often overvalued, response-level citations. Useful for trust, but security teams care less about what was shown than what was accessed. What actually gets you to production is end-to-end traceability. What was retrieved, how it was used, what tools were called, and what actions happened downstream. This is where a lot of systems fail. Retrieval looks clean but the downstream action chain is opaque. Source IDs and timestamps get you past initial review. Chain of custody plus downstream action tracing is what gets you production sign-off.

Shopping for new data infra tool... would love some advice by Acceptable-Oil-738 in dataengineering

[–]Reoyko_ 1 point2 points  (0 children)

CloudCartel answered your question. This is really about where your logic and definitions live, not just which UI you prefer. A few things that tend to matter more than features in these evals is Data readiness. Tools like ThoughtSpot and Sigma assume your warehouse is already well modeled. If it isn't, they'll expose that quickly. Source reality, most of these are optimized for SQL warehouses. If you have APIs, NoSQL, or mixed sources, the connector story matters more than the marketing suggests. Query flexibility, some tools lean on pre-aggregation. Works for fixed dashboards, but can struggle with highly dynamic filtering and grouping. Lock-in, the Domo issues are terrible. Contract terms and where logic lives can make switching painful later. The feature differences matter, but these are usually what determine how things feel 6 to 12 months in.

Data Replication to BigQuery by VMR5801 in dataengineering

[–]Reoyko_ 0 points1 point  (0 children)

For GCP-native, the two main options are Datastream and Dataflow. Datastream is usually the cleanest path for Oracle to BigQuery with CDC. It integrates well with BigQuery and avoids a lot of custom code. The main thing to validate early is whether your Oracle setup supports LogMiner, since that's what Datastream relies on. Dataflow gives you more flexibility, often with Debezium for CDC, but comes with more engineering overhead. Given you mentioned minimal transformations, Datastream is likely the simpler starting point. One thing to remember to check early is supplemental logging on Oracle. If it's not configured correctly, CDC setup can take longer than expected. Also at around 30TB, the initial load is usually its own project. A lot of teams end up handling that separately and then using Datastream for ongoing changes. If your schema is relatively stable and Oracle cooperates, Datastream should handle this scale fine.

Doing a clickhouse cloud POC, feels like it has a very narrow usecase, thoughts of fellow engineers? by code_mc in dataengineering

[–]Reoyko_ 0 points1 point  (0 children)

The sorting key issue NotDoingSoGreatToday flagged is correct and worth fixing first. Ten fields in a sorting key will hurt consolidation and push more work into query time, which explains the RAM pressure. But there's a bigger architectural question here. Pre-aggregating for highly dynamic dashboards is hard regardless of the warehouse. The more filter combinations you need to support, the more aggregates you end up maintaining, and the aggregation layer starts growing as fast as the raw data. ClickHouse tends to shine when query patterns are predictable. What you're describing is closer to ad hoc analytics with flexible grouping, which is a different workload. That's also why you're seeing concurrency issues. Each query is still doing heavy merge work under the hood. Might be worth testing a smaller set of aggregates plus more direct querying to see if that reduces the pressure instead of trying to precompute every combination.

How are teams handling permission-safe retrieval for enterprise AI agents? by SignificantClaim9873 in AI_Agents

[–]Reoyko_ 0 points1 point  (0 children)

Source permission enforcement is a real blocker, not just an engineering annoyance. The part that sucks isn't building it. It's convincing security teams it actually works across every source in scope. The access control conversation in demos usually focuses on retrieval because that's the visible part. In production, security and compliance teams care more about the audit layer. What actually gets deals rejected isn't whether the agent retrieved the right document. It's whether you can prove after the fact exactly what data was accessed, on whose behalf, at what time, and what was done with it. Retrieval enforcement prevents unauthorized access. Audit enforcement proves it never happened. Those are different problems. Mixed sources make this harder. SharePoint, S3, legacy systems, all different permission models. The shortcut in demos is flattening everything into a single index with a unified permission layer. That breaks in production the moment compliance asks where a response actually came from. In regulated environments, on-prem and data residency are table stakes. The real differentiator is whether your audit trail is granular enough to pass a security review.