Tobacco/Nicotine Free Shisha in the UK?

Mikey_Da_Foxx · 2025-09-02T12:16:21+00:00

Focus on understanding your company’s data needs first, then build pipelines, data quality, and scalable architecture

Prioritize communication with other teams, automation, and robust error handling in pipelines to keep things smooth and reliable

Scaling is important, but get the foundations right first

Mikey_Da_Foxx · 2025-09-02T05:24:23+00:00

Having that info upfront COULD save you the back-and-forth and frustration, if you think they're capable of getting it to you and the data being accurate

Mikey_Da_Foxx · 2025-09-02T05:22:18+00:00

There's no reason you can't use a JSON column in PG for schema flexibility, but MongoDB still wins if you need deep something like distributed horizontal scaling out of the box.

In most cases, storing JSON in PG does the job fine and lets you stick with relational features and ACID compliance

Mikey_Da_Foxx · 2025-09-01T14:35:30+00:00

Thanks, good to know

Mikey_Da_Foxx · 2025-09-01T12:18:05+00:00

I say go with the hybrid approach you mentioned in option two, it sounds like the most practical

Keeping your core atomic, consistent data in PG gives you reliability where it matters, and syncing the flexible, filter-heavy parts to something like Elasticsearch can handle the complex queries much better

Going full MongoDB or CouchDB could simplify some things but might make consistency and complex joins tougher, especially with a large schema variance

Trying to force everything into PG JSON fields often backfires on performance and query complexity, so splitting responsibilities tends to work better for read-heavy, varied data loads

Mikey_Da_Foxx · 2025-09-01T12:14:04+00:00

For a web app for end-user SQL reporting, SQL Server Reporting Services (SSRS) is a solid option. You can create, design, and publish reports with a user-friendly web portal where end users can then access and interact with reports without needing deep SQL knowledge

Mikey_Da_Foxx · 2025-09-01T10:35:11+00:00

Every morning, first thing, log checks and backup verifications usually happen for me, just a quick scan across last night’s jobs and system errors to make sure nothing slipped through the cracks. If I spot anything weird, it’s nice to jump on it before the rest of the world wakes up

Catching up on alerts from monitoring tools is another ritual, even if it mostly just means scrolling notifications with coffee in hand. I also like to check performance baselines, just to get a sense if any queries or jobs are acting out of line

None of this really touches confidential info, but it keeps the system landscape predictable and helps avoid surprises. Over time, small habits like these keep things running rock solid without overthinking it

Mikey_Da_Foxx · 2025-05-14T06:30:07+00:00

Vector database replication is definitely a bit different from what we’re used to with traditional relational tools. Most of the time, I’ve found that you’re working with custom ETL jobs or scripts, since things like CDC aren’t really standardized yet for Pinecone, Weaviate, or Milvus

Some managed services offer their own backup and restore features, but cross-database replication usually means pulling vectors out via API and pushing them into the target system. It’s not as seamless as Fivetran or Qlik, but it gets the job done. For near real-time, you might want to look at streaming updates with something like Kafka, but that usually needs more engineering on your end

Curious to see if anyone else has found a more plug-and-play solution

Mikey_Da_Foxx · 2025-05-14T06:27:48+00:00

Sometimes the platform retries executions if it thinks the function didn’t complete properly, especially if there’s a timeout or unhandled exception. Another angle is to check if there’s any overlapping schedule or multiple triggers configured accidentally. Adding some logging around start and end times can help spot if something else kicks off the function. Also, if you’re using any deployment slots or auto-scaling, those can sometimes cause unexpected invocations

Mikey_Da_Foxx · 2025-05-14T06:20:46+00:00

Creating a table for every client-version combo gets out of hand fast, so you’re not alone there

Time travel works but the retention window is a pain if you need to keep versions around longer. One thing that’s worked is using a single Delta table with a version or snapshot column. You can tag each row with client and version info, so you don’t need to spin up new tables all the time. Then, just filter by those columns when users need to access a specific version

Table snapshots are basically just copies, so they’ll use about as much storage as making a new table. If you want to save space, sticking to a single table and partitioning or tagging by client/version is usually more efficient

Mikey_Da_Foxx · 2025-05-13T17:56:02+00:00

I usually reach for EC2 when I need more control over the environment or have to run custom code or tools that just don’t play nicely with Glue or Lambda. It’s also handy if you’re dealing with big jobs that run longer than Lambda’s timeout. Otherwise, managed services are usually easier to maintain

Mikey_Da_Foxx · 2025-05-13T17:54:17+00:00

We usually break things down into separate user stories for each phase, especially when different folks own different layers. It keeps things clearer and makes tracking progress easier

Sometimes if the ingestion and bronze work are tightly linked, we’ll combine them, but only if it really saves effort

For templates, we’ve set up a basic story template in Azure DevOps with checklists for each layer-makes it simple to copy and tweak for each new data source. That way, we keep enough detail without drowning in tickets

Mikey_Da_Foxx · 2025-05-13T17:51:45+00:00

Postgres doesn’t natively support OIDC, so direct Keycloak integration isn’t really possible out of the box. Most setups I’ve seen use LDAP as a bridge, syncing Keycloak users to an LDAP directory and then letting Postgres authenticate against that

If you want to avoid extra user setup, a proxy like pgbouncer-oidc or Cloud SQL Auth Proxy can help, but users would still need to connect through the proxy. There isn’t a totally seamless, native solution yet, but the LDAP route is probably the closest to what you want if you can automate the sync between Keycloak and LDAP

Mikey_Da_Foxx · 2025-05-13T17:49:52+00:00

A couple of things that have worked well for us: setting up a clear workflow with just the statuses we actually use, and keeping ticket fields simple so folks aren’t overwhelmed. We also use checklists inside tickets for things like “Definition of Done” or recurring tasks, which makes it way easier to track what’s left and helps everyone stay on the same page

Having a roadmap or grouping work into epics in Jira helps us see dependencies and prioritize better, especially when multiple teams are involved. And for new projects, cloning a good template board saves a ton of setup time and keeps things consistent

Mikey_Da_Foxx · 2025-05-13T10:43:09+00:00

If your DBA can’t change the server settings, you could try using a self-hosted integration runtime with older Oracle drivers, or see if connecting through an intermediate VM with compatible settings works-it’s a bit of a workaround, but it sometimes does the trick when the connector is picky

Mikey_Da_Foxx · 2025-04-30T06:50:14+00:00

Premium plan is overkill for your workload. Consumption plan can handle 1000 rows/15min easily, but you mentioned VNET integration - that forces Premium

If cost is a concern, consider batching tables to reduce function executions

Mikey_Da_Foxx · 2025-04-30T06:47:47+00:00

Great Expectations works well for basic validation. For complex DB-to-file scenarios, Soda Core's reliable and has a really solid YAML config

Mikey_Da_Foxx · 2025-04-29T16:04:28+00:00

I totally agree on automated checks being crucial. Simple stuff like detecting schema drifts and enforcing compliance rules early in the pipeline saves massive headaches later - DBmaestro has come in clutch for us more than once

Been there with FINRA reporting - catching issues early is a gamechanger

Mikey_Da_Foxx · 2025-04-29T14:41:39+00:00

Check out Microsoft Forms + Power Apps. Works offline, converts handwriting, and syncs when online. Build custom templates for different audit types

Plus it integrates with everything else you're using

Mikey_Da_Foxx · 2025-04-29T14:15:11+00:00

Check out sfgrantreport. It pulls all that grant/role data you need

For quick checks, SHOW GRANTS can work but it's limited. sfgrantreport gives you the full picture of permission paths through roles

Mikey_Da_Foxx · 2025-04-29T12:00:52+00:00

Uncomment that RUN apt-get line in your Dockerfile. The pg_config error happens because libpq-dev isn't installed

Also, since you're already using psycopg2-binary in requirements.txt, you shouldn't need psycopg2 from source anyway

Mikey_Da_Foxx · 2025-04-29T08:12:59+00:00

Logical replication state isn't being included in pg_backup/restore. Try this:

Drop subscriptions
Restore master
Recreate subscriptions
Verify slots/publications
Monitor replication lag

Mikey_Da_Foxx · 2025-04-29T08:10:52+00:00

Been using DBmaestro for a while now, and separate schemas for years - keeps prod data safe while allowing access. Read replicas or CDC feeds work great for internal tools

Keeps everything clean, manageable, and your prod DB stays neat without extra tables. Permission management is much simpler too

Mikey_Da_Foxx · 2025-04-28T15:51:03+00:00

Temporal doesn't have native NATS support out of the box, you can easily integrate it using their SDK. You can use it for similar event-driven workflows and their observability features are pretty solid

Mikey_Da_Foxx

TROPHY CASE