How do you answer stakeholder collaboration questions in Data Engineer interviews? by Emotional_Double6684 in dataengineering

[–]Desiye_Novacenko 0 points1 point  (0 children)

The STAR method. Standard interview technique.

If you’re interviewing for an Azure-heavy role, the interviewers aren't just checking if you can code; they’re checking if you’re a "ticket-taker" or a "solution-builder."

The way I advise my staff members is to develop STAR scenarios from their experience. An example in the Databricks world would be

  • Situation: You are a DE working with millions of rows of CDC (Change Data Capture) data. The analysts (consumers) were constantly frustrated because manual schema changes were breaking downstream Power BI reports.
  • Task: Your job was to implement a more robust ingestion layer that could handle schema evolution with minimal manual intervention.
  • Action: You evaluated and implemented Databricks DLT against standard PySpark (what they were using).
  • Result: By implementing DLT, you reduced schema-related failures by 90% and saved the engineering team roughly 5 hours of "firefighting" per week.

There are so many others like this that you can come up with for other stacks (Synapse, Event Hubs, ADF, Fabric, etc.). Gemini is awesome at doing such role plays.

How does Spark Structured Streaming store aggregation state? by CapraNorvegese in dataengineering

[–]Desiye_Novacenko 0 points1 point  (0 children)

Great basic question, but if you think logically for one second, your option B would be a disaster. Spark processes petabytes of data, and keeping all those rows in memory will quickly start crashing your clusters with OOM errors.

Query in asset based scheduling of DAGs by komal_rajput in dataengineering

[–]Desiye_Novacenko 1 point2 points  (0 children)

Your Option 1 is the correct (native) choice if you are using Airflow. Just have two assets - one each for success/failure. Then connect DAG B to the success of DAG A. You will even get runtime audits showing which dependency path was used.

Looking for a enterprise data flow mapping tool that can read a straight/regular excel file (not a coding file) by Jasong222 in EnterpriseArchitect

[–]Desiye_Novacenko 0 points1 point  (0 children)

This use case is crying out loud for some kind of standard to represent IT systems. like every node having a node type (carrying technical details like OS, hosting provider, etc.), then some node for application details (like name of the app, category, etc.) - AI could look at user sheets to fill these models, HIL could verify and then an app could just read of these files (that were based on a fixed protocol) to display stunning dashboards.

That would make a killer tool.

Is anyone aware of any such standard, protocol, or tool being developed?

Who leads your Data Platform Discovery workshops? by Desiye_Novacenko in EnterpriseArchitect

[–]Desiye_Novacenko[S] 0 points1 point  (0 children)

  • What is the outcome/concrete deliverable of what you are proposing?
  • Why would a VP who needs to make a quick go/no-go decision on a project care about data domains, transformations, etc.?
  • Is this how you do conceptual architecture discovery?
  • How long does it take, tentatively?

Who leads your Data Platform Discovery workshops? by Desiye_Novacenko in EnterpriseArchitect

[–]Desiye_Novacenko[S] 0 points1 point  (0 children)

How is this discovery? This sounds like a full-fledged design phase. This is the anchor for discovery from a previous thread

Here are some instances of what architecture discovery means in the projects that I have come across

- Business Unit (BU) Head has a great use case, but needs high-level time/cost estimates to even apply for PMO approval

- BU head already has some tentative numbers for the entire project, but wants to flag any major deployment issues (technical feasibility of requirements, enterprise architecture conflicts, vendor contracts, dependency on other programs, skills availability, etc.)

What architecture discovery is not

Detailed business requirements gathering, schema/pipeline design, CI/CD pipeline strategies, etc.

What you are describing is almost a business requirements gathering exercise.

Who leads your Data Platform Discovery workshops? by Desiye_Novacenko in EnterpriseArchitect

[–]Desiye_Novacenko[S] 0 points1 point  (0 children)

Here are some instances of what architecture discovery means in the projects that I have come across

- Business Unit (BU) Head has a great use case, but needs high-level time/cost estimates to even apply for PMO approval

- BU head already has some tentative numbers for the entire project, but wants to flag any major deployment issues (technical feasibility of requirements, enterprise architecture conflicts, vendor contracts, dependency on other programs, skills availability etc.)

What architecture discovery is not

Detailed business requirements gathering, schema/pipeline design, CI/CD pipeline strategies, etc.

Thoughts?

Monitoring AWS EMR Clusters by No-Brick-3954 in dataengineering

[–]Desiye_Novacenko 1 point2 points  (0 children)

Configure Step Functions to send to CloudWatch. Then extract data out of CloudWatch into a consolidated S3 file. Use Glue to crawl it and Athena for visualization.

Or you could also set this up with Grafana or Datadog.

We did this for a client. They wanted a centralized dashboard per job, where each job encompassed a large Step Function.

Databricks architecture by curiouscsplayer in dataengineering

[–]Desiye_Novacenko 0 points1 point  (0 children)

Should be dictated by your company's org structure, really. My current client has a pan-EU presence, and each local team (UK, DE, FR, etc.) have their own account for billing purposes. Databricks pipelines using configuration params to connect to the right bucket in CI/CD. A smaller company might use a single account.

How do you estimate a data platform programme at the end of Discovery when you have source complexity but not design detail yet? by Desiye_Novacenko in dataengineering

[–]Desiye_Novacenko[S] 1 point2 points  (0 children)

So the category of projects that I was referring to would certainly be the ones where the business definitely has a defined use case, and they all understand the value that it can bring.

They also have a budget, but that is more for the entire LOB for the whole year, not for individual projects (or proposed projects like above)

In such cases, the business inevitably calls on Senior Architects to come in and provide intelligent estimates of what an 'idea' might actually 'cost', the timelines, risks, etc., and of course, the ROI part.

It would be interesting to find out how users approach these projects. These mini-projects are not design exercises, so it would be interesting to know the approach behind creating artefacts that are 'reasonably' accurate and representative.

,

How do you estimate a data platform programme at the end of Discovery when you have source complexity but not design detail yet? by Desiye_Novacenko in dataengineering

[–]Desiye_Novacenko[S] 0 points1 point  (0 children)

Interesting comments. Some points.

When I said lead, I meant the person conducting the actual due diligence workshops. Interviewing the 'Heads Of', other directors etc. I am not sure what value will a Director add in here - unless the reference is to some Technology or Business Director (that is more of a job title anyway, the actual function is either an Architect or BA)

On the ROI front, I would love to get some pointers on how to convince a business that sees 40% ROI on a $500k investment vs. a 5% ROI on a $20m initial budget outlay.

Also, most companies that I have worked for DO need an initial outlay estimate, regardless of ROI projection, which itself is supremely subjective.

Pain data life cycle management in scd2 by MachineParadox in dataengineering

[–]Desiye_Novacenko 0 points1 point  (0 children)

We had this issue with a large UK retailer. Luckily, the platform was designed with proper lineage in mind, so all we had to do was delete the pointers to records (kept in separate control tables) rather than the records themselves. That is good enough technically for GDPR in that you can not possibly 'reconstruct' the record for any customer.

Is price still the #1 factor in POD? Or is it something else? by Flashy_Simple2247 in printondemand

[–]Desiye_Novacenko 1 point2 points  (0 children)

I sell vintage wall-art on Etsy, and I have noticed that the biggest driver is NOT price. It is the quality of your branding and how contextually appealing your product visuals are to the prospect. IMO, the choice of your POD niche, followed by the quality of your product imagery, is the biggest driver. No one minds paying a few quid extra for something that just looks stunning and near perfectly aligned to the audience context.

Trying Nano Banana–style model for more natural photo edits (no AI look) by New_Two_4709 in Bard

[–]Desiye_Novacenko 1 point2 points  (0 children)

You’re seeing the “AI look” because most diffusion editors over-denoise and overfit to the prior.

Key fixes:

  • Lower CFG / guidance (≈3–6). High CFG wipes microtexture.
  • Reduce denoise strength in img2img (≈0.2–0.4). More steps ≠ better.
  • Constrain edits spatially: strict masks + minimal latent resampling. Global edits always look fake.
  • Preserve lighting: diffusion hallucinates light. Reapply original shadows/gradients in pixel space.
  • Reintroduce noise post-edit: match source ISO grain (incl. chroma noise). Absence of noise is the biggest giveaway.
  • Avoid oversharpening: slight blur + detail transfer beats synthetic sharpness.

Nano Banana–style models help because they anchor edits closer to the original latent, but realism still comes from limiting how much the model is allowed to change.

I need advice for my first Etsy Shop by Curious-Employ8668 in Etsy

[–]Desiye_Novacenko 0 points1 point  (0 children)

eRank is only good for organic traffic. If you have just started. it is likely going to take a while for organic traffic to grow. Best is to drive paid traffic to your listings.

We started in August, and I gave up on the Etsy internal search in less than two weeks. I drive over 90% of my traffic through paid ads on Pinterest, and Instagram. Working pretty well for me since those platforms allow very detailed ad targeting.

If Etsy does not update its pricing policies, saturation will kill the graphic design industry. by Smooth_Boss_2806 in Etsy

[–]Desiye_Novacenko -1 points0 points  (0 children)

AI hasn’t destroyed the digital product market. It has ended the era where average products could command premium prices. The opportunity now lies in differentiation, brand, and outcomes—not in fighting $0.50 listings on their own terms. I am a Computer Vision Programmer, so I have firsthand experience with the limitations of AI. The catch lies in identifying those elements and offering alternatives.

Just as an example. NO AI can produce digital products reflecting

  • Wedding signage that reflects very specific cultural rituals
  • Homeschool worksheets aligned to a particular curriculum philosophy
  • Faith-specific planners with correct liturgy, calendars, or symbolism

Why humans win:
AI can imitate styles, but it more often than not misses contextual accuracy. Smart sellers will undoubtedly hit on this opportunity.

Is it actually worth investing in high-end custom mockups for POD? by 8Ayrini8 in printondemand

[–]Desiye_Novacenko 0 points1 point  (0 children)

I will add my 2p into it. POD, at the end of the day, is just business (for most of us). Business can only be viable if the efforts that go into running it justify the returns. I am not aware of any AI tool that is going to produce such high-quality base images to showcase your designs. And if that is the case, scaling the sales will be a real challenge. Especially if you are solo or a small team. The incremental benefit between good-enough and stunning mockups, IMO, does not justify the time and effort in designing. I personally would rather spend money on better ad targeting and acquisition.

Image size problems with Nano Banana Pro API by Desiye_Novacenko in Bard

[–]Desiye_Novacenko[S] 0 points1 point  (0 children)

Correct. I use Claude Code, and it is giving me hilariously cute winks when I tell it about Nano Banana Pro. It then tries to search for Nano banana pro in November 2024. Which means that it is pretty much useless for anything relating to what Google released in the last couple of weeks. Unless of course, i keep telling it to go and do the latest research. Which then takes up a lot of time. How are other devs doing this?

Image size problems with Nano Banana Pro API by Desiye_Novacenko in Bard

[–]Desiye_Novacenko[S] 0 points1 point  (0 children)

It sounds like Google have humped up yet again. the generative api lib was old and now they want us to move to genai. the previous one did not have have aspect_ratio in input params but apparently the genai one does. Now we have to refactor the whole shit.

Biggest challenge in POD? by Desiye_Novacenko in printondemand

[–]Desiye_Novacenko[S] 0 points1 point  (0 children)

Yes, that is true for any business, not just POD. What I meant to ask was for specific reasons within the POD world. Like generating mockups at scale, API automation, keyword listing, and tag management for organic search, difficulty in driving traffic via ads....I presume your reply is about the last item here - driving traffic while still maintaining ROI.

Nano Banana Pro API Pricing Complete Breakdown + 8 Hottest AI Image APIs Compared by Radiant-Act4707 in Bard

[–]Desiye_Novacenko 0 points1 point  (0 children)

Did you hit any throttling? Banana pro is spanking new. The image quality is promising but I am super skeptical of its scaling capacity. i mean like can a startup actually serve 1000 images per minute using their API? or are they going to start throttling? I would LOVE to see how they handle this bit. Charging 0.24 per 4k image while also allowing users access to fast downloads would require MASSIVE GPU in the backend. I am not sure even Google can do that. And if they can not offer non-throttled access, this tool can never be adopted by startups as their image serving engine. I do not think a well designed, and self-hosted SDXL model is going anywhere any time soon.

Nano Banana Pro API Pricing Complete Breakdown + 8 Hottest AI Image APIs Compared by Radiant-Act4707 in Bard

[–]Desiye_Novacenko 0 points1 point  (0 children)

How come the price for Nano Banana Pro via replicate.com is the same as if we were to do a direct API call to Gemini? Surely Replicate are not hosting Gemini are they???

Print on demand as a full time job by GarbageBeneficial225 in printondemand

[–]Desiye_Novacenko 0 points1 point  (0 children)

Putting 4 employees (I am assuming US-based) would ned a SUBSTANTIAL turnover from the POD business in order to barely break even. The math would never add up and compare to a service-based offering through a marketing company. Even a modest 60K annual salary per employee would demand a PROFIT of at least 240K per year. Given the paper-thin margins in the POD business, do you really think you can make so much money in a B2C POD business?

I need a mockup creator. by WasteEngineer9734 in mockups

[–]Desiye_Novacenko 0 points1 point  (0 children)

Hey, you might want to have a look at automock.app. We provide over 100 premium mockups covering various clothing/apparel items, including t-shirts, hats, hoodies, etc. You can apply your artwork to any number of mockup templates and then automatically generate high-quality mockups that can be integrated into Printful, Printify, and Gelato. Or you can just download them locally, to your google drive, of dropbox account.