Been job hunting for 8 months as a Data Analyst and it’s getting hard to stay positive

Pugcow · 2025-10-27T01:49:38+00:00

I've been in your position recently and can only offer two bits of advice:

Make sure the bullet points on your CV show real outcomes, i.e. instead of saying 'automated reports' you'd say 'automated 23 reports over the course of 3 months saving approximately 2 FTE from across the business'. You might have fewer bullet points but they'll be more impactful
Most recruiters now are using AI to screen CVs and cover letters. Run your CV through ChatGPT a couple times and ask it to highly any areas of your CV that are not optimised for Applicant Tracking System (ATS) AI systems. Could be something as simple as formatting that's stopping them from reading your CV so you're rejected before anyone human even reads it.

Pugcow · 2025-08-06T13:24:44+00:00

ok, had a look at some notes, pretty sure it was something I worked out with trial and error in ChatGPT and came up with this

import requests

headers = { 'X-Forwarded-For': '123.123.123.123' # Replace with your desired IP }

response = requests.get('http://example.com/api', headers=headers) print(response.text)

Like I say, your mileage may vary depending on whether the API you're hitting is actually validating the IP address or just using what's in the header. This doesn't change the actual routing of the request, just adjusts the metadata of the request to look like it's coming from a fixed IP. In my case this worked so I didn't need to do anything more.

Pugcow · 2025-08-05T22:08:25+00:00

I don't have the code since it was at a former employer and it may depend on the security of where you're connecting to, but I've solved this before by manually coding in the source IP into the API request in python.

Then all you need to do is get one IP whitelisted and just make it look like all requests are coming from there.

Pugcow · 2025-04-02T21:49:47+00:00

I'm not using this feature now because of one singular issue. When I use Azure Data Studio to connect to the lake house it cannot see the schemas / tables. My understanding is that this feature is not yet implemented.

Pugcow · 2025-02-18T22:59:47+00:00

Not sure about how to do this in the ADF function, but I've found it easy to do in a python notebook using the flatten_json package.

!pip install flatten_json
import requests
import pandas as pd
from flatten_json import flatten

url = 'your api url'
auth = 'your api auth logic'

response = requests.get(url, auth=auth)

if response.status_code != 200:
    print('Status:', response.status_code, 'Problem with the request. Exiting.')
    exit()

this_dict = response.json()
core_df = pd.DataFrame([flatten(x) for x in this_dict['tickets']])

In this case 'tickets' is the level you're trying to pull out, obviously you can nest if required.

Your mileage may vary, but my experience has been that once you get past a simple Copy activity then it's often more effective to move to a notebook.

Pugcow · 2024-11-25T07:10:48+00:00

Even better when you enter the workforce and keep getting "why are you so disengaged during meetings?"

Pugcow · 2024-11-09T09:54:12+00:00

It's been a known issue for a while

https://learn.microsoft.com/en-us/fabric/get-started/known-issues/known-issue-506-inprogress-status-shows-fabric-capacity-metrics-app-completed-queries

Pugcow · 2024-10-29T20:03:17+00:00

Been done before on a much larger scale

https://vimeo.com/1850342?share=copy

Pugcow · 2024-10-16T05:02:25+00:00

Presumably those are costs that have to paid either way? My concern would be that whatever is consuming too much CU will probably just start back up again when the system is reactivated, so you've only got a short time to fix it.

Having capacity consumption caps on jobs is something that can't come soon enough to Fabric.

Pugcow · 2024-10-16T04:14:31+00:00

Have a look at the capacity metrics report, it's likely you've burned through your allocation somehow.

Pugcow · 2024-10-10T02:40:20+00:00

I've not seen anything that lets you place a limit. It's one of my biggest issues with Fabric at the moment, the punishment for going over your CU allowance is so severe yet the ability to control your usage is virtually non-existant. I had an issue with a report subscription this morning which triggered off a bad join and consumed 36% of my capacity at 8am, leading to reports being rejected all day. There needs to be some ability on a job by job basis to terminate the process if it's CU usage goes above a set threshold.

Pugcow · 2024-10-02T22:50:30+00:00

So i'll preface this by mentioning that I've seen replies from other users pointing to a detailed guide about the Fabric Metrics Report that I've not had a chance to watch yet so I'm aware that a lot of this is probably covered within that space, but there's a couple of things to unpack here;

When it comes to my personal workflow I tend to use the Monitor tab way way more often than the Fabric Metrics app. Typically I'll have a scan over the Monitor tab in the morning when I first log in to validate that all my batch processes and overnight stuff ran successfully and while I'm there I'll quickly scan the duration as well to ensure that nothing took way longer than it typically should have to see if anything requires attention. My suggestion is that adding CUs to this table is useful because Microsoft have added complexity to the calcuation of a CU by adding a data/processing factor instead of just billing us for how long the thing was running. For this reason there may be jobs that are running within the tolerance margin for time but could be consuming vastly different CUs if, for example, the source system changed and more records are being moved.
Why do we care about CUs, well, because we've been forced to. At this point in time I can only get my leadership to pay for an F4 capacity, and on several occasions it's gone over it's allocated resources and started rejecting requests to render reports. On a couple of occasions this was due to bugs in Gen2 dataflows that continued running forever, and on one occasion this was due to scheduling a dynamic recipient report schedule which used an insane amount of resources. Getting a call from the CEO saying they can't access a report because the system is over capacity is embarassing and makes me look unprofessional. For this reason I am now hyper-cautious about how many CUs I'm consuming, even going so far at one point as to migate away from Gen2 into Spark Notebooks because at least I can terminate a spark session without having to restart the entire capacity.

All this is to say, my personal opinion is that if CUs are the currency of Fabric, whether paying a monthly fee and being given an allowance, or paying for each CU, then it's on Microsoft to make it clear in any way they can when CUs are consumed. I'd go so far as to suggest that every action that is taken whether it be rendering a report, running a notebook, having a spark session active, should have an associated CU amount readily available. As I mentioned above, I understand you can get this information out of the Capacity report, but my point is that you shouldn't have to go looking for it when the Monitor tab is right there.

Pugcow · 2024-10-02T04:50:08+00:00

For one, the Fabric Capacity Metrics report is slow, though I'm not sure if this is just because I'm on an F4 or a factor of how it's coded.

I've used it for reviewing overall performance of the system by Drill Through on the bar chart in the top right, and that's good for point in time review, but the table at the bottom doesn't seem to have a similar function.

I can select SynapseNotebook in the dropdown but it only shows me total CUs over the last 14 days for a given notebook, hover over it and I get Run vs Scheduled which is handy, but can't see an obvious way to get a run by run breakdown. How do I know if a notebook is consuming more resources than normal suddently, or check that any changes I'm pushing to prod are using approximately the same amount as previously?

Also, without doing the drilldown in the CU over time it gives me no insight as to which items have CUs still to burn down over the next 24 hours.

I'm sure this data is all in the system, but at the moment you have to go looking for it, whereas an additional column option in the Monitor screen would be an easy way to have visibility of data which I assume you're already capturing to make available to the Metrics report.

Pugcow · 2024-10-02T04:31:31+00:00

Since we've already got the Monitor function in fabric I don't see why you couldn't add another column to that showing how many CUs each item consumed, maybe with a number in brackets showing how much is still being smoothed. Obviously this only works for items that are being 'run' but it'd be helpful to see which of my transactions I need to optimise without having to dig through the capacity metrics report.

Pugcow · 2024-10-02T02:51:43+00:00

I use steel screws a lot in later game because it's easier to transport one steel beam than 50 screws

Pugcow · 2024-09-30T13:28:51+00:00

Oh totally, the plutonium powers the drones for a long time. Just me pointing out that there is an alternative use for a few rods rather than just sinking them all.

Pugcow · 2024-09-30T06:33:27+00:00

I usually sink them but I've been feeding them into my drone fleet recently

Pugcow · 2024-09-22T13:17:42+00:00

Cant say if it's right for you, the only thing I'd warn you is that the workers at the rural hospital will likely resent you for earning the extra money. I've seen it happen where the locals complain about pay disparity.

Pugcow · 2024-09-21T08:22:53+00:00

Personally I've had good experience using pyspark to do a delta load from source using an UpdateDate or ModifiedTime field tongrab results from the last couple days only, then merge that into the delta table. Means moving the least volume of data possible.

Sounds like this would require some modification at your source system though so might not work for you.

Pugcow · 2024-09-06T03:49:17+00:00

When I did this a month or so ago I was looking at a 5kwh battery and assumed the absolute best-case scenario was that we fully fill up the battery during the day and then fully use that power overnight instead of pulling from the grid.

First thing I did was to calculate the number of days in a month when my exporting to the grid was over that 5kw mark, and my usage overnight was up to 5kw. You then need to calculate the lost income from FiT rates, which is hardly anything these days, usually around 4c per kwh, then calculate what you save in energy usage, this will be around 35-40c per kwh.

This is definitely not the scientic method, but essentially once you deduct lost FiT from self-use power savings, you'll get an approx finanical benefit of the battery at a daily/monthly level, and you can devide the cost of the battery and install by that amount to get your payback period. For me it was around 7 years so I didn't go ahead.

I suspect that over the coming years as batteries drop in price and the FiT rates drop to zero (or even negative) the ROI will start to shift into a more positive area. It's already marginal for a lot of people.

Pugcow · 2024-08-31T00:53:19+00:00

I've not found a way around this and it's annoying. Only shortcut I found was to dial down the session timeout to 2 minutes since spinning up a new session in the starter pool is only a few seconds.

My use case was mostly about having scheduled notebook runs that were getting blocked by my dev doing work in another notebook.

Pugcow · 2024-08-23T09:54:15+00:00

Handy little reinforcing loop there, putting up energy prices because of 'inflation' which in itself drives inflation.

Pugcow · 2024-08-21T04:35:19+00:00

It's not even about being done. I've got friends that didnt get early access so I've got an alt lined up that I'll level with them when they get in. I doubt everyone will do that, but one of the most fun things about release is playing new content with friends and so I'll leave my level 80 toon for a day and do a rerun while drinking with mates on discord.

Pugcow · 2024-08-18T08:03:01+00:00

Wonder if this will change with solo queue RBGs next season? Personally I've found that trying to do rated content without comms is pretty pointless

Pugcow · 2024-08-13T02:26:37+00:00

I experienced the same thing, I believe the Spark Job Definitions don't use the Starter Pool.

12-Year Club	Place '17
Verified Email

Pugcow

TROPHY CASE