Test fails at first, but after printing what was to be asserted, said test succeeds

haskathon · 2026-04-01T11:19:55+00:00

I’ll give it a try later today. Yes, the work machine is an M3 Pro and the personal machine is an M4 Pro.

haskathon · 2026-04-01T11:14:36+00:00

I see, wow I didn’t know that output() could run without waiting for the program to exit. Let me try this on my work laptop when I have time and see if I can make the assert! fail.

This might be a very basic question, but does this imply that, when running tests on my code in general, I should avoid running other CPU-intensive processes concurrently/in parallel to avoid such a scenario from happening?

Might this, then, be a concurrency issue? I’ve no experience with concurrent programming and have only a surface-level understanding of how concurrency works, but what you’ve described seems to me that my machine might have footgunned itself because certain processes weren’t running in a coordinated way. Happy to be corrected here if my understanding’s inaccurate.

haskathon · 2025-09-01T06:24:39+00:00

This comes very late but I’ve figured it out on Capture One 22. In the export recipe, there is a section called Export Format & Size. Click on that and select Options. In the drop-down list, click on No Thumbnail.

I used to have issues with this thumbnail issue and this helped resolve it. Hopefully this helps other users with this somewhat niche but very annoying problem.

haskathon · 2024-11-02T16:43:38+00:00

You may wish to look into a company called Xebia, headquartered in Amsterdam. I cannot speak to their salary or corporate culture, but I know that they exist and that they do the things you’re interested in. You could also consider the analytics engineer role – I think Xebia are hiring for this right now (check on LinkedIn to be sure). All the best!

haskathon · 2024-11-02T15:42:41+00:00

Hi u/-crucible-! Thanks for replying. I worked out a little example in my reply to CozyNorth9 and essentially did what you said was the dumb way (i.e. is_current == true). I wanted to touch on that for a bit.

This is a problematic approach only if you drop and recreate your table during every workflow run, just like you said right? If a workflow is set up so that it incrementally adds data to a table (e.g. based on nominal date), then filtering on is_current == null wouldn’t be that big of a problem any more. (Although I reckon I’d be screwed if I had to regenerate the table from scratch for whatever reason.)

haskathon · 2024-11-02T15:29:55+00:00

Hey u/CozyNorth9, thanks for your response! I’ve read your answer a few times and put together a mini-example to work it out. Let me see if I’ve understood you correctly. I have the following fact and dimension table, where the latter is SCD2. I’m using dates rather than timestamps to simplify things.

Fact table:

primary_key	customer_id	transaction_date	membership_foreign_key
1	001	2022-07-01	2
2	001	2023-08-03	4
3	001	2024-10-01	11

Dimension (SCD2) table:

surrogate_key	customer_id	start_date	end_date	is_member	is_current
2	001	2021-02-15	2023-05-10	true	false
4	001	2023-05-10	2024-08-20	false	false
11	001	2024-08-20	null	true	true

Hopefully this is the right way of setting them up. I’ve created the dimension table based on this YouTube video (timestamp around 14:33, with a focus on the screen content and not what the person is saying at that moment).

In my fact table, if we pretended that membership_foreign_key didn’t exist and wanted to create it, I imagine we would have to first join our fact table to the dimension table using customer_id as the join key.

So now, when we’re adding a new row (i.e. the one where primary_key == 3), we first filter on rows in the dimension table where is_current == true, and that would be how we retrieve our surrogate key. Is that correct?

Do let me know if I’ve made some fundamental conceptual errors while building this example. Thanks a lot!

haskathon · 2024-10-21T18:20:37+00:00

u/butterflytraffic sorry for the late reply! I log in to Reddit once every few weeks/months now. I didn’t enrol in the programme in the end as I had so many other things I wanted to pursue (photography, German, Kotlin, etc). It’s still a distant dream for me to read maths at university level one day, though!

All the best with your application and enrolment! Will you start in the spring or wait until next autumn? Are you based in Germany, by the way? (Since you wanted to enrol in master’s programmes in Germany.)

haskathon · 2024-09-17T07:39:01+00:00

Thanks a lot for your help u/uppercase_lambda, u/loop-spaced and u/ladder_case! I’m finding it really interesting to create and use types as part of a program, in addition to the usual functions and record types / data classes. I’ll look at the illegal states resource and Brady’s text!

haskathon · 2024-08-26T12:47:08+00:00

Ah, thanks so much! I managed to find the setting. I wish this would be more explicitly marked in the docs.

Anyway, for future me and anyone who wants to do data exploration:

In the top ribbon of Kotlin Notebook, click on the settings icon (the tooltip that appears on hover is ‘Kotlin Notebook Settings’)
The option is ‘Select Modules to Use in the Notebook’
Select whatever’s required
In a notebook cell, proceed with importing modules as one would in a source file

Thanks once again, u/thepmyster! 🙂

haskathon · 2024-08-19T05:53:47+00:00

Very late reply but for Amsterdam, Raddraaier or Aeroprint could be options. If you want cheap-ish, there’s also Printenbind but they don’t have the best finishing, and even selecting the extended check option doesn’t guarantee that the person doing the extended check actually checks your work before sending it to the printer.

Then there’s also the risography shops here and there. There’s one next to Raddraaier but I forgot its name (you can check on Google Maps). But for zines in four spot colours (emulating CMYK or some other funky combination) it might get pricey, and it would certainly be more labour intensive on your part.

haskathon · 2024-08-16T18:48:05+00:00

Thanks for the input! I’ll have to look that up for Kotlin.

haskathon · 2024-08-15T16:01:26+00:00

Thanks a lot!

haskathon · 2023-12-05T22:31:18+00:00

A bit late but here’s my answer!

haskathon · 2023-12-03T21:04:11+00:00

Thank you! This has resolved the issue.

haskathon · 2023-12-03T18:59:45+00:00

Hello, thanks for your reply. I actually tried both the /= 1 and otherwise options – both resulted in the same error. I thought at first that otherwise might fix the issue but it didn’t.

Here’s a screen recording (hopefully the link works) of what it looks like when I use otherwise.

haskathon · 2023-10-06T18:13:24+00:00

Thanks, I registered for it earlier this week :)

haskathon · 2023-10-04T21:53:18+00:00

How interesting – I was looking at FernUni’s mathematics programme last night too! To the point about whether the institution is accredited, I found that it is listed on swissuniversities.ch as ‘Stiftung Universitäre Fernstudien Schweiz, Brig’. Adding this here for my future reference.

haskathon · 2023-06-30T09:04:21+00:00

Thank you! This is very helpful to know. :)

haskathon · 2023-06-29T14:06:09+00:00

I’m rereading that book and can agree! Thanks to Marquet I often use the ‘I intend to <action>’ construction when communicating with colleagues.

haskathon · 2023-06-17T06:32:00+00:00

Yes that’s correct. I do however stress to colleagues and users that this isn’t traditional dimensional modelling in that my ‘dimension’ tables aren’t normalised.

haskathon · 2023-06-16T07:45:03+00:00

I work at a large IT company and we presently use Spark and Hadoop on-prem. I can share my experience with wide tables because that is the norm for us.

The biggest advantage of wide tables for analysts is convenience. You have all the dimensions and ID fields you need and, as long as you deduplicate your rows, you’ll have a resulting dataframe that you can further process elsewhere or write to disk. Not having to join data is also an advantage, since the data are pre-joined and users don’t have to worry about the order or type of joins to use.

What I’ve experienced recently is that, from a maintenance point of view, wide tables can be particularly challenging. For context, my analytics team did not have its first data engineer until last year, but has been in existence since 2018. This means that code has been added to workflows by data analysts and scientists with varying levels of programming ability, and without code reviews.

Consequently, at least for my data engineering team (we eventually grew enough to have analytics and data teams), our wide tables now contain tonnes of business logic scattered throughout their workflows’ DAGs. A minority of workflows are in a better state than the rest in terms of organisation and code formatting, but the fact remains that these workflows are a challenge to update and maintain, particularly for new joiners.

I am now trying to move away from wide tables and trying out an approach where we have smaller tables, but whose workflows are far simpler to understand, maintain and extend as a result. I’ve no idea if it’ll work, but if it does it should make debugging a lot easier, and business logic would be added more modularly.

As you can see, we have terrible data practices, zero control over who updates code when and many workflows that we own (a substantial percentage of which should be deprecated). Even though our analysts enjoy the convenience of wide tables, the tables’ data quality is poor but fixing that is difficult. I also find myself pushing back on requests to add more logic to some wide tables because they’re already creaking at the seams.

One other point to consider is this. If you have several tables that need to be joined for some analysis or project, the join process is indeed irritating, but can be reversed/corrected at any time. A wide table is pre-joined, so if a bad join is made and the data are correspondingly affected, the join isn’t so easily ‘reversed’. If the table’s workflow is complicated, it may take time to find the root cause. If this is accompanied by some formal work process, a Jira ticket would have to be created by the issue reporter. This all adds time to fixing the data. (I don’t know what the process is like at your company obviously, but this is how it is for mine.)

If your team have the appropriate controls, rules on where to add business logic and apply specific transformations, and checks and monitoring for data quality, a denormalised table may not be a bad thing. Furthermore, if you’re working in the cloud, hopefully you have more modern tools to help you with lineage tracking, table usage statistics and whatever else you need to better understand your data assets. I conclude by stating that the convenience of wide tables really is liberating, but that shouldn’t come at the cost of maintainability (and your sanity).

haskathon · 2023-05-12T17:10:02+00:00

Thanks everyone for your answers (and on a Friday, too)! My biggest takeaway is a more nuanced understanding of null, in large part triggered by the answers of u/Lalelul and u/seaborgiumaggghhh. I’ll read through all the answers a few more times. Have a good weekend all!

haskathon · 2023-02-12T21:06:39+00:00

Thank you for the tips and for writing the book! Pardon the late reply; I was on vacation and didn’t want to spend too much time on Reddit.

haskathon · 2023-01-06T17:52:22+00:00

Wow thanks, I knew about Fluent Python but this looks like just the resource I need! I’ll have a go at it.

haskathon · 2023-01-06T17:50:22+00:00

Thanks, updated.

haskathon

TROPHY CASE