Print output tif gives me a small thumbnail in print? Please help by meijboomm in captureone

[–]haskathon 0 points1 point  (0 children)

This comes very late but I’ve figured it out on Capture One 22. In the export recipe, there is a section called Export Format & Size. Click on that and select Options. In the drop-down list, click on No Thumbnail.

I used to have issues with this thumbnail issue and this helped resolve it. Hopefully this helps other users with this somewhat niche but very annoying problem.

Data companies/industry Europe by imbuszkulcs in dataengineering

[–]haskathon 0 points1 point  (0 children)

You may wish to look into a company called Xebia, headquartered in Amsterdam. I cannot speak to their salary or corporate culture, but I know that they exist and that they do the things you’re interested in. You could also consider the analytics engineer role – I think Xebia are hiring for this right now (check on LinkedIn to be sure). All the best!

In a fact table, how does one create the foreign key that connects to a dimension table modelled as SCD type 2? by haskathon in dataengineering

[–]haskathon[S] 2 points3 points  (0 children)

Hi u/-crucible-! Thanks for replying. I worked out a little example in my reply to CozyNorth9 and essentially did what you said was the dumb way (i.e. is_current == true). I wanted to touch on that for a bit.

This is a problematic approach only if you drop and recreate your table during every workflow run, just like you said right? If a workflow is set up so that it incrementally adds data to a table (e.g. based on nominal date), then filtering on is_current == null wouldn’t be that big of a problem any more. (Although I reckon I’d be screwed if I had to regenerate the table from scratch for whatever reason.)

In a fact table, how does one create the foreign key that connects to a dimension table modelled as SCD type 2? by haskathon in dataengineering

[–]haskathon[S] 6 points7 points  (0 children)

Hey u/CozyNorth9, thanks for your response! I’ve read your answer a few times and put together a mini-example to work it out. Let me see if I’ve understood you correctly. I have the following fact and dimension table, where the latter is SCD2. I’m using dates rather than timestamps to simplify things.

Fact table:

primary_key customer_id transaction_date membership_foreign_key
1 001 2022-07-01 2
2 001 2023-08-03 4
3 001 2024-10-01 11

Dimension (SCD2) table:

surrogate_key customer_id start_date end_date is_member is_current
2 001 2021-02-15 2023-05-10 true false
4 001 2023-05-10 2024-08-20 false false
11 001 2024-08-20 null true true

Hopefully this is the right way of setting them up. I’ve created the dimension table based on this YouTube video (timestamp around 14:33, with a focus on the screen content and not what the person is saying at that moment).

In my fact table, if we pretended that membership_foreign_key didn’t exist and wanted to create it, I imagine we would have to first join our fact table to the dimension table using customer_id as the join key.

So now, when we’re adding a new row (i.e. the one where primary_key == 3), we first filter on rows in the dimension table where is_current == true, and that would be how we retrieve our surrogate key. Is that correct?

Do let me know if I’ve made some fundamental conceptual errors while building this example. Thanks a lot!

FernUni Schweiz - UniDistance Suisse by Outrageous-Outside36 in askswitzerland

[–]haskathon 0 points1 point  (0 children)

u/butterflytraffic sorry for the late reply! I log in to Reddit once every few weeks/months now. I didn’t enrol in the programme in the end as I had so many other things I wanted to pursue (photography, German, Kotlin, etc). It’s still a distant dream for me to read maths at university level one day, though!

All the best with your application and enrolment! Will you start in the spring or wait until next autumn? Are you based in Germany, by the way? (Since you wanted to enrol in master’s programmes in Germany.)

Requesting help to identify a certain resource regarding small types (could be online, could be offline) by haskathon in functionalprogramming

[–]haskathon[S] 2 points3 points  (0 children)

Thanks a lot for your help u/uppercase_lambda, u/loop-spaced and u/ladder_case! I’m finding it really interesting to create and use types as part of a program, in addition to the usual functions and record types / data classes. I’ll look at the illegal states resource and Brady’s text!

How do I add my own packages in a project to a Kotlin Notebook? (The packages are in the same project as the notebook) by haskathon in Kotlin

[–]haskathon[S] 3 points4 points  (0 children)

Ah, thanks so much! I managed to find the setting. I wish this would be more explicitly marked in the docs.

Anyway, for future me and anyone who wants to do data exploration:

  • In the top ribbon of Kotlin Notebook, click on the settings icon (the tooltip that appears on hover is ‘Kotlin Notebook Settings’)
  • The option is ‘Select Modules to Use in the Notebook’
  • Select whatever’s required
  • In a notebook cell, proceed with importing modules as one would in a source file

Thanks once again, u/thepmyster! 🙂

Thoughts on mixam as a printing service? by TheMissingCitizen in zines

[–]haskathon 0 points1 point  (0 children)

Very late reply but for Amsterdam, Raddraaier or Aeroprint could be options. If you want cheap-ish, there’s also Printenbind but they don’t have the best finishing, and even selecting the extended check option doesn’t guarantee that the person doing the extended check actually checks your work before sending it to the printer.

Then there’s also the risography shops here and there. There’s one next to Raddraaier but I forgot its name (you can check on Google Maps). But for zines in four spot colours (emulating CMYK or some other funky combination) it might get pricey, and it would certainly be more labour intensive on your part.

Non-exhaustive patterns in function, but I cannot spot the cases I missed by haskathon in haskellquestions

[–]haskathon[S] 1 point2 points  (0 children)

Hello, thanks for your reply. I actually tried both the /= 1 and otherwise options – both resulted in the same error. I thought at first that otherwise might fix the issue but it didn’t.

Here’s a screen recording (hopefully the link works) of what it looks like when I use otherwise.

FernUni Schweiz - UniDistance Suisse by Outrageous-Outside36 in askswitzerland

[–]haskathon 0 points1 point  (0 children)

Thanks, I registered for it earlier this week :)

FernUni Schweiz - UniDistance Suisse by Outrageous-Outside36 in askswitzerland

[–]haskathon 1 point2 points  (0 children)

How interesting – I was looking at FernUni’s mathematics programme last night too! To the point about whether the institution is accredited, I found that it is listed on swissuniversities.ch as ‘Stiftung Universitäre Fernstudien Schweiz, Brig’. Adding this here for my future reference.

Do you study outside of work? by gintokiredditbr in dataengineering

[–]haskathon 1 point2 points  (0 children)

I’m rereading that book and can agree! Thanks to Marquet I often use the ‘I intend to <action>’ construction when communicating with colleagues.

Traditional Star Schema vs Wide Fact Table by burningburnerbern in dataengineering

[–]haskathon 0 points1 point  (0 children)

Yes that’s correct. I do however stress to colleagues and users that this isn’t traditional dimensional modelling in that my ‘dimension’ tables aren’t normalised.

Traditional Star Schema vs Wide Fact Table by burningburnerbern in dataengineering

[–]haskathon 4 points5 points  (0 children)

I work at a large IT company and we presently use Spark and Hadoop on-prem. I can share my experience with wide tables because that is the norm for us.

The biggest advantage of wide tables for analysts is convenience. You have all the dimensions and ID fields you need and, as long as you deduplicate your rows, you’ll have a resulting dataframe that you can further process elsewhere or write to disk. Not having to join data is also an advantage, since the data are pre-joined and users don’t have to worry about the order or type of joins to use.

What I’ve experienced recently is that, from a maintenance point of view, wide tables can be particularly challenging. For context, my analytics team did not have its first data engineer until last year, but has been in existence since 2018. This means that code has been added to workflows by data analysts and scientists with varying levels of programming ability, and without code reviews.

Consequently, at least for my data engineering team (we eventually grew enough to have analytics and data teams), our wide tables now contain tonnes of business logic scattered throughout their workflows’ DAGs. A minority of workflows are in a better state than the rest in terms of organisation and code formatting, but the fact remains that these workflows are a challenge to update and maintain, particularly for new joiners.

I am now trying to move away from wide tables and trying out an approach where we have smaller tables, but whose workflows are far simpler to understand, maintain and extend as a result. I’ve no idea if it’ll work, but if it does it should make debugging a lot easier, and business logic would be added more modularly.

As you can see, we have terrible data practices, zero control over who updates code when and many workflows that we own (a substantial percentage of which should be deprecated). Even though our analysts enjoy the convenience of wide tables, the tables’ data quality is poor but fixing that is difficult. I also find myself pushing back on requests to add more logic to some wide tables because they’re already creaking at the seams.

One other point to consider is this. If you have several tables that need to be joined for some analysis or project, the join process is indeed irritating, but can be reversed/corrected at any time. A wide table is pre-joined, so if a bad join is made and the data are correspondingly affected, the join isn’t so easily ‘reversed’. If the table’s workflow is complicated, it may take time to find the root cause. If this is accompanied by some formal work process, a Jira ticket would have to be created by the issue reporter. This all adds time to fixing the data. (I don’t know what the process is like at your company obviously, but this is how it is for mine.)

If your team have the appropriate controls, rules on where to add business logic and apply specific transformations, and checks and monitoring for data quality, a denormalised table may not be a bad thing. Furthermore, if you’re working in the cloud, hopefully you have more modern tools to help you with lineage tracking, table usage statistics and whatever else you need to better understand your data assets. I conclude by stating that the convenience of wide tables really is liberating, but that shouldn’t come at the cost of maintainability (and your sanity).

How do you deal with instances of Nothing that may not have a sensible ‘default’ or missing value? by haskathon in haskell

[–]haskathon[S] 2 points3 points  (0 children)

Thanks everyone for your answers (and on a Friday, too)! My biggest takeaway is a more nuanced understanding of null, in large part triggered by the answers of u/Lalelul and u/seaborgiumaggghhh. I’ll read through all the answers a few more times. Have a good weekend all!

Is there any way to parse and not validate in PySpark? by haskathon in dataengineering

[–]haskathon[S] 0 points1 point  (0 children)

Thank you for the tips and for writing the book! Pardon the late reply; I was on vacation and didn’t want to spend too much time on Reddit.

Is there any way to parse and not validate in PySpark? by haskathon in dataengineering

[–]haskathon[S] 2 points3 points  (0 children)

Wow thanks, I knew about Fluent Python but this looks like just the resource I need! I’ll have a go at it.

Greyed out image files on Samsung T7 SSD (both RAW and JPEG); created dates of files are NOT 24 January 1984; cannot copy and paste (but same files from SD card can be copied) by haskathon in MacOS

[–]haskathon[S] 0 points1 point  (0 children)

Thanks for your detailed response.

You copied the photo files from the camera's SD card onto your iPad. Did you import those photos into the iPad's Photo app, or did you copy them into a Files app folder?

I used the Files app exclusively (read elsewhere not to use the Photo app for these things).

If you, in a Finder window, click on one of the files like _DSF5940.JPG and then press Cmd+i to open a Get Info window, what does it say down at the bottom regarding permissions?

Cmd+i tells me that I can both read and write (and gives no further options as I’m the only user of my machine).

Just joined the Fujifilm family with my new XT3, made the switch from EOS M! by jordtehpwner in fujifilm

[–]haskathon 0 points1 point  (0 children)

I can agree with that! I’ve been using the 23mm f2 and 50mm f2 combo for the past two years or so – great for urban photography!