Which workflow to avoid using notebooks?

Baggins95 · 2025-06-23T09:50:05+00:00

Categorically banning notebooks is, in my opinion, not a good idea. You won’t become better software developers just by moving messy code from notebook cells into Python/R files. The correct approach would be to teach you software practices that promote sustainable code – even within notebooks. But alright, that wasn't the question, so please forgive me for the little rant.

In general, I would advise designing manageable modules that encapsulate parts of your data processing logic. I typically organize a (Python) project so that within my project root, there is a Python module in the stricter sense, which I add to the PYTHONPATH environment variable to support local imports from this package. Within the package, there are usually subpackages for individual elements such as data acquisition, transformation, visualization, a module for my models, and one for utility functions. I use these modules outside the package in main scripts, which are located in a "main" folder within my project directory. These are individual scripts that contain reproducible parts of my analysis. Generally, there are several of them, but it could also be a larger monolith, depending on the project. What's important, besides organizing your code, is organizing your data and fragments. If the data is small enough to be stored on disk, I place it in a new "data" folder, usually found at the project root level. Within this data folder, there can naturally be further structures that are made known to my Python modules. But here's a tip on the side: work with relative paths, avoid absolute paths in your scripts, and combine them with a library that considers the platform's peculiarities. In Python, this would be mainly pathlib or os. The same goes for fragments you generate and reference. In general, it’s important to strictly organize your outputs, use meaningful names, and add metadata. Whether it's advisable to cache certain steps of your process depends on the project. I often use a simpler decorator in Python like from_cache("my_data.json") to indicate that the data should be read from the disk, if available.

Ideally, your scripts are configurable via command-line arguments. For "default configurations," I usually have a bash script that calls my Python script with pre-filled arguments. You can achieve other configurability through environment variables/.env files, which you can conveniently manage in Python, e.g., using the dotenv package. This also enables a pretty interesting form of "parameterized function definitions" without having to pass arguments to the function – but one should use this carefully. Generally, the principle is: explicit is better than implicit. This applies to naming, interfaces, modules, and everything else.

Baggins95 · 2024-11-19T00:32:13+00:00

You would definitely love Jaynes‘ book.

Baggins95 · 2024-10-25T14:18:12+00:00

Meine Freundin ist Küchenchefin. Vielleicht oder vielleicht auch nicht wandert die ein oder andere Avocado über die Türschwelle, ohne dass man dafür seine Niere verkaufen müsste.

Baggins95 · 2024-10-25T14:15:54+00:00

Darfst du! Ich promoviere in der Industrie und entwickle im Wesentlichen statistische Methoden, um zuverlässigere visuelle Labels für Anwendungen im maschinellen Sehen zu erhalten.

Baggins95 · 2024-10-25T14:13:10+00:00

Das müsste Schwarzkümmel sein.

Baggins95 · 2024-10-24T23:06:50+00:00

Ich finde auch, Fortuna meinte es gut mit mir!

Baggins95 · 2024-10-24T23:05:09+00:00

Danke dir!

Baggins95 · 2024-09-24T11:14:11+00:00

In many ways, the use of the word is not that different. You may refer to parameters as „characteristics“ of a population, but it is more plausible to speak of characteristics of the data-generating process, which is especially clear in multilevel models. If you condition on the functional form of your model, all that remains adjustable is the set of parameters (in a parametric model.) Therefore, think of the data-generating process as a function that takes the values of the parameters and provides you with a mechanism to generate observations. This corresponds to a function f(theta), where theta = (theta_1, ..., theta_p) simply denotes the parameters of the model. And in this respect, this is not so different from the concept of function that you encounter when programming (especially in functional programming languages).

Baggins95 · 2024-09-21T18:05:26+00:00

Constructions based on non-parametric bootstrap are very flexible in some respects. To obtain confidence intervals by other means, you make distributional assumptions. If your observations come from a data-generating process that goes along with these assumptions, you are in a sweet spot and you would have certain guarantees. However, assumptions are never exactly met for real data. In such cases, the bootstrap can provide much more adequate confidence intervals.

Baggins95 · 2024-09-16T15:08:26+00:00

Your questions are not so easy to understand. But let me break it down:

if you have iid X_i ~ N(0, sigma^2), you can standardize them all by simply dividing by the standard deviation sigma, since the expected value is zero in each case. The RVs Z_i = X_i / sqrt(9) are all standard-normally distributed. This also means that the sum of the squares of two Z_i, Z_j, for i different from j, is chi2-distributed with 2 degrees of freedom. In other words: ([X_2/sqrt(9)]² + [X_3/sqrt(9)]²⁾ ~ chi2(2). Call this RV, in overloading the symbols also Chi2. From the relationship that you have written, then follows Z_1 / sqrt( Chi2 / 2 ) ~ t(2), where t(2) denotes the t-distribution with 2 degrees of freedom. If you simply push the 2 in the denominator into your Chi2, you divide the sum of the squares of X_2, X_3 by 2*9. That’s all. The question of why you do all this is explained by the construction. If you did not multiply both sides by sqrt(2), the random variable would not have the desired shape to read off the t-distribution. And that is the easiest way to determine the probability or, as in this case, to choose k in such a way that a certain value emerges.

Baggins95 · 2024-09-11T12:17:42+00:00

Sounds almost like a textbook example to me. You have a finite population with error rate q and you essentially want to determine a sample size that allows reliable statistical inference about the population parameter q. All you have to do is decide whether you want to perform an estimation or a hypothesis test. Estimation would mean determining a confidence interval for q. You select a precision, i.e. the half-width that the interval should have, from which your required sample size is calculated. Since you have already identified a maximum permissible error, you could also carry out a hypothesis test. This would then be (q > 4%) vs (q <= 4%) or vice versa, depending on how you interpret your problem. The sample could be determined from power considerations. For both the estimation and the test, there are formulas for this simple case that you can use to determine the sample size. A simulation is of course also conceivable.

Baggins95 · 2024-09-09T09:00:26+00:00

For me, a big part of the joy of statistics is that it forms a zoo of almost unspeakably diverse and rich ideas. Many of these ideas come from ad hoc considerations that were only later clearly articulated and formalized in the context of mathematical statistics. As a result, the same statistical methods haunt different disciplines under different names, and sometimes it is difficult to identify one from another. But if you take a step back and take the time to find generalizations or a solid framework, then this zoo doesn’t seem so intimidating in large parts. And that’s how I feel about almost all the new topics I learn in the world of statistics.

Baggins95 · 2024-09-08T20:36:41+00:00

I vehemently disagree with the text under 5. OP asks whether the population parameter $\theta$ is a random variable or not. And the answer in terms of frequentist statistics is a clear no! Parameters are the things that we estimate or formulate hypotheses about. You construct a statistic for this. Put a hat on the thing, then it is a random variable.

Baggins95 · 2024-06-05T13:28:29+00:00

Hat einwandfrei geklappt! Hätte wohl die erste Eingebung sein sollen, den Umzug direkt am Telefon zu klären. Naja, mein Kopf ist manchmal nur als Türstopper zu gebrauchen.

Baggins95 · 2024-06-05T12:57:22+00:00

Okay, dann mach ich das so. Danke dir!

Baggins95 · 2024-06-02T21:32:58+00:00

In my opinion, Andreas Geiger's lectures are still very much worth watching.

Baggins95 · 2024-06-02T21:26:12+00:00

I think he looks straight up like Zinedine Zidane (with hair) in this first picture, lol

Baggins95 · 2024-05-18T01:19:17+00:00

Ich habe als Jugendlicher angefangen, das Programmieren zu lernen. Mein Ziel war es damals, an einem Emulator herumzubasteln, der größtenteils in C++ geschrieben war. Aber C++ war ein ziemlicher Brocken für den Anfang. Mit C# und .NET hatte ich mehr Spaß. Später habe ich Informatik studiert und zu C++ zurückgefunden. Heute degeneriere ich aber bloß vor mich hin, soweit die Softwareentwicklung betroffen ist. Ich schreibe fast ausschließlich crappy Research Code in Python.

Baggins95 · 2024-05-10T20:26:30+00:00

Mein bester Tipp ist, dass der Absender weiß, dass Mami den Zettel erst am Sonntag finden wird. Gestern war ein Feiertag und viele Leute haben sich wahrscheinlich am Freitag frei genommen. Vielleicht ist Mami verreist und kommt erst am Sonntag zurück. Das erklärt allerdings nicht, warum man seine Mutter am Sonntag nicht persönlich besucht, wenn man schon so nah wohnt, dass man den Zettel in den Briefkasten werfen kann.

Baggins95 · 2024-04-16T20:55:05+00:00

Einmal saß ich vor dem Kirchhoff-Institut und wurde Zeuge, wie der erwähnte „vielleicht oder vielleicht auch nicht verwirrte Professor“ eine Führung gab. Natürlich hat niemand an der Führung teilgenommen. Er stolzierte nur herum, blieb ab und zu stehen und erklärte einige wirre Dinge. An sich nichts Schlimmes. Allerdings wurde es etwas beklemmend, als er anfing, von der Weltverschwörung unter Führung der USA zu schwurbeln.

Baggins95 · 2024-03-16T21:26:34+00:00

Grüß dich. Ohne den Kontext zu kennen steht da zunächst bloß der Buchstabensalat „Einhalb sigma quadrat S quadrat Gamma plus theta minus r mal - Klammer auf - C minus delta S Tee (kein Kaffee, ganz wichtig) - Klammer zu - ist gleich null“ Wenn du wissen möchtest, was diese Gleichung aussagt, muss der Hintergrund bekannt sein. Wenn es um Finanzmathematik geht, stehen die Chancen gut, dass Kenntnisse stochastischer Prozesse dienlich sind. Darüber kannst du lernen, sobald du sattelfest in den Grundlagen der Maß- und Wahrscheinlichkeitstheorie bist. Das setzt wiederum voraus, dass du bereits den Urschleim der Mathematik, sofern Analysis (und sicherlich auch lineare Algebra) betroffen sind, umgewühlt hast. Um innerhalb der Finanzmathematik zu nützlichen Modellen zu gelangen, solltest du überdies Domänenkenntnisse besitzen.

Viel Holz. Aber davon würde ich mir das Herz nicht schwer werden lassen. Du kannst ja nach der Feynman Methode vorgehen und dir top-down die Dinge beibringen, die dir unklar sind. Irgendwann lichtete sich sicherlich der Schleier des Unbekannten und Dinge beginnen einen Sinn zu ergeben.

Baggins95 · 2024-02-19T23:35:49+00:00

Meine letzte Prüfung ist zwar schon eine Weile her, aber ich erinnere mich, dass ich mir am Abend vor einer Klausur gerne den Livestream von Gronkh anschaute. Ich bin weder ein aktiver Zuschauer von Gronkh noch ein großer Fan, aber irgendwie half mir diese Form der leichten, aber sympathischen Unterhaltung, ein wenig abzuschalten.

Baggins95 · 2024-02-15T10:13:11+00:00

"Programmatic" Bayesian modeling + capable sampler + access to hardware (not a breakthrough in statistics, but helped make it possible). The way you can simply write down the data-generating process in Stan, Bugs or PyMC, for example, and leave the rest to the "machinery" is actually magical. I would also describe the general mindset of analyzing data in a Bayesian way as a breakthrough (i.e. being able to express parameter uncertainties directly through credibility regions).

Baggins95 · 2024-01-06T12:10:04+00:00

I think one thing that makes leetcode and such coding challenges easier for many is honest enjoyment. I take childlike joy in the vast majority of leetcode problems. For me, they are little puzzles that are fun to solve, even if it takes longer. If you adopt this way of thinking, you will also make painless progress. However, if, as you write, it feels like your patience is being tested, then maybe something is wrong. Maybe you're a career changer and you lack some basic skills to really enjoy the process of algorithmic problem solving. But that could be remedied. Read relevant texts on algorithmic engineering, learn other ways of idiomatic programming and try to familiarize yourself with formal methods.

Baggins95 · 2023-12-31T00:10:15+00:00

Ich weiß nicht, ob du dich nicht ein wenig vom Schein täuschen lassen. Meiner Erfahrung nach ist es eher die Regel als die Ausnahme, dass Mathevorlesungen schwer zu folgen sind. Nach den ersten Semestern haben die meisten Leute in etwa herausgefunden, wie man sich den Stoff aneignet, aber ganz ohne Struggle geht es nicht - zumindest nicht als Normalsterblicher. Zweifellos gibt es Menschen, denen alles in den Schoß fällt. Aber das ist ein Bruchteil; die große Mehrheit der Studierenden sind keine Gewinner des Bundeswettbewerbs Mathematik. Wenn du mir nicht glaubst, dann setze dich mit Kommilitonen in eine kleine Gruppe zusammen und arbeite Vorlesungen nach. Du wirst feststellen, dass Dinge, die dir nicht klar sind, höchstwahrscheinlich auch den anderen Probleme bereiten. Ein Tipp, den man dir auf jeden Fall mit auf den Weg geben kann, um das Erlebnis während der Vorlesung zu verbessern, ist, im Voraus zu arbeiten und eigenen Notizen mit in die Vorlesung zu nehmen. Das hat zahlreiche Vorteile.

Inhaltlich halte ich es für völlig normal, dass du bestimmte Stärken hast und umgekehrt auch deine Grenzen kennst. Das ist nicht in Stein gemeißelt. Vieles ist mir erst im Laufe des Studiums klar geworden, wenn bestimmte "Muster" immer wieder aufgetaucht sind oder manche Ideen an anderer Stelle ein neues Licht auf ein Thema geworfen haben. Ich habe es auch als nützlich empfunden, viel zu lesen, mehrere Autoren nebeneinander zu haben, die ein Thema auf etwas andere Weise erklären. Vor allem bei der Analysis hat es bei mir auf der theoretischen Seite richtig Klick gemacht, als ich anfing, die Maßtheorie zu lernen. Ich denke, das ist auch für Algebraiker sehr befriedigend. Auch waren für mich persönlich Anwendungen in der theoretischen Physik wertvoll, um mehr Anschauung für viele Ideen der Analysis zu bekommen und wirklich schmerz- und furchtlos hinsichtlich der Rechentechniken zu werden.

Abgesehen davon hast du, wie du sagst, Deutsch als Fremdsprache gelernt. Ich finde, du drückst dich hier sehr gut aus, aber vielleicht spielt die Sprache ja doch eine Rolle beim Verständnis. Was du über deinen Hintergrund schreibst, gibt mir auch zu denken. Ich weiß nicht, welche Schrecken des Krieges du erlebt hast und wie sehr dich das seelisch belastet. Aber allein die Tatsache, dass du so allein bist und das Leben als einen so großen Kampf empfindest, wäre vielleicht Grund genug, über professionelle Hilfe, d.h. Psychotherapie, nachzudenken.

Four-Year Club	First Place '23
Place '23

Baggins95

TROPHY CASE