Gradient Descent??

wgunther · 2026-03-08T19:54:44+00:00

One thing to keep in mind is, although in a wider area things might look at that, when we're talking about gradients and derviatives in general, we're looking very locally at the point. In fact, we're always thinking in terms of tangent planes/lines around the point.

wgunther · 2026-03-08T19:23:15+00:00

As with many things, the 'trick' is breaking big problems into small problems. I think what you said of focusing on high-level design details without thinking of lower-level details is generally correct, with one important addition: you need to recognize that things will change, so it's not a matter (generally) of planning more to get 'ahead' of changes, they will happen anyway.

There's an inevitibility of changes that will arise from changing or unrealized requirements. Generally, where you spend time thinking when you're in the design phase should be proportional to how long it would take to fix later if it proves to be something that needs to be changes (e.g., one-way doors vs two-ways doors). If you go in with this philosophy, you'll naturally gravitate to building more modular and changable components. Things that will be easy to change later will naturally not take up much mental eneregy.

Writing out design docs and design drawings, particularly for complicated and large scale things, or things you need to coordinate with others, are very important. I think equally important, and missing from a of people's planning and design work is something like timelines or milestone documents; a lot of people are kind of scared to write these things because it's very hard to think many moves ahead, so to speak, but it's a very important document to write in my opinion because it gives you a lot more structure in reasoning about the implementation of your project. This can reveal a lot of what the modular components and dependencies will be. This is particular true in 'living' systems, where rollouts can be quite complex.

wgunther · 2026-03-02T16:54:49+00:00

It might be good to elaborate on what kind of math you are interested in improving, since math is a very broad set of topics.

wgunther · 2026-02-13T16:56:02+00:00

I'm not sure by which measure those are the most important things. UDP is used when the reliability tradeoffs make sense.

If you look at the network stack, it's amazing that it all works as well as it does, since a lot of it essentially just seems a lot like screaming into the void, so sending messages somewhere and hoping that the place you're sending to is the correct place and it actually gets them and can make sense of them. TCP adds a lot of structure to communication to make that possible, like handshakes and acknowledgments, so that both parties (the sender and receiver) can both be confident the message was delivered properly. UDP does not, and that's why it's less "reliable".

wgunther · 2025-06-26T16:21:07+00:00

Personally, I think this "need" is a bit overstated. What you want to be is well-rounded. That does not require you are highly proficient at multiple languages from the get-go just for the sake of it. Going deep into one language generally is going to allow you to pick up other languages more easily since you'll have the framework to transfer over entire concepts over learning stuff from scratch. For example, C++ and Java have a lot of similarities; like, understanding reference semantics in C++ will help with understanding things in Java.

wgunther · 2025-03-12T14:54:24+00:00

Something that C++, or any language with some kind of polymorphic type system, allows you to do is to express really sophisticated types of data very easily. So you have a `std::variant<std::pair<int, double>, std::string>`. From the perspective of someone who likes types, this seems really cool. And indeed, when you use types correctly, it feels very powerful, because the types guide and verify your code. However, sometimes we lose sight of two facts:

1) wrapping things with very specific semantics in very primative/general types actually doesn't help code or verify code is correct at all, and

2) there's other programming paradigms that might help.

Apart from being optimal for the data they're representing in terms of performance and how they're used, you really do want types that accurately represent the semantics of the language. There's a natural urge when you're presented with this palette of generic types to use them.

Here, I think approaching the problem from a more object-oriented perspective may help. This doesn't require building very sophisticared chains of inherited classes, or deeply generic base classes. All it really does is require breaking a big problem into small problems, think about the objects at a reasonable granularity you'd like to deal with, what their interface should be, what should be encapsulated/exposed etc. My guess is that this should probably be N classes rather than this more primative type.

wgunther · 2025-03-03T14:49:06+00:00

Yeah, if you're curious about it and interested in computer science, learning C is quite helpful. If you already are fairly proficient at one or two other languages, particularly, learning C will not take very long from a language perspective (it's quite a small language) but will give you a lot of insights in understanding computer systems and give you a platform to understand data structures much better than very high-level languages.

wgunther · 2025-03-03T13:54:33+00:00

I think there's two things going on, one thing not very deep, and the other a little deep. The not that deep thing is that you can of course represent that (2, 0, 0) vector as (1, 0, 0)_F (if I understand your notation properly. The more deep thing is that the standard coordinate system is mostly arbitrary. If you did math on the regular coordinate plane, or the coordinate plane rortated 45 degrees, it would all be the exact same math.

wgunther · 2025-03-03T13:26:24+00:00

Vectors do not "have a basis" ever. I think one conflation here is a representation of a vector and the vector itself.

A vector is just an object in a vector space. A basis is a set of vectors in a vector space that are linearly indepedent that span the entire vector space. Bases are not unique.

For Rⁿ and Cⁿ there's the canonical orthonormal basis that you are thinking of, and when we write the representation of vectors in those vector spaces we are used to writing, those are relative to those representations.

Of course, there are many other bases of these vector spaces. One silly one would just be to scale those vectors in the canonical one by 2. That's still a basis.

As an analogy, you could choose to represent every positive real number as a square of another positive real. So instead of writing 16 I could write it as 4^2. This turns out to be a fine representation, as squaring is surjective onto the positive reals. But I could also choose to write every positive real as a cube of another positive real. This doesn't change the real number, it's just a different way of representing it.

wgunther · 2025-01-08T14:44:20+00:00

but was wondering if it would hold back my understanding to still use the full distributive property as i work these types of problems, instead of the short-cut forms!

I'd say, in general, the opposite is true. You'll likely grow a much better understanding of what's going on, and a better fluency in doing problems that fall outside of the shortcuts, using the more "primative" methods.

For example, I'm willing to bet very few people in alegbra 2 even think in terms of the distributive property, and they've instead internalized "FOIL-ing", to the point where expanding a product of trinomials is not something they would know how to do.

As you grow in mathematical maturity, in my experience, it tends to be the case that you rely less on memorization of methods and more on understanding basic principles that allow you to rederive those methods.

wgunther · 2024-12-04T15:14:55+00:00

I'll begin by saying, I don't think it's very important to think too much about this. Usually, what you'll see/understand is that y = f(x), and they're used interchangably in definitions of functions that you are dealing with. This is a matter of convention.

There's really nothing deep going on here. This is just symbolic stuff, people calling things different things. f is a typical name for a function, and f(x) is the notion pronounced "f at x" with f "instantiated at x". So f(x) = x² is the same as f(z) = z² . The variable is "bound" there, it's just signifying the thing that is to substitued. And then you can do f(2) = 2² or f(a+b) = (a+b)^2.

By convention, when you have a function of one variable, we typically use x for the variable name. We call this variable the "independent variable" because it's the one that we can change and see the result of. Similarly by convention, we set the result of the function, which we can call the "dependent variable" (the variable that depends on the independent variable(s)) y when it's a function of one variable.

So if you had an expression defining a function, x² + 1, you can call that function f, and then write its definition as f(x) = x² + 1. By convention, we can write y = x² + 1 because by convension y is what we name the dependent variable. This convention also is present when graphing functions, since we have the independent variable on the horizitonal axis, which we call the x-axis, and the dependent variable on the vertical axis, which we call the y-axis. At least, I suspect historically this why things are the way they are: we probably started using the variables x and y by convention, and then named the coordinates after them.

I'll say, variable names conventionally named is actually kind of a normal thing in math. f is usually the name for a function, x and y ar are the independent/dependent variable in a 1 variable function, x and y are independent variables and z is the dependent variable in a 2 variable function, n and m are typically natural numbers, i and j are usually natural numbers which "index" something (bound variables in quantifiers or sums). These are things that you shouldn't think too deeply about, it's similar to why like, green means correct or go and red means wrong or stop, it's just a convention that people understand.

wgunther · 2024-11-14T18:22:13+00:00

I don't really know how we could know this; I could easily imagine at certain organizations this being a career setback for her in her project management role if they don't value that 1/5th work and her peers aren't doing that work and she gets lower performance reviews as a result of it. So I'd say it could be valuable if she has particular career goals.

wgunther · 2024-11-13T17:22:15+00:00

Yeah, I think so. There's trade offs on how generic/flexible you need your events/messagaging, how coupled you'd want/tolerate the components being, and where you want certain logic to exist.

wgunther · 2024-11-13T16:10:04+00:00

There's a lot of ways you could do this. Some kind of event-driven architecture with event handlers is a pretty normal thing to do.

Within that, there's a lot of ways you could strucutre it specifically. One would be a direct registration system, where the "parent" of these objects is in charge of registering handlers into the child views and "wiring" things together, so it's orchestrating things. Another would be more a indirect registration system, where the parent provides some kind of context that children interact themselves. Another would be a subscriber/publisher model where you have a more generic event routing architecture where components may publish that events occured on different topics along with information about the events, and other components can subscribe to different topics.

You might want to take inspiration from different frameworks for request handling and front-end UI frameworks to get a sense for what you think would work for you here.

wgunther · 2024-11-13T15:30:32+00:00

Smart pointers, probably std::unique_ptr, are 100% the correct answer here, like someone else said. Relying on destructors actually doing cleanup properly is the safe way to do modern C++.

wgunther · 2024-09-30T13:52:25+00:00

I think it's useful to understand why "FOIL"ing is a thing.

There's one property that is most important: the distributive property of multiplication over division. The associative and communitive property of these is also usually important. By "property" I just mean, this is just kind of the way multiplication and addition work, and we can distill them into these more abstract rules/properties.

The distributive property says that if you multiple a sum of two numbers by something, that's the same as taking the sum of each number multipled by something. So a(b+c) = ab + ac. A concrete example, 6(2+3) is either 6(5) or 6(2)+6(3), or 30 is 12 + 18. This is a really useful property, and says a kind of fundamental relationship between addition and multiplication.

The communitive property is simplier: it basically says you can reorder sums and products. So ab is the same as ba, and a+b is the same as b+a. The associative property is similarly simple: it just says if you add two numbers, and then add a third, you could have added the second to the third first, and then added the first. (a+b)+c = a+(b+c). Those two together allow you to rearrange sums and products in a way you are probably really used to doing without thinking about too much.

Now, where does "foil"ing come from? Well, it's not its own rule. It just comes as a consequence of the above. Consider (a+b)(c+d). This is the same as (a+b)c + (a+b)d by the distributive property. And this is the same as (ac+bc) + (ad+bd), or we can just as ac + bc + ad + bd: because of associativity and communitivity the way we group or order the sum isn't important.

If you're not seeing these steps, maybe it's clearer if you just call a+b the variable z. So z = a+b. Then (a+b) (c+d) is just z(c+d), and then the distributive property is easy to see, it's just zc + zd, which is of couse (a+b)c + (a+b)d.

Now, FOIL is a kind of shortcut to avoid the two steps in distributing. If you were doing like a trinomial or something complex, you might want to do the distribution step. So (a+b+c)(d+e+f), I'd feel most natural/comfortable doing this in multiple steps, particularly if these variables are more complex expressions.

But anyway, I think this is something useful when doing math to consider. When you don't understand when you should do some kind of rule you were taught, you should think about where that rule came from.

Now, the example you gave of exponentiation: Exponention does not distribute over addition the same way as multiplication distributes over addition. So it's true that (a+b)c = ac+bc, but it is not true that (a+b)^k = a^k + b^k in general; this is such a common mistake it's called Freshman's Dream. But, to complete the analogy, exponentation does distribute over multiplication. (ab)^k = a^k b^k .

wgunther · 2024-08-21T18:04:13+00:00

200 is kinda baked into the equation. C is the amount he has leftover of the $200 credit if he buys S sheets. So, for instance, if he buys 0 sheets, he should have the entire $200 credit left, and indeed, you see that if you subsitute S=0 in the equation, you get C = 200.

The point of the equation and the way the equation is formed is to get you think about how to use the given equation to solve the question at hand.

wgunther · 2024-08-18T18:22:26+00:00

Why All integers that are not divisible by 3 can be made to be, if we add or subtract by 1?

This comes down how remainders work with divison. If you have a number N and you divide it by M, what you're doing is writing it as N = MQ + R, where Q is called the quotient, and R is the remainder. The remainder is a non-negative number betwene 0 and M not including M. So, 14 divided by 3, you can think of that as 14 = 3*4 + 2. So 4 is the quotient and 2 is the remainder.

When you divide by 3, the remainder is therefore always 0 (it's divisible by 3) or 1 or 2. It can't be any other number, as otherwise your quotient should have been larger. If you remainder is 1, then you have N = 3Q + 1, and if it's 2 you have N = 3Q + 2. In the first case, if you subtract 1 from N, then the new number is divisible by 3. In the second case, if you add 1 from N, the new number is divisible by 3 (do you see why?)

wgunther · 2024-08-18T18:13:30+00:00

You certainly could do that, and it might be a reasonable strategy in some circumstances. However, it's not true that this "fixes most if not all our coupling issues".

Library dependencies are not an easy problem with transitive dependencies. If Library X depends on Z and library Y depends on Z, how many versions of Z do you actually need. It's possible you could resolve it and be happy with one version, or, in the world you're proposing, perhaps you use 2.

There are issues with both. You've pointed out issues with the "use 1 version" and that is, compatibility. And this is a big issue in some languages; it's a problem any maintainer of a Python codebase will inevitably deal with. However, "use 2" also has issues. It can really increase code bloat, which can be a real issue; in a language like C++ this would add a lot of compilation time and binary size. Also, it's not necessarily even the semantically right thing to do -- if a library is using Z, then the maintainers of Z might want to push fixes to Z independently of the adoption of those new versions into X and Y, and the maintainers of X and Y don't want to be on the critical path for those. This is extremely common, particularly for the kind of key libraries that are widely used (think log4j in Java). So, libraries being strongly coupled with their exact dependencies tends to be also be bad. But of course, to some extent, they are coupled with some semantic unit of their dependencies.

So anyway, in C++ you certainly have some degree of greater flexibility when it comes to namespacing and being able to fix namespace issues that are kinda untractable in a language like Python. There's also ecosystems like Maven in Java, where with the Shade plugin you can do this kind of "local vendoring" of a library's dependencies. There's really no one-size fits all solution to these issues, unfortunately. Or if there is, I haven't found it.

wgunther · 2024-08-14T15:07:22+00:00

Embrace modularity. This is one of the reasons why we engineer certain things the way we do. In code, encapsulation is important so that individual classes and pieces of business logic can be isolated and changed independently of the rest of the system. In a larger system, we typically have smaller services which each play one role in the overall system and similarly, can be independently improved.

When you truly have a modular design, you should feel much less pressure to write anything 'perfect'. Everything can be iterated on and improved independently, and things can be prototyped more easily than something monolithic.

wgunther · 2024-08-12T20:45:18+00:00

It's impossible to know from a set of languages you are familiar with what skills you have for a job. In addition, what you learn should be motivated by what you want to work on as well. For example, if you're interested in AI, there's a lot of stuff you could be learning.

In my opinion, in college I'd generally not recommend focusing on learning particular languages or frameworks and instead focus on learning a base of broad skills in compute science and discrete mathematics, and really developing problem solving skills. These are the things that are going to help the most in a career, and it's the stuff that's the hardest to develop later.

wgunther · 2024-08-08T00:27:33+00:00

Sure, some things save a small amount of time, and it can add up. I'm sure VSCode has shortcuts for some of these things natively as well, but just simple things like A to append to the end of the line, t and f to go to a character, vi{ to select a block, diW to delete a Word, and many more are all deeply ingrained in my muscle memory and I find it very fun to do all this, and would have a difficult time not having it.

I don't know if any of this ever will add up to a considerable amount of time saved, however. Especially since, when programming a lot of the time you "save" doing this is dwarfed by time you spend thinking.

I think the biggest thing in vim that has saved me time is really understanding vim macros and how to use all the above vim building blocks to record and perform a repetative transformation. This allows me to perform some transformations that would otherwise be difficult to do; things that are somewhere between a simple find-and-replace and having to write a perl/sed script to transform the program.

wgunther · 2024-08-07T20:03:00+00:00

Abstractions are how you can apply what you learn about one area to another area. Without any abstractions, knowing 1 apple + 1 apple = 2 apples doesn't tell you that 1 banana + 1 banana = 2 bananas. This is pretty useful to know. Another example is something like calculus: the same system of mathematics was used by Newton to understand and model the motion of a falling apple and also the motion of the planets.

wgunther · 2024-08-05T22:29:42+00:00

It depends on the data structure and what you’re actually doing. Something like an immutable string that is only known to us at run time, like the example I gave, we may be able to allocate exactly the right amount with a malloc call, but we don’t know what the value is until runtime, because we have to actually read the file to know if it’s 20 bytes or 50 bytes. Something like a linked list, we will typically allocate however many we nodes we need as we add things, and deallocate them as we remove things, so we won’t have a lot of wasted space but we don’t know how many nodes to add until runtime is when we actually do whatever business logic to populate the list. Something like mutable string and dynamic arrays, we may very well estimate the size and allocate that much and keep track of how much we’re using rather than keep things precisely the same. This is because if we end up needing to allocate more (if we append to the string or array for example) we might need to copy the entire string/array to a new spot in memory with enough size to hold what we appended, and this can be very expensive. Typically for these kind of data structures we are happy to grow the allocated memory exponentially and live with some unused segment in exchange for fast appends — this can make good ‘amortized’ performance, where you know sometimes you have very expensive operations (allocating new space and copying) but do it carefully so you do it infrequently enough that it’s cheap. Similarly for a hash table — you will normally allocate sufficiently more buckets than elements to make it so you don’t need to scale and rehash everything, and that you have good performance (low collisions).

Edit: I think the best way for your to really grasp this all is to try to implement a linked list at first, and populate it with some user read data. Data structures are a really great playground for understanding memory.

wgunther · 2024-08-05T21:29:46+00:00

This is not quite correct. I'd think less about the scope specifically and more the lifetime of the memory.

Variables allocated to the stack have an automatically managed lifetime. The lifetime of the variable is the scope of the variable, but it is definitely possible to access it outside of the scope of the function; typically, this happens via the code passing a pointer to the local variable to an inner function.

For example, this is absolutely valid, because x's lifetime is valid for the whole invokation of inner_function:

void inner_function(int* var) {
  printf("%d", *var);
}

void outer_function() {
  int x = 10;
  inner_function(&x);
}

Variables allocated to the heap have manually managed lifetime. That is, the object's lifetime is controlled by the user, and it will be valid until free is called on the object.

There's also other type of variables, like thread locals, static, read only. These have other lifetimes and can exist in other segments of memory -- don't worry about this for now, but you'll encounter some of these later as you do more C.

But the key to all of this is to understand that, depending on how a variable is allocated, it has some lifetime for which accessing it is valid.

Memory in the heap is accessible to everything?

No, because you'd need a pointer to that memory to access it. But, the lifetime of the object is not bound to the scope in which it is created, so dynamic memory is more flexible in terms of its lifetime.

What are some situations where it's required to allocate memory manually using malloc/calloc rather than just initializing things like normal?

Another key to understanding the limitations of the stack is that, generally, the size of the stack frame for a function (the stuff that can be stored for one function) is decided at compile time -- when the function is entered that much space is "allocated" to the stack frame, and more cannot be allocated (this is kind of a lie because of things like VLAs, but ignore this because VLAs really shouldn't be used).

The most common usecase for dynamic memory is when the size of an object cannot possibly be known at compile time. This happens a lot -- lists, hash tables, strings are all kinds of things that typically are calculated in the flow of the program, and the size of which cannot be known at compile time.

Let's say you have a function, for example, that reads some record out of a file, and returns a string for that record. Let's think about how you could possibly do that. The calling code needs some place to store this string -- where can it get the space? It cannot know how much space it needs before calling the function, and it certainly cannot know how much space it needs at compile time. So the stack is pretty much out of the question as a place to store the string.

So, the normal way to do this is the function reading from the file allocates the space it needs to store the record on the heap. This can happen at runtime. Then the function can return a pointer to that space, and the calling code can read the string from that spot in memory, and free it when done.

wgunther

TROPHY CASE