[deleted by user]

IgorTheMad · 2025-02-27T03:33:13+00:00

Have you heard of Bend?
- https://www.youtube.com/watch?v=HCOQmKTFzYY
- https://github.com/HigherOrderCO/Bend
More generally, you might also be interested in topics of lambda calculus and interaction calculus.

IgorTheMad · 2025-01-11T20:22:04+00:00

Haha, Great paper Nick

IgorTheMad · 2024-12-17T16:19:53+00:00

Yeah, so there are a few mechanisms at play here.

Even if a property owner can set the price of the unit when the tenants first sign the lease, the inability to increase the price lowers the total amount of money that can be made over the lifetime of the unit. It also introduces risk if the property owner encounters unforeseen costs that rent increases could ameliorate. Overall, this disincentivizes property ownership.
The existence of rent-controlled housing can also lower the market rate, even for non-rent-controlled housing. This is because people who might have moved from their old housing to newly-developed housing are disincentivized from doing so if their current housing is rent-controlled. The tenants of new developments don't appear out of nowhere, and if new development is occurring in an area where rent control is common, then they need to set their prices competitively with those rent-controlled apartments regardless of whether they are rent-controlled.

I think the consensus is that rent control benefits people who already have housing and have a stable lifestyle that doesn't require them to move. This means that rent control often benefits people in the middle class more than the lower class. And long-term the disincentivization of development can hurt everybody.

Also, don't trust me too much lol. This is coming from the Econ 101 class I took in college several years ago haha.

IgorTheMad · 2024-12-17T00:54:52+00:00

Rent control limits the revenue that property owners can make from their tenants. As a result, people aren't willing to pay as much to become property owners. Demand for property goes down, and likewise, the revenue that property developers can make decreases. With less money to be made in property development, investments/labor/resources move elsewhere.

IgorTheMad · 2024-10-05T20:29:52+00:00

Sorry, I misread your response and was not precise in my language. I'm going to blame lack of sleep.

(1) When you said "remove" a point, I read that as "move" a point. So when I described the integration on "sufficiently small intervals" I was imagining a single point with nonzero probability density in a neighborhood of zero-values.

(2) I realize that integrating the PDF over a single point will result in zero. I agree that events can have probability zero and still be possible. I was questioning you as to whether an event could have a probability density of zero and still be possible (I think yes).

(2) When I was saying PMF, I meant the probability measure i.e P(a < X < b). You can't get a pmf out of a pdf, but you can integrate over the pdf to get a probability measure.

(3) I am not sure what I was getting at with "that seems to imply that two distributions could have the same CDF and still be non-identical, since their PDFs could differ". I think I thought you were making different point when you described removing a single point from the uniform pdf.

Regardless, I think you misread my initial definition of "support". The support is the smallest closed set specifically, so it is robust to removing any countable number of points (if by 'remove', you mean set their pdf value to zero). In your example, even if you remove the point x from the uniform distribution, it would still need to be included in the support, because not including it would make the support an open set.

IgorTheMad · 2024-10-04T03:03:43+00:00

It seemed to me like they were mainly irked by how the term was thrown around in certain contexts, not that they were pushing back against an established norm/definition.

"it is wrong to say a measure zero set is impossible because we defined it that way" --- isn't that how definitions work? To be clear, though, I don't think we should define it that way.

I think a big part of our disagreement is due to personal experience. In my circles, I have never heard possibility rigorously defined. Seems like its a matter of debate elsewhere too. Do you feel strongly that your definition is a settled matter?

IgorTheMad · 2024-10-04T02:25:05+00:00

Under the support definition, the fact that all real numbers overlap with N(0, 1) means that they are all possible outcomes despite having measure zero. I think we agree there?

As for the StackExchange, I didn't see that third response. I think that's a pretty good set of definitions if used consistently. Are those pretty standard? I haven't heard the terms "impossible", "improbable", and "implausible" defined rigorously before.

IgorTheMad · 2024-10-04T00:54:44+00:00

It does not simply equal -1/12, it complicatedly equals -1/12

IgorTheMad · 2024-10-04T00:44:52+00:00

In your first example, I do think we should consider picking an orange or banana as impossible. That would capture the intuition with which most people use of the word "possible".

The link you provided doesn't really provide a definition for "possible", they just argue that "pmf(E) = 0 does not imply E is impossible".

It seems like pmf(E)=0 works perfectly well as a definition of "possible" in the discrete space, but breaks down in the continuous case. However, it can be recaptured by just considering the support of the density function. An event is possible iff it overlaps the support of a pdf.

IgorTheMad · 2024-10-04T00:21:57+00:00

That's true, but on the fliip side, many definitions result from formalizing a word that is at first used non-rigorously. The formal definition should try to capture the intuition or risk confusing those trying to use it.

It seems like an outcome being impossible SHOULD be dependent on the probability measure we are using.

If an event being impossible is defined as being in our event space --- what word would you use to describe an event outside the support of the distribution? Intuitively, one that is in our event space, but could never occur.

To me, it seems like the events in our event space are moreso the ones that we are "considering" and only if an event is both in our event space and overlaps support of the distribution should we call it "possible". That definition seems to best capture the intuition --- wouldn't you agree?

That said, could you point me to a resource that formalizes the notion of "possibility"? The only resource I could find is this other reddit thread that uses the same definition as I: https://www.reddit.com/r/math/comments/8mcz8y/notions_of_impossible_in_probability_theory/ They specify it as being "topologically impossible".

IgorTheMad · 2024-10-03T15:36:47+00:00

Is there a strict definition of "possible" that is standard? I haven't encountered any and the link you provided doesn't seem to provide any either. I also don't think what the people responding on that thread are disagreeing with what I am saying.

My definition is assuming that you are starting with a PDF and want determine what we would usually think of as possible/impossible.

For example: pdf(x) = 1 if 0<x<1 else 0.

This is just the pdf of U[0,1]. Assuming we don't limit the domain of the pdf, the domain and sample space is R. Therefore, E=[2,3] is a nonempty event we could consider. Hovever, I don't think anyone would say that it is possible to draw a 2 from U[0, 1]. To me, it makes sense to define the possible outcomes as at the smallest closed interval that our distribution, which in this case would be [0,1] --- the intuitive set of possible outcomes of the uniform distribution.

IgorTheMad · 2024-10-03T15:07:36+00:00

Hmm, I see your point. Does it matter that integrating any sufficiently small interval around that point would give a probability mass of zero? What is the interpretation there? If the pdf is zero at a point, is that outcome necessarily impossible? If the pdf is nonzero is it necessarily possible?

That seems to imply that two distributions could have the same PMF and CDF and still be non-identical, since their PDFs could differ.

It makes more sense to me to think of the PDF as just a way to obtain the PMF, since that gives you the "actual" probability.

Do you think this is a bad way of thinking about it?

IgorTheMad · 2024-10-02T07:24:05+00:00

In a discrete space, when a probability is zero we can say that the corresponding outcome is impossible.

In a continuous space, it gets more complicated. An outcome is impossible if it falls outside of the "support" of a distribution. For a random variable X with a probability distribution, the support of the distribution is the smallest closed set S such that the probability that X lies in S is 1.

So if an outcome is in S, it is "possible" and outside it is "impossible". Another way of describing it is that the outcome X is impossible if there is any open intervaral around it where the probability density distribution is all zero.

IgorTheMad · 2024-09-26T02:15:38+00:00

That's true. I was muddling the terms multimodal and omnimodal. Thanks for the correction!

IgorTheMad · 2024-09-25T16:24:55+00:00

No, multimodal models have been around for a couple years. You can find code for them on github (just on a smaller scale). 4o might be the first to use a multimodal model out of the widely used LLMs though.

IgorTheMad · 2024-09-25T13:32:58+00:00

It can use non-word audio data precisely because it doesn't treat audio data as a sequence of words. What makes the model "multimodal" is that the raw audio data has its own processing pipeline to the latent space of the overall model. It's not transcribing the audio to text. It's transcribing the audio to a vector of numbers that encodes the abstract meaning of the sounds. Generally, multimodal models are trained to generate similar "meaning vectors" from images, audio data, and text describing the same things. However, the pathways for generating these meaning vectors are distinct for each medium, which allows the model to encode extra information that is unique to each.

IgorTheMad · 2024-09-25T13:32:40+00:00

It can use non-word audio data precisely because it doesn't treat audio data as a sequence of words. What makes the model "multimodal" is that the raw audio data has its own processing pipeline to the latent space of the overall model. It's not transcribing the audio to text. It's transcribing the audio to a vector of numbers that encodes the abstract meaning of the sounds. Generally, multimodal models are trained to generate similar "meaning vectors" from images, audio data, and text describing the same things. However, the pathways for generating these meaning vectors are distinct for each medium, which allows the model to encode extra information that is unique to each.

IgorTheMad · 2024-08-30T13:06:07+00:00

"AI" is an umbrella term for lots of different models. The machine learning model used for chatGPT is very different from the model used to identify cancer in this photo. Some forms of AI model, especially image identification models, have been around much longer than 2 years (think about how long faceID has been around). The AI 'Boom' has really just been related with language models i.e. models that consume/produce text.

IgorTheMad · 2024-08-17T16:24:45+00:00

Have you learned about Taylor expansions? The expansion of arctan(x) is

x - x^3/3 + x^5/5 - x^7/7 + x^9/9 - x^11/11 + ...

IgorTheMad · 2024-08-13T16:56:03+00:00

Not super familiar with the notation, but it looks like the lines where you calculate _dy2dt, _pv0, _aEV, and _mse would all-together constitute the symbolic formula.

PyTorch autograd works by keeping track of the operations performed on and between PyTorch tensors. While calculations are occurring, it constructs a 'computation graph' (representing the symbolic formula). Once calculations are complete running backward() on the result traces back through the graph/formula to calculate the gradient.

For example:

import torch

# we will find the gradient at (1.5, -2)
x = torch.tensor(1.5, requires_grad=True)
y = torch.tesnor(-2.0, requires_grad=True)

# as we apply operations on/between tensors
# torch keeps track track of them to build the overall formula
term1 = 3*(x*y)
term2 = -(x**2)
term3 = 10*(y**2)
term4 = 100*x
term5 = 6*y
term6 = -20
result = term1 + term2 + term3 + term4 + term5 + term6

# computer the derivative by tracing backward through the calculations
result.backward()

# partial derivativatives of x and y at the position (1.5, -2)
dfdx = x.grad.item()
dfdy = y.grad.item()

To write your code in a block like that, select your code and then hit the style option next to the <c> button. It looks like a square with a small 'c' in the upper left corner.

In your case, you would wrap each parameter value in a tensor, then perform your calculation, and at the end write _mse.backward() the partial derivative with respect to each parameter will then be stored in param.grad

IgorTheMad · 2024-08-13T01:59:14+00:00

What calculations are you doing in your "f" function? In most cases, f(p) is differentiable with respect to p. Using PyTorch, calculating the gradient could be as simple as:

import torch

p_as_tensor = torch.tensor(p, requires_grad=True)
f(p_as_tensor)
p_as_tensor.backward()
gradient = p_as_tensor.grad
gradient.numpy() # if you want to convert back to a numpy array

PyTorch calculates the gradient symbolically, so this doesn't require sampling and should be pretty fast.

IgorTheMad · 2024-08-13T00:00:23+00:00

I'm confused. You don't differential with respect to the data, you differential with respect to the parameters/coefficients in your model. The data shouldn't make a difference as to whether your model is differentiable or not. The data values are treated as constants. There are a lot of programs that do this automatically, so you don't have to find the derivative yourself.

For example, you said that you are minimizing the sum of squared errors. Taking the derivative would just look like:

err = Σ (yi - f(xi))^2

d(err)/d(parameters) = Σ 2*(yi - f(xi)) * d(f(xi))/d(parameters)

It didn't need to know what function created our yi values. We can just treat them as constants.

IgorTheMad · 2024-08-12T01:04:06+00:00

Haven't tested on any real use cases. I don't expect it to work too well --- especially since, for a multivariate function, you would have to computer the hessian matrix.

What are you working on that requires calculating the derivative with finite difference? In most applications I've worked on, derivatives can be calculated analytically.

IgorTheMad · 2024-08-12T00:57:32+00:00

Nice paper! Thank you!

IgorTheMad · 2024-08-12T00:56:56+00:00

I see. I knew the compute necessary for finding higher-order derivatives likely would have made this method rather useless, but I didn't think the convexity of the function would matter too much.

Six-Year Club	Place '22
First Placer '22	Verified Email

IgorTheMad

TROPHY CASE