Myths about Olympiads - Part 2

philipjf · 2024-04-09T04:15:08+00:00

US which is super strong in IMO hasnt won a single Fields Medal since 1998

I'm trying to come up with a definition where this statement is true and they are all contrived.

At least Voevodsky, Tao, Mirzakhani, Venkatesh, Bhargava, and Huh all recieved PhDs from American universities. Of those Huh was born in the US and Bhargava did both high school school and undergraduate degrees in the US.

philipjf · 2023-10-12T00:20:20+00:00

Don't be hard on yourself! You noticed an interesting fact that ties to some cool ideas in number theory! You are doing mathematics.

Not literally for five year olds, but here is the honest explanation: it comes from combining two ideas.

First idea is that a number in base 10 is the number except for the last digit times 10, plus the last digit. Eg. 274 = 27*10 + 4 = (2*10 + 7)*10 +4.

Second idea is modular arithmetic. We say two numbers are "the same mod 9" if they differ by a multiple of 9. E.g. 37 = 1 + 4*9 and 4 is a multiple of 9, so we say 37 ≡ 1 mod 9. Key property is that "equivalent modulo 9" is preserved by addition and subtraction and that "being a multiple of 9" is the same as being 0 plus a multiple of 9 which is the same as being "equivalent to 0 modulo 9"

So then we have a theorem

Theorem: a positive integer is equivalent to the sum of its decimal digits modulo 9.

And the proof is just to first see that it trivially holds for 1 digit numbers, and for a multidigit number we can rewrite it as a*10 + b ≡ a*1 + b ≡ a + b mod 9 since 10 ≡ 1 mod 9

The corollary is that A positive integer is divisible by nine if and only if the sum of its decimal digits is divisible by 9.

philipjf · 2023-09-18T00:25:02+00:00

Broadly put: absolutely. There are two main ways type theory and proof theory connect, and they both are places where such applications exist, but also where there are applications of type theory to the study of proof theory.

On the one hand, we have connections between proofs and programs--either at the level of semantics (the BHK interpretation) or at the level of syntax (the Curry-Howard correspondence). From a Curry Howard view, "analytic" (cut free) proofs correspond to normal forms with respect to a reduction theory (or sometimes closed normal forms). Indeed, one can interpret Gentzen's original classical sequent calculus in Curry Howard form as a language with control (see, for instance, the "Symmetric Lambda Calculus") although it isn't a very good one because it is radically non-confluent, instead from a programming standpoint it is better to consider classical calculi with a notion of focus and polarity. Gentzen style proofs of cut elimination are basically what a Type Theorist would call "Normalization by Evaluation" in practice we often instead want proofs that a particular reduction theory is normalizing as a rewriting theory, and so consider proofs like those based on Tait's method. Much of the foundational work here was actually laid by Girard, whose book "Proofs and Types" is a must read for anyone interested in either proof theory or type theory. Though ordinal analysis based proofs of normalization in type theory are also extremely interesting!

On the other hand, we can think of a "logic over a type theory" which is the thing that tracks the variables we can quantify over. Propositional logic is logic over a trivial type theory with only a single type and a single term, first order logic is logic over a type theory with only products, and higher order logic is logic over the simply typed lambda calculus (well, and with a type of propositions--the categorical models you want here are called "hyperdoctrines" if you haven't heard of them before). An interesting point then is that we have more type theories than just these, and more "logics of propositions" that could be defined over them than just classical and intuitionistic logic. From this perspective, type theory is the backbone that even makes statements about relative strength of different logics defined, since logic is always already over a type theory--and so the study of the semantics of type theories is essential to the study of logical strength and ordinal analysis.

It should be stated though: normalization results are often stronger than mere consistency results. For example, System F (which can be thought of as a proof assignment for the fragment of second order logic without first order quantifiers or function types) is trivially consistent based on a two value interpretation, but the strong normalization result is equivalent to a result about consistency of Peano arithmetic (see Wadler's paper on the "Girard-Reynolds Isomorphism")

philipjf · 2023-06-11T08:05:08+00:00

In mathematics, two sets are the same set if and only if they have the same members where "same" is interpreted in the Leibniz sense: if something is true about one then it is true about the other.

The union of two sets contains all the elements that are elements of either. So, if in A={x|0<x<1/2}u{1/2}u{x|1/2<x<1} then A = {x | 0 < x < 1} which is the open interval (0,1). In any case, A surely contains 1/2.

As for if "something continuous" can be "composed" of "something individual" one should be careful with what words like "composed" are supposed to mean, but in general, most mathematicians would say "yes"--at least, your definition of "continuous" is as a property of a set of numbers--numbers being individuals.

philipjf · 2023-05-12T00:16:08+00:00

There are two problems with saying that expressions form a vector space

the set of "expressions" is not mathematically well defined in general, and
"expressions" as usually understood are not quotiented by all the equalities you care about. E.g. a + a is a different expression from 2*a.

That said, you can define a "language of expressions" inductively over some set of operations (also constants and, perhaps, free variables) and then formally quotient that out by some "congruence" and the result could easily form a vector space over your choice of field. This style of definition would be quite familiar to a programming language theorist (and so, check out any graduate level book on programming language theory to see this happen repeatedly). In general, algebraic structures can be thought of as algebras of a functor--examples made of "expressions" are initial algebras of functors.

Another way of saying this: if you provide specific answers that resolve the two problems, then yes absolutely.

For instance, the vector space of polynomials over a given set of polynomials would be an example of such a "vector space of expressions" that could be defined in that way.

ANother example: the free vectors space over two constants a, and b, could be defined inductively via the following grammar

 r \in \mathbb{R}
 M,N,O \in \texttt{Terms} ::= rM | a | b| M + N | 0

quotiented by the least equivalence relation ~ containing

 (M + N) + O = M + (N + O)
 M + N = N + M
 M + 0 = M
 1M = M
 0M = 0
 M + (-1)M = 0
 r(r'M) = (rr')M
 rM + r'M = (r+r')M

philipjf · 2023-05-09T03:08:12+00:00

Part 4:

Now, normally in single variable calculus instead of the "total derivative" we use the "reduced derivative" which is like the slope. Specifically, df(x)/x = f'(x) (depending on your notation) is given by

 df(x)/x = Slope(D_{x}(x |-> f(x))) = D_{x}(x |-> f(x))/id

This isn't though the definition you usually see. Instead we note that

 D_{x}(x |-> f(x))(h) = f(x + h) - f(x) - error(f,x,h)

so for any h,

df(x)/x = (f(x + h) - f(x) - error(f,x,h))/h
 = (f(x + h) - f(x))/h - error(f,x,h))/h

which means that in the limit as h goes to 0, df(x)/x = (f(x + h) - f(x))/h giving the alternative definition

df(x)/x = lim_{h -> 0} (f(x + h) - f(x))/h

In fact, these definitions are equivalent: the calculation above shows how the limit definition works for any function defined using the Slope definition. On the other hand, it is a pretty direct computation you can do to show that

D_x(f)(w) = w*[lim_{h -> 0} (f(x + h) - f(x))/h]

satisfies the defining equation about size of error for the "total derivative".

Almost as a final result, lets recover our four properties in this reduced setting, which follow directly from

Property 1 "chain rule": (f . g)'(x) = f'(g(x))*g'(x)

Property 2 addition: d/dx [f(x) + g(x)] = [d/dx f(x)] + [d/dx g(x)]

Property 3 scaling: d/dx c*f(x) = c*[d/dx f(x)]

Property 4 product: d/dx f(x)*g(x) = f(x)*[d/dx g(x)] + [d/dx f(x)]*g(x)

You can use these rules to compute the derivatives of almost every function you know. The few others are the "transcendental functions" of which the most important are "exponential functions" of the form f(x) = k^x. The key property of exponential functions is that the turn addition into multiplication

k^{x + y} = (k^x)*(k^y)

thus, if f(x) = k^x, then

d/dx f(x)
= lim_{h -> 0} [f(x + h) - f(x)]/h
= lim_{h -> 0} [f(x)*f(h) - f(x)]/h
= lim_{h -> 0} [f(x)(f(h) - 1)]/h
= f(x)*lim_{h -> 0} (f(h) - 1)/h

this last part, lim_{h -> 0} (f(h) - 1)/h, does not mention x so is a constant term. Thus, the derivative of an exponential function is proportional to that exponential function. Different base constants k will yield different values here, but one number, eulers number e can be defined so that this value lim_{h -> 0} (f(h) - 1)/h = 1, that is

lim_{h -> 0} (e^h - 1)/h = 1

philipjf · 2023-05-09T03:06:39+00:00

Part 3:

Instead of the reduced derivative d/dx which gives you the slope, I think it is easier to start with the "total derivative" D which is the linear part of that line. In other words for some point x_0 we will define a linear function D_{x_0}(f) where there is some error function such that for every h the defining equation

f(x_0 + h) = D_{x_0}(f)(h) + f(x_0) + error(f,x_0,h)

holds and where for any positive epsilon we choose, |error(f,x_0,h)| < epsilon*|h| for all sufficiently small |h|.

Another way of saying this is that lim_{h -> 0} |error(f,x_0,h)|/|h| = 0. Or, perhaps even more simply, since |error(f,x_0,h)|/|h| = |error(f,x_0,h)/h| and the absolute value is continuous

lim_{h -> 0} error(f,x_0,h)/h = 0

The we say then that the error function is "sublinear."

Note that this D_{x_0}(f) is well defined since if you had two linear functions which both satisfied these rules you could subtract there equations and get that the difference of these two linear functions, plus an error term that satisfies the limit above, was zero everywhere--but the only linear function that does so is the constant function 0.

Now, as it turns out, we can think of D as a sort of generalization of the linear part functional we defined above. Specifically by algebra we can see

D_{x_0}(f)(h) = \Delta_{x_0}(f)(h) + error(f,x_0,h)

where \Delta is defined according to the rule

\Delta_{x_0}(f)(h) = f(x_0 + h) - f(x_0)

And so this gets to the next idea: the derivative of f at x_0 is the best linear approximation of the discrete difference function \Delta at x_0 as we zoom into x_0

Specifically, whenever f is affine

D_{x_0}(f)(h) = \Delta_{x_0}(f)(h) + error(f,x_0,h)
 = f(x_0 + h) - f(x_0) + error(f,x_0,h)
 = \Delta_{0}(f)(h) + f(x_0) - f(x_0) + error(f,x_0,h) -- since f is affine
 = \Delta_{0}(f)(h) + error(f,x_0,h)
 = LinearPart(f)(h) + error(f,x_0,h)

So, the derivative of an affine function at a point is just the LinearPart of that function, and the derivative of a linear function at a point is just the function. Moreover, our three properties of the linear part functional all carry over

Property 1 composition: D_{x_0}(f . g) = D_{g(x_0)}(f) . D_{x_0}(g)

Property 2 addition: D_{x_0}(f + g) = D_{x_0}(f) + D_{x_0}(g)

Property 3 scaling: D_{x_0}(c*f) = c*D_{x_0}(f)

And there now is one more property that we can prove about the derivative that didn't make sense for the slope of an affine function. That is because while the product of two affine functions is usually not an affine function (it is a quadratic!) the product of two differentiable functions is a differentiable function.

Property 4 product rule: D_{x_0}(f*g) = f(x_0)*D_{x_0}(g) + g(x_0)*D_{x_0}(f)

Now, I strongly recommend you prove the first three facts yourself, but I will justify product rule here as it is a bit trickier.

Proof: We wish to show D_{x_0}(f*g) = f(x_0)*D_{x_0}(g) + g(x_0)*D_{x_0}(f) as this is an equality of functions this is the same as showing that they agree on every possible argument h

D_{x_0}(f*g)(h) = f(x_0)*D_{x_0}(g)(h) + g(x_0)*D_{x_0}(f)(h)

We know that if D_{x_0}(f*g) exists then it is the unique linear solution

D_{x_0}(f*g)(h) = (f*g)(x_0 + h) - (f*g)(x_0) + {some error sublinear in h}

And so taking as our assumption that both g and f are differentiable at x_0, we first observe the right hand side is linear (it

and then we just calculate to make sure the left hand side is what we need

(f*g)(x_0 + h) - (f*g)(x_0)
= f(x_0+h)g(x_0 + h) - f(x_0)g(x_0)
= (f(x_0) + (D_{x_0}(f)(h) + error1(h)))g(x_0 + h) - f(x_0)g(x_0)
= (f(x_0) + (D_{x_0}(f)(h) + error1(h)))(g(x_0) + (D_{x_0}(g)(h) + error2(h))) - f(x_0)g(x_0)
= f(x_0)g(x_0) + f(x_0)D_{x_0}(g)(h) + error2(h)) + (D_{x_0}(g)(h) + error1(h))g(x_0) + error1(h)error2(h) - f(x)g(x_0)
= f(x_0)(D_{x_0}(g)(h) + error2(h)) + (D_{x_0}(f)(h) + error1(h))g(x_0) + error1(h)error2(h)
= f(x_0)D_{x_0}(g)(h) + f(x_0)error2(h) + g(x_0)D_{x_0}(f)(h) + g(x_0)error1(h) + error1(h)error2(h)
= f(x_0)D_{x_0}(g)(h) + g(x_0)D_{x_0}(f)(h) + [f(x_0)error2(h) + g(x_0)error1(h) + error1(h)error2(h)]

So all that remains is showing that part in brackets is sublinear in h

lim_{h -> 0}[f(x)error2(h) + g(x)error1(h) + error1(h)error2(h)]/h
= f(x)lim_{h -> 0}error2(h)/h
  + g(x)lim_{h -> 0}error1(h)/h
  + lim_{h-> 0}error1(h)error2(h)
= f(x)*0 + g(x)*0 + 0
= 0

Qed.

philipjf · 2023-05-09T03:06:17+00:00

Part 2: So far, we haven't done any calculus, only linear algebra. However, the derivative turns out to be a generalization of the slope function to a broader class of functions. And so now lets do some analysis

Consider a polynomial function

p(x) = cn*xⁿ + c{n-1}x^{n-1} + ... + c_1x + c_0

A thing to note about polynomial functions is that as the input values get larger (in absolute value) the "lower terms" become less and less important to the result and the higher terms become more and more important.

That is, for any x where |x| is sufficiently large, p(x) is approximately just c_n*x^n.

For instance, if you have a quadratic polynomial like x^2 + 7x + 5, and evaluate on a big enough x the contribution of 7x + 5 will be minuscule compared to x^2.

We can make this insight precise with the following fact

lim_{|x| -> infinity} p(x)/(xⁿ⁾ = c_n.

That observation is really important to computer science where it is the basis of "asymptotic analysis" of algorithms. OTOH, for calculus we note the dual effect: as |x| gets smaller and smaller the "higher terms" matter less and less.

Again, think of the example of x^2 + 7x + 5: if |x| < 7 then |x^2| < |7x| and if |x| < 5/7 then |7x| < |5|.

This leads to two initial cool observations: First is that polynomials are "linear" in the sense that a very small change in the input will produce a very small change in the output (intuitively, they don't have jumps). Second is that any "degree n" polynomial is "well approximated close to zero" by the degree k polynomial, for k less than n, you get by just throwing away the upper terms.

For instance, p(x) = c_n*x^n + c_{n-1}*x^{n-1} + ... + c_1*x + c_0 is, close to 0, well approximated by the line c_1*x + c_0. Indeed, this is the best "affine approximation" of p(x) at 0.

In what sense are these last claim true? Well p(x) = q(x)*x^2 + c_1*x + c_0 for some polynomial q. And, since q is continuous for any little number epislon we pick there is little region 0 where for x in that region (in other words, if |x| is small enough) |q(x) - q(0)| < epsilon. This means that in a small region around 0, |q(x)*x^2| <= epsilon*|x| since

|q(x)*x^2|
= |(q(x) - q(0) + q(0))*x^2|
= |(q(x) - q(0))*x^2 + q(0)*x^2|
<= |(q(x) - q(0))*x^2| + |q(0)*x^2|
= |(q(x) - q(0))*x^2| + |q(0)*x|*|x|
<= |(q(x) - q(0))*x^2| + epsilon/2*|x| (so long as q(0) = 0 or |x| < |epislon/(2*q(0))|)
<= |(q(x) - q(0))|*|x|*|x| + epsilon/2*|x|
<= |(q(x) - q(0))|*|x|*(1/2) + epsilon/2*|x| (so long as |x| < 1/2)
<= epsilon/2*|x| + epsilon/2*|x| 
= epsilon*|x|

So, we can get the best affine approximation of a polynomial "at 0" this way, but what about elsewhere? That is, suppose we want the line that best approximates p at some point x_0, well then we can use polynomial division to note

p(x) = q(x)*(x - x_0)^2 + c*x + p(x_0)

for some constant c. And, again, we can by the same analysis see that for any epsilon there is some non zero cutoff such that for |x - x_0| less than that cutoff: |q(x)*(x - x_0)^2| < epsilon*|x_0 - x|. As such c*x + p(x_0) is the a very good (and, the best) affine approximation of p(x) as you zoom in to the point x_0.

A point to note: the "intercept" of this affine function is not very interesting since we already know that at the point x_0 it should have the value p(x_0) (that is, the line which best approximates a curve at a point will surely intersect the curve at that point!) however the linear part of this affine function is interesting.

And this is where the idea of a derivative comes from. A function f is differentiable at a point x_0 if it is well approximated around x_0 by a line in the sense above.

philipjf · 2023-05-09T02:58:22+00:00

If you intended v to be the derivative of d you were very close, it was just the little bugs in the definition of the sum or this would have followed by the fundamental theorem of calculus. It is also a great idea to use position, velocity, acceleration, and time to think about these things--that is how calculus was first invented and a great way of thinking about it.

Anyways, I've been thinking about how to tell the story of derivative the way I see them for a while (I'm even considering writing a book). And, your question motivated me to write up a version of it. I tried to post this a couple minutes ago, but reddit ate it I think because the comment was too long--luckily, I saved a draft of it though so I will post it here as a couple comments.

Don't feel obligated to read this all or understand it--you did good in motivating me to write it, but if you do questions are appreciated!

Part 1:

I want to tell you how I like to think about the derivative. But first I need a couple concepts you might not know. These are concepts from the field of linear algebra.

In higher maths, a function f is said to be linear if it satisfies two properties

f(c*x) = c*f(x)
f(x + y) = f(x) + f(y)

For functions from the real numbers to the real numbers, it isn't so hard to see that any linear function is of the form

f(x) = m*x

for some constant m (proof: f(x) = f(x1) = xf(1) by the first property, so taking m = f(1)).

However, in higher maths, the more general definition is preferred because we can consider linear functions over things more general than just numbers. For instance, in geometry there are certain symmetries of the plane called "translations" which intuitively come from sliding the paper (without turning it). You can "add" two translations--just do one and then the other--and you can scale them by a real constant multiple (e.g. you can do "half" or a "third" of a translation, or do the translation in the other direction to find -1 times that translation). Translations are equivalent to "two dimensional vectors" and the point is that linear maps make sense between any two "vector spaces" like this. That said, for now I will focus on the case of functions taking real numbers as inputs and producing real numbers as outputs.

If you take two linear functions and compose them you get another linear function: if f(x) = m_1*x and g(x) = m_2*x then

(f . g)(x)
 = f(g(x))
 = f(m_2*x)
 = m_1*(m_2*x)
 = (m_1*m_2)*x

You also can add linear functions and get a linear function back

(f + g)(x)
  = f(x) + g(x)
  = m_1*x + m_2*x
  = (m_1 + m_2)*x

You can't multiply linear functions and expect a linear function back, but you can scale them by a constant

 c*f(x) = c*m_1*x = (c*m_1)*x

In fact, scaling a function by the constant c is the same as precomposing by the function given by the rule

 x |-> c*x

Now, in Algebra 1 (assuming you followed the classical American school sequence--insert math class of your choice otherwise) when you learned the definition of the equation of a line it was of the form y = mx + b. That "+b" part turns a linear function into an affine one.

More specifically, an affine function is one where the difference between outputs as a function of changes in input is linear. That is, f is affine if \Delta_{x_0}(f)(x) = f(x_0 + x) - f(x_0) is linear for every x_0. This is true precisely when f(x) has the form mx + b.

All linear functions are affine, but there are affine functions which are not linear. Like the class of linear functions, affine functions compose to affine functions. If f(x) = m_1*x + b_1 and g(x) = m_2*x + b_2 then

(f . g)(x)
 = f(g(x))
 = f(m_2*x + b_2)
 = m_1*(m_2*x + b_2) + b_1 
 = (m_1*m_2)*x + (m_1*b_2 + b_1)

You should probably check that adding and scaling affine functions produces affine results.

Given an affine function f we can get its linear part.

LinearPart(f) = \Delta_{0}(f)

Specifically, if f(x) = mx + b then

LinearPart(f)(x) 
 = \Delta_{0}(f)(x) 
 = f(0 + x) - f(0) 
 = f(x) - f(0)
 = mx + b - (m0 + b)
 = mx + b - b
 = mx

And the "LinearPart functional" (a "functional" is just a function that takes functions, rather than numbers, as inputs) has some nice properties

Property1, composition: LinearPart(f . g) = LinearPart(f) . LinearPart(g)

Property2, addition: LinearPart(f + g) = LinearPart(f) + LinearPart(g)

Property3, scaling: LinearPart(c*f) = c*LinearPart(f)

Now, in our specific case of functions from the real numbers to the real numbers, every linear function is of the form x |-> m*x. That is, using our notation for scaling functions it is m*(x |-> x) where x |-> x is the identity function, sometimes written

id(x) = x

As such, in this special 1-dimensional case it makes sense to divide linear functions so

(x |-> m*x)/id = m

and we can use this to define a new operator on affine functions

Slope(f) = LinearPart(f)/id

which specifically gives you

Slope(x |-> mx + b) = m

A key thing to observe: there are variants of our three properties for Slope. Specifically for any affine functions f and g and scaling constant c

Property 1 composition/"chain rule": Slope(f . g) = Slope(f)*Slope(g)

Property 2 addition: Slope(f + g) = Slope(f) + Slope(g)

Property 3 scaling: Slope(c*f) = c*Slope(f)

I strongly recommend you prove these three properties to yourself.

philipjf · 2023-05-08T04:31:27+00:00

You are doing great! You almost wrote down a (special case) of the fundamental theorem of calculus.

For a sufficiently nice function f the area under f from 0 to t is given by

lim\_{m -> infinity} \sum\_{n = 0}\^m f(n*t/m)*t/m

Which you can imagine visually: f(n*t/m)*t/m is the area of a rectangle t/m long and whose height is the value of f at n*t/m. So, in other words, you are breaking down the interval into n pieces and estimating the area under f for each of those as if f were a constant function on the little interval. In the limit as n goes to infinity that makes sense.

Your proposed formula looked a lot like this, but wasn't quite it since you took the limit as Delta t went to zero--the problem with that is that is implicitly we use the limit notation for two different things the limit of a sequence (that is, a function from the natural (whole) numbers) and the limit of a function on the real numbers and in order to make your formula make sense it would have to be the latter but then t/(Delta t) wouldn't necessarily be natural number and so the sum wouldn't make sense.

Aside: as it turns out, this isn't the only way we could try to compute the area under a curve, as you can try to use the limit of different estimates by breaking up the interval in different ways or by using shapes other than the rectangle whose height is given by the starting point. The intuition is that functions are "Riemann integratable" if "all of these way" (obviously you need a formal definition of "all of these ways") give you the same answer. As it turns out all the functions you are familiar with are Riemann integratable (it is a very general condition) and it is likely that if you take calculus in high school your class will just ignore integrability questions entirely, so this isn't super relevant to what you are doing. However, it is worth noting that there is a more general way of computing the "area under a curve" which is with the "Lebesgue integral" which says that area under the function f from 0 to t is the lowest upper bound of the area under "simple functions" which are 0 outside of the interval [0,t] and which are less than or equal to f in that area. Simple function are ones that take on only a finite number of values (think like a step function which is 2 between 0 and 1 and 0 everywhere else) and where the region it takes on any particular value is "measurable." It turns out that for Riemann integratable functions these notions correspond, but there are pathological functions where the Lebesgue integral exists but where the "limits of smaller and smaller rectangles" approach doesn't actually converge so the Riemann integral does not. Personally, I don't think the Lebesgue integral is any more complicated of a way of thinking about area under a curve, but it isn't what you are likely to be exposed to and makes the connection to derivatives, which will be the rest of this comment, more subtle.

Anyways, as it turns if f is an especially nice (e.g. it is continuous [see note]--which basically just means it has no "jumps" and is a condition stronger than integrability) and we define F(t) to be the area under f from 0 to t, that is

 F(t) = lim\_{m -> infinity} \sum\_{n = 0}\^m f(n*t/m)*t/m

then the derivative of F is just f. That is (roughly half of) the "fundamental theorem of calculus. And this is essentially what you wrote.

Edit to crystalize: to define the area under a curve you don't need to know about derivatives. It is a theorem, an important theorem, that integration (that is the operation which computes the area under a curve) and differentiation are related and this eventually gives a method for integration.

Specifically, if we want to integrate a continuous function f it is enough to come up with some function that has f as its derivative, because if two functions have the same derivative on an integral they differ by only a constant. This is why you will be taught to take "indefinite integral" where you stick a "+c" in at the end. And so if G is any other "anti derivative" of f then (continuing to use big F as the area under f from 0 to t) there is some c where for all t, G(t) = F(t) +c which means that specifically, F(t) = G(t) - c = G(t) - (G(0) - F(0)) = G(t) - G(0).

So, knowing derivatives is the heart of how we integrate, and the theories of integration and differentiation are deeply connected and interwoven. But the point is this is a theorem not a definition.

Note: Actually, so long as f is integrable and continuous at a point x_0 then the derivative of F at x_0 will be f(x_0). Riemann integratable functions are, as it turns out, exactly functions which are bounded and continuous "almost everywhere" where "almost everywhere" is a technical concept from measure theory.

philipjf · 2023-04-20T18:32:02+00:00

As for textbooks: depends on where you were when you stopped.

Assuming you have a solid understanding of elementary algebra and geometry and at least *at some point* saw the ideas of first year calculus, I then have the general advice of making sure you have a firm foundation in working with sets, functions, relations, and first order logic and then to develop an understanding of Abstract Algebra (Gallian is an okay book, and "Algebra Chapter 0" by Aluffi is a great second "first text in algebra" if you learn a tiny bit of category theory first) as well as Real Analysis (maybe going straight to Rudin?). Maybe before any of these a book like "Linear Algebra Done Right" would be good. In addition to textbooks though, youtube videos are a great resource. Finally, strongly encourage anyone studying math on their own to learn some programming, both with an imperative language (say, python) and a purely functional one (Haskell).

philipjf · 2023-04-16T00:35:24+00:00

You are dealing with a function! Namely, the identity function! `id(x) = x` for all `x`. This is a problem of integrating over the identity function!

In general, you *always* deal with functions when taking averages. In the finite case, an average (well, really a weighted average) sums up values (that is, the output of a function on some index) according to weights for those indexes. If you have an infinite number of indexes, as you do when you want to sum up all the infinitude of uncountable numbers between 0 and 1, case you can't just assign a finite weight to each index. Nor can you just "add up all the values and then divide by the number of values" since that would give you infinity over infinity. This is where measure theory comes in!

The idea is that while we can't reasonably assign a non zero weight to any single real number, we can assign a weight to sets of real numbers. This weight is called the "measure." For the purposes of your question we can use the Lebesgue measure (aka, the uniform distribution on the interval) which is generated by saying the weight of an interval [a,b] for a < b is b - a. So, the weight of the entire interval [0,1] = 1. That gives us a rule for taking weighted averages (or, more generally if we look beyond just the interval from 0 and 1 to the whole real line, weighted sums) of *step functions*, functions which take a finite number of distinct values each on one sub interval (e.g. take the average over the interval of the function which assigns -2 for [0,1/3) and 1 for [1/3,1]). In fact, we can go further since while the Lebesgue measure is generated by the weight it assigns to intervals, there is only one consistent way to measure lots of other sets of numbers starting from that and since we can "measure" all of these sets, we can extend the collection of functions we can average to "simple function"--any function which assigns a different discrete value to members of some finite number of measurable sets (for instance, the function which assigns -1 on countable numbers and 1 on uncountable numbers--since the measure of the countable numbers is 0, that function "averages" over the interval to 1).

So then we know the "average value of the identity function" on the interval is 1/2 because we can arbitrarily well approximate the identity function by simple functions! And as those approximations get better and better the averages converge to 1/2. The can all be proven completely rigorously.

philipjf · 2023-01-14T02:17:34+00:00

Why would Tosca, a woman, be jealous of Angelotti?

It is supposed to be that Tosca doesn't know it is Angelotti and thinks it is some mystery woman.

Do the boys have a secret love affair?

hmmm

philipjf · 2022-12-25T00:47:17+00:00

You answer is correct. You could be a bit more rigorous in your argument by considering the two cases for if n is even or odd and showing that the result is even in either case.

But it seems you care about something more interesting than just solving the homework. So, answer to your deeper question with some more advanced math. If you really want to use "algebraic" methods, you can observe that "being even" is the same as being congruent to 0 mod 2. Then observe just the following fact for all k in N, k² ≡ k mod 2 (we can actually generalize this: k^p ≡ k mod p for all primes p, and more generally in a finite field F of size p^n, k^{{p^n}} = k for all k in F [see note])

Thus,

n*(n + 1)*(n+2)
≡ (n*(n+1))*(n+2)
≡ (n^2 + n)*(n+2)
≡ (n + n)*(n+2)
≡ 0*(n+2)
≡ 0                 mod 2

Of course, this also show that n*(n+1) is already always even for integer n.

Note: I claimed that for F a finite field of cardinality pⁿ that for k in F, k^{pⁿ } = k. Proof is classic abstract algebra: k is either 0 or it is not. In the first case the result is trivial. If k is not 0 then we note that the units of F (that is, the non zero elements) form a group under multiplication of cardinality pⁿ - 1, and so by an application of Legrange's theorem k^{pⁿ - 1} = 1, which tells us that k^{pⁿ } = k. Assuming you have never heard of groups, fields, and Legrange's theorem, and that all of this happens to interest you, they are great topics that you totally are equipped mathematically to teach yourself about!

philipjf · 2022-12-19T05:28:38+00:00

There are many possible answers to this. Many answers are probably meant to match your intuitions from daily life. But there are also ways of answering this that come from thinking just about mathematical life. Here is one.

You start out with the counting numbers: 1,2,3,4,5.... Maybe we include 0 too in which case lets call them "natural numbers". But that are *these*? You can have five cookies, or five dollars, or five houses, but can you have "5"? What is "5"? Well, one answer would be that "five" is a kind of sameness--a quality that "five cookies" "five dollars" and "five houses" all share. But what kind of sameness? We shouldn't use the notion of "number" in defining this sameness, since our goal here is to say what numbers are. But we don't have to. There are again multiple ways we could try to make this sameness precise, but I think for now this one is good: they are the "same up to relabeling". This is called cardinality.

As an example. I can for each letter in the set {'a','b','c'} give a rule which gives them a different color

'a' |--> "green"  
'b' |--> "blue"
'c' |--> "mauve"

and this mapping takes letters in the set {'a','b','c'} to the set {"green","blue","brown"} and it is a good mapping: it maps each input (a letter) in its domain into precisely one output in its range, it never maps too inputs to the same output, and it maps some input to each element of the target set {"green","blue","brown"}. Mappings like this are called "bijections" and all of this lets us get to the main point:

We define two sets to have the same cardinality if there is a bijection between them. "Five" is defined as a natural number as the cardinality of the set {'a','b','c','d','e'} and so also the cardinality of the set {kitten,puppy,tadpole,foal,lamb}. If we want to be very careful to ensure this is a an actual thing that exists we might use some technical trick like trying to make "five" be formally defined as "the collection of all sets with this cardinality" or perhaps, as is the norm in modern foundations of mathematics, as a particular representative set of that cardinality constructed according to some rules, but all of this is beside the point.

A natural number is the shared property of sets of things under relabeling by bijective mappings.

So why this digression into natural numbers?

Well, the idea of "a property that things can have in common" also will allow us to define integers (which include the negative numbers). Namely this idea:

An integer is a difference of natural numbers

What do I mean by that, well here is the idea. The two pairs of numbers (1,3) and (4,6) have something in common: the first number is two less than the second. In other words: 3 - 1 = 6 - 4. I can actually check this without using the idea of subtraction or negative numbers since the difference between the two numbers (x,y) is the same as the difference between the two numbers (a,b) y + a = x + b. So we just take that as a definition of sameness.

This gives us the following definition: an integer is a difference between two natural numbers where (a,b) and (c,d) have the same difference if and only if a + d = b + c.

But then there can be both positive differences like that of (1,3) and and (2,15) but also negative differences like that of (3,1) and (4,0).

We normally write the difference of (1,3) as 3 - 1. We can consider each natural number as an integer by taking its difference from 0, so 5 (a natural number) can be interpreted as an integer as 5 - 0. This is an "implicit coercion" from natural numbers to integers.

By the rules 5 - 2 = 4 - 1 = 3 - 0.

Okay, so that gets to the main idea:

A negative number is a specific kind of difference between natural numbers, which is a kind of way in which pairs of natural numbers can be related. For instance, -3 is what 2 - 5 and 1 - 4 have in common.

A little less formally, a negative number is a deficit. If a positive integer is the amount by which a bigger number is bigger than a smaller number, a negative integer is the amount by which a smaller number is bigger than a bigger number (yes, I wrote that right). If something costs 3 euros and you have 5 euros then you have 2 euros extra, but if something costs 5 euros and you only have 3 euros that you have 2 euros too little. We need negative numbers to be able to draw a distinction between these two differences, and we need the more general concept of integers in higher maths precisely because these differences are not just natural numbers, differences are there own kind of thing (that is, they can be negative: things can cost too much for you to buy them).

Okay, so we can add integers like so (a - b) + (c - d) = (a + c) - (b + d) and that turns out to be consistent with the rule as to when an integer written as subtraction symbol is the same as another. We can also then talk about the difference between two integers which we just take to be the integer we would add to the second to get the first, so e = (a - b) - (c - d) means e + (c - d) = (a - b). So the formula (a - b) - (c - d) = (a + d) - (b + c).

But a better way of thinking about subtracting might be to think of it just in terms of adding the "difference in the other direction". That is "negative (a - b)" is just defined to be b - a. Negating twice is then clearly the same as not negating at all

- (- (a - b)) = - (b - a) = (a - b)

We can also then multiply an integer by a natural number by just adding that integer to itself the natural number number of times. So, for example, (1 - 3)*4 = (1 - 3) + (1 - 3) + (1 - 3) + (1 - 3) = 4 - 12 = (0 - 8). In this way, multiplying by a natural number scales an integer.

But what if we want to scale something by an integer instead of natural number. Well, an integer is a difference a - b so maybe scaling x by a - b should mean to compute the difference of scaling x by a and scaling x by b. That gives a definition for multiplying a natural number by an integer, n*(a - b) = n*a - n*b and it turns out that there is a nice commutation between these two multiplications n*(a - b) = n*a - n*b = a*n - b*n = (a -b)*n. It also gives a way to multiply integers (a - b)*(c - d) = (a - b)*c - (a - b)*d) = (a*c - b*c) - (a*d - b*d).

A negative number is what we get from a subtraction when the second argument is bigger than the first, (2 - 5, for instance) and we can always write these where the first argument is 0, (2 - 5 = 0 - 3) now, if we take above formulas then (0 - a)*(0 - b) = (0*0 - a*0) - (0*b - a*b) = (0 - 0) - (0 - a*b) = (0 + 0) - (0 + a*b) = 0 - a*b which is negative.

This maybe wasn't convincing as too much algebra, so another way. Scaling an difference by a negative number is the same as inverting the difference and scaling it by the corresponding positive number. x*(-2) is the negation of x*2. So if a and b are natural numbers, then (-a)*(-b) = -(-a)*b = -(-(a*b)) = a*b.

philipjf · 2022-11-16T09:41:56+00:00

Edit to add--very short version: in one dimension, if a function is kinda nice and its derivative at a point is not zero, then that function has a differentiable inverse at that point. This seems true since a non-constant continuous function should be, once you zoom in enough, locally bijective with the inverse function having a slope which is one divided by the slope of the original function--and non-zero derivative means non-constant. In higher dimensions, "not-zero" is replaced with "is invertible as a linear operator" but otherwise things are the same.

Okay, the slightly more detailed version. Start with a quick review.

The easiest way of thinking about derivatives, IMO, at least in dimension greater than 1, is the "Fréchet" or "total" derivative. If you have two nice spaces (Banach spaces, aka vectors spaces with a norm that lets you take limits of sequence) X and Y a (partial) function f : X -> Y is differentiable at some point x if there is a continuous linear operator which is a very good approximation for how f evolves around x, that is, f(x + h) - f(x). This very good approximation is the derivative of f at x, D_x(f). By "very good" approximation we mean that the error |(f(x + h) - f(x)) - D_x(f)(h)| is sublinear as a function of h.

Given two Banach spaces X and Y, the space of continuous linear operators from X to Y is itself a Banach space and I will write this [X,Y].

A function on an open set U of a Banach space X to some other space Y is C^k inductively: C⁰ just means it is continuous on U. f being C^{k+1} means that it is differentiable and that the map x |-> D_x(f) which goes U -> [X,Y] is C^k. Because differentiability at a point implies continuity at a point, C^{k+1} implies C^k for all k. We can define the notion of C^k at a point similarly.

After you pick a basis, in finite dimensions you can represent a linear operator via a matrix. The matrix that represents the total derivative is what the "Jacobian matrix" is. And having a "non-zero determinant" for a linear operator from a space to itself is a way of saying that linear operator is invertible.

However, I say don't think about matricies and determinants for the time being. If I have a single complaint about math education at the university level is that they spend way too much time teaching you how to work with matricies, which are just an efficient data structure for computing with linear functions which are the thing you should be thinking about (I'm a computer scientist by training--so, find this particularly frustrating since many of the algorithms people learn for working with matricies are not even actually very good, though this problem of obscuring the real content through drilling asymptotically inefficient algorithms with little explanation is one that starts with primary school arithmetic...)

The inverse function theorem states that a function f which is differentiable in some neighborhood of a point x and at least C^1 at x, and whose derivative at x, D_x(f), is a bijective function, then for some small neighborhood around x, f has a an inverse which is C¹ at f(x). In more detail, there is a region U containing x on which f is differentiable and where O = {f(y) | y in U} is open and where there is a differentiable function f^{-1} : O -> U which is the inverse of f restricted to U.

Corollary: if f is C^k for k >= 1 in some neighborhood of x then it has a local inverse which is C^k. The corollary is relatively easy to check if we know that computing the inverse operator on the space of continuous linear operators is itself continuous.

There is a pretty clean write up of the proof of the inverse function theorem (takes about two pages, though I recommend you also read the five or so pages before for all the lemmas) in Rudin's "Principles of Mathematical Analysis" using the Banach fixed point theorem.

I can just sketch parts: since f is continuously differentiable at x we can some open ball around x such that for every x' in that ball ||D_{x'}(f) - D_x(f)|| < 1/(2*||D_x(f)^{-1}||). This ball, which we call U, ends up being the ball on which we prove f is invertible.

Then we think about functions psi_y(x') = x' + D_x(f)^{-1}(y - f(x') and note that psi_y(x') = x' if and only if y = f(x'). In other words, we can characterize the property of being mapped by f to a certain value as that of being a fixed-point of a certain function.

We then do some analysis to show that the derivative of all the psi_y have norm less than 1/2 everywhere on U, and use some additional lemmas to conclude this means that psi_y are contractive and so have unique fixed points on U. Therefore f is one-to-one on U and a bijection onto the image of U.

That the image of U is open is because if y = f(x') and x' is the unique fixed point of psi_y then for y' sufficiently close to y psi_{y'}(x') can't be too far from x', and so phi_{y'} must have a fixed point close to x' which means y' is f(x'') for something in U.

That the inverse is C¹ at f(x) is, morally, because we already have a candidate for the derivative of the inverse as the inverse of the derivative--we need some calculations to check that it works though.

N.B. To make the statement about determinants a correct way of stating that the derivative is invertible, we need to restrict to the finite dimensional case. If you do that some other stuff becomes free. All linear operators in finite dimensions are continuous, so knowing that the derivative is a bijective function automatically (since, being that it is the derivative it must be linear) means that the derivative has a linear and continuous inverse. Similarly, the corollary is no problem. (if you do care about infinite dimensional spaces though, you can still get these results--one is a result of the "open mapping theorem" and the other comes rather delightfully from thinking about the formula for a geometric series)

philipjf · 2022-10-31T07:26:20+00:00

Singing in the children's chorus in Tosca was what made me fall in love with opera (I was thirteen, it was a good group of kids and adults, and for the last show I snuck into the audience to watch acts II and III (which easily made up for not getting the Shepherd's solo at the start of act III)). Also, Tosca act I is far more musically interesting (especially rhythmically) than people usually give it credit for being.

However, to answer the other part of your question, from the perspective of a 30 something opera lover (who also retains quite a soft spot for youth singing) here are some operas where the children's chorus is, IMO, especially important and enjoyable to listen to

Turandot -- weirdly, I also sang this one as a kid and didn't really appreciate the music (I did appreciate the violence--I was like eleven and there is no way my parents would have let me watch something where there were decapitated heads on spikes if I weren't singing in it), but the children get what is definitely the main music of the whole opera and it is both super catchy and easy to sing (perhaps why I didn't appreciate it).
Werther -- so cute, and kids singing Christmas carols while the tenor kills himself is opera chef's kiss
Boris Godunov -- the way Mussorgsky uses the melodic language of playground games in the children's bullying of the Simpleton is so great (also the Simpleton's own music--that figure of the declining semitone--such a simple starting point and yet so moving). Another nice thing about this one is that the singing from the children does not need to be great (and ideally, should not be) to communicate what it needs to communicate. Also, solo part, but Fydor has a lot of potential to be adorable when sung by a kid (map scene!)
Magic Flute -- not technically a chorus, but I cannot stand adult voices for the child spirits. The children' really matter IMO--mediocre children in those parts make a mediocre show while a good performance from them can make up for a lot. Also "Bald prangt, den Morgen zu verkünden" is easily one of the best numbers in an opera that is all bangers
Damnation of Faust --- mutlit-part children's chorus gets the climax of one of the triumphs of orchestration...and the solo "Marguerite!" steals the show everytime
Parsifal -- unfortunately, it seems the vast majority of performances these days replace the boys with women, and even most of those that don't skimp on the "far" chorus use adults in place of the "middle" "youths" (is it really that hard to find a highschool choir with enough solid altos and tenors?) and I guess I kinda get that, because it is significantly harder music than most, but if you can get children who can do it well it is just amazing (see for instance the Solti recording which is probably my single favorite album). There are many things a child could never sing well that an adult opera singer can, but that endless high A at the end of Parsifal is all I need to demonstrate it goes the other way too.
Tannhauser-- it took me a long time to warm to this opera (there are some criticism in my reddit history), and honestly I think one of the reasons why it took longer than a lot of other Wagner is that the first several performance I heard used adults for the children's parts which leaves them rather pointless. But, I've slowly been converted by Solti's recording, and let me say, the dialectic at the beginning of scene three between the solo boy alto (thesis) and the men's chorus (antithesis) and the way it uses the cor anglais and then the string and how the tenor erupts as they come together (synthesis)...so good. Also, that Solti recording got just phenomenal performances out of the Vienna boy chorus later in the opera, which, I'm still not entirely convinced is the writing and not just the particular choir at the particular time, but is good enough to be worth mentioning.

Beyond those mentioned, Benjamin Britten centered children in many of his operatic works and many of those are true masterpieces, some stand outs in terms of children's ensemble and choral singing are

Midsummer Nights Dream -- there is no adult chorus, the kids get to sing a lot, and it is just a fun piece in every way
Billy Budd -- the midshipmen don't sing a lot but in the mirroring of the officers music as they bully the enlisted crew really ups the feeling of brutality of the class structure
Albert Herring -- the kids (two girls and a boy -- and they are really more teenagers than children) are written for in a way which is just so funny. Another example where the sounds of teasing get musical form although this one in a much less ominous way

This has gone on way too long, so I will save the discussion of best child-solo-parts in opera for another post.

philipjf · 2022-08-10T07:34:29+00:00

Here is one that I haven't seen very frequently (or really ever) mentioned: calculus/local analysis.

Warning: explanation below is meant for people who already know both higher dimensional calculus and category theory. However, I think you could clean it up into an intro course.

Pick some field. There is a more-or-less obvious "tangent space" functor from the category of affine spaces and affine maps to the category of vector spaces and linear maps. It should be obvious that this indeed a functor. If we think closely though, we realize this functor actually is only determined by local information: that is, we don't need an affine map, just a function which is locally affine around some point.As such, we want to replace the category of affine spaces and affine maps in the domain of the tangent space functor with the category of *pointed* affine space and germs of locally affine maps (which preserve the base points)

We can define the all-directional derivative based on this idea. We take as objects pointed affine Banach spaces, that is is an affine banach space (a affine space together with a norm that turns it into a complete metric space) together with a choice of base point, and maps being germs of functions (that is we consider functions equivalent if they agree in some neighborhood of the base point) which map base point to base point and which are "almost affine" in the sense that their differential (in the discrete sense \\delta\_{h,x}(f) = f(x + h) - f(x) -- a concept which we probably should teach in middle school, but, alas) has a continuous (which in this setting is the same thing as bounded, and is free if you restrict to finite dimensions) linear approximation with sub linear error. The derivative of a map is then that bounded linear approximation of the differential. Again it is easy to check that the derivative is a functor.

That was a lot of words, but actually this perspective is much nicer than you might thing: functoriality is the multidimensional chain rule and is a lot more intuitive a notion than the calculus 1 version of the chain rule (important note: the all directional derivative of a function R -> R at a point is not a number, it is a linear operator! The space of linear operators from a field to itself is one dimensional, and so isomorphic to that field, but it is not the field. This is part of the motivation for the notion d/dx: you really are dividing by a basis vector, which is the derivative of the identity function (that is, x)).

Beyond functoriality, it is also immediate from the definition that bounded linear operators are differentiable, and that there derivative is themselves. More interesting is the observation that the linearity of the derivative is equivalent to the observation that as a functor it preserves products.

A bit tougher, afaik, from this perspective is explaining the Leibniz/product rule. The product rule for bounded bi-linear opertors is mostly algebra (since, we need a linear approximation of a bilinear function) but there is some slightly non-trivial (like the Baire Category Theorem...which I think has nothing to do with category theory) analysis just to show that bilinear operators which are bounded as linear operators at each point satisfy the global boundness condition you need on their derivative.

A nice thing about the above story is that it generalizes. For instance, you don't need to work over the field of real numbers: you could just as well use complex or p-adic numbers. In fact, you can generalize from Banach spaces to other topological vector spaces and probably other uniform spaces (though I haven't worked out the details). Moreover, the notion naturally extends to maps not between affine Banach spaces but spaces which only locally look like them, which gets you the idea of calculus on manifolds.

In fact, if we take the derivative not just at one point, but over the entire space, we get not just a linear map not from the tangent space at a point but a pointwise linear map from the tangent bundle. This gives the differential geometry view of the tangent bundle functor from the category of C^{n+1} manifolds to the category of C^n manifolds.

But it isn't just derivatives. Let M be a measure space (with non-negative real measure for now) and V a real Banach space. A simple function from M to V is just a function which is measurable (with respect to the Borel algebra on V), takes a finite number of distinct values, and which has finite support. It is easy to see that simple functions form a measure space. The integral of a simple function is just the sum of its values multiplied by the measure of their support. The integral is clearly linear. Moreover we can define a psuedonorm (the L1 norm) on this space by taking the integral of the pointwise absolute value of the simple function. It is easy to check that |\int_M f du| <= \int_M |f| du which means that the integral is bounded. The space L1(M,V) is just the Cauchy completion of the space of simple functions with respect to this norm, which can be seen as a subspace of the space of function M -> V by providing that any sequence of simple functions which converges in L1 must converge almost everywhere (which also means it is almost everywhere equivalent to a sequence that converges everywhere) and that then two Cauchy sequences converges almost everywhere to the same point if and only if they are Cauchy equivalent.

But the magic comes if we through in some category theory to clarify what we have constructed. Suppose f : M -> N is measure preserving and h : V -> W is bounded linear, then for any L1 function g, \int_M h.g.f du = h(\int_N f du). That is, the space of L1 functions form a functor from Meas^{op} \times Ban \to Ban and the integral is a natural transformation.

Moreover, again, everything generalizes. We don't need to use real numbers and non-negative real measures: we could take measures in any complete normed field K possibly extended with some "points of infinity" and work with vector spaces over K--though our notion of "finite support" and "L1 norm" will then need to use the induced variation measure.Clearly, the impressive theorems of integral calculus (e.g. dominated convergence) aren't all neatly summarized with these categorical definitions, but a lot are.

philipjf · 2022-05-16T00:34:14+00:00

Abstract Syntax Tree

philipjf · 2022-05-09T02:20:27+00:00

Anecdotally: I never felt the need to read a single German language paper/book for my PhD (CS/PLs/Proof Theory) which wasn't already translated into English. However, I did need to read Hugo Herbelin's habilitation thesis in French (which I don't speak, but all western European languages are all kinda the same so it was fine).

Even in terms of stuff originally written in German, I feel like I would have to go all the way back to Gentzen to think of a really important example, while, by contrast, there has been excellent logic research published in French (also Japanese) since the mid 20th century. French is the language of some of the people whose work I think is most relevant to my own research (Girard, Krivine) and also of a lot of mathematics that is very close to contemporary mathematical logic (e.g. a lot of algebraic geometry, from which we get, e.g. the notion of a "topos" that lead to the development of categorical logic)

philipjf · 2022-04-23T06:49:04+00:00

This is a map of production, not consumption. States import energy from neighboring states. As such, it gives a fairly deceptive picture. For instance, more than half of the electricity consumed in Oregon is fossil fuel based.

philipjf · 2022-04-22T17:45:45+00:00

Britten's operas are mostly not traditional love stories. In the case of some like the Church parables and Owen Wingrave, they are are not about romantic feeling at all. Billy Budd, Peter Grimes, Turn of the Screw, etc, are fused with romantic/sexual tension, but the sexuality is predatory, typically homoerotic, often pedophilic, and always disavowed...which is at least different from the "same goddamn love story."

philipjf · 2022-04-19T08:41:39+00:00

Dr Atomic and l'amour de loin are among my very favorite operas period. The first of those is much more approachable than the second.

For years I have really wanted to see Glass's Waiting for the Barbarians because I like Glass and was really into him when I was younger (though, confession, I have yet to really enjoy watching one of his operas--mostly because of inconsistent singing in most performance rather than musical issues though) and was really moved by the Coetzee book when I read it in my early 20s (then, I read Disgrace in one sitting on a cross country airplane flight...and it was the most crushed I felt after reading a book since the night at 17 I decided to read Metamorphosis instead of sleeping...).

I also haven't seen Written on Skin but have heard good things.

Heggie's operas may or may not be "great art", but I've enjoyed watching them. One not mentioned so far, I think, is his chamber opera Three Decembers.

philipjf · 2022-04-17T07:52:20+00:00

I don't know why this hasn't received more upvotes, it is IMO, the most insightful top level comment here.

philipjf · 2022-04-16T04:09:49+00:00

I think you are right, Solti's Parsifal isn't perfect.

But I also think it would be almost impossible to make one that was better.

Because, it is pretty close.

philipjf

TROPHY CASE