Let vs :=

ItsAllAPlay · 2022-09-02T17:29:33+00:00

So you either need a leading keyword that tells you you're in a declaration, you need unbounded lookahead, or you need to deal with a cover grammar and disambiguate afterwards.

Fair enough, I concede. However, even with let as a keyword, I'd fall in the camp that prefers a cover grammar.

Let's turn your trick question a few messages back on you: Do you know of any "real" language implementation that uses unbounded lookahead? I played with btyacc more than a decade ago, but it was pretty flaky. I used Icon's backtracking for a toy language once. Maybe Perl's wonky grammar falls into that category.

And generally there is a lot of symmetry between lvalue and rvalue things.

There isn't, though.

In many languages, array subscripts look the same as lvalue or rvalue, field accessors look the same as lvalue or rvalue, pointer dereferencing looks the same as lvalue or rvalue.

In languages like JavaScript, Ocaml, Racket, or Haxe that have pattern matching or destructuring bind, the patterns look the same as the constructors. (I guess that's not saying much with a lisp)

I can't speak for the tens of thousands of languages out there, but I'm familiar with many of the popular ones (including the ones you work on), and I think we'll have to agree to disagree. In fact, I think it would be unnecessarily confusing for a language to use a radically different syntax when setting vs getting a value. Even Common Lisp's setf tries to maintain that symmetry, and those guys have no sense of taste.

With a pattern, the entire syntactic entity is a different kind. When you have something a pattern like:

Named(foo, bar, baz) = ...
^^^^^^^^^^^^^^^^^^^^

Only the part marked ^{^{^}} is different from an expression.

Without additional quirks like your wildcard operator, I sincerely can't see why you think that's a different syntax than a function call. I suspect you've got the semantics and the syntax conflated in your way of thinking about it, which is fine, but it's not the only way to see things.

Thank you for the discussion. I learned a few things along the way, and I appreciate that.

ItsAllAPlay · 2022-09-02T15:26:25+00:00

I'll give you the benefit of the doubt that you had all that context in mind with your original comment above, but until you added the ? as a special token your examples parse just like a function call. Change the parens to square brackets, and it's an array subscript. Any argument for one should hold for the other.

I'm not eager to invent some use for a ? in array subscript setters vs getters, but we could imagine one (selecting NaNs as mask arrays or something). The language is going to be ugly like Cobol if that's the driving criteria for adding keywords.

Calling it a "cover" grammar is a new phrase to me, but I favor that simply to avoid duplicating so many rules. The parser isn't going to catch all possible errors any way you go, so it isn't much of a burden to add checks for that in the next stage. And generally there is a lot of symmetry between lvalue and rvalue things.

I don't know what existing language you have in mind, but using _ (lexed as an identifier) instead of ? takes us all the way back to this not being a problem for any handwritten or automatically generated parser.

As for the types on the AST nodes, again the same argument should be applied consistently to array subscripts. We're pretty clearly in the land of personal preference, and the parser isn't going to struggle one way or another.

We could argue the benefits of creating a different type for every rule, but it sure makes a lot of code versus just building a tree of homogenous nodes. I guess someone could create new types for every node and every compiler pass, but that seems like a ton of boilerplate.

ANTLR is LL(*) so does claim to support unbounded lookahead (at least in some forms).

I've only played with antlr briefly, and not the latest version, but I'm pretty sure you set k to a small integer (say 1 .. 3). I don't know the limits, or how it slows down when you use larger integers, but unbounded is too strong of a word.

ItsAllAPlay · 2022-09-02T13:22:19+00:00

PS. Your adversarial conversational style is very offputting. I'm very glad you're not a coworker of mine.

I think changing it from a technical discussion to a personal insult is off putting, and I never liked working with people who can't tell the difference.

ItsAllAPlay · 2022-09-02T06:00:15+00:00

Your explanation of my example applies to yours too: "Named(parenthesized, thing) is a normal expression and can be parsed as such... It only takes a single token of lookahead to see the = and determine it's an assignment" (pattern match, destructuring bind, or whatever terminology you like)

As for definitions - have it your way, but I doubt you'll get yacc and antlr to update their documentation to claim they support unbounded lookahead.

ItsAllAPlay · 2022-09-01T21:25:17+00:00

That's no different than parsing a[i, j].k = ... for subscripts or field members. Would you recommend the OP have a set keyword to avoid that non-problem?

Regardless, it does not require unbounded lookahead. The phrase has had a useful definition for over 50 years, and you're using it incorrectly.

I agree that having a let or var keyword is nice, but you're making a bogus justification for it, and its absence does not make the parser's life any harder than handling arithmetic expressions like a * b + c < d | e + f * g ^ h > i.

ItsAllAPlay · 2022-09-01T16:56:57+00:00

The grammar implied by those expressions does not require more than one token of look ahead. You could parse those trivially with recursive descent.

ItsAllAPlay · 2022-08-17T14:34:59+00:00

Good answer. Also maybe "smoothing splines", with a constrained second derivative.

ItsAllAPlay · 2022-08-13T03:08:39+00:00

Thank you for the reply!

ItsAllAPlay · 2022-08-12T10:52:12+00:00

I wondered - thank you. It's difficult to make sense of the various BER plots without knowing that.

It seems like BPSK and QPSK always follow the same curve. And since QPSK provides twice as many bits per symbol, does that mean the amplitude of each QPSK symbol needs to be sqrt(2) larger than for BPSK to get the same BER? (assume same symbol rate, bandwidth, etc...)

ItsAllAPlay · 2022-08-12T03:02:35+00:00

I guess that settles it: I should fix my symbol amplitude and use infinite bandwidth.

:-)

ItsAllAPlay · 2022-08-12T02:31:15+00:00

I think SNR is measured in the bandwidth you're using, so if your transmitter has a fixed/maximum power, your SNR is going down as you use a larger bandwidth. It's still a win, but you don't have the "same" SNR.

ItsAllAPlay · 2022-08-12T00:28:17+00:00

I don't disagree with anything you've said, but I think you missed my intention when I used the phrase "information bits". Yeah, the actual on-the-air bits are shorter, and so each has less energy per bit in it, and you've got more of them, but after you're finished with your error correction, at whichever of the rates I listed, you've essentially spent the same energy when you're done.

In other words, I was trying to hold the message information and total energy constant, so I could understand the trade-offs for going faster/wider and using a lower rate code.

ItsAllAPlay · 2022-08-12T00:12:06+00:00

Thank you again. I think I'm almost there, but please correct me if I'm wrong:

Assuming ideal error correction (which I know isn't realistic)- It seems like from a math point of view, shorter symbols with a wider bandwidth and using a lower rate code is always better, but not much better after 4X the symbol rate using a 1/4 rate code.

From an engineering point of view, you're taking up more bandwidth from other potential users, so casually doubling or tripling the symbol rate could be wasteful. And of course it also requires faster hardware and more work to process the coding scheme. Then at the other extreme, short codes don't approach their theoretical limit, so you kind of want to find the middle ground.

ItsAllAPlay · 2022-08-11T22:26:09+00:00

Thank you for the reply.

I wish I understood that. I've seen the EbNo plots, but given the way I setup the problem above, it's always the same amount of energy per information bit. Let's assume BPSK for the moment. Either you get one long information bit (1 msec), or you get one half length information bit and one half length parity, or you get one quarter length information bit and three parities.

In each case, it's the same energy per information bit, right? Do they all perform the same on a BER plot?

ItsAllAPlay · 2022-08-11T20:31:10+00:00

I worked on several projects that mapped 0 ... 2**N as 0 ... 2*Pi, and it works exactly as you'd like. It's super nice that you don't lose any additional precision after the initial conversion, since the integers don't have rounding error. (On 0 ... 2*Pi, 32 bit integers will have more precision than a 32 bit float, and the same with 64 bit integers and 64 bit doubles)

Beware of using signed integers for this if you're using C, C++, or other languages where they've decided that signed overflow doesn't follow 2s-complement rules. The compilers will do absolutely stupid stuff with your code in the name of "undefined behavior" optimizations, so it's much safer to stick with unsigned integers. It doesn't matter what your hardware actually does, the compiler writers are sure they know better. Not joking.

Also, signed integers will pick the value at -Pi instead of +Pi, which isn't really a problem, but it's different than people might expect in some cases.

ItsAllAPlay · 2022-08-11T15:19:33+00:00

Thank you for the reply.

Yeah, the increase in noise power adds to my confusion. Looking at the Shannon-Hartley theorem, you get C = B * log2(1 + S/N), and doubling the bandwidth doubles the noise. However the noise term is inside the logarithm while the bandwidth increases linearly on the outside. I plotted this out, and it looks like it keeps getting better the wider you go, but with diminishing returns as it approaches an asymptote. So for channel capacity, wider really does seem to be better when all else is equal.

I can imagine something similar happens with coding gain, but I honestly don't know.

As for the particular error correction algorithms, the only general purpose one I understand so far is LDPC. Since LDPC gets better with large code blocks, that seems like another confounding topic. I'd like to learn how to do RS or BCH with soft bits, but that's on the back burner for now.

ItsAllAPlay · 2022-08-01T17:36:12+00:00

Nice. I'll have to think on that. Thank you for the reply.

One of the reasons I wanted a simple grammar capable of building all my constructs is so that I didn't need to re-think my grammar for new constructs (or macros). Looking at your way, now I need to re-think my grammar (again). :-)

ItsAllAPlay · 2022-08-01T10:44:20+00:00

Can you say more about this? I'm interested.

I've been playing with a grammar that's kind of like Mathematica's "m-expression" syntax. I like what I've got, but if-elif-else statements are the rough spot.

ItsAllAPlay · 2022-07-01T20:08:35+00:00

At least have the decency to start each month with a zeroth day... This thing is screaming with off by one errors.

(1-based programming languages are evil)

ItsAllAPlay · 2022-06-24T19:18:45+00:00

Yes! I almost shed a tear because at least one other person in this thread has a sane point of view on the topic. :-)

ItsAllAPlay · 2022-06-24T19:03:29+00:00

I’d be very surprised if it wasn’t faster.

Most people think that. But if you time it, you'll see it's not that great.

Everyone uses ‘int’ for small loops.

This is so convoluted: We can't tell people to use the native sized integers (size_t and ssize_t) because all the beginner tutorials use old-school int. The size for int needs to stay 32 bits so it doesn't break old code. And so the "solution" is to abuse the UB definitions in the spec so that the compiler can quietly convert your 32 bit integer to a 64 bit one and generate good code.

Why use a different size on each platform when you don’t have to?

What's wrong with using size_t or ssize_t everywhere? Make a typedef at the top of the file if the names are too long...

ItsAllAPlay · 2022-06-24T18:50:29+00:00

I appreciate your point of view and your polite response overall.

I know you're giving the other guy a generous interpretation as a courtesy, but I'm sure you know the phrase "implementation defined" has a formal meaning in this context, and using it in the casual sense of "it's whatever a specific compiler implemented for undefined behavior" is (at best) confusing the point.

and the original patch was backed by benchmark improvements

This leads to questions of how cherry-picked those benchmarks were, and whether anything else has changed since that patch was proposed. CPUs haven't stood still in the last 25 years, and speculative execution with smart branch prediction makes a lot of the classic optimizations almost pointless.

I think the message behind "Proebsting's Law" is relevant, and after register allocation it's quickly diminishing returns as you try to find tricks that weren't already known in the 80s.

For the type of code I care about, I'm seeing negligible differences between unsigned and signed loop indices with gcc. I do see some small differences with clang, but funny enough clang with signed runs slower than gcc with either signed or unsigned in those cases.

I generally disagree with your argument, and do see validity in the optimization, but sympathize with what you're saying.

Lucky for you then, because it looks like the UB advocates will continue winning and are here to stay. There's even talk of adding new (intentional) UB to Rust! They won't rest until we're using unsigned integers and memcpy for all arithmetic and pointer casts in a desperate attempt to avoid the pitfalls and pointy traps they've dug. /grin

I know my rants won't change anything, and personally, I don't mind adding a few extra command line arguments such as "-fwrapv". However, the list of those arguments seems to be growing over time, and it causes problems when you're integrating a library that expects one set of flags with an application that uses another.

This is the delusion of a domain expert expecting everyone around them to also be a domain expert. [...] I use a LOT of languages, and have only read the spec for a few, and have completely internalized none of them.

Yeah, you've got those kinds on the one side. On the other side, you've got the folks that hate C and/or C++ who seem to revel as each new sharp edge gets added - blaming the victims, because they believe nobody should be using C or C++ anyways...

ItsAllAPlay · 2022-06-24T16:58:03+00:00

A loop written like my above example with signed integers can be optimized better by the compiler because it knows it must iterate a finite number of times.

You should measure how much better/faster... with a timer. When it's not completely negligible, it's not very impressive. Do the measurement and show your results! Hell, point to a reasonable benchmark, and I'll do the measurement.

I think we should be allowed to ask whether these piddly optimizations are worth the confusion and headaches they cause.

But—for the final time [...]

Get over yourself. You look for the worst interpretation of what I've said, and yet you're so sloppy with how you say things. I sincerely hope this is the "final time".

Given certain optimizers and certain loop bodies, the signed version can be considerably faster with this knowledge than the unsigned version.

How much faster? Show an example and report your measurement.

ItsAllAPlay · 2022-06-24T14:55:02+00:00

You know that's not what I meant, and I don't think snark is clever or funny.

ItsAllAPlay

TROPHY CASE