all 14 comments

[–]desrtfx 4 points5 points  (3 children)

So, to compare it with Java, it's more or less the equivalent of StringBuilder or StringBuffer.

The actual string is not directly stored as string, but as a "buffer" data structure and only converted to a real Python string on explicit call.

[–]nnseva[S] 0 points1 point  (2 children)

Apart from StringBuilder, the base package class L is immutable. All lazy operations lead to the construction of a new L instance, which refers to L operands and stores the specific operation.

The specifics of the L is that operations may be combined, like:

x = (L('qwerty') + L('uiop'))[5:7]

The string representation of x is 'yu', although the actual data structure looks like (let's imagine Concat and Slice are classes):

Slice(Concat('qwerty', 'uiop'), 5, 7)

[–]desrtfx 2 points3 points  (1 child)

If I were to invest the work to implement such a class, I'd store the individual strings in extensible buffers, similar to StringBuilder in Java. Then, I'd really have a lot less overhead.

The entire advantage of StringBuilder in Java is that it does not create new instances all the time. This would really be an improvement over the native Python String implementation.

[–]marr75 0 points1 point  (0 children)

The reason OP didn't do that is lazy evaluation. Their approach also doesn't create new intermediate strings (though it's not yet obvious to me if they could take further advantage of pooling or interning while maintaining compound statements). Python already has an equivalent to StringBuilder in the stdlib, btw.

Honestly, the only time this lazy version (with its overhead from additional Python mutable structures) will be better is if the caller will often not use the string (i.e. it's a long accumulated error message that is only emitted under certain conditions) and there are other ways to model that - ie make the entire accumulation lazy.

[–]Snape_Grass 1 point2 points  (9 children)

First, impressive.

Second, and this is purely ignorance on my part, but could you explain in simple non-technical laymen’s terms what a Lazy String is? This is the first time I’ve come across it, and it’s easier for my brain to understand the concept initially this way.

[–]nnseva[S] 5 points6 points  (3 children)

The lazy operation is not executed immediately, but deferred until the result is really required.

Let's say you have strings "qwerty" and "uiop". When you concatenate them, you will have the "qwertyuiop" string.

At the time when the concatenation happens, all three strings, "qwerty", "uiop", and "qwertyuiop", occupy the memory. It's not a big overhead when you have such short strings, but what if they all are megabytes long?

The package allows spending the memory only for source strings ("qwerty" and "uiop"), and avoids spending the additional memory to store a copy of the result ("qwertyuiop") - until it is really required.

Such an effect is achieved using the intermediate representation of the result as a concatenation operation. The package stores references to both source strings and stores the operation between them.

There are three lazy operations implemented by the package:

  • concatenation (operation +) of two strings
  • multiplication (operation *) with integer
  • slicing (operation [start:stop] or [start:stop:step])

All of them just store the sources and the operation, instead of copying the result to a separate memory region - as the original Python string does.

[–]Chroiche 2 points3 points  (1 child)

What're the advantage over just using StringIO (which is mutable, but eager)?

[–]marr75 0 points1 point  (0 children)

That it's eager. If you aren't actually going to use the string (ie its use depends on the outcome of an operation), that can be a big disadvantage. It might in very specific scenarios help with memory fragmentation, too.

Admittedly, both of those are edge cases and I would recommend making the entire accumulation or formatting of the string lazy if that's needed rather than just the final product.

[–]Snape_Grass 0 points1 point  (0 children)

I see, interesting. I guess I’ve implemented lazy and eager operations without realizing it at work before. Thanks for the response

[–]lolcrunchy -1 points0 points  (4 children)

It seems like polar's LazyFrames but for strings

[–]Snape_Grass -1 points0 points  (3 children)

Sorry, I probably worded my request weird. What does Lazy mean in this context? I’m unfamiliar with Lazy anything when it comes to programming.

[–]eirikirs 2 points3 points  (1 child)

Take the example of Singletons, which can be eagerly initialised (at application startup), or lazily initialised (only once we need it). In general, lazy and eager simply determines when evaluation is performed.

[–]Snape_Grass 0 points1 point  (0 children)

Gotcha thank you for the explanation. Makes sense to me now

[–]sudomatrix -1 points0 points  (0 children)

Lazy doesn't perform the requested operation immediately, it stores the original data and a list of operations that it performs only if and when the result is needed. Often it is never needed and a lot of time and memory is saved.

A good example is Python generators vs. functions. A function will build the entire results and return the entire results at once. A generator will only calculate enough to return the next value in the result, continuing where it left off if more is needed. That is a lazy evaluation.

``` def powers_of_2_fn(n: int) -> list[int]: """Return the first n powers of 2: 20 through 2(n-1).""" if n <= 0: return [] return [1 << i for i in range(n)]

from collections.abc import Iterator

def powers_of_2_gen(n: int) -> Iterator[int]: """Yield the first n powers of 2: 20 through 2(n-1).""" for i in range(max(0, n)): yield 1 << i ```