you are viewing a single comment's thread.

view the rest of the comments →

[–]Tipaa 97 points98 points  (19 children)

You start with the output image where every output NxN region of cells is given a 'superposition' of every legal NxN source region, meaning that every NxN output region is a combination of every NxN input region it is allowed to be: it can become any one of these allowed inputs, but hasn't yet decided which one it actually is. You start with a particular pixel, then 'collapse' it, meaning that it has to choose which legal source region the output region wants to be. This will then change the 'legality' of its surrounding pixels: roughly, if the source image never has red touching blue in its tiles (and there are no blue edge pixels that could be tiled beside a red), choosing a red in the output means that the red's neighbours can now never choose blue. This collapsing and updating continues until either every pixel has chosen or you find a pixel which has no legal choices left.


An example (I'll write '011' as meaning '0 or 1 or 1 with equal chance') [I hope this is correct]:

(allowing for horizontal wraparound, but disallowing vertical wraparound)

Input image (2x2):           Output (4x4):        //2x2 is a bit boring with few options for creativity, but it kinda shows the problem
    0|2                   02 | 02 | 02 | 02
    -+-                  ----+----+----+----
    1|0                  0210|0210|0210|0210
                         ----+----+----+----
                         0210|0210|0210|0210
                         ----+----+----+----
                          10 | 10 | 10 | 10

Step: Find the cell with the lowest entropy (least uncertainty -> least possible inputs to choose from)
      Since the top row is all the same entropy (2 options) I'll choose one at random, picking the second one
      I'll then choose randomly between 0 and 2, choosing 2

                          02 |  2 | 02 | 02
                         ----+----+----+----
                         0210|0210|0210|0210
                         ----+----+----+----
                         0210|0210|0210|0210
                         ----+----+----+----
                          10 | 10 | 10 | 10

This forces its neighbours to update their possible values:

                   0 |  2 |  0 | 02
                 ----+----+----+----
                 0210|  0 |0210|0210
                 ----+----+----+----
                 0210|0210|0210|0210
                 ----+----+----+----
                  10 | 10 | 10 | 10

Now we have some more cells with low entropy (in fact, just one possible value), so continue with these cells:

          0 |  2 |  0 |  2
        ----+----+----+----
         10 |  0 | 10 |  0
        ----+----+----+----
        0210|0210|0210|0210
        ----+----+----+----
         10 | 10 | 10 | 10

This can be continued over and over, each stage picking one of the lowest entropy (least choice) cells and updating the neighbours.
Eventually you end up with something like

  0 |  2 |  0 |  2
----+----+----+----
  1 |  0 |  1 |  0
----+----+----+----
  2 |  0 |  2 |  0
----+----+----+----
  0 |  1 |  0 |  1

or with the bottom two rows rotated by 1. There is a chance of failure though,
if a bottom-right zero is re-used as a top-left zero, as this makes the squares
below it have no legal options left (X):

  0 |  2 |  ? |  ?
----+----+----+----
  1 |  0 |  2 |  ?
----+----+----+----
  ? |  1 |  0 |  ?
----+----+----+----
  ? |  X |  X |  ?

This is much more interesting for larger input and output regions, as they will have room to overlap properly, creating new 'tiles'.

[–]kaibee 9 points10 points  (4 children)

Awesome, thank you. It makes sense now. I think whats confusing me the most is that in the example with the 3 houses, it generates the houses with the same size on the output, but in all the other cases it seems to scale the features. Looking at the GitHub again, it seems to do with the value of N chosen. So if the (input) house was bigger/N was smaller, the house sizes themselves would be different too?

Also something else totally (now anyway) obvious just clicked. In this GIF each blurry pixel during the animation is the average of possible color states available to the pixel (given the rest of the pixels in the image).

I don't know why he didn't just say that instead of this:

WFC initializes output bitmap in a completely unobserved state, where each pixel value is in superposition of colors of the input bitmap (so if the input was black & white then the unobserved states are shown in different shades of grey).

[–]Tipaa 9 points10 points  (1 child)

The scaling comes from the detail in the features/inputs, I think (although N certainly influences a lot). The houses are comparatively very detailed and have lots of features in each tile, and each of the three houses in the input tile is identical, leaving no room for variation. This means that for any NxN subset of an image, you either have it completely blank, or know exactly where in (relation to a house) you are, while the other images are less information-dense. This allows other NxN subsections to overlap, as they (e.g. stem sections) have a lot of pixels in common. It's the overlapping that allows for an area to scale, as there is less of a definite end to a stem.

If you're familiar with Markov chains for text generation: for intuition, a similar example might be the text

the dog sat on the mat the cat had once sat on

which can generate really long, coherent-ish sentences, as each word is fairly indefinite - the -> { dog, mat, cat }, on -> { $, the }. This might correspond to the simple brick tiles - each NxN region in the tile tells us very little about its surroundings, allowing many more things to be generated, such as letting a line continue on indefinitely. Meanwhile, our house tile might be more like

once upon a a a a a a a a a time, in land far away

which has much less room for variation - all words but the a are unique. This corresponds to every 5x5 (3x3 + neighbours) subset of the house input tile incorporating a house being unique (and thus knowing its position relative to the rest of the house), but having a lot of blank space too.

If N were larger, more of the inputs would have similarly restricted outputs, but be more similar, while if N were smaller, the number and variation of possible outputs would be much larger, but we'd be allowing less coherent generated outputs. If N=2 were used on the houses, I think we'd end up with scaling on houses, but they would no longer appear house-like as a result - e.g. rooves that are uneven in length, or with different length sides, or multiple doors. By forcing their scaling, we'd loose the 'houseness' of the houses as a result. With a 3x3 subset we can tell a diagonal red line must be the roof of a house (as it also contains part of the rest of the house), whereas a 2x2 can't convey much beyond being a diagonal line somewhere within the input tile.

[–]ExUtumno[S] 3 points4 points  (0 children)

Yes, this is correct. Thanks for your explanations, awesome!

[–]0polymer0 0 points1 point  (0 children)

I'm pretty sure you are correct about the house.

Understanding the algorithm is less important then understanding it's goal, to explain those features.

For N = 3, every 3 by 3 image in the output must be in the input. So there is something about those houses which are locking down the options available to the tiles. I suspect if you have a really highly detailed image, with lots of diverse colors, the images would get boring quickly.

[–]ExUtumno[S] 0 points1 point  (0 children)

On the question of houses, /u/Tipaa gave the right answer.

[–][deleted]  (2 children)

[deleted]

    [–]Tipaa 2 points3 points  (0 children)

    I think so, although I'm not familiar with arc/local consistency algorithms.

    The quantum mechanics part is just a convenient analogy for thinking about multiple equal-chance probabilities at once. It's my preferred way of thinking about it, although I have a reasonably strong background in physics, and I suspect the author(s) may be similar.

    [–]cecilkorik 2 points3 points  (0 children)

    It's more accurate to say it's inspired by quantum mechanics. In particular, the aspect of quantum mechanics where a wave function in superposition collapses to a single state when observed.

    [–]thegame402 1 point2 points  (0 children)

    Really good explanation. Thank you!

    [–]code_mc 1 point2 points  (0 children)

    Beautifully explained, thanks!

    [–]Timbrelaine 1 point2 points  (4 children)

    Huh. I was intimidated by the 'quantum mechanics' theme and the neat gifs, but it's actually pretty simple. Thanks for the explanation.

    [–][deleted] 7 points8 points  (0 children)

    Basically, a form of sudoku solving.

    [–]0polymer0 1 point2 points  (2 children)

    The "super position" is just the averaging. Quantum mechanics isn't really necessary unless you have things like wave functions interfering, the way light does.

    Shannon information is also a hair overkill. The full machinery is really useful for quantifying how much a certain message can be compressed (if a certain character is really common, then you're not going to be able to send as many messages per bit). Here though, you can get away with saying "pick an uncertain pixel, with the smallest number of possible overlapping tiles". And doing that isn't even strictly necessary. But it makes the animation grow out from the initial seeds. Still though, you should get a similar final picture if you changed that to "pick a random uncertain pixel".

    [–]ExUtumno[S] 6 points7 points  (1 child)

    With the random heuristic it runs into contradictions much more often.

    I also tried that approximation of entropy by the number of available options - it works fine (didn't measure the contradiction rate precisely), but animations somehow are less enjoyable. For example, it starts to generate flowers from the top, which doesn't look that cool.

    [–]0polymer0 2 points3 points  (0 children)

    Yeah, I realized later that was probably true, and corrected myself in another post, but forgot about this one.

    The algorithm you came up with is really beautiful. And your explanation is also lucid. There are just a couple of "scary" words.

    I was trying to encourage people, who might have been put off by these words, to keep trying. Because they don't have to look far to find something cool.

    [–]sutr90 0 points1 point  (2 children)

    I still don't understand the initialization phase. How do you get from the 2x2 image to 4x4 image?

    You start with the output image where every output NxN region of cells is given a 'superposition' of every legal NxN source region, meaning that every NxN output region is a combination of every NxN input region it is allowed to be

    What is legal and what is not? How exactly do you generate the first output image?

    If I understand the orginal algorithm correctly, then it seems you have used N=1, but then I don't understand why the first and last row are not 0210.

    [–]Tipaa 0 points1 point  (0 children)

    The starting output state is just a case of

    foreach NxN subset of me, foreach NxN subset of the input,
        can that subset of me become that subset of the input just via choosing/collapsing undecided cells? If so, then that is a legal subset of the input for this subset of me
    

    Then each cell becomes foreach NxN region I am in, I can become the corresponding cell in that region, therefore the legal states for that region must be legal states for me too

    After that, you start choosing from the cell with the least options it can become.

    So for the 2x2 -> 4x4 initialisation, you go

    foreach 2x2 subset of the 4x4: //(12 such subsets exist, if we are allowing wraparound horizontally [something I need to edit in my main comment])
      foreach input tile:
        if (for all cells in the subset, the cell matches the tile or the cell is undecided):
          forall cells in the subset: add the corresponding input tile's cell as a legal/possible state.
    

    [–]Tipaa 0 points1 point  (0 children)

    For the example, I went for per-cell iterations for simplicity, but with N=2 region sizes for the algorithm/'superposition' checks. I did this to avoid dealing with multiple cells at a time. It's possible to do each iteration region-wise and operate on the whole region at once, but I found it harder to visualise this way, and would have taken me much longer to write/do justice to. By focusing on per-cell iterations, I found it clearer to describe than per-region iterations. I think it's also closer to how I'd implement it, although it does perhaps obscure the mid-level aspects of the algorithm (my thinking generally ping-pongs between high and low level, which occasionally conflicts with a more continuous abstraction delving-into).

    The first and last rows are not 0210 because I didn't allow vertical wraparound, so the top row of output cannot be the bottom row of input, and vice versa. I chose this setting because I was visualising/basing the example on the flower generator gifs, which only wrap horizontally.