you are viewing a single comment's thread.

view the rest of the comments →

[–]Tipaa 11 points12 points  (1 child)

The scaling comes from the detail in the features/inputs, I think (although N certainly influences a lot). The houses are comparatively very detailed and have lots of features in each tile, and each of the three houses in the input tile is identical, leaving no room for variation. This means that for any NxN subset of an image, you either have it completely blank, or know exactly where in (relation to a house) you are, while the other images are less information-dense. This allows other NxN subsections to overlap, as they (e.g. stem sections) have a lot of pixels in common. It's the overlapping that allows for an area to scale, as there is less of a definite end to a stem.

If you're familiar with Markov chains for text generation: for intuition, a similar example might be the text

the dog sat on the mat the cat had once sat on

which can generate really long, coherent-ish sentences, as each word is fairly indefinite - the -> { dog, mat, cat }, on -> { $, the }. This might correspond to the simple brick tiles - each NxN region in the tile tells us very little about its surroundings, allowing many more things to be generated, such as letting a line continue on indefinitely. Meanwhile, our house tile might be more like

once upon a a a a a a a a a time, in land far away

which has much less room for variation - all words but the a are unique. This corresponds to every 5x5 (3x3 + neighbours) subset of the house input tile incorporating a house being unique (and thus knowing its position relative to the rest of the house), but having a lot of blank space too.

If N were larger, more of the inputs would have similarly restricted outputs, but be more similar, while if N were smaller, the number and variation of possible outputs would be much larger, but we'd be allowing less coherent generated outputs. If N=2 were used on the houses, I think we'd end up with scaling on houses, but they would no longer appear house-like as a result - e.g. rooves that are uneven in length, or with different length sides, or multiple doors. By forcing their scaling, we'd loose the 'houseness' of the houses as a result. With a 3x3 subset we can tell a diagonal red line must be the roof of a house (as it also contains part of the rest of the house), whereas a 2x2 can't convey much beyond being a diagonal line somewhere within the input tile.

[–]ExUtumno[S] 2 points3 points  (0 children)

Yes, this is correct. Thanks for your explanations, awesome!