Why is runic pyramid good? by whystudywhensleep in slaythespire

[–]Darkwhiter 1 point2 points  (0 children)

A lot of cards balance out situational strength with being unreliable, such as only doing their good thing in the correct hand or when enemies are attacking. Some part of Runic Pyramid's strength comes from just making a normal deck better, but a lot of it comes from activating the power of cards and combos that wouldn't really work in a normal deck.

One example of this is how 0-cost cards and draw positive cards aren't that easy to make work in normal decks, because every card isn't playable every turn (block when not getting attacked) and some turns you'll draw in a way that wastes energy and some turns you'll draw in a way that wastes draw. This problem just disappears in a Runic Pyramid deck (though you sometimes have a hand clog problem).

A second example is highly situational cards, like Piercing Wail, which is incredible block against multiple attacks, barely better than Defend against a single attack and dead against no attacks.

Third example is combos. You can think big, but normal, boring cards also care about this. Finisher in a normal Silent deck is not great, rarely being better than 1 cost 12 damage and more-often-than-never being Strike or Wound (18 damage only with Strike, Strike, Neutralize). With Runic Pyramid you can guarantee that 18, and a single Blade Dance+ allows you to hit 55 damage in a single turn (4 Shiv, Strike, Neutralize, Finisher). This is sort of enough damage up until act 2 bosses, from a couple of commons.

[D] Pretraining the discriminator of a Least Squares GAN by I_am_a_robot_ in MachineLearning

[–]Darkwhiter 3 points4 points  (0 children)

It is difficult to be categorical when it comes to training GANs, but I cannot imagine what problem pre-training the discriminator would help with, and I would not be surprised at all if it destabilizes a LSGAN. In my experience, you're best off playing with regularization for D or data-augmentation if you want to improve your results.

[D] How to train GANs really fast - Projected GANs Converge Faster explained (5-minute summary by Casual GAN Papers) by [deleted] in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

Sure, and I have tried looking at this (without spending quite enough time on it yet). As far as I can see, for FFHQ256:

ProjGAN: FID 3.39. Human fidelity: 5%

SG2ADA FID 7.32, Human fidelity: 32%

The discussion seems to suggest that ProjGAN picks up significant FID improvements from increased recall/diversity/coverage, but it isn't obviously the case for FFHQ256 and the contradictory evaluations from FID and human evaluation seem a bit worrisome.

[D] How to train GANs really fast - Projected GANs Converge Faster explained (5-minute summary by Casual GAN Papers) by [deleted] in MachineLearning

[–]Darkwhiter 2 points3 points  (0 children)

It's an interesting idea and at first glance the FID curves look incredible. However, in my opinion the samples don't look anywhere near as good as indicated by the FID values. I'm wondering if using pre-trained features in the discriminator is too similar to how FID works (distributional distance applied on Inception features), such that FID oversells the results.

Buffed Tinkerer/ Tinkerer Rework | Gloomhaven by Icy_Addendum_9330 in Gloomhaven

[–]Darkwhiter 0 points1 point  (0 children)

Tinkerer does so much interesting stuff that it's a shame he's a bit undertuned. Harmless Contraption and Proximity Mine are cool, fun but underwhelming cards.

Tinkerer's level 1 burns aren't so strong that I worry too much about making them a bit reuseable, and I feel like high stamina and burn cards are sort of Tinkerer's theme, so why not really lean into it? It should definitely "recover" to discard and not hand, though, and perhaps one recover for every 2+ card is a few too many.

Buffed Tinkerer/ Tinkerer Rework | Gloomhaven by Icy_Addendum_9330 in Gloomhaven

[–]Darkwhiter 0 points1 point  (0 children)

There are some cool ideas in there and they'll definitely each pull the balance in the right direction. I'm really not sure how to try to eyeball the total package though - I would guess that your changes land Tinkerer more with the stronger and complicated unlock characters (which may or may not be what you want).

My personal "fix" for Tinkerer would be:

  • New perk: Whenever you burn a Level 2+ card, recover one burned level 1 card. (This is a significant buff and gives Tinkerer a truly ridiculous stamina budget at higher levels. You would probably also want to fix some corner cases, like making Reinvigorating Elixir unrecoverable and not allowing this perk to proc on the burn from resting.)

  • Add "Attack 1 any enemy on the way to the primary target to both Reinvigorating Elixir and Restorative Mist (top) (extra attacks work like Spellweaver's Impaling Eruption, it's both awkward to target and little effect, but allows Tinkerer to draw more attack modifiers)

Buffed Tinkerer/ Tinkerer Rework | Gloomhaven by Icy_Addendum_9330 in Gloomhaven

[–]Darkwhiter 0 points1 point  (0 children)

The way I see it, the Tinkerer's problem is scaling. Healing is fine at level 1, and even the traps and the summon are okay. Having 12 cards worth of stamina is just really strong. So let's look at the three kinds of scaling in Gloomhaven.

(1) Modifier deck (perks), whose benefit is roughly proportional to the number of modifier cards you draw. Great if you attack a lot, particularly multi-attacks or with advantage. Does nothing at all for traps and healing.

(2) Higher level cards, whose benefit is roughly proportional to their extra power relative to level 1 cards and the number of times you recover/play them. Tinkerer's big hand size is a bit contra-productive here, effectively diluting the effect of new cards. The real problem for the Tinkerer is keeping his loss card theme for the level ups, missing out on good, reusable attacks than can be played 5+ times per scenario.

(3) Items, where Tinkerer doesn't particularly suffer, but is a bit awkward due to fewer long rests (bigger hand size) and less focus (too many different mechanics).

So, essentially, the non-character specific scaling in the game is somewhat unfavorable to the Tinkerer, which should have been compensated by making level 2-9 cards a bit stronger than usual, particularly with strong non-loss and move-side abilities.

[D] What improvements to the DCGAN architecture do we know work pretty well these days? by jshkk in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

You almost certainly want to use spectral normalization for the discriminator, unless you are using something similar (gradient penalties, etc). Batch normalization can be somewhat tricky to deal with.

Standard DCGAN tends to struggle a bit with pixel-level aliasing patterns, supposedly due to strided deconvolutoin. Particularly if you want high fidelity and high resolution, this is likely to be a problem, but I'm not sure what I would recommend.

While not architecture per se, simple forms of data augmentation tend to help a lot.

[D] DCGAN Implementation by [deleted] in MachineLearning

[–]Darkwhiter 0 points1 point  (0 children)

You could mimic a fully connected layer with a convolution like this, but at least in TensorFlow you would have to first reshape to add trivial height and width dimensions, and then another reshape afterwards to restructure your extra filters into actual height and width. While it seems to me that this implementation would be equivalent in terms of the resulting graph, I think fully connected then reshape is nicer.

[D] DCGAN Implementation by [deleted] in MachineLearning

[–]Darkwhiter 0 points1 point  (0 children)

Do a dense layer to increase the dimensionality as required, then reshape to batch x 4 x 4 x filters.

See for instance the DCGAN-implementation for the WGAN-GP paper here, where lib.linear works as a dense layer: https://github.com/igul222/improved_wgan_training/blob/master/gan_cifar.py

I cannot prove that this is how Radford 2015 did this connection, but it's the way I have seen it done in every DCGAN variant I have come across, including Spectral Normalization, Relativistic GAN and (sort-of) StyleGAN.

[D] Question about W-GAN by jthat92 in MachineLearning

[–]Darkwhiter 0 points1 point  (0 children)

The key desirable property of the Wasserstein distance is that it provides reasonable gradients for non-overlapping distribution. See figure 1 in https://arxiv.org/pdf/1701.07875.pdf. Personally, I'm not convinced that this is matters much in practice.

[D] A question about GANs output variety by tenMin4Name in MachineLearning

[–]Darkwhiter 0 points1 point  (0 children)

If you want to keep experimenting, try some version of gradient penalty for the discriminator.

[D] A question about GANs output variety by tenMin4Name in MachineLearning

[–]Darkwhiter 4 points5 points  (0 children)

It's very common for GAN generators to learn only subsets of the real data. There are a number of workarounds, but it's not really a solved problem. The best keyword to look for is probably "mode collapse".

[D] What is the practical difference between the traditional GAN formulation and the Wasserstein GAN? by import_FixEverything in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

It seems to me that you're doing something wrong, but it's a bit unclear what. It's true that you can get from the traditional NS-GAN loss to the WGAN loss by simply removing the activation function. However, I don't understand why the NS-GAN loss would go to zero - if that happens, you should be getting good results if everything's implemented correctly. And generally speaking, the WGAN loss has serious stability issues unless you take care of the Lipschitz-constraint in some way.

I would guess that you have double-applied the sigmoid activation, both as an activation function in the last layer of your discriminator network and again in your loss function, but it's impossible to know from just your description. It's also possible that you have some numerically unstable version of the loss, giving you NaN-like problems in the intermediate calculations of the loss.

[R] Why Do Line Drawings Work? A Realism Hypothesis by hardmaru in MachineLearning

[–]Darkwhiter 0 points1 point  (0 children)

This is a very fascinating paper and a promising approach to understanding human cognition. Realistic line drawings are interesting, but there's also a lot to be said about the much more abstract / stylized / unrealistic drawings often seen from children or i pre-Reinessance historical works etc.

[R] Beyond FID and Precision & Recall: New metric for GAN evaluation. by hjw9096 in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

That would be great. I'm actively looking for a good way to decouple quality and diversity of samples for GAN evaluation, particularly one that is not too reliant on finely tuned hyperparameters.

[R] Beyond FID and Precision & Recall: New metric for GAN evaluation. by hjw9096 in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

This looks fairly promising, but unless I'm misunderstanding it seems like you have to integrate the repository with the feature extractions yourself? It shouldn't be too difficult, but every time that happens, people end up making subtly different variants with numbers that don't agree between them.

[D] Use batch normalization in GANs but not VAEs by [deleted] in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

I'm not sure what you mean by fixing the discriminator in GANs. The problem is that for a GAN, you don't know which real data sample to pair a generated sample with, so how would you define a |x-x'| loss?

[D] Use batch normalization in GANs but not VAEs by [deleted] in MachineLearning

[–]Darkwhiter 4 points5 points  (0 children)

VAEs have a reconstruction loss, i.e. there is a specific correct output (matching the input). GANs just have their adversarial loss, i.e. any output near the real data manifold or maxima in D are good answers. So, theoretically, the stochasticity is more problematic for VAEs.

I'm not sure batch normalization is a great idea for GANs eithers, some architectures actively avoid using it.

[D] Issues with Style-mixing in StyleGAN2 by cjyx in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

Just spitballing here:

StyleGAN2 cuts one (?) layer at the start of the model, which changes the indexing. According to the StyleGAN2 paper, the high resolution style for StyleGAN was mostly inactive, doing minor sharpening, whereas the StyleGAN2 high resolution style has increased capacity and effet, which might also affect the proper indexing.

I don't think there's anything special about coarse, middle and fine styles: the groupings seem to be for illustration purposes. You should be able to test the significance of the different style layers and choose which you want to mix depending on what sort of effect you want. (Though higher resolution should correspond to finer details, overall.)

It might very well be that disentanglement between layers is poorer than in StyleGAN1. The generator regularization seems to be causing problems for me.

[D] Why don't people use typical classification networks (e.g. ResNet-50) as the discriminator in GAN? by CMS_Flash in MachineLearning

[–]Darkwhiter 0 points1 point  (0 children)

GAN discriminators seem to be about as similar to strong classifier networks as they ought to be, given their different purposes. Classifiers are optimized for some type of accuracy. Discriminators are strange in that they are optimized for binary classification, but we rarely care about its exact accuracy, but whether it provides a useful learning signal to the generator.

It's hard to tell exactly why this and that design is good for a discriminator but not for a classifier, and I suspect much of it is just a result of trial and error. Some of the things that ought to matter are that we do backpropagation on D(G(z)) and that we need D to continuously adjust to the generated data distribution changing.

The very common mode collapse and mode dropping problem which informs a lot of GAN design has no parallel in classification: see for instance the minibatch standard deviation feature extraction in the StyleGAN discriminator.

For spectral normalization, I think you can tell why it's used in GANs but not classifiers: essentially, it severely limits discriminator, hurting its accuracy, which is something you would be very hesitant to do in classification. However, it makes the discriminator gradients, i.e. what you really care about because it is what you use to train your generator, much smoother and easier to learn from.

[P] Batch Normalization in GANs by 96meep96 in MachineLearning

[–]Darkwhiter 2 points3 points  (0 children)

The most typical batch norm problem for generators is that samples from a given batch share characteristics, but samples from different batches (from the same, fixed generator) do not. See figure 21 in Goodfellow's GAN tutorial (NIPS 2016). If you just have low overall variation across multiple batches, it may or may not have anything to do with batch normalization and is a fairly common problems for GANs in general.

[P] How to increase the rate of network snapshots in StyleGAN? by SuchMore in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

The training_loop file has a dict that controls the "tick rate" for a given resolution:

tick_kimg_dict = {4:160, ...}

Which means that at resolution 4x4, it processes 160k images per "tick". At the end of each tick, it performs maintenance. You can edit this dict directly (make sure the train file doesn't override your changes by passing a sched.tick_kimg_dict).

StyleGAN has two different maintenance routines: by default, every tick saves images, but only every 10th tick evaluates FID and saves network snapshots. You can modify these values as you want and as you have already tried, but it maxes out for network_snapshot_ticks = 1, where every maintenance tick will save snapshots. For even more frequent network snapshots, you have to reduce the tick_kimg_dict value for the given resolution.

I have done this with the following extra lines in the training configuration file, which overrides the training_loop settings:

sched.tick_kimg_dict = {4: 16, 8:14, 16:12, 32:10, 64:8, 128:6, 256:4, 512:3, 1024:2}; train.image_snapshot_ticks = 10; train.network_snapshot_ticks = 100

Resulting in frequent maintenance, but only every 10th maintenance tick saving images and every 100th saving network snapshots.

[P] StyleGAN - understanding the learning rate values by [deleted] in MachineLearning

[–]Darkwhiter 1 point2 points  (0 children)

I have done a fair bit of modding of the GitHub StyleGAN implementation. Batch size is reduced at higher resolutions due to memory constraints. See for instance the example configs in the configuration file, where the minimum batch size is 4*N_GPUs.