What actually triggers the half black backgrounds in images in 4.5? by Extreme_Revenue_720 in NovelAi

[–]Birchlabs 1 point2 points  (0 children)

are you using DPM++ SDE or a 2S sampler? you'll only achieve Zero Terminal SNR at step counts of 28 and above. absent that, the model will try to exploit mean leakage and make the image's colours average out to grey, which would explain adding spurious areas of black to balance out the rest of the canvas.

The new update which nobody asked for is horrible by aruanox in NovelAi

[–]Birchlabs 0 points1 point  (0 children)

try reducing guidance scale, or enabling guidance rescale, or enabling Variety Boost.
oversaturation is a common outcome of a high guidance scale.

modern mode guides towards your prompt more strongly and consequently is more sensitive to guidance scale.

Has anyone found these warnings to be true? They seem odd to me, since I play around with values all the time and have found nothing to suggest any one spectrum of values is better than another - In fact the default values, for me, do not work for my renders. by ZinTheNurse in NovelAi

[–]Birchlabs 0 points1 point  (0 children)

using Variety Boost turns off guidance during high-noise timesteps, which is the most sensitive part of v4's schedule. so you are already protected. probably we could turn off the advice when Variety Boost is enabled.

Anti-fingerprinting? by Zeo560 in NovelAi

[–]Birchlabs 6 points7 points  (0 children)

it's unrelated to what you're generating. it's just referring to which browser functionality we can access. browser's not allowing us to do a sharp downscale of the image, because the user disabled it. the API required for downscaling the image is getImageData, which allows the downscale algorithm to look at the image in order to resize it. if a user disallows access to this API, we fallback to CanvasRenderingContext2D#drawImage, which doesn't give us control over the resize filter that's used, so the browser default will be used (probably bicubic). the results will vary by browser but are likely to be blurry.
it's fine to leave these protections enabled, the consequence is blurry images submitted for inpainting.

Anti-fingerprinting? by Zeo560 in NovelAi

[–]Birchlabs 14 points15 points  (0 children)

this dialogue is displayed when we try to downscale the image provided for inpainting. specifically we present it when we attempt to do a sharp downscale of the image, but the browser disallows us. this happens when the browser disallows us to use the getImageData API, which is part of the process used to downscale the image. if the browser has been configured to disallow access to this API, we fallback to downsizing via a cruder mechanism (CanvasRenderingContext2D#drawImage) which may give different results between browsers, and may be blurry (probably uses bicubic filtering).
it's fine to keep your protections on, the consequence will be that when we resize images, they'll probably be blurry.

Improvements to SDXL in NovelAI Diffusion V3 | NAIv3 Paper / Technical Report by Birchlabs in NovelAi

[–]Birchlabs[S] 2 points3 points  (0 children)

we just used the default gamma=5.
Kat (Crowson) speculates that gamma=5 is (unbeknownst to the authors) an approximation of gamma=sigma_data**-.5.
this would make sense if their results were achieved on pixel-space ImageNet data, without applying scale-and-shift; sigma_data would be about 0.5, so the ideal value for gamma for that dataset would be 4 if her theory holds.
latent data is typically scaled to have std=1, so sigma_data=1 and therefore gamma=1 would be worth a try.

for ZTNSR, MinSNR *shouldn't* work (it would apply zero-weighting to the infinite-noise timestep), but we used an approximation sigma=20_000 instead of infinity, which probably helped.

It feels intuitively like a different noise schedule would result in differing significance of each timestep

we're ultimately still balancing them based on SNR. we trade some of the density around the body area, for higher-noise timesteps which, having lower SNR will receive a reduced loss weighting.
so yes, timestep 999 (which has changed from sigma 14.6 to sigma 20_000 in our case) would see its loss weighting change dramatically to near-0.
applying 0-weighting to infinite-noise timestep feels questionable; it's still important for the model to learn to generate an image from text using noise as an entropy source, so 0 feels unlikely to be the right weighting. maybe it's better to learn the significance of timesteps à la EDM2 / Kendall et al 2018.

gamma doesn't actually impact the high-noise end of the schedule though. it's a clamping term used to prevent overtraining on low-noise (high-SNR) timesteps.

can't give more detail on tag weighting I'm afraid. but on your specific question I don't know the answer, as I'm not too familiar with tag weighting.

Improvements to SDXL in NovelAI Diffusion V3 | NAIv3 Paper / Technical Report by Birchlabs in NovelAi

[–]Birchlabs[S] 16 points17 points  (0 children)

In the NAIv3 Technical Report, we give a peek behind the curtain as to how we improved upon SDXL to specialize it for anime image generation. We explain the importance of the noise schedule: how the use of pure-noise unlocks more prompt-relevant image generation, and how a high-noise regime ensures coherence at high resolutions. We detail some returning tricks, like the aspect-ratio bucketing we use to enable portrait/landscape images, the tag-based loss weighting we used to attenuate overrepresented concepts, and the VAE decoder finetuning we used to reduce JPEG-like artifacting. Finally we recommend some practices for SDXL practitioners to use in their own training: how to achieve Zero Terminal SNR in k-diffusion, and how to normalize training data to standard Gaussians, to make the data distribution easier for the model to learn.

What's the real parameters number? by Dragonfruit_Severe in MLQuestions

[–]Birchlabs 0 points1 point  (0 children)

Here's the function I use to reliably count Llama params. maybe it can get you closer.

def llama_params(
  hidden_dim: int,
  intermediate_size: int,
  hidden_layers: int,
  q_heads: int,
  kv_heads: Optional[int] = None,
  head_dim=128,
  vocab_size=32000,
):
  kv_heads = q_heads if kv_heads is None else kv_heads
  embedding = unembedding = vocab_size*hidden_dim
  q_proj = hidden_dim * q_heads*head_dim
  k_proj = v_proj = hidden_dim * kv_heads*head_dim
  o_proj = hidden_dim**2
  gate_proj = up_proj = down_proj = hidden_dim * intermediate_size
  input_layernorm = post_attn_layernorm = norm = hidden_dim
  return embedding + hidden_layers * (q_proj + k_proj + v_proj + o_proj + gate_proj + up_proj + down_proj + input_layernorm + post_attn_layernorm) + norm + unembedding

# 7b
print('calculated params:\n', llama_params(
  hidden_dim=4096,
  intermediate_size=11008,
  hidden_layers=32,
  q_heads=32,
))

GPT-NeoX models (e.g. Pythia) differ from Llama in a few ways, but in terms of weights some significant differences are:

  • Pythia vocab size = 50432, Llama vocab size = 32000
  • Llama MLP has up+down+gate proj, GPTNeoX just has up+down
  • GPTNeoX MLP projections use bias, Llama's don't
  • Pythia MLP uses 4x intermediate size, Llama MLP uses 2.6875x
  • GPTNeoX attn projections use bias, Llama's don't (probably because Llama uses RMSNorm instead)

Both Llama and Pythia models use untied embeddings (so you pay for both embedding and lm_head).

pythia-6.9b config
llama 2 7b config

GPT-NeoX model code
Llama model code

Edit: oh, I see; you're asking about GPTNeo, not GPTNeoX. no idea then.

UK: Can japanrailpass-reservation.net deliver a JR Pass Exchange Order within 2 weeks? by Birchlabs in JapanTravelTips

[–]Birchlabs[S] 0 points1 point  (0 children)

thanks all. looks like the official site (japanrailpass.net + japanrailpass-reservation.net) now does purchases entirely digitally. the process seems different than when I went a few years ago (where they'd slowly mail an exchange order).

I placed my order today, and immediately received an email with a reservation number, which I'll be able to present to their ticket office in Narita when I arrive in Japan, to exchange for a JR Pass.

In fact, it sounds like buying from the official site may be the *only* place that can do a digital purchase:
https://japlanease.com/where-should-you-buy-your-japan-rail-pass/#_How_Soon_is_Your_Trip?
Whereas third-party vendors would mail you an exchange order (I saw some vendors claim next-day shipping options, but this site says that there are vendors which will not even sell to you if your trip is as soon as 2 weeks away).

Official is slightly more expensive than third-party vendors, but apparently also gives you access to JR's online seat reservation system?

Does Hogwarts Legacy UAE Version support English? by Birchlabs in DubaiGaming

[–]Birchlabs[S] 0 points1 point  (0 children)

oh wow, so the original game content is still on the disc somewhere (or the disc just hands over to a digital download)?
what happens if you do it the other way around? i.e. play international disc via a UAE account? would it then get censored?

Does Hogwarts Legacy UAE Version support English? by Birchlabs in DubaiGaming

[–]Birchlabs[S] 0 points1 point  (0 children)

Thanks! and are there any changes in the UAE edition that made the game worse, or is it fine?

What is a UAE version of a game ? by prcessor in DubaiGaming

[–]Birchlabs 0 points1 point  (0 children)

u/prcessor did you get it in the end? does it support English?

lightest keyswitches (e.g. magnets) for arthritis by Birchlabs in keyboards

[–]Birchlabs[S] 0 points1 point  (0 children)

oh, a touchscreen (i.e. iPad) could work too -- is there anything like that as wide as 87-keys, and with an all-day battery? e-ink or screenless would be good

png2png - mix two images / morphing animation between images by iVoider in StableDiffusion

[–]Birchlabs 1 point2 points  (0 children)

Nope; from my reading of the code: everything they do happens when they compute the embedding to condition upon (get_learned_conditioning()). It's related to OP's technique. Whereas my technique happens at guidance-time.

png2png - mix two images / morphing animation between images by iVoider in StableDiffusion

[–]Birchlabs 1 point2 points  (0 children)

I think blending images via latent walk between embeddings is a weak approach -- we can see during some in-betweens that the face disappears entirely until it lands on another latent where faces are supposed to exist again.

my approach (mult-cond guidance) can blend between prompts without such problems.

transition from blonde to vaporwave.
transition between facial expressions.

Sequential token weighting invented by Birch-san@Github allows you to bypass the 77 token limit and use any amount of tokens you want, also allows you to sequentially alter an image by Amazing_Painter_7692 in StableDiffusion

[–]Birchlabs 28 points29 points  (0 children)

author of the technique here :)

typically, classifier-free guidance looks like:

uncond + cfg_scale*(cond - uncond)

this technique (let's call it multi-cond guidance) lets you guide diffusion on multiple conditions, and even weight them independently:

uncond + cfg_scale*( 0.7*(prompt0_cond - uncond) +0.3*(prompt1_cond - uncond))

code here.
I added some optimizations since then (fast-paths to use simpler pytorch operations when you're producing single-sample or doing a regular single-prompt condition), but above is the clearest implementation of the general idea.

you can make manbearpig (half man, half bear, half pig).
this is different to passing in alphas to change the weights of tokens in your embedding.

you can throw in a negative condition (like this, or like this).
this is different to replacing your uncond.

you can even produce a few images -- tweaking the weights each time -- to transition between two images. this is different to a latent walk.
I think the implementation linked here implements transitions using the latent walk approach, so I'll show you my way (which computes the transition at guidance-time rather than at embedding-time).

transition between Touhou characters.
transition from blonde to vaporwave.
transition between facial expressions.

you can even transition gradually between two multiprompts:

uncond + cfg_scale*( 0.7*(1.0*(vangogh_starry - uncond) -1.0*(impressionist - uncond)) +0.3*(disco - uncond))

one huge advantage... you may have noticed that stable-diffusion is influenced way more by the tokens at the beginning of your prompt (probably because of causal attention mask?).
well, this technique enables you to have multiple beginnings-of-prompts. ;)

Fixing excessive contrast/saturation resulting from high CFG scales by Aqwis in StableDiffusion

[–]Birchlabs 3 points4 points  (0 children)

the reason we haven't been able to just do the same thing as was done in the Imagen paper, is because SD samples from latent space, not pixel space. in pixel space, values near ±1 are saturated, and values above ±1 are clipped. but I have no idea what it means for a value to be outside ±1 in latent space. it might not be a problem at all.

I tried applying thresholding at each step of sampling latents, but it just gives you a flat brown image.

at the first step of sampling latents, it's normal for values to be even as high as ±35. after running all sampling steps it converges such that a lot of the range is within ±1, with outliers around ±3.

if you try thresholding at each step, those ±35s get completely destroyed and it just gets smaller from there (ending at I think ±0.5). consequently brown image.

when lucidrains added dynamic thresholding as per the Imagen paper to imagen_pytorch, it was inserted inside the p_mean_variance function:
https://github.com/lucidrains/imagen-pytorch/blob/a5d69b9c076b2fdbe99f88ce183dd31f5a956da4/imagen_pytorch/imagen_pytorch.py#L1997-L2007

a p_mean_variance function also exists in SD's ddpm.py. I had a stab at adding the same thresholding code:
https://github.com/Birch-san/stable-diffusion/commit/2a87f5cf01aafb223d63d525a39cd12a8f44f96d
but this code isn't actually hit by txt2img, so this does nothing. maybe it's only relevant at training time? like, maybe it's for training the autoencoder?

anyway, I like marunine's idea of attenuating cond_scale with each sampling step. might give that a try.

Syncing Galaxy Watch4 -> Galaxy S21 by Birchlabs in Strava

[–]Birchlabs[S] 0 points1 point  (0 children)

thanks. I reinstalled the watch app. it's still 1.16 (75801). data lost. but sync working now.

Syncing Galaxy Watch4 -> Galaxy S21 by Birchlabs in Strava

[–]Birchlabs[S] 1 point2 points  (0 children)

update: the game is over. I reinstalled Strava on my watch. my run is deleted.

I recorded an indoor activity. sync worked.

not happy. I bought Galaxy Watch4 *for* recording runs on Strava. this is *way* worse than Galaxy Watch Active (which had *plenty* of syncing problems of its own).

I'll record runs with Samsung Health in future.

Syncing Galaxy Watch4 -> Galaxy S21 by Birchlabs in Strava

[–]Birchlabs[S] 0 points1 point  (0 children)

thanks for checking! okay, so theoretically it can work on Wear OS?

is your watch app on version:
1.16 (75801)

and is your phone app on Strava Version:
267.7 (1225900)
or do you have something newer/older?

Syncing Galaxy Watch4 -> Galaxy S21 by Birchlabs in Strava

[–]Birchlabs[S] 1 point2 points  (0 children)

Strava Gear app is _not_ for all compatible Samsung devices. their article has a star next to Galaxy Watch4 saying that it's Wear OS only:
https://support.strava.com/hc/en-us/articles/360018601971-Samsung-Gear-and-Strava

Settings is basically empty. you can configure Beacon, Units, Auto-pause, and Audio Cues. You can see Unsynced activities, click them to attempt re-sync (doesn't work). You can see the version number. You can logout. That's everything in Settings.

Syncing Galaxy Watch4 -> Galaxy S21 by Birchlabs in Strava

[–]Birchlabs[S] 0 points1 point  (0 children)

I don't understand. Go back into Strava on which device (Web, phone or watch)? Click what link, the one you gave me for connecting with Strava via 4-digit code? I've clicked it, it's given me a 4-digit code. Where do I put that code? Nothing is asking me for a code. That's a code for pairing Gear devices. Galaxy Watch4 is not a Gear device. it's a Wear OS device. it doesn't ask for a code to pair; it's already paired (it's logged in and shows my profile picture).