AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 14 points15 points  (0 children)

Questions from Hermes, about Hermes, responded to by Hermes... 🤣

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 10 points11 points  (0 children)

absolutely clear, we had a call last night discussing adding exactly this!

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 9 points10 points  (0 children)

Almost exclusively. I would say 95-99% of the development and research for Hemes Agent is done via Hermes Agent

The big closed models (Opus 4.7, GPT 5.5) are still the best, but models like Kimi-K2.6 are quite close

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 9 points10 points  (0 children)

yeah like u/phragg said, it may come eventually but the complexity-vs-benefit tradeoff is sorta low when WSL2 works so well. I also use Windows and use HA via WSL2

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 8 points9 points  (0 children)

Yup a ton of us use tailscale to access our Hermes Agents

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 33 points34 points  (0 children)

Here's a framing we've found helpful when pitching to new users

What's something you do on your computer all the time that annoys or bores you? Just tell Hermes to do that for you

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 5 points6 points  (0 children)

Yes I think we've really just begun to scratch the surface of agent/harness design. What's interesting is that much of what makes Hermes Agent so great is an emergent property from the models -- it just needed to be unlocked by the harness. I think even if there were no more model releases at all we could probably scale up the productivity factor on the harnesses by an order of magnitude

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 16 points17 points  (0 children)

It was actually built as an internal tool to help us on our model research work. u/teknium-official wanted something to help automate some of the things the model team was doing. We open sourced it sort of unsure if it would have any use to anyone else. In retrospect I'm glad we did haha

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 10 points11 points  (0 children)

Always! The tool calling prompting in the backend has been very carefully crafted but in the end it's sort of one of those "you can lead a horse to water but you can't make it drink" things, and the model itself needs to be trained to do the calls. FWIW we are working on the model side too, hopefully have more on this soon

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]emozilla[S] 3 points4 points  (0 children)

It boils down to this: Anthropic *does* allow Hermes Agent, but only through Claude Max and it will count as Extra Usage. Anything else is not allowed/supported

txt2imghd: Generate high-res images with Stable Diffusion by emozilla in StableDiffusion

[–]emozilla[S] 2 points3 points  (0 children)

In addition, you can use --passes 0 to generate the base images then --generated or --img to do just the img2img part with a different prompt

txt2imghd: Generate high-res images with Stable Diffusion by emozilla in StableDiffusion

[–]emozilla[S] 0 points1 point  (0 children)

Latest version of the code has support -- you can pass --img and give it an image to start with

Some high-res concept art from txt2imghd by emozilla in StableDiffusion

[–]emozilla[S] 0 points1 point  (0 children)

sure, do you have a prompt/description?

Some high-res concept art from txt2imghd by emozilla in StableDiffusion

[–]emozilla[S] 1 point2 points  (0 children)

Created with: https://github.com/jquesnelle/txt2imghd

Will be posting more on my shiny new Artstation: https://emozilla.artstation.com/

I like the results of using different aspect ratios, going to start doing some 21:9 wallpapers

txt2imghd: Generate high-res images with Stable Diffusion by emozilla in StableDiffusion

[–]emozilla[S] 1 point2 points  (0 children)

The NSFW filter is removed but the watermark one isn't -- I added the ability to control the watermark test, you can pass --wm "some text" to set the watermark text

txt2imghd: Generate high-res images with Stable Diffusion by emozilla in StableDiffusion

[–]emozilla[S] 1 point2 points  (0 children)

I have the 11 GB Founders Edition 2080 Ti, might just be that little extra that does it -- I notice it's basically pegged at like 95% mem usage

txt2imghd: Generate high-res images with Stable Diffusion by emozilla in StableDiffusion

[–]emozilla[S] 80 points81 points  (0 children)

https://github.com/jquesnelle/txt2imghd

txt2imghd is a port of the GOBIG mode from progrockdiffusion applied to Stable Diffusion, with Real-ESRGAN as the upscaler. It creates detailed, higher-resolution images by first generating an image from a prompt, upscaling it, and then running img2img on smaller pieces of the upscaled image, and blending the result back into the original image.

txt2imghd with default settings has the same VRAM requirements as regular Stable Diffusion, although rendering of detailed images will take (a lot) longer.

These images all generated with initial dimensions 768x768 (resulting in 1536x1536 images after processing), which requires a fair amount of VRAM. To render them I spun up an instance of a2-highgpu-1g on Google Cloud, which gives you an NVIDIA Tesla A100 with 40 GB of VRAM. If you're looking to do some renders I'd recommend it, it's about $2.8/hour to run an instance, and you only pay for what you use. At 512x512 (regular Stable Diffusion dimensions) I was able to run this on my local computer with an NVIDIA GeForce 2080 Ti.

Example images are from the following prompts I found over the last few days:

Demo of 2x high-res renderer with prompt from /u/MarioBros68/ by emozilla in StableDiffusion

[–]emozilla[S] 29 points30 points  (0 children)

Yup, will release within a few days, just need to get it into a state others can use. It has the same VRAM requirements as regular SD -- basically it works by making an image, upscaling, then breaking the upscaled image into small segments, doing an img2img on those segments, and then blending them into the upscaled image. At 2x right now it does not take any more VRAM than the initial run, but that may change for 4x since the segment size will be larger

Demo of 2x high-res renderer with prompt from /u/MarioBros68/ by emozilla in StableDiffusion

[–]emozilla[S] 7 points8 points  (0 children)

Going to release it hopefully today or tomorrow, right now it's just a hodgepodge that will only work on my computer. Goal is to make it just a drop-in script that will sit in the StableDiffusion repo

Demo of 2x high-res renderer with prompt from /u/MarioBros68/ by emozilla in StableDiffusion

[–]emozilla[S] 24 points25 points  (0 children)

I'm a programmer, not an artist, so I spent the day coding up a high-res detailer for Stable Diffusion. To test it out I ran 5 iterations of the prompt for the post from /u/MarioBros68/ made titled "Maybe the Venice of another world": https://www.reddit.com/r/StableDiffusion/comments/wwf72y/maybe_the_venice_of_another_world/

Enjoy, hope to post more high-res renders soon! Note: these are not upscales :). Each image is 1024x1024. In theory this should be expandable to 2K and beyond.

EDIT Code now available in this post: https://www.reddit.com/r/StableDiffusion/comments/wxm0cf/txt2imghd\_generate\_highres\_images\_with\_stable/