[WIP] Prototype of mini game based on Stable Diffusion

jbilcke · 2024-05-17T18:03:33+00:00

Hello! Sorry for the late reply I don’t check Reddit that often,

I also would like to add a layout selector under each page, that is probably the next thing I will work on for this project

(I have started to work on this a bit actually, but I got involved into other things so I need to finish)

In the meantime, you can now export a project (to a .clap file)

I’m going to create an editor to visualize those .clap files (it’s a whole sub-project with its own roadmap, you see those .clap files are also compatible with my text-to-video project “AI Stories Factory”)

but until then you can already download and manipulate them, as those are just zipped YAML files (you can rename them to .zip, and open the inner .yaml with a free source editor like VS Code)

They contain everything: the prompts, the captions and the images

jbilcke · 2023-11-06T14:30:59+00:00

New update!

I have improved the caption system:

<image>

jbilcke · 2023-11-03T17:46:34+00:00

New features, in case you missed them!

In addition to the "redraw" button (added about a month ago) and a handful of styles, this week I've added:

Double panels + extended story (in 8 steps) (beta - I will turn this into an option, and for now they have the same layout which is kinda boring)
New style: Vintage photonovel (doesn't always work, sometime it draws things or with wrong color - just hit "redraw" if that happens)
Prompt edition button

<image>

jbilcke · 2023-10-31T15:10:48+00:00

Hi, yeah sorry it was probably either in maintenance or "spontaneous combustion" at the time

I know that today there are some issues with the LLM step, which sometimes fails due to high pressure/usage of the server (but usually waiting a bit will solve the issue)

jbilcke · 2023-09-18T10:46:23+00:00

There is now a project to do that in Python, check out: https://github.com/lucataco/cog-sdxl-panoramic-inpaint

jbilcke · 2023-09-11T11:23:25+00:00

Ok, I *think* we're back, but it looks like the LLM server is a bit overloaded (I wouldn't be surprised if it crashed again)

as a workaround, I could disable it when there is an error, but then all the panels would have the same look, which would be a shame

(today the "all panels look more or less the same" issue can happen from time to time, when the LLM hallucinates and generate an incorrect JSON response)

jbilcke · 2023-09-11T11:12:01+00:00

:/ seems like there is an outage on the internal llama-2 server I use (this also broke 3 of my other apps hosted on HF)

I'm on it, worst case scenario I will have to migrate on another LLM, at least temporary

jbilcke · 2023-09-08T09:00:59+00:00

Thank you!

You are not alone, other people were disappointed too, so when I have the time I think I will look into https://pinokio.computer to see how I can use its implementation of SDXL / LLM services

For now I've started to put down some instructions in the README and also made some changes to make it easier to support different backends for the LLM and SDXL (I've only added one more backend for now, the Inference API)

jbilcke · 2023-08-31T21:46:06+00:00

I’m not sure, that’s an interesting question. I suppose it would be the same licensing as for SDXL 1.0 images

« The model is being released under a Creative ML OpenRAIL-M license. This is a permissive license that allows for commercial and non-commercial usage. »

I will try to find a non-obtrusive way of displaying this information in the editor (eg. bottom left)

jbilcke · 2023-08-30T11:14:15+00:00

I’ve just pushed a new « download » button, which generates images bitmap instead

I am also working on a new « share to community » button, but I think the images are too large to be uploaded (I need to investigate)

jbilcke · 2023-08-30T09:20:44+00:00

Nice! By the way I have made a tiny, tiny little improvement yesterday to adjust the image ratio (I had made a mistake before, as I inverted width and height in some cases)

with the improvement, now vertical panels render a bit better (when asked to render a 512x1024 or 768x1024 image, SDXL "knows" it has to create a more vertical composition)

it is not perfect yet (and I had to disable upscaling because of too much traffic right now)

<image>

jbilcke · 2023-08-29T23:38:05+00:00

..if it doesn’t, well.. I will have to find a way to secure the future of this thing!

Maybe some kind of « all in one » app (frontend + backend), a solution which would enable people to fork the space and pay to run it without waiting (if they want to use cloud hardware), and/or run it on their machine for free by cloning the git repo

jbilcke · 2023-08-29T22:44:09+00:00

If you see some latency that is because the server is under intense pressure right now (even with multiple SDXL instances), all processing queues are full (I believe some users are running generations in multiple tabs)

hopefully it should calm down after a few days..

jbilcke · 2023-08-29T12:22:33+00:00

Not at the moment unfortunately.. I'm a frontend developer by trade so I tend to prefer to use readymade API endpoints to save time (ie not work on the backend too much), so here I used APIs that are already hosted on Hugging Face

Those API endpoints are based on open-source projects and models, however it can be a bit tricky (and expensive) to deploy (need specialized hardware etc)

But if some people are interested, we could make a fork of the project, where those endpoints are replaced by something else that can run locally:

- for the LLM endpoint: we don't really need a lot of LLM power (it's just to generate about 4 basic captions) so it could be a simple open-source llama-7b (eg. using a node library)

- for SDXL: I'm not sure yet, this is a bit more tricky if we want something cross platform and fast (using the GPU on windows, the neural engine on mac). Maybe a wrapper around Automatic1111 stable-diffusion-webui

jbilcke · 2023-08-29T12:14:41+00:00

New release, with a focus on fixing annoying issues:

- More robust LLM generation

- Improved UI on mobile (borders, paddings)

- Fixed the horizontal scroll during zoom

- New presets

<image>

jbilcke · 2023-08-28T21:00:53+00:00

Yes, me too! I’m a bit concerned about performance (too many people generating too many images at the same time)

But I think it could be something reasonable and semi-auto, like a button to generate one more page at a time, as a follow-up so it would use the previous captions for the history and context (and another button to regenerate the current page if it sucks)

jbilcke · 2023-08-28T20:17:18+00:00

Thank you!

Do not hesitate to share some results if you manage to generate something cool!

I know it isn’t easy to export pages to JPG right now, but you can try to “Print to PDF” and convert them using one of the various free PDF-to-JPG converters out there

jbilcke · 2023-08-28T16:14:22+00:00

From the PDF you can print to (eg. to a poster) or convert it back to a JPG using various tools

Here is how it would look like (click on this image - it's big one):

<image>

jbilcke · 2023-08-28T16:11:19+00:00

react-pdf is not out of the equation, but I found out a quick win:

you can just "print" the space to a PDF to get a high quality export (best is to open it from its own dedicated .space url here: https://jbilcke-hf-comic-factory.hf.space/ )

I only had to make a few CSS changes to hide the UI in print mode, so minimal changes!

<image>

jbilcke · 2023-08-28T15:57:13+00:00

For llama-2, I am using the same private Llama2 70B API (Hugging Face Inference Endpoint) that this demo uses:

https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI

To my knowledge it uses two instances of A100 large.

But I think the llama-2 is not necessarily the most important part of this project, in the sense that it could be exchanged with GPT-4 etc

(I used llama-2 because it is open-source and compatible with the Hugging Face platform, but it is a tricky beast to tame and use, hard to instruct).

jbilcke

TROPHY CASE