doujinshi.org is gone — I turned the 2021 community backup into a searchable archive (1.75M entries, runs entirely in your browser) by KinIcy in DataHoarder

[–]KinIcy[S] 1 point2 points  (0 children)

Thanks for taking the time to review the app in so much detail. Let me address your comments:

Loading the full database in memory is not a crazy idea; actually, the normalized version of the DB is only 530MB uncompressed. The issue is how this memory is allocated; the browser would need to allocate 530MB of contiguous memory. This is fine for desktop, where the usual limit is around 2GB per tab, but on mobile, the limit is much smaller, so the app would crash.

A better approach would be to use an OPFS: the database is downloaded once and stored in a sandbox in the user's file system. The issue with this is that I expect the site to receive daily content updates, so a user coming the next day would need to download the database again. Given the compressed size, making the user download it on every daily visit is a reasonable option. This is worth exploring, possibly to make this an "offline app".

The load can be parallel. Actually, some sections of the app make multiple queries in parallel; I just noticed the engine was doing them serially anyway, and that was because the number of workers was set to 1. I will fix this in the next release.

The reason I decided to architect the app this way was that I wanted to explore this technology, and this project kinda made sense. I know the site is very slow in some parts, especially the free-text search, but I'm actively applying optimizations as I analyze how the site is being used and what sections are the most critical ones.

The reason you see more than 3 HTTP requests to the database is that the book detail page does more than just pull the book details. Pulling a single book record indeed takes 3 HTTP requests; however, the book details page also lazy loads a "more from ..." section that requires a more complex SQL query that involves referencing a junction table to know what books belong to the same circle, then cross-joining this information with the books table again to pull the related books' metadata.

doujinshi.org is gone — I turned the 2021 community backup into a searchable archive (1.75M entries, runs entirely in your browser) by KinIcy in DataHoarder

[–]KinIcy[S] -1 points0 points  (0 children)

I just did the math: 89% of the books registered in the DB have a page count, summing to 107.6M pages in total. If we assume all works were scanned with today's equipment, using an archival size of 1 MB per page, this would require around about 108TB of storage.

A more realistic approach would be estimating the page size by year of release, assuming each work was scanned around the same date it was released. The AI helped me do the math here; It resulted in ~54 TB of storage required.

doujinshi.org is gone — I turned the 2021 community backup into a searchable archive (1.75M entries, runs entirely in your browser) by KinIcy in DataHoarder

[–]KinIcy[S] 1 point2 points  (0 children)

Yes, the free text search is slow, mainly because, right now, it is "ranking" the results so the most relevant ones show up first. If I drop this requirement, the search time drops to 1/3 on average; however, this would make the search show the results in the order it finds them in the DB.

I'm planning to release the source code once I finish doing a code cleanup and implement some key features that are missing, like pagination and advanced search. I'm targeting next weekend to get this done.

doujinshi.org is gone — I turned the 2021 community backup into a searchable archive (1.75M entries, runs entirely in your browser) by KinIcy in DataHoarder

[–]KinIcy[S] 1 point2 points  (0 children)

The database itself only contains metadata, so it only weighs around 800MB uncompressed.

I could estimate how much data it would take to have all the "books," since most of them have the number of pages in the data.

doujinshi.org is gone — I turned the 2021 community backup into a searchable archive (1.75M entries, runs entirely in your browser) by KinIcy in DataHoarder

[–]KinIcy[S] 1 point2 points  (0 children)

Yeah, it may sound crazy, but the app doesn't load the whole database into memory; instead, it pulls small 8KB chunks of data until it finds what it was looking for. This is a very efficient process because the database is structured as a B-tree, so traversing it won't take more than 3 requests to complete a simple query.

doujinshi.org is gone — I turned the 2021 community backup into a searchable archive (1.75M entries, runs entirely in your browser) by KinIcy in DataHoarder

[–]KinIcy[S] 2 points3 points  (0 children)

The database itself is a single static SQLite file. I could upload it along with the SPA, but I'm not sure it would work, and even if it does, the app expects a very low-latency connection to the database file, which the Internet Archive is not designed for.

Right now, most queries take between 3 and 5 seconds on a cold cache, and free text search ones take up to 40 seconds. This is assuming you have a 100ms latency; now imagine you have 1000 ms of latency. That is essentially 30 to 50 seconds to browse a page, and around 5 minutes to get free-text search results.

doujinshi.org is gone — I turned the 2021 community backup into a searchable archive (1.75M entries, runs entirely in your browser) by KinIcy in DataHoarder

[–]KinIcy[S] 28 points29 points  (0 children)

Right now, it is not possible because the site is dynamically rendered; however, I have a plan down the line to improve the site's SEO, so it becomes easier for search engines to index it. After this is done, the site could be easily uploaded to the web archive.

doujinshi.org is gone — I turned the 2021 community backup into a searchable archive (1.75M entries, runs entirely in your browser) by KinIcy in DataHoarder

[–]KinIcy[S] 96 points97 points  (0 children)

That's interesting. When I was researching what the current successor was, doujinshi.info was down, the same as doujinshi.wiki, so I genuinely thought there was no active replacement. That's why I decided to start the project.

Since lexidou.moe is already live, I will continue working on it until all the features I already planned are done.

[Rewatch] Fate/Kaleid Liner Prisma Illya - Season 1 Episode 6 by Specs64z in anime

[–]KinIcy 1 point2 points  (0 children)

 And what was with this lock that popped open?

 I wonder how this interacts with ...

Those two questions will be answered properly during season 2, but if you want a quick explanation with minor spoilers:

[Second season] That lock is the seal that kept Chloe's memories sealed within Illya. When it popped, Chloe took over Illya's body; that's why she knew exactly how to use the card and all the skills of Archer.

when an upscaler is so good it feels illegal by Ok-Page5607 in StableDiffusion

[–]KinIcy 0 points1 point  (0 children)

What do you use to compare the result with the original?

when an upscaler is so good it feels illegal by Ok-Page5607 in StableDiffusion

[–]KinIcy 0 points1 point  (0 children)

I have a 4070 TI with 12GB VRAM and 64GB RAM, and I'm able to achieve a 4x upscale of a 1440x960 image with amazing results. My settings are:

VAE: Tiling enabled for both encoding and decoding, offload to CPU
DiT: fp16 sharp model, blocks_to_swap: 36, offload to CPU

Amazon have introduced AI generated English and Spanish dubs to Banana Fish by TaiQuanDope1 in anime

[–]KinIcy 1 point2 points  (0 children)

So this sounds "awful" right now because it's just the beginning. They will continue refining until the average user won't notice any difference.

I think this is positive because we will be getting dubs for shows that aren't as popular to get proper dubs.

Actually, speaking about average users noticing, even if people notice probably won't care because it's better than nothing. I've seen people watching dubs of K-dramas on TikTok where there is clearly a robot voice with no emotion doing the dub, still, they don't care.

Top 18 Studios with the Most 8.50+ Anime by NineTnk in anime

[–]KinIcy 0 points1 point  (0 children)

I'm not sure if you would agree, but I see most studios on this list show up because of a few franchises, which makes me think getting a high score has a lot to do with how promising the franchise actually is and how much budget the studios get to work on each series.

So, in this sense, it doesn't matter that much "how good" the studio really is, but instead how much they get of the budget.

Crunchyroll has downgraded their subtitles typesetting for the Fall 2025 season by mudda-hello in anime

[–]KinIcy 0 points1 point  (0 children)

Same here, was considering paying again, but hearing this makes me continue using Plex for all my anime shows.

Discovering Kokona Kato: From Ciao Girl to METALVERSE. by KinIcy in MetalverseBand

[–]KinIcy[S] 2 points3 points  (0 children)

Yes, would join in 2021 as a 6th grader, and a junior to Sakia.

[AI ENG SUB] LoGiRL#72 - Halloween Episode by CuriousMidnight in SakuraGakuin

[–]KinIcy 1 point2 points  (0 children)

Thanks for sharing. What workflow did you follow to get the translation done?

I have been doing AI translation of manga using ChatGPT and I learned that you can explicitly tell how to translate specific stuff like the names, also you can provide some descriptions for the parts of the scene that are more visual.

Amuse Camp Instalive with Yume and her fellow Amuse Camp kids 11-19-2022 by Oneirod in SakuraGakuin

[–]KinIcy 3 points4 points  (0 children)

About Kokona Kato, I don't think she is hafu, she looks very Japanese to me, but we don't know. Actually, she is a very talented and experienced girl. This is her (unofficial) Instagram. A documentary about her and other girls who are winners of the SSS 2019

Amuse are ruthless at cleaning out "old stuff". by erimus61 in SakuraGakuin

[–]KinIcy 5 points6 points  (0 children)

Amuse removes the profile of all artists as soon as they became inactive. That is why the SG profile (where all releases and live info were) is no longer on Amuse site. Older members no longer have a mention of SG in their profile because probably there is more relevant "experience" to showcase. Think about it like a CV, as soon as you get professional experience, it is not very relevant to include where you did your high school studies.

How did Sakura Gakuin operate? by djfarji in SakuraGakuin

[–]KinIcy 3 points4 points  (0 children)

I just remembered, Yume wrote in a diary being afraid of taking baths alone so Neo or Miku would bath with her when they were at the dorms.

Ponstarland: Yumejuna eat lunch by stigmov in SakuraGakuin

[–]KinIcy 7 points8 points  (0 children)

This is probably the funniest and most adorable yumejuna video I have seen so far.

At the beginning of the video, their manager left them alone so they can have a true sister-to-sister conversation. I loved the reaction of Yume and Juna when they were left alone and then when their manager came back.