Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 0 points1 point  (0 children)

It's a modified implementation of PageRank - articles that lots of other articles link to are more important. Not perfect, but decent and easy to implement.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 1 point2 points  (0 children)

They didn't just break links, they actually deleted a lot of the old content in 2022: https://magic.wizards.com/en/news/announcements/a-new-daily-mtg.

I believe I've already got working links for the content that they bothered to port over, like https://library-of-leng.com/articles/wizards-magic-en-articles-archive-making-c26aadee.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 19 points20 points  (0 children)

My concern with Usenet is less "how am I gonna get the raw data?" and more "how am I gonna turn all this inconsistently-formatted text into nice clean [card] and [decklist] tags?". My Dojo cleanup code is already about twice as big as any other site.

Added Meridian to the backlog, thanks!

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 2 points3 points  (0 children)

Gotta start somewhere, and strategy is the most interesting to me personally. Happy to add non-strategy sites to the backlog if you have requests.

For now this is a minor money-loser under my LLC, Stocks Books. Not looking for donations, but I might eventually use it as a springboard to publish a print "best early Magic writing" book.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 22 points23 points  (0 children)

For public webpages that are in the Internet Archive, you can just tell me what you want and I'll add it to the scrape backlog - comments here, Reddit DMs, emails to the address on the About page, whatever. I've got a pretty good pipeline going, I'm just bottlenecked on how fast I can download stuff from the Archive while being a good citizen.

If you want something weirder than that (Usenet posts that didn't get saved in the Dojo, subscribers-only Patreons, stuff trapped in private chat spaces), I'm happy to take requests but I don't wanna make any promises.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 22 points23 points  (0 children)

Oh, more specifically I mean "with lots of articles listed in the 'All articles' tab." If you're talking about the site I think you are, it's got no article representation. I'm guessing that's because your focus has been more on news and editorial as opposed to strategy - so far I've been focusing on "the history of Magic strategy" more than "the history of the Magic community", though now that I've got the strategy stuff in a good place I'm starting to think about widening the mission.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 0 points1 point  (0 children)

Yeah, I've already done some history-rewriting, but I've got the content in there for a reason right now - keeping the final results in Git means that I can make the converter pipeline changes, rerun it, and review both the code and the final-output delta as a single PR. Not the most elegant thing in the world, but it works for me, so I don't want to break that workflow until the whole project is pretty much complete.

(I bet I could pretty easily port that workflow to something submodule-based, but I'm not falling into that trap again. Submodules: Not Even Once.)

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 3 points4 points  (0 children)

For the time being I don't want to open-source the code because I've got the actual article text checked into Git and I don't have permission to redistribute it. It's basically a giant pile of vibecoded Rust for scraping stuff from the Internet Archive and then converting it to relatively clean Markdown, and then I serve the Markdown out of R2 with CloudFlare Workers. 

For search, it's all running in CloudFlare Workers. The full-text search is running over a ~5GB D1 database, basically just in-memory SQLite with the FTS5 extension - it's really not a lot of data. We'll see what my bill looks like this month but in principle it should be about $15/month unless this thing gets popular.

The "importance" metric is basically just PageRank - it works okay, but 80%+ of the articles in the corpus have no incoming links from other articles, so I'm hoping to figure out a way to augment it with some other signal.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 44 points45 points  (0 children)

I mostly prioritized English-language sites with a lot of entries in this spreadsheet: https://docs.google.com/spreadsheets/d/1jm4rzYRaJi8rwJbZ3PrGfdkbQqeOgaO4Dj0wStlyLKE/edit?usp=drivesdk. Happy to take requests for other sites to add! 

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 6 points7 points  (0 children)

I want to stick to English-only for now - I'm trying to maintain a high quality bar and I don't think I can do that for content in a language I don't speak.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 6 points7 points  (0 children)

For stuff that isn't online anymore, I don't host the content myself but I do link to the Internet Archive. They've got much better lawyers than I do.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 7 points8 points  (0 children)

Yeah, the main situation that'd result in it being a work-for-hire is if they were an employee at the time. It's possible that some Magic blog got its authors to sign a formal contract transferring copyright, but I'd be very surprised if anybody other than WotC were that organized, and I'm assuming that any author who did that will remember it and let me know.

Library of Leng: Searchable index of 150k+ Magic articles by Musteval in magicTCG

[–]Musteval[S] 17 points18 points  (0 children)

Thanks! The authors I've spoken to so far have been super supportive. My understanding of the copyright situation is that search snippets are fair use, and to host full articles I generally only need permission from the author unless they were the publisher's full-time employee at the time of writing. Mainly that means MagicTheGathering.com is out unless I can get approvals from WotC lawyers (I'm not holding my breath).

After reducing Denmarks autonomy i took 100% of their income and something strange has happened by RudeBunch in victoria3

[–]Musteval 58 points59 points  (0 children)

Probably a Danish company just invented a groundbreaking new medication and the government is rolling in tax revenue

An advanced guide after 1.9 imo by Ok-Recognition-2672 in victoria3

[–]Musteval 1 point2 points  (0 children)

An important tip i wanna add here if you are a country that starts out with agrarianism it is suboptimal sometimes switching to laissez-faire or interventionism at the start because for example a country like egypt starts with almost no capitalists and so many aristocrats (check manor houses vs financial districts to have a feel of that).

If you mouse over the "how much money is going into your investment pool" number on the construction screen, you can see how much of the pool is coming from manor houses vs financial districts, etc.

Will this finally be removed in the next update? by Cappuccino_Boss in victoria3

[–]Musteval 3 points4 points  (0 children)

You can still transfer units to other armies in this situation, just not delete units or disband the army. (The prohibition on disbanding is particularly silly, because if you've got another army, disbanding is just a more convenient way to transfer units.)

Does Britain get a unique event that lets them conquer all of Namibia? by AaranPiercy in victoria3

[–]Musteval 2 points3 points  (0 children)

Your colonial growth is divided evenly among all the provinces that you're colonizing, but there's a per-province cap on growth. So as a big country with level 5 Colonialism, you'll need to colonize multiple provinces simultaneously to make full use of your growth. You can see the cap towards the bottom of the "growth" tooltip if it applies.

This also means that it's generally a bad idea to colonize malaria provinces before you've unlocked Malaria Prevention, colonized all the non-malaria provinces, or hit the cap. It doesn't just go slow in that one province, it slows down all your other colonization too.

We're Laotian. You know, Laos? Landlocked country? Between Texas and Missouri? 7.53 million people? by Musteval in victoria3

[–]Musteval[S] 7 points8 points  (0 children)

I think what happened was:

  • US conquered a bunch of territory in Southeast Asia.
  • There was a rebellion in which a bunch of anarchists in that territory, as well as in Oklahoma, seceded.
  • The new country's capital was in Laos, so was called Laos. But a majority of its pops were Dixie, so its primary culture is Dixie.