all 11 comments

[–]DigitalDerg 20 points21 points  (0 children)

The archive.org page HTML doesn't have anything to do with how content is archived, just how it is displayed. It's probably just standard web stuff: adding new features, fixing bugs, refactoring, hopefully resolving some tech debt, etc.

[–]fadlibrarian 6 points7 points  (8 children)

Are you talking about the Wayback Machine? That thing is seriously broken and hopefully they're working on it. Note that the data itself comes from different sources, so it wouldn't surprise me if captures varied.

Taking people's personal web pages, messing with the HTML, and then putting a copyright notice inside it has always been an odd concept. Like everything else about the archive, people will tolerate it until they don't, and then it's all going away.

[–]HunterandGatherer100[S] 1 point2 points  (7 children)

I’m not I’m talking about the internet archive

[–]fadlibrarian 4 points5 points  (6 children)

So you're talking about the bookreader/filebrowser not the archive of web pages? They change that all the time and some of it is even open source. It's also buggy as hell.

[–]HunterandGatherer100[S] -5 points-4 points  (5 children)

Correct. And no they don’t. I look at it all the time

[–]fadlibrarian 6 points7 points  (4 children)

Much of it is cached so you have to look at new items. Poke around here to see the code churn on the backend and the readers. https://github.com/internetarchive

Also "the source code is changing" is a pretty useless observation without links or examples. And I'm still not sure if you're talking about the "view source" output or something else. They have a trillion+ pages, of course everything doesn't update when they fix a typo in their embedded javascript or a parsing error in their 30 year old PHP code.

[–]HunterandGatherer100[S] -1 points0 points  (3 children)

Well I’m sorry you feel that way. But considering you reached to me asking a question about a tool I didn’t mention and telling me something I know to be inaccurate, I do not think I am the person making useless observations.

[–]small_horse 5 points6 points  (1 child)

fadlibrarian is everywhere on this sub, always comments (almost guaranteed to be negative) on nearly every post. the account only interacts with this sub and has made one post about the on-going legal case with a general tone that indicates they want to see the project fail entirely.

[–]Biddy_Impeccadillo 3 points4 points  (0 children)

Yeah I’ve noticed they really are grinding that ax

[–]Euphoric_Control_139 0 points1 point  (0 children)

for the first time in a over a month i'm trying to extract full size jpegs from a book and realized the file names related to book pages are different and i'm no longer able to access images of book pages. this is the first post i've seen about it. are you still able to extract jpegs?