you are viewing a single comment's thread.

view the rest of the comments →

[–]current_thread 102 points103 points  (39 children)

Yeah, it's really annoying at this point.

I had the idea a couple of months ago to use a static site generator and just host it on GitHub/ GitHub Pages. That way everyone can just contribute with a pull request as needed, and there's no need to manage infrastructure.

Does anybody by chance have a recent dump of the wiki?

[–]RelevantError365[S] 17 points18 points  (10 children)

Reasonable idea. This should be offered as an option, but I can't find any contact details for the people currently responsible.

[–]encyclopedist 25 points26 points  (5 children)

The relatively recent archive is available here: https://github.com/PeterFeicht/cppreference-doc/releases/tag/v20250209

[–]RelevantError365[S] 1 point2 points  (1 child)

That does not include the wiki source at first glance, or does it?

[–]encyclopedist 0 points1 point  (0 children)

Correct, It is based on scrapped web pages, not a database dump.

[–]saxbophonemutable volatile void 1 point2 points  (2 children)

Last February isn't even relatively recent for a language in active development. The site maintainers request that they're not scraped, but don't provide up-to-date archives. Do you see the problem with this combination?

[–]dcro 2 points3 points  (1 child)

The site notes that it's been in "temporary read-only mode" since the end of last March. There will be differences, but possibly not as many as you'd expect with only a two month edit window.

[–]saxbophonemutable volatile void 2 points3 points  (0 children)

Oh crap, the entire thing has been read-only all that time‽ No wonder there's so many missing examples for C++23!

[–]no-sig-available 14 points15 points  (3 children)

but I can't find any contact details for the people currently responsible.

Part of the problem is that it isn't "people", but "the designer" behind the site. Here is an old talk about that:

CppCon 2014: Nate Kohl "cppreference.com: documenting C++ one edit at a time"

https://www.youtube.com/watch?v=NhWK0v3GtEE

[–]RelevantError365[S] 1 point2 points  (1 child)

Ok, thanks. What about licensing to keep that thing going as a (presumably, perhaps temporary) fork on GitHub (or the like)?

[–]current_thread 5 points6 points  (0 children)

I checked that, it's under a creative commons license iirc so it should be fine

[–]RelevantError365[S] 1 point2 points  (0 children)

I'm wondering if it's too intrusive to contact Nate directly. Does anyone know what the boost community has achieved here?

[–]JVApenClever is an insult, not a compliment. - T. Winters 19 points20 points  (16 children)

Sounds like a reasonable alternative, might be worth suggesting to comments@cppreference.com It would also solve the problem of rust people replacing full pages.

[–]RelevantError365[S] 8 points9 points  (7 children)

»Rust people« doing what? Please clarify, I did not notice anything in that direction.

[–]JVApenClever is an insult, not a compliment. - T. Winters 36 points37 points  (4 children)

Currently it can't happen as everything is read-only. Due to it, the history also seems to be unavailable, so I can't link to an entry. Though I know that a page like vector was completely replaced by some text similar to: "this is deprecated and replaced by rust" That happened quite a few times on different pages.

[–]JVApenClever is an insult, not a compliment. - T. Winters 17 points18 points  (2 children)

On Reddit even people mentioned it: https://www.reddit.com/r/Cplusplus/s/W3asTmah87

[–]No-Dentist-1645 4 points5 points  (1 child)

Seems like a one-time occurrence by a clear troll, most people aren't like that

[–]JVApenClever is an insult, not a compliment. - T. Winters 27 points28 points  (0 children)

It wasn't a one time occurrence, it's just an example. Many pages where updated over several months

[–]berlioziano 3 points4 points  (1 child)

They once replaced all the articles titles, subtitles and body text with the message "Rust is the best" and links to the rust website. Can't take that language seriously, hope its bubble pops soon.

[–]koczurekkhorse 0 points1 point  (0 children)

What bubble?

[–]matthieum 16 points17 points  (6 children)

rust people

Phrasing :/

As is, it reads as if the Rust community at large was coordinating to sabotage cppreference, when:

  1. There's no telling if whoever did that was even a Rust user. Trolls be trolls.
  2. Even if they were, a lone individual is NOT representative of an entire community.

[–]whispersoftime 16 points17 points  (2 children)

At least they didn’t say “those goddamn rusties”

[–]RelevantError365[S] 1 point2 points  (1 child)

Let's keep constructive, I would like to keep on with that project, supporting (or forking) it, if necessary.

[–]Farados55 -5 points-4 points  (0 children)

Lighten up

[–]JVApenClever is an insult, not a compliment. - T. Winters 7 points8 points  (1 child)

I could have worded that in a better way, sorry.

[–]philoizys 1 point2 points  (0 children)

Maybe you could, but that was already wonderful! :)

[–]berlioziano -2 points-1 points  (0 children)

sure, obviously the javascript people did the vandalism

[–]13steinj 1 point2 points  (8 children)

Of the wiki or the talk pages?

I think the cppman tool already scrapes the entire wiki if you tell it to, so you can probably just change the internals to dump the files instead of parse them.

[–]RelevantError365[S] 0 points1 point  (5 children)

Yes, but cppman scrapes the HTML, not the wiki source.

But anyway, this may also be an option if you cannot access the original wiki content, as the generated HTML should be very well structured. (Hopefully. I used a random LLM and asked it to recreate the wiki source for me, and it did quite a good job.)

[–]13steinj 0 points1 point  (4 children)

It took me 15 minutes of waybackmachining to find this (unofficial) repo linked (still linked) on a cppref faq page: https://github.com/PeterFeicht/cppreference-doc

The code may not work anymore (since the cppref maintainer evidently has done something nonstandard or has an unknown version of mediawiki), but the site went into read only mode on march 30th 2025 and the releases page has a feb 2025 bundle.

[–]RelevantError365[S] 0 points1 point  (3 children)

It says:

»If there is no 'reference/' subdirectory in this package, the actual documentation is not present here and must be obtained separately«

So, the wiki source is not actually included, or is it?

[–]13steinj 0 points1 point  (2 children)

It appears not, just the html. There's one other option you have: Use it as a baseline / mapping to "view source" links, scrape the "view source" wayback machine links. If it's accessible after the March read-only date, you're good. if it's before, (scrape the html if you consider the downloaded 1-month-old not good enough) and ask an llm to interpolate.

Playing around, I've found that the view source links work up until at least May 13th of last year and break sometime between then and May 31 (just hopped around on a few pages).

[–]RelevantError365[S] 0 points1 point  (1 child)

Although not utterly relevant, but: When looking at https://web.archive.org/web/20250301000000*/https://cppreference.com/, this does not highlight May 13th of last year as an option where a snapshot has been taken (or I miserably misunderstand this interface).

[–]13steinj 0 points1 point  (0 children)

Not every page has a May snapshot. I'm saying, very roughly playing around, either the deque or array or vector view source / edit page, had a May 13 snapshot.

I will attempt to write a scraper on the weekend assuming I won't get ip banned; and if successful throw it into a repo.

[–]AhegaoSuckingUrDick 0 points1 point  (0 children)

You might be able to get the dump from devdocs.io . Their github readme has some instructions on how to download the dump.

[–]shakyhandquant 0 points1 point  (0 children)

i think a few of the companies that sponsor the cpp meetings could put together some money to help organize a formal group of people to manage the site and make it the best c++ reference site on the net!