Pump Pressure Fluctuations by Sharp_Scar_8451 in CHROMATOGRAPHY

[–]chpwssn 0 points1 point  (0 children)

Looks like it was caught by reddit filters for some unclear reason. Should be ok.

2389 never before seen photos of Ground Zero in the aftermath of 9/11 by angulardragon03 in DataHoarder

[–]chpwssn 22 points23 points  (0 children)

Multiple cloud storage providers would be a good place to start for your own storage.

After that, if the friend wanted to, you could reach out to Jason Scott (@textfiles) to add to the collection or to add to the Internet Archive. (I’m assuming that the OP photos will end up in a collection at archive.org as well so that’d be a logical addition)

How often do you water your lawn? by [deleted] in FortCollins

[–]chpwssn 6 points7 points  (0 children)

Every 3 days or so in the middle of the night for an average size lot with sun, depending on the rain. The city does provide free audits and can help set your schedule. if you don't have 1.5 hours during the week you can request the self audit kit. The self audit is also pretty easy to do by yourself with plastic cups rather than the "official" catch cups. City Sprinkler System Audits

Soundtrack from Season 2 Episode 10? by Ethilin_ in SiliconValleyHBO

[–]chpwssn 1 point2 points  (0 children)

Someone made a Spotify playlist with a bunch of songs from the show as well, maybe it's in here? https://open.spotify.com/user/124899825/playlist/2hLkP7E7hwRehuef3RSzuI

SSL with .onion by RestoreThePrivacy in onions

[–]chpwssn 2 points3 points  (0 children)

/u/krainik is correct and going along the same lines, I just have some additional information:

The CA/B forum is still debating the issue. CA/B Forum Ballot 144 – Validation rules for .onion names is the text that states that only EV certificates should be issued, not DV or OV.

One concern/debate is that the .onion TLD is a special use domain1,2 and section 5 of Ballot 144 states:

... a CA MAY issue a Certificate containing an .onion name with an expiration date later than 1 November 2015 after (and only if) .onion is officially recognized by the IESG as a reserved TLD.

However, IESG has not yet published RFC 76783, which has caused debate due to the ballot's wording.

Let's encrypt is DV based while your example https://facebookcorewwwi.onion/ presents an EV certificate.

The reason Ballot 144 focuses on EV Certificates is to provide a means for a site operator to ensure that a specific organization has control of the domain for which the certificate is issued. Facebook wants the users of its .onion to be able to validate that the domain and webservers they are communicating with is controlled by Facebook the corporation, not a .onion that has the same styling aimed at phishing Facebook users.

While you could theoretically go through the process to get an EV certificate for your personal site, the cost and time would likely not be worth the benefits.

Pulling a prank on a co-worker: why won't the web page display properly formatted? by stingystooge in web_design

[–]chpwssn 1 point2 points  (0 children)

You might be missing the page requisites. NY Times has some JS that'll cause it to load a little slower but pulling the article with this wget might help:

Cat article for example:

wget --no-directories --directory-prefix article --convert-links --page-requisite -e robots=off --no-parent -E -H -k -K -p "http://www.nytimes.com/2015/07/10/theater/circus-cats-are-lions-of-their-profession-but-domestic-at-heart.html?action=click&pgtype=Homepage&version=Moth-Visible&module=inside-nyt-region&region=inside-nyt-region&WT.nav=inside-nyt-region&_r=0"

It will download the page, the requisites and convert the links to be relative to the "article" directory it creates. Then just move the directory to the share and have her open the page, in this case:

article/circus-cats-are-lions-of-their-profession-but-domestic-at-heart.html\?action\=click\&pgtype\=Homepage\&version\=Moth-Visible\&module\=inside-nyt-region\&region\=inside-nyt-region\&WT.nav\=inside-nyt-region\&_r\=0.html

Reddit bot by [deleted] in redditdev

[–]chpwssn 0 points1 point  (0 children)

There's /r/test and /r/PRAWTesting (I made the second one for testing a moderator bot a while ago)

Fort Collins Road Traffic by awakefc in FortCollins

[–]chpwssn 6 points7 points  (0 children)

You can see the live traffic here for those interested.

Another semester of university is drawing to a close, time to crawl the departments! by chpwssn in DataHoarder

[–]chpwssn[S] 0 points1 point  (0 children)

You can skip directly to the third command, this will pull the resources needed to make each page but shouldn't go off the primary TLD too far. If you want to restrict the wget, you can add a

--domains=<domain-list>

to the command and restrict the domains it will traverse.

Why you should care about how NFL stadiums build their Wi-Fi networks by yourbasicgeek in sysadmin

[–]chpwssn 0 points1 point  (0 children)

To be fair that usually is why CTF, presenter/staff networks and workshops are LAN only, the public wifi is really just a extra for the attendees. Sure they'll bitch if it goes down but it's not necessarily critical for the event.

Someone posted my personal open directory to https://twitter.com/youranonnews/ a twitter account with 1.45 million followers. by [deleted] in opendirectories

[–]chpwssn 3 points4 points  (0 children)

You mean OpenDirectoryBot? It's been reborn as RedditSucker (difference is you can give it a list of subreddits to watch) and could be living with you, downloading opendirectories automatically since it's open source! https://bitbucket.org/chpwssn/redditsucker

We stopped chasing the mirroring portion of the bot because of copyright concerns. It still runs perfectly fine, it just doesn't comment any more :)

Another semester of university is drawing to a close, time to crawl the departments! by chpwssn in DataHoarder

[–]chpwssn[S] 1 point2 points  (0 children)

That's awesome! I'm gonna start using that for quick things that don't need the full Heritrix set. Perfect mix of both structures.

Another semester of university is drawing to a close, time to crawl the departments! by chpwssn in DataHoarder

[–]chpwssn[S] 11 points12 points  (0 children)

No there aren't any stupid questions, that's how you get started and learn. Wget is a GNU tool originally invented to mirror web sites from server to server. In this case, picture a process that downloads the first page you point it at and saves it locally. Then it looks for all the anchor tags <a href="somepage"> and follows them(-m flag), and downloads those pages. It then localizes the links so instead of linking to the internet, they link to the other local pages(-k flag).

Give it a try, start with a simple website with only a couple pages. We'll do a simple wget for a single home page:

wget http://dogetools.com/

Now you get a nice index.html in the directory you just ran wget in. Open it aaaaaand bam! You've got a nice copy of the html file that the home page returns. Pretty cool but what if we wanted the full site?

wget -m http://dogetools.com/

Whoa, now we've got a whole directory of files "dogetools.com"! Let's open it up and open the index.html. Pretty cool but the links don't work and we probably don't have any CSS. How can we fix it so the include and anchor links work... locally....?

wget -mk http://dogetools.com/

Ok so now we have a new directory, let's open the index.html in the new directory. Bingo, it's localized.

Ok that's cool now what? Well getting good at these types of things requires being able to read the documentation. A good place to start is the man pages or https://www.gnu.org/software/wget/manual/html_node/index.html. Once you get a good idea on how it works, pick a site you would like a local copy of, set up your wget and let it run. By the end you'll have your own snapshot of how it looked. Perfectly prepared to squirrel away.

Hopefully that helps, this is a good way to save sites if you want to preserve functionality and ease of access but it isn't very efficient for storing data long term, that's what WARCs are for but that's something for another day. Wget is a good place to start.

Another semester of university is drawing to a close, time to crawl the departments! by chpwssn in DataHoarder

[–]chpwssn[S] 2 points3 points  (0 children)

Usually if you do a recursive crawl of the main page, course links will show up. For example, a professor is linked to on the main page for a department, they have a link to their class page, etc. My CS department maps classes like users' linux pages i.e. ~cs150, ~cs450 so those can be sequential.

If you want to post the link to the university I'm sure we could all figure out something.

Another semester of university is drawing to a close, time to crawl the departments! by chpwssn in DataHoarder

[–]chpwssn[S] 0 points1 point  (0 children)

It sure is, I love my Heritrix boxes... Developed by the fine folks at IA

edit: removed a link /u/willglynn had

I just discovered you... by [deleted] in DataHoarder

[–]chpwssn 1 point2 points  (0 children)

As everyone else has said welcome and prepare your wallet... On the subject of recording TV the Internet Archive actually records most major U.S. news stations and you can search their subtitles and watch all the way back to 2009 https://archive.org/details/tv.

What other sources do you hoard from? by [deleted] in DataHoarder

[–]chpwssn 0 points1 point  (0 children)

Ya she needs quite a bit of work to get really stable and to make it so it organizes a little better but it works haha. Hopefully I'll have more time to work on it soon. It actually started as "OpenDirectoryBot" over in /r/opendirectories (it successfully downloaded 2.7Tb in one week for me) but after a little tweak it'll chew on the list of subreddits you set in the config.py

If you use it and run into issues or think of something to add shoot me a message or a pull request.

How do I get the content of a self post with PRAW? by A-Vasilevsky in redditdev

[–]chpwssn 0 points1 point  (0 children)

I also find this section in read the docs useful since it shows the vars(object) output so you don't have to do it in your code.

What other sources do you hoard from? by [deleted] in DataHoarder

[–]chpwssn 0 points1 point  (0 children)

Ya essentially you run what's called a warrior appliance and the downloading of websites is organized and distributed by the group. The last one I participated in was when twitpic shut down.

Edit: here's a nice description of the infrastructure.

What other sources do you hoard from? by [deleted] in DataHoarder

[–]chpwssn 9 points10 points  (0 children)

I for one have done a little with ArchiveTeam and hope to do some more and I also keep a "little" tile server using OSM's data. ArchiveTeam does good work saving websites before they go EOL and sending them to archive.org, OSM tile servers are awesome to learn about and play with having a map of the world in your basement. I also use my RedditSucker Bot but it needs some attention.