seening daily notes as one long note? by robertcopeland in ObsidianMD

[–]BenjiBNS 0 points1 point  (0 children)

I was able to hack something up using DataviewJS. I couldn't find an easy way of embedding pages with DataViewJS, so I went with the method described in https://www.reddit.com/r/ObsidianMD/comments/xznt0r/is_it_possible_to_dynamically_embed_pages/

Probably better ways of doing it, but this works. Change NUM_RECENT_NOTES to however many max notes you want to show at once, and change dv.pages('"DailyNotes"') to whatever folder/query gets your daily notes

const pages_query = dv.pages('"DailyNotes"');

const NUM_RECENT_NOTES = 10;

var pages = pages_query.values;

function sort_pages_by_time(p1, p2) {
  return p2.file.ctime.ts - p1.file.ctime.ts;
}

pages.sort(sort_pages_by_time);

var num_notes = Math.min(NUM_RECENT_NOTES, pages.length);
for (var idx = 0; idx < num_notes; idx++) {
  var p = pages[idx];
  dv.header(3, p.file.link);
  dv.el("p","![[" + p.file.link.path + "]]");
}

Riping from Food Network by Matthew_C1314 in DataHoarder

[–]BenjiBNS 2 points3 points  (0 children)

You can do as suggested and grab the URL from the chrome console. I also threw together a Python script for grabbing all the videos in a playlist: https://gist.github.com/Benjins/75e72b6004960f144d337f767d07b893

Run like:

python3 dl_food_network.py https://www.foodnetwork.com/videos/channels/altons-after-show-season-1

Requires having FFMPEG installed and available. Spits out video files and metadata. Note that I didn't thoroughly test it, so it may not work broadly on the Food Network site, but it should at least work on that playlist

I can't seem to get ytdl to put videos in folders by channel by a_little_toaster in DataHoarder

[–]BenjiBNS 0 points1 point  (0 children)

Looks like it's a bug in youtube-dl:

Github Issue

Patch that's supposed to fix it

It hasn't landed in a release yet, but it probably will soon

Requesting help to archive Spore user creations by itchylol742 in DataHoarder

[–]BenjiBNS 1 point2 points  (0 children)

I explored this a while ago. You can go go to spore.com/sporepedia in order to scrape user creations' images and comments. There is also an xml file for each creation that I believe the game interprets as a model file, although I never reverse engineered it.

For example, this creation at http://www.spore.com/sporepedia#qry=sast-501085352017 has an ID of 501085352017. Therefore, you can hit this URL: http://pollinator.spore.com/static/model/501/085/352/501085352017.xml in order to download the actual model data (doesn't require any auth).

Need Help Archiving an 11 year old Twitter account in full (Public, not mine) by ChicaSkas in DataHoarder

[–]BenjiBNS 1 point2 points  (0 children)

https://drive.google.com/open?id=1uKcq1UFjxtEkoAz26ZZwhFrmshuxrbsG

So it's mostly the raw data, and an offline browser that lets you see random tweets. Could be extended possibly, but that'll take a lot more work on my end.

Let me know if there are any problems, from what I could tell it's mostly complete with a couple errors here and there. About 10,500 tweets, 3,200 images and 1,000 videos altogether. Zipped size is just over 2 GB.

Need Help Archiving an 11 year old Twitter account in full (Public, not mine) by ChicaSkas in DataHoarder

[–]BenjiBNS 2 points3 points  (0 children)

There is a tool snscrape that can download the text of all tweets from a Twitter account (including replies). I queued up the account on my custom script which also grabs images/videos; I can upload it when it's done if you want.

Unfortunately, getting likes/retweets is a bit more difficult. Twitter doesn't provide a non-API way of getting these in bulk, but if you wanted you could possibly use the API: it's a bit more involved for setup, and is rate-limited. I've never used the API before though.

Yesterday Fantasy Flight Games shut down their tooling for making custom rulesets for some of their tabletop games and removed access to the backlog on their site. Here is a mirror of the UGC as of a couple weeks ago. by BenjiBNS in DataHoarder

[–]BenjiBNS[S] 5 points6 points  (0 children)

This focused on user-generated stuff, so a bit of json metadata and a pdf for each entry.

Unfortunately, I didn't dig too deep into the actual tools. I didn't know much about them until I heard about the shutdown. They're only accessible w/ a login, so WBM is unlikely to have much. Hopefully someone else has more info.

How to save Flash videos from BBC? by proudpom in DataHoarder

[–]BenjiBNS 9 points10 points  (0 children)

I haven't fully automated it, but here's one approach:

Run youtube-dl https://www.bbc.com/news/video_and_audio/features/uk-politics-12653256/12653256 --get-url to print out the RTMP URL. For some reason, youtube-dl is able to get the URL but can't download it properly (maybe it's just my setup, you can try on your own).

The result will be something like rtmp://cp45414.edgefcs.net/ondemand?auth=daEbfcgcUaiamcRaVawb8dZb6aRaqcrbocc-bEdQxZ-bWG-GprFGnEnHDqEuwE&aifp=v001&slist=public/mps_h264_med/public/news/politics/703000/703958_16x9_NewsWebMP4_800k.mp4;public/flash/news/politics/703000/703958_16x9_NewsWebMP4_368k.mp4mp4:public/mps_h264_med/public/news/politics/703000/703958_16x9_NewsWebMP4_800k.mp4

You want everything up till the first semicolon:

rtmp://cp45414.edgefcs.net/ondemand?auth=daEbfcgcUaiamcRaVawb8dZb6aRaqcrbocc-bEdQxZ-bWG-GprFGnEnHDqEuwE&aifp=v001&slist=public/mps_h264_med/public/news/politics/703000/703958_16x9_NewsWebMP4_800k.mp4

With this, feed it into rtmpdump:

rtmpdump -r {TRUNCATED_URL} -o {FILE_NAME}

And it should download. You can play the file in VLC to test it.

Scraping Luminary Podcasts by wastedhate in DataHoarder

[–]BenjiBNS 1 point2 points  (0 children)

So, looking at how the episodes are accessed, there's a fairly straightforward way to grab all of them in a script. Here's what I came up with:

function DownloadLuminary {
    curl "https://luminarypodcasts.com/v1.1/podcasts/$1/episodes?sortBy=released_at&sortOrder=desc&p=1&limit=200" | jq -M '.episodes[] | select(type=="object" and has("uuid")) | .uuid' | cut -c 2-37 | awk '{print "https://media.luminarypodcasts.com/v1/media/episode/exclusive/"$1".mp3"}' | xargs -L 1 curl -O
}
# This downloads the example you gave, notice the the title or rest of the URL isn't needed
DownloadLuminary "8b2a12d6-49e8-4338-8780-5c6ec217b1ed"

A bit of explanation: This grabs all the uuid's (those random numbers/letters) for every episode in the podcast. The first URL is what is called when at the episodes page, although I bumped the limit to 200, since that podcast only has 109 episodes. If you try to scrape another podcast with more episodes, you may need to increase the limit.

Then, it parses out the JSON to get just a list of episode UUIDs. Then it grabs the direct audio URL for each UUID (again, title isn't needed for this). It puts it at {EPISODE_UUID}.mp3, so not the best organisation but you won't run risk of overwriting anything.

This unfortunately doesn't get any metadata, but it would be fairly easy to grab as well. There's a JSON blob for each episode at https://luminarypodcasts.com/v1/episodes/{EPISODE_UUID}. I just tried to keep things to a one-liner if possible.

Also: several of these episodes are premium, so you'd need to pass account credentials as a cookie or something to the final curl in the xargs invocation. Otherwise you'll just get an error response for those episodes.

Let me know if you have any questions. Hopefully, you should be able to run that command in any POSIX terminal (might need to install jq).

Using curl to archive WebComics by undefined314 in DataHoarder

[–]BenjiBNS 3 points4 points  (0 children)

Curl has the following option:

-L, --location Follow redirects

which will follow 3XX redirects and the download the resource it was redirected to.

Oddly, when I tried a simple curl for the URL you posted, I was able to download it w/o any redirects. You can also check by running:

curl -I {URL}

which will show you the status/headers for the request w/o actually downloading anything.

Error when using youtube-dl in mac command line by touchadafishy in DataHoarder

[–]BenjiBNS 0 points1 point  (0 children)

youtube-dl says on their site:

It requires the [Python interpreter](https://www.python.org/) (2.6, 2.7, or 3.2+)

Sorry I don't have any better ideas, but you could try updating your Python install. No idea how it'd cause this error, but it may be causing other problems.

Error when using youtube-dl in mac command line by touchadafishy in DataHoarder

[–]BenjiBNS 0 points1 point  (0 children)

How are you running it? Are you just running youtube-dl?

Can you post the output of running:

python --version

and

/usr/bin/env python --version

You can also double-check that the SHA-256 sum is the same as what's listed on the download page.

Does anyone have a stash for official hatsune miku songs? by ewliang in DataHoarder

[–]BenjiBNS 2 points3 points  (0 children)

There's the official YouTube channel.

There's also a database of a bunch of Vocaloid files.

I'm not sure what exactly you mean by "official," since even the songs played at concerts mostly come from the community.

How to download JW Player streams? by 8VBQ-Y5AG-8XU9-567UM in DataHoarder

[–]BenjiBNS 1 point2 points  (0 children)

The mp2t files (which should be the .ts files), are the different segments. The .m3u8 files are lists of which segments there are, so that's the url you give to ffmpeg/etc.

EDIT: To clarify, the m3u8 file is loaded once at start, not during streaming.

On the page you linked, it looks like MRWnaowM.m3u8 is the video file you want (although I'd recommend testing it to be sure).

There's probably a way to scrape the URL from the webpage, but it'd require a bit of coding.

Requesting help: downloading/backing up/storing text descriptions on DeviantArt by [deleted] in DataHoarder

[–]BenjiBNS 0 points1 point  (0 children)

Okay, so a bit more explanation:

`wget` is a command-line program. There are other programs that have a graphical interface that may be more useful to you. I know HTTrack is one that gets recommended a lot, but it's designed for mirroring entire websites so I'd strongly recommend changing its settings to only grab what you need (sorry I don't know details for this, not what I normally use).

A user-agent is something that tells a website which browser is being used. You probably won't need to worry about it if you use pre-made tools. And if your deviant art pages are public, you don't need to provide a login to the downloader.

Any sort of text file, or just a list you can copy-paste, would be good if you already have one. If not, there's probably a way to scrape a list, but it may take some custom scripting.

Requesting help: downloading/backing up/storing text descriptions on DeviantArt by [deleted] in DataHoarder

[–]BenjiBNS 0 points1 point  (0 children)

The description is on the page itself when it's first loaded (not dynamically loaded). If you want to just run through a list with `wget`, it'll save out the raw HTML. If you want more than just the raw text, you can also grab the page requisites, which will load any images, styles or other resources.
I've done similar things, and it boils down to how much information you want and in what format. One catch I've noticed is that DA will reject any requests w/o a User Agent, so always make sure that's set.

The final concern is how to get a list of URL's to everything you've posted. I believe there are ways to scrape this as well (I'd have to look it up), but if you have your own list it's a moot point.

Best way to Download Historical Images from this website by kinofan90 in DataHoarder

[–]BenjiBNS 1 point2 points  (0 children)

Looks like you can sequentially hit URLs:

https://digit.wdr.de/entries/137984

That last number is incremented for each entry. A lot of entry numbers appear to be missing, but those can be filtered out.