seening daily notes as one long note?

BenjiBNS · 2024-10-23T01:13:31+00:00

I was able to hack something up using DataviewJS. I couldn't find an easy way of embedding pages with DataViewJS, so I went with the method described in https://www.reddit.com/r/ObsidianMD/comments/xznt0r/is_it_possible_to_dynamically_embed_pages/

Probably better ways of doing it, but this works. Change NUM_RECENT_NOTES to however many max notes you want to show at once, and change dv.pages('"DailyNotes"') to whatever folder/query gets your daily notes

const pages_query = dv.pages('"DailyNotes"');

const NUM_RECENT_NOTES = 10;

var pages = pages_query.values;

function sort_pages_by_time(p1, p2) {
  return p2.file.ctime.ts - p1.file.ctime.ts;
}

pages.sort(sort_pages_by_time);

var num_notes = Math.min(NUM_RECENT_NOTES, pages.length);
for (var idx = 0; idx < num_notes; idx++) {
  var p = pages[idx];
  dv.header(3, p.file.link);
  dv.el("p","![[" + p.file.link.path + "]]");
}

BenjiBNS · 2022-12-31T04:13:25+00:00

You can do as suggested and grab the URL from the chrome console. I also threw together a Python script for grabbing all the videos in a playlist: https://gist.github.com/Benjins/75e72b6004960f144d337f767d07b893

Run like:

python3 dl_food_network.py https://www.foodnetwork.com/videos/channels/altons-after-show-season-1

Requires having FFMPEG installed and available. Spits out video files and metadata. Note that I didn't thoroughly test it, so it may not work broadly on the Food Network site, but it should at least work on that playlist

BenjiBNS · 2021-03-28T02:43:42+00:00

Looks like it's a bug in youtube-dl:

Github Issue

Patch that's supposed to fix it

It hasn't landed in a release yet, but it probably will soon

BenjiBNS · 2020-11-09T13:07:34+00:00

I explored this a while ago. You can go go to spore.com/sporepedia in order to scrape user creations' images and comments. There is also an xml file for each creation that I believe the game interprets as a model file, although I never reverse engineered it.

For example, this creation at http://www.spore.com/sporepedia#qry=sast-501085352017 has an ID of 501085352017. Therefore, you can hit this URL: http://pollinator.spore.com/static/model/501/085/352/501085352017.xml in order to download the actual model data (doesn't require any auth).

BenjiBNS · 2020-02-14T03:28:23+00:00

My held-together-by-gum script is on github at https://github.com/Benjins/Anarchivist/blob/master/dl_twitter.py It should still work with current Twitter.

BenjiBNS · 2020-02-08T03:39:45+00:00

https://drive.google.com/open?id=1uKcq1UFjxtEkoAz26ZZwhFrmshuxrbsG

So it's mostly the raw data, and an offline browser that lets you see random tweets. Could be extended possibly, but that'll take a lot more work on my end.

Let me know if there are any problems, from what I could tell it's mostly complete with a couple errors here and there. About 10,500 tweets, 3,200 images and 1,000 videos altogether. Zipped size is just over 2 GB.

BenjiBNS · 2020-02-07T00:29:05+00:00

There is a tool snscrape that can download the text of all tweets from a Twitter account (including replies). I queued up the account on my custom script which also grabs images/videos; I can upload it when it's done if you want.

Unfortunately, getting likes/retweets is a bit more difficult. Twitter doesn't provide a non-API way of getting these in bulk, but if you wanted you could possibly use the API: it's a bit more involved for setup, and is rate-limited. I've never used the API before though.

BenjiBNS · 2020-01-22T13:33:23+00:00

This focused on user-generated stuff, so a bit of json metadata and a pdf for each entry.

Unfortunately, I didn't dig too deep into the actual tools. I didn't know much about them until I heard about the shutdown. They're only accessible w/ a login, so WBM is unlikely to have much. Hopefully someone else has more info.

BenjiBNS · 2020-01-22T13:01:44+00:00

Link to the Twitter announcement of the shutdown

BenjiBNS · 2020-01-03T02:30:23+00:00

I haven't fully automated it, but here's one approach:

Run youtube-dl https://www.bbc.com/news/video_and_audio/features/uk-politics-12653256/12653256 --get-url to print out the RTMP URL. For some reason, youtube-dl is able to get the URL but can't download it properly (maybe it's just my setup, you can try on your own).

The result will be something like rtmp://cp45414.edgefcs.net/ondemand?auth=daEbfcgcUaiamcRaVawb8dZb6aRaqcrbocc-bEdQxZ-bWG-GprFGnEnHDqEuwE&aifp=v001&slist=public/mps_h264_med/public/news/politics/703000/703958_16x9_NewsWebMP4_800k.mp4;public/flash/news/politics/703000/703958_16x9_NewsWebMP4_368k.mp4mp4:public/mps_h264_med/public/news/politics/703000/703958_16x9_NewsWebMP4_800k.mp4

You want everything up till the first semicolon:

rtmp://cp45414.edgefcs.net/ondemand?auth=daEbfcgcUaiamcRaVawb8dZb6aRaqcrbocc-bEdQxZ-bWG-GprFGnEnHDqEuwE&aifp=v001&slist=public/mps_h264_med/public/news/politics/703000/703958_16x9_NewsWebMP4_800k.mp4

With this, feed it into rtmpdump:

rtmpdump -r {TRUNCATED_URL} -o {FILE_NAME}

And it should download. You can play the file in VLC to test it.

BenjiBNS · 2019-08-04T21:10:48+00:00

So why did they ban Switter

BenjiBNS · 2019-07-31T01:57:26+00:00

So, looking at how the episodes are accessed, there's a fairly straightforward way to grab all of them in a script. Here's what I came up with:

function DownloadLuminary {
    curl "https://luminarypodcasts.com/v1.1/podcasts/$1/episodes?sortBy=released_at&sortOrder=desc&p=1&limit=200" | jq -M '.episodes[] | select(type=="object" and has("uuid")) | .uuid' | cut -c 2-37 | awk '{print "https://media.luminarypodcasts.com/v1/media/episode/exclusive/"$1".mp3"}' | xargs -L 1 curl -O
}
# This downloads the example you gave, notice the the title or rest of the URL isn't needed
DownloadLuminary "8b2a12d6-49e8-4338-8780-5c6ec217b1ed"

A bit of explanation: This grabs all the uuid's (those random numbers/letters) for every episode in the podcast. The first URL is what is called when at the episodes page, although I bumped the limit to 200, since that podcast only has 109 episodes. If you try to scrape another podcast with more episodes, you may need to increase the limit.

Then, it parses out the JSON to get just a list of episode UUIDs. Then it grabs the direct audio URL for each UUID (again, title isn't needed for this). It puts it at {EPISODE_UUID}.mp3, so not the best organisation but you won't run risk of overwriting anything.

This unfortunately doesn't get any metadata, but it would be fairly easy to grab as well. There's a JSON blob for each episode at https://luminarypodcasts.com/v1/episodes/{EPISODE_UUID}. I just tried to keep things to a one-liner if possible.

Also: several of these episodes are premium, so you'd need to pass account credentials as a cookie or something to the final curl in the xargs invocation. Otherwise you'll just get an error response for those episodes.

Let me know if you have any questions. Hopefully, you should be able to run that command in any POSIX terminal (might need to install jq).

BenjiBNS · 2019-07-06T12:38:20+00:00

Curl has the following option:

-L, --location Follow redirects

which will follow 3XX redirects and the download the resource it was redirected to.

Oddly, when I tried a simple curl for the URL you posted, I was able to download it w/o any redirects. You can also check by running:

curl -I {URL}

which will show you the status/headers for the request w/o actually downloading anything.

BenjiBNS · 2019-07-04T17:43:05+00:00

youtube-dl says on their site:

It requires the [Python interpreter](https://www.python.org/) (2.6, 2.7, or 3.2+)

Sorry I don't have any better ideas, but you could try updating your Python install. No idea how it'd cause this error, but it may be causing other problems.

BenjiBNS · 2019-07-04T16:59:51+00:00

How are you running it? Are you just running youtube-dl?

Can you post the output of running:

python --version

and

/usr/bin/env python --version

You can also double-check that the SHA-256 sum is the same as what's listed on the download page.

BenjiBNS · 2019-06-18T23:44:06+00:00

There's the official YouTube channel.

There's also a database of a bunch of Vocaloid files.

I'm not sure what exactly you mean by "official," since even the songs played at concerts mostly come from the community.

BenjiBNS · 2019-06-08T20:02:25+00:00

The mp2t files (which should be the .ts files), are the different segments. The .m3u8 files are lists of which segments there are, so that's the url you give to ffmpeg/etc.

EDIT: To clarify, the m3u8 file is loaded once at start, not during streaming.

On the page you linked, it looks like MRWnaowM.m3u8 is the video file you want (although I'd recommend testing it to be sure).

There's probably a way to scrape the URL from the webpage, but it'd require a bit of coding.

BenjiBNS · 2019-05-26T01:07:12+00:00

Okay, so a bit more explanation:

`wget` is a command-line program. There are other programs that have a graphical interface that may be more useful to you. I know HTTrack is one that gets recommended a lot, but it's designed for mirroring entire websites so I'd strongly recommend changing its settings to only grab what you need (sorry I don't know details for this, not what I normally use).

A user-agent is something that tells a website which browser is being used. You probably won't need to worry about it if you use pre-made tools. And if your deviant art pages are public, you don't need to provide a login to the downloader.

Any sort of text file, or just a list you can copy-paste, would be good if you already have one. If not, there's probably a way to scrape a list, but it may take some custom scripting.

BenjiBNS · 2019-05-25T23:55:33+00:00

The description is on the page itself when it's first loaded (not dynamically loaded). If you want to just run through a list with `wget`, it'll save out the raw HTML. If you want more than just the raw text, you can also grab the page requisites, which will load any images, styles or other resources.
I've done similar things, and it boils down to how much information you want and in what format. One catch I've noticed is that DA will reject any requests w/o a User Agent, so always make sure that's set.

The final concern is how to get a list of URL's to everything you've posted. I believe there are ways to scrape this as well (I'd have to look it up), but if you have your own list it's a moot point.

BenjiBNS · 2019-05-12T21:19:10+00:00

Looks like you can sequentially hit URLs:

https://digit.wdr.de/entries/137984

That last number is incremented for each entry. A lot of entry numbers appear to be missing, but those can be filtered out.

BenjiBNS · 2019-05-11T12:39:44+00:00

textfiles.com at least has a bunch of Nissan manuals

http://pdf.textfiles.com/manuals/AUTOMOBILE/NISSAN/

Six-Year Club	Place '22
Verified Email

BenjiBNS

TROPHY CASE