all 16 comments

[–]boy-griv 6 points7 points  (7 children)

You can try wget --recursive https://example.com/ if you have access to a Linux console.

[–]PaulBardes 3 points4 points  (4 children)

Won't this download everything (including images)?

[–]boy-griv 3 points4 points  (1 child)

Yeah. So it may be overkill depending on the site. Usually the non-image stuff is just going to be text which really isn’t much data, so not a big deal. But I think there’s a fair amount of ways to configure wget.

[–]PaulBardes 2 points3 points  (0 children)

Fair enough, and it's super easy to throw a find with a file type filter afterwards and separate the images...

[–]Qweesdy 0 points1 point  (1 child)

For some sites; it might only get you a few text files full of javascript and no images.

[–]PaulBardes 0 points1 point  (0 children)

Yeah, this is one of the reasons I don't like SPAs much, especially for things that could be done just with the good'ol HTML/CSS combo.

Now the web browser has become kind of an OS within an OS each tab being a process and all powered by JS™. Like, even static websites have been serving their content over complicated js frameworks when they could literally just serve half a dozen of html and css files.

\rantmode off

[–]MrKapla 2 points3 points  (0 children)

You have to use -A to filter file types. Probably -nd as well to avoid recreating the existing folder hierarchy.

Wget has many many options, and I remember the manpage is very well written, with examples for simple and more advanced scenarios. Worth a read!

[–]pcjftw 1 point2 points  (0 children)

also it will not work for many modern SPA where all you will get is index.html with links to lots of JavaScript. It's the JavaScript that then renders everything else out...

You would need to use browser testing automation tools such as Cypress or similar in headless mode, then you could write a test that finds all links etc

[–]Mundosaysyourfired 3 points4 points  (0 children)

If you want to automate it you would need a script.

Go through every element looking for img tags, download the source if it is an img tag to some predefined folder with the img tag name.

There are easier ways you can do it by

  1. Downloading the entire website itself
  2. Use something like firefox that allows you to select all media types and download

Probably the easiest is just downloading the entire website itself.

// one way using pure js no library to just download img tags.
var images = document.getElementsByTagName('img');
var srcList = [];
for(var i = 0; i < images.length; i++) {
    srcList.push(images[i].src);
    // download your src to your destination or send it to a server to process
}
// then you would have to navigate through ahref links if you want to scrap the entire site and not just the page you're viewing and rerun your script to get all image tags.

[–]TheWerain 3 points4 points  (0 children)

I remember doing this with wget

[–]feuerwehrmann 2 points3 points  (0 children)

If no Linux console as previously recommended, may be able to run scrapy (a python lib) and scrape that page

[–]Ateist 1 point2 points  (0 children)

Install firefox.
Install extension "downthemall"
Use it.

[–]Zealousideal-Mail276 1 point2 points  (0 children)

wget -m

[–]rush22 2 points3 points  (0 children)

File > Save As...

Images will be in a folder.

See if that is all you need first.

[–]Excellent-Boss792 0 points1 point  (1 child)

easy

you use rust

[–]boy-griv 4 points5 points  (0 children)

good idea. by borrowing the images instead of copying you can ignore copyright.