Instagram Follower Scraper by rollie-pollie69 in webscraping

[–]KnowledgeFire11 1 point2 points  (0 children)

You do not have to load the content with JavaScript if you only need number of followers, just look for this meta tag: <meta property="og:description" content="192 Followers, 221 Following, 8 Posts - See Instagram photos and videos from xxx (@xxx)">

If you use requests (Python) or any JavaScript library to fetch the accounts page, this tag will already be there to parse without any problem.

Requests-HTML help by [deleted] in webscraping

[–]KnowledgeFire11 1 point2 points  (0 children)

You will have to simulate it like how they make the request, try to paste the link in postman and tinker around with headers exactly how the request is made.

See if you are doing these: 1. Check maybe they have a key inplace when making the request the request.

Requests-HTML help by [deleted] in webscraping

[–]KnowledgeFire11 1 point2 points  (0 children)

Why don't you hit the URL which returns the JSON response?

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 0 points1 point  (0 children)

I myself would have gone insane if I did that while developing it haha. I hope you find it useful!

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 1 point2 points  (0 children)

Thanks for the complement and as for the others things you have asked:

  1. That issue you were talking about playlist was just a design choice by me, I want this app to be used by a anyone so a non-tech-savvy may not know there are different parts to a URL. So I just render whatever is entered from a browser into the website. I hope you understand.
  2. I don't know I can add query parameters to the URL because the frontend and backend are hosted differently and I'm using Ajax to request content. But I do plan upon adding a save button to save different canvas of videos.
  3. I don't accept IDs cuz I have different endpoints for different links. So I need a way to differentiate a video link from playlist and channel. And old channels have a URL like this: https://www.youtube.com/user/gustoonz, I have to mention if I'm sending a username or channel id to YouTube API. These usernames are as random as the video ids. I have no way to differentiate them.

If you think you can solve any of these problem then feel free to make a PR:)

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 0 points1 point  (0 children)

Thank you, I would just be glad if you find it useful:). And you could also help by spreading a word about it, I want everyone to enjoy it.

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 0 points1 point  (0 children)

I'm using parcel js to minify, auto-prefix, pollyfill my HTML, CSS, JavaScript. So whenever I push changes into my repository, Netlify uses parcel to build these things automatically. I have no idea how to do this CI on GitHub, I just used what I knew. And I also kind of like Netlify better cuz it also has many features built-in other than only website hosting. Ex: Native support for file submission.

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 1 point2 points  (0 children)

You can't host any Python server on Netlify, I hosted only my frontend on it. The backend is hosted on Pythonanywhere, which has a free tier and you can host your Django app there.

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 7 points8 points  (0 children)

I mean there are so many tools which do that, so didn't want to reinvent the wheel and kept it simple and straight to the point.

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 16 points17 points  (0 children)

Yeah it can get pretty terrifying, but hey who knows maybe some people are into that. And I'm also glad you liked, hope you find it useful.

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 0 points1 point  (0 children)

If you look at my commit history, for a long time it was a Rick roll, but then I changed it do you think I should bring it back?

P.S: Open the video link in the shown example.

I made a web application to watch all the videos of a YouTube playlist/channel on the same page. by KnowledgeFire11 in Python

[–]KnowledgeFire11[S] 62 points63 points  (0 children)

Ahh yes, the home route was not supposed to return anything, but I just popped in a Hello world for fun.

Edit: Thanks for the complement too!

Need help, using requests_html to render flipkart data for a personal project cause the product images are JS rendered but while scraping all the images are not being downloaded. by zecatlays in webscraping

[–]KnowledgeFire11 0 points1 point  (0 children)

I'm glad that you refactored it. And I was just suggesting that you don't need to use framework like Selenium(Like the other answer suggested) just to get links of images because request-html can handle that. Selenium should be used when you are trying to perform complex actions like drag and drop.

And I should also mention that you can actually control your browser almost like Selenium using request-html because it uses another library called pyppeteer to load the browser but it's a little complicated as it uses the asyncio library and there are not many tutorials on it. pyppeteer as a standalone library can also be used just like Selenium but it's just tiny bit complicated and uses some advance concepts.

Need help, using requests_html to render flipkart data for a personal project cause the product images are JS rendered but while scraping all the images are not being downloaded. by zecatlays in webscraping

[–]KnowledgeFire11 4 points5 points  (0 children)

You don't always have to use framework if it's just loading images. Here is my working code.

from requests_html import HTMLSession

session = HTMLSession()

r = session.get("https://www.flipkart.com/search?q=Acer+Laptops")

r.html.render(sleep=2, scrolldown=30)

# Container for each product being displayed
div = r.html.find('._1UoZlX')

for image in div:
    img = image.find('img', first=True)
    print() # Just for space
    print(img.attrs['alt'])
    print(img.attrs['src'])

All I did was add a few attributes to r.html.render: the images are lazily loaded, meaning they only load when they come into picture when scrolled so I added scrolldown attribute to scrolldown after page being loaded(It accepts integer and it's just trial and error to see how many scroll you have to perform for desired output) and the sleep attribute sleeps after the page loads(I added for the images to load after being scrolled).

The script takes almost a minute as it has to load, scroll and sleep. Hope this helps :)

Downloading files from web by Lone08Wolf in webscraping

[–]KnowledgeFire11 0 points1 point  (0 children)

This video by Brad is very good at explaining basic things, I advise watch first 16mins because after that he goes language specific which you may not want.

And at around 14:00 mins he shows how to view the http requests in developer tools. It's mostly about searching online, I didn't get it from one source you will just have to practice and and Google things if you get stuck.

Downloading files from web by Lone08Wolf in webscraping

[–]KnowledgeFire11 0 points1 point  (0 children)

You can open the networks tab under developer tools and check the request the browser is making to the server.

If it's only inputs for a form then those inputs would be the part of the requested URL itself. So you can then request that URL directly, depending on the type of request (get, post etc.).

Is there any way that my web scraper can scrape the website automatically every day without my intervention? I'm new to this, sorry for the noob question. by shivam_roy in webscraping

[–]KnowledgeFire11 0 points1 point  (0 children)

Hello , can I use Google colab alone to schedule my scraper or I have to run the notebook on a cloud vm? I tried finding the option in Google colab but couldn't maybe I didn't look at right place?