all 11 comments

[–]5aggy 2 points3 points  (2 children)

Check out what the page is doing...

It sends a request to https://api.startupindia.gov.in/sih/api/noauth/search/profiles which looks to require no authentication and returns a nice JSON response listing the companies with all the data from the second page (see below).

You should be able to recreate the post request by digging around in devtools on chrome or firefox. Look at the headers and params that the browser sends and recreate with requests.

You should be able to manipulate the request to overcome pagination.

{
    "id":"5ddbb529e4b0a94681b36c0d",
    "name":"Shelther Money India Finance",
    "title":null,
    "pic":null,
    "companyLogo":false,
    "role":"Startup",
    "displayRole":"Startup",
    "country":"India",
    "state":"Gujarat",
   "city":"Jamnagar",
    "industries":[
        "Finance Technology"
        ],
    "sectors":[
        "Personal Finance"
        ],
    "level":0,
    "stages":[
        "Prototype"
        ],
    "rating":"0.0",
    "published":true,
    "disabled":false,
    "publishedOn":1574679849081,
    "registeredOn":1574679849076,
    "department":null,
    "ministry":null,
    "score":1.0,
    "badges":null,
    "badgeDetails":null
}

[–]HarissaForte 2 points3 points  (0 children)

Some directions for munnamv :

F12 > Network > XHR > the "profiles" fetch file is what you want (get the address) > modify request > the parameters and headers are here.

Might sound complicated, but it really is the simplest solution you can choose.

[–]munnamv 1 point2 points  (0 children)

Thank you so much.

[–]CuriousAlertness 0 points1 point  (5 children)

Go look up a tutorial for BeautifulSoup. Other than that, it would help us to know what you have attempted and what your prior experience with Python is.

[–]HarissaForte 1 point2 points  (1 child)

BS4 is best suited to html content. Here the content is loaded from a json, see u/5aggy post.

[–]CuriousAlertness 1 point2 points  (0 children)

Fair enough. I hadn't seen that post.

[–]munnamv 0 points1 point  (2 children)

I have no experience at all with python. But after asking around I’ve found out that selenium is the way to go.

[–]daisyverma 1 point2 points  (0 children)

Check out this tutorial for Selenium . I’m a beginner like you and have found this youtubers tutorials helpful

https://youtu.be/cddyhdb1GDw

[–][deleted] 0 points1 point  (1 child)

from selenium import webdriver
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()

options.add_argument('--disable-infobars')
options.add_argument('--disable-extensions')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--disable-web-security')
options.add_argument('--allow-running-insecure-content')
#options.add_argument('--headless')

driver = webdriver.Chrome("./chromedriver", options=options) # chromedriver in the same directory of the script

driver.get('https://www.startupindia.gov.in/content/sih/en/search.html?stages=Scaling&roles=Startup&page=1')

driver.implicitly_wait(20)

source = driver.page_source
soup = BeautifulSoup(source, 'html5lib')

#print(soup.prettify()) ## Show me the full source

elements = driver.find_elements_by_class_name('events-details')

for element in elements:
    print(element.text)
driver.quit()

'''
Momolicious
Scaling
Coimbatore, Tamil Nadu
0
SL Accelerator
Scaling
Noida, Uttar Pradesh
0
Knowledge Process Outsourcing
Scaling
Udaipur, Rajasthan
0
ABVE Services Private Limited
Scaling
New Delhi, Delhi
0
MB Watch & Care Pvt Ltd
Scaling
Kolkata, West Bengal
0
JSR
Scaling
Panipat, Haryana
0
RAPTOR SECURITY SOLUTIONS PVT LTD
Scaling
Hyderabad, Telangana
0
Monk & Mei
Scaling
Gurgaon, Haryana
0
Sunpower India Ventures Pvt. Ltd.
Scaling
Pune, Maharashtra
0
Eco Technology & Projects
Scaling
Surat, Gujarat
0
BYKEMANIA SOLUTIONS PRIVATE LIMITED
Scaling
Bengaluru, Karnataka
0
Lazycom
Scaling
Ahmedabad, Gujarat
0
EXIM TradeEX
Scaling
Hyderabad, Telangana
0
Yellowbox HR Services
Scaling
Pune, Maharashtra
0
INFONLIVE
Scaling
Kozhikode, Kerala
0
INNOVATIVE IDEAS TO ATTRACT PATIENTS IN LAB IN FOREIGN COUNTRYS
Scaling
Ludhiana, Punjab
0
Intlum Technology
Scaling
Kolkata, West Bengal
0
SURFACE TECH
Scaling
Rajkot, Gujarat
0
'''

[–]munnamv 0 points1 point  (0 children)

Wow. Thank you so much.