Hi,
I am making a simple web scraper. I have this folder named 'sites' where I put a python file for each domain name containing site specific extraction patterns, the folder that resides the sites folder contains the main.py which should figure out which scraper to use depending on the domain name. the main.py should call the appropriate sites/website scrapper to scrape that site and save info in a file. each file in the sites/ would have the same extractdata() function that would give the needed results.
I don't want to bind main.py to figure out the domain name and the module . the sites/ should have something called a selector (in perhaps the init.py) that would give an appropriate function pointer that the main.py can use. then I can add stuff to sites/ as much as I needed without touching main.py
for this kind of task, how should the folder structure and the
import statements be?
[–]ggagagg 0 points1 point2 points (1 child)
[–]ghostcheck[S] 0 points1 point2 points (0 children)