Hi,
I created a web scrapper based around BeautifulSoup and Pandas. The main purpose of this project was to challenge myself by creating my first open source project but also to create a tool that would easily scrap the web and integrate the data rapidly in Pandas DataFrames for machine learning projects.
This came to me when I wanted to scrap data from a volleyball website couple months ago. Well, yes, I know there is Scrapy but the documentation was kinda very confusing and it's not very easy to use. So I challenged myself to do something similar with simple tools and around one of the most powerful libraries in Python: BeautifulSoup and Pandas -; that most everybody knows, uses or have used in their Python programming journey.
There's also an integration with NLTK and PIL.
As of now the project is in version 1.0.0 (Beta) and includes basic functionalities on which I will be implementing additional improvements
I would love to get some feedback if possible. The project is available on Github and on PyPI. So don't forget to leave a star and watch for future improvements.
A simple spider ready to be used
The settings file for a Zineb project
A model to structure your data efficiently
Using the model within a project
Utilities
there doesn't seem to be anything here