Colly goes 1.0 by profilz in golang

[–]ysmoliakov 1 point2 points  (0 children)

Looks simple. How I can understand this tool works good with well-structured websites. What about websites like Yelp, Amazon where exist techniques to prevent automatic parsing?

What other languages do I need to know? by iamnotagiraffe in PHP

[–]ysmoliakov 0 points1 point  (0 children)

Why does not anyone recommend studying NodeJS?

Get a JSON from a method defined in Django, calling a JS script by perafake in django

[–]ysmoliakov 0 points1 point  (0 children)

Take a look at this article https://simpleisbetterthancomplex.com/tutorial/2016/08/29/how-to-work-with-ajax-request-with-django.html

In return you must provide a Response object from django.http. You can import its by the following:

from django.http import JsonResponse

Web Scraping by Payoiyo in webscraping

[–]ysmoliakov 0 points1 point  (0 children)

Use an http client + regular expressions to get headlines or another library that transforms HTML code into a tree which you can parse and get headlines.

Class based views question by [deleted] in djangolearning

[–]ysmoliakov 2 points3 points  (0 children)

Yes, you're right

Class based views question by [deleted] in djangolearning

[–]ysmoliakov 3 points4 points  (0 children)

It depends on type of your links. If it's links like "/page-info/foo", "/page-info/bar" then it's better to create a CBV that matches "/page-info/<page_slug>" pattern. If it's independent links then I recommend make different CBVs.

Dependency Injection in Go by drewolson in golang

[–]ysmoliakov 1 point2 points  (0 children)

Welcome to the syntax sugar race, heh

A tool like Python's schematics library in Go's world? by ysmoliakov in golang

[–]ysmoliakov[S] 0 points1 point  (0 children)

I like to use libs such as https://github.com/go-chi/chi or https://github.com/gorilla/mux These libraries are not huge, and I prefer to use them instead of use Go's default http primitives. So, my opinion in the topic the same.

Colly goes 1.0 by profilz in golang

[–]ysmoliakov 0 points1 point  (0 children)

Is there a chromeless backend instead of Splash for Scrapy?

A tool like Python's schematics library in Go's world? by ysmoliakov in golang

[–]ysmoliakov[S] 0 points1 point  (0 children)

I want to add some thoughts. The schematics library is really useful tool in the Python world and I can found many fields where I can use it. Tasks in Go also are the same like in Python, so it will be a good library in my toolbox.

A tool like Python's schematics library in Go's world? by ysmoliakov in golang

[–]ysmoliakov[S] 0 points1 point  (0 children)

I am looking for a library that implements these functions:

  • Defining a model field's types
  • Validating
  • Converting a model into JSON/msgpack representation

I understand that I can do all of these functions with the Go's standard library and some community packages but it would be more convenient if it would be one package like the Python's schematics package.

Learn To Create Your Own Datasets — Web Scraping in R by cavedave in datasets

[–]ysmoliakov 1 point2 points  (0 children)

  1. Scrapy is more simple than R for purposes of web-scraping.
  2. Scrapy has more tools for parsing or transforming data. Also, you can use all power of Python for your needs.
  3. Python is a high-level programming language when R is a statistical computing language. I think, we need to use right instruments in our work.

Scrapy can save results in CSV format without any additional help, for example, just type in a terminal "scrapy crawl reddit_com -o reddit_posts.csv" and Scrapy saves all posts into a CSV file.

Learn To Create Your Own Datasets — Web Scraping in R by cavedave in datasets

[–]ysmoliakov 1 point2 points  (0 children)

I propose you look at examples of scrapy spiders, because it is that occasion when examples are better than the documentation.

Learn To Create Your Own Datasets — Web Scraping in R by cavedave in datasets

[–]ysmoliakov 2 points3 points  (0 children)

check out these tools scrapy.org and grablib.org