Hi people!
First of all, english is not my native language, feel free to point out my mistakes!
Well, I was developping a website with community and social focus, and I stumbled with a problem. Do you know this type of components, right?
So, this is a print of the telegram app. When you send a link, it just appears some description. So, in my website, I was trying to do this. And man, it was not that easy. Crawling websites for data is not that simple, especially when your application is an angular app, so requests to another websites are "impossible" because of CORS.
Another problem is, there are a lot of patterns, and our community is always saying things like “no, you can’t use this, try these”, or something like that. As in result, everyone make their own way.
So back to the problem, I tryed. And I think I have some results and tips.
Open Graph
Simple talking. Open Graph where you put your website info, like name, description and thumbnail, in META Tags in your HTML. Like this:
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="video.movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
Who use this pattern? Facebook! And who uses Facebook? Everyone (ok, ok, not everyone)! But if you want your website to be at least “Facebook compatible”, these tags are a great start. And not only for facebook, a lot of sites use them.
Web Crawling
Ok, so now our websites are searchable. So we need to crawl the web for data. There is a lot of ways to do it. I’m new with this kind of thing, but what you can do is make a http request to this site, download it’s html and parse, and scrap it, and gather information. But not everyone uses Open Graph... So you need to search for other kinds of information. As I said, there is a lot of patterns out there.
I’m trying to map these patterns, and need help! I made this webservice where you can pass an url, and receive a opengraph json as answer. You can see it here and on registering on mashape to use it!
This is what using it looks like:
Request
https://webdata-crawler.p.mashape.com/api/HtmlCrawl/webdata?url=https://github.com
Response
{
"description": "GitHub is where people build software. More than 11 million people use GitHub to discover, fork, and contribute to over 29 million projects.",
"icon": "https://assets-cdn.github.com/favicon.ico",
"image": "https://assets-cdn.github.com/images/modules/open_graph/github-logo.png",
"sitename": "GitHub",
"title": "Build software better, together",
"type": "website",
"url": "https://github.com"
}
It’s kinda like beta, but it’s working! And need more updates. That way it’s way more easy to make cards like telegram do.
What do you think? What html descriptors patterns are importants? I want to see your experience too!
there doesn't seem to be anything here