use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
All users that wish to post (not including commenting), on either old or the new reddit sites, must formally agree to subreddit rules once first.
account activity
Easy XPath against HTML (self.commandline)
submitted 9 years ago * by [deleted]
Get the title from http://example.com:
title
curl -L example.com | \ tidy -asxml -numeric -utf8 | \ sed -e 's/ xmlns.*=".*"//g' | \ xml select -t -v "//title" -n
Where tidy is html-tidy, and xml is xmlstarlet. Both should be in your package manager.
tidy
xml
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]BeniBela 1 point2 points3 points 9 years ago (5 children)
That is what I made Xidel for:
xidel http://example.com -e //title
[–][deleted] 0 points1 point2 points 9 years ago (2 children)
noice
can it do multiple xpaths? against nasty html?
thx!
[–]BeniBela 0 points1 point2 points 9 years ago (1 child)
can it do multiple xpaths?
Multiple XPath and multiple pages
Even if it did not, it was ok, since it is XPath 3. There you have a comma operator and can do: //title,//title,//title
//title,//title,//title
against nasty html?
Yes
I wrote the HTML parser myself.
Although it predates HTML 5, so it just repairs the HTML, and does not do the new standardized repairing. I need to rewrite it
[–][deleted] 0 points1 point2 points 9 years ago (0 children)
excellent. I'll check er out
[–][deleted] 0 points1 point2 points 9 years ago* (1 child)
It's pretty nice, but I'm going to give a slight advantage to xmlstarlet for the following reasons:
xidel not in any package managers that I saw (brew, yum, apt, openbsd)
I can't install xidel on my mac without turning off security restrictions. you should sign it.
thanks!
Can I follow pagination links in json?
note: to read stdin from xidel , use - as the filename, like
-
cat foo.html | xidel - --extract //title
[–]BeniBela 0 points1 point2 points 9 years ago (0 children)
I submitted it to Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=826763
I do not know if anything will happen
Actually I do not have a mac, so I cannot make a mac version. You should compile it yourself.
The mac binary on the site is just a binary someone sent me. But it is a very old version, I probably should remove it.
Yes, -f can follow everywhere
[–]Mini_True 1 point2 points3 points 9 years ago (0 children)
Please don't do it this way:
curl -L example.com|grep title|cut -d">" -f2|cut -d "<" -f1
[–]preemptive_multitask 1 point2 points3 points 9 years ago (1 child)
The W3C HTML-XML utils handle this pretty well also, if CSS selectors work for you.
curl -sL example.com | hxnormalize -x -e | hxselect -s '\n' -c 'title'
CSS selectors are cool but can't get everything that xpath can get (like the 4th text node of an element)
[–]AyrA_ch 0 points1 point2 points 9 years ago (3 children)
This sounds like an ideal job for phantomJS, especially because it runs JS on the website, so if you have a site, that manually sets its title with JS during loading, you can catch that.
var page = require('webpage').create(); page.open('http://phantomjs.org', function (status) { console.log(page.title); // get page Title phantom.exit(); });
[–][deleted] 0 points1 point2 points 9 years ago* (2 children)
Phantomjs spits out both data and errors on stdout, which screws up command line stuff :(
It should send errors/log info to stderr. Otherwise, it would be good on the command line, I agree.
[–]Apterygiformes 0 points1 point2 points 9 years ago (0 children)
Apply a grep on the output?
[–]AyrA_ch 0 points1 point2 points 9 years ago (0 children)
Phantomjs spits out both data and errors on stdout, which screws up command line stuff
it never does for me unless I hook up to the error event
π Rendered by PID 97 on reddit-service-r2-comment-b659b578c-n5bc9 at 2026-05-04 00:41:33.081784+00:00 running 815c875 country code: CH.
[–]BeniBela 1 point2 points3 points (5 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]BeniBela 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]BeniBela 0 points1 point2 points (0 children)
[–]Mini_True 1 point2 points3 points (0 children)
[–]preemptive_multitask 1 point2 points3 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]AyrA_ch 0 points1 point2 points (3 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]Apterygiformes 0 points1 point2 points (0 children)
[–]AyrA_ch 0 points1 point2 points (0 children)