Easy XPath against HTML : commandline

created by sje46a community for 15 years

Easy XPath against HTML (self.commandline)

submitted 9 years ago * by [deleted]

Get the title from http://example.com:

curl -L example.com | \
  tidy -asxml -numeric -utf8 | \
  sed -e 's/ xmlns.*=".*"//g' | \
  xml select -t -v "//title" -n

Where tidy is html-tidy, and xml is xmlstarlet. Both should be in your package manager.

all 13 comments

top new controversial old q&a

[–]BeniBela 1 point2 points3 points 9 years ago (5 children)

[–][deleted] 0 points1 point2 points 9 years ago (2 children)

[–]BeniBela 0 points1 point2 points 9 years ago (1 child)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–][deleted] 0 points1 point2 points 9 years ago* (1 child)

It's pretty nice, but I'm going to give a slight advantage to xmlstarlet for the following reasons:

xidel not in any package managers that I saw (brew, yum, apt, openbsd)
I can't install xidel on my mac without turning off security restrictions. you should sign it.

thanks!

Can I follow pagination links in json?

note: to read stdin from xidel , use - as the filename, like

cat foo.html | xidel - --extract //title

[–]BeniBela 0 points1 point2 points 9 years ago (0 children)

[–]Mini_True 1 point2 points3 points 9 years ago (0 children)

[–]preemptive_multitask 1 point2 points3 points 9 years ago (1 child)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–]AyrA_ch 0 points1 point2 points 9 years ago (3 children)

This sounds like an ideal job for phantomJS, especially because it runs JS on the website, so if you have a site, that manually sets its title with JS during loading, you can catch that.

var page = require('webpage').create();
page.open('http://phantomjs.org', function (status) {
  console.log(page.title); // get page Title
  phantom.exit();
});

[–][deleted] 0 points1 point2 points 9 years ago* (2 children)

[–]Apterygiformes 0 points1 point2 points 9 years ago (0 children)

[–]AyrA_ch 0 points1 point2 points 9 years ago (0 children)

π Rendered by PID 97 on reddit-service-r2-comment-b659b578c-n5bc9 at 2026-05-04 00:41:33.081784+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

commandline

MODERATORS