A compelling alternative to Hpricot
Posted by Jeremy Voorhis Tue, 10 Apr 2007 15:52:00 GMT
After re-reading Nat Pryce’s Scrapheap Challege writeup, I tried to see how easily I could answer a simple question with only lynx, grep and friends. It turns out to be even simpler than I suspected. For example, the following tells me that my average blog post receives 2.2 comments.
# iterate through 17 pages
page=1; until [ $page -eq 17 ] ; do
lynx -dump "http://www.jvoorhis.com/articles/page/$page" | \
egrep "([[:digit:]]+|no) comments?" | \
sed -e "s/\[.*\]//g" -e "s/no/0/g" | \
awk '{print $1}' >> comments.txt
page=$(( page + 1 ))
done
avg comments.txt

“How could anything be better than Hpricot?!!”
When I read the blog title, I was a little scared, because I’ve grown to like Hpricot a lot . The thought of Hpricot being marginalized made me feel almost …threatened. Irrational? Perhaps. But I am only human.
With that said, I must say that that is some fine shell scripting you have there. It also looks very ruby-esque.
Thanks! But it’s not as Ruby-esque as I would like, since
until,whileandforwere the only iterators I had to play with.Although this is a trivial example, I think the following ideas let me bang out a trivial implementation:
lynxis a browser, but I wanted a tool that would download document, parse it into semi-structured text, and get out of the way.These are all good ideas that can be put to work while composing throw-away solutions for prototyping or problem solving.
Jeremy, what shell did you use?
I second the “Ruby-esqe” comment :) I did a double take lol!
I use zsh, but the above should run just fine in anything sh-compatible.
xargs is my secret iterating weapon.
I see, “avg” isn’t a builtin in zsh is it? mine on osx doesn’t seem to have it.
@amr
I wondered if anyone would catch that ;)
avgis this thing that I wrote a while ago that reads lists of numbers and, well, averages them. This is all:That is not the only utility of its kind I have. I am surprised there aren’t more command line tools for statistical processing.
I couldn’t find avg, either, so I decided to implement it as a shell function:
Heh :) actually just that very day I was doing a one liner in ksh & friends on HP/UX to sum up the disk sizes on one of our boxes and I had to use AWK variables to sum the sizes up in the pipeline. I checked out bc quickly but couldn’t find anything that would sum/avg a stream of numbers in a pipeline.
I’d really love to see bc -avg -i
capability (or something like that), please point me to one if someone knows it. I looked up and down that man page but was too hurried to futz around too much.
Spoke too soon! I think this would do the averge thingy:
expr `cat comments.txt | (tr ’\n’ +; echo 0) | bc` / `cat comments.txt | wc -l `
expr `cat input.txt | (tr '\n' +; echo 0) | bc` / `cat input.txt | wc -l `I think my prev comment was eaten by textile.
Beware of Randal Schwartz. He’s prone to handing out Useless Use of Cat Awards