I recently rewrote my girlfriend’s personal site to list recent LiveJournal entries, recent photos on Flickr and such.
LiveJournal offers various data feeds, but only if you’re a paying customer. These feeds seem to include the “cut” post body text – that is, any LJ cuts are shown as such; messages are not shown in full. This is what I want.
But again, this is just for paying customers. Also, you just get a blob of HTML and would need to parse that to do things like changing the date format. Furthermore, I wanted to show the number of comments on each entry, to make the page feel more dynamic.
I ended up retrieving most of the data from a RSS feed, and then parsing the things it does not provide – the cut post body text and comment counts – from HTML using Hpricot.
The page is Ruby CGI running on Dreamhost.
The LiveJournal class
This class is sadly under-featured for its name, but it does show recent entries.
It caches stuff and sends an informative user agent to comply with LiveJournal’s rules for bots. Be sure to change that user agent.
The constants in the code should properly be arguments passed into the class, but I couldn’t be bothered.
# Make Time.now do Central European TimeENV['TZ']='CET'%w{rubygems hpricot open-uri rexml/document time}.each{|lib|requirelib}classLiveJournalUSER_AGENT="http://mysite.example.com; me@mysite.example.com"STALE_IN_MINUTES=3CACHE_FILE="lj-cache.txt"definitialize(username)@username=usernameenddefrecent_entries(options={:max=>2})age=ifFile.exist?(CACHE_FILE)then(Time.now-File.mtime(CACHE_FILE))/60elseSTALE_IN_MINUTESendretrieve(options)unlessage<STALE_IN_MINUTESentries=Marshal.load(File.read(CACHE_FILE))ifentries.size<options[:max]retrieve(options)elseentriesendendprivatedefblog()"http://#{@username}.livejournal.com/"enddeffeed()"#{blog}data/rss"end# By the rules in http://www.livejournal.com/bots/.deffriendly_open(url)open(url,{"User-Agent"=>USER_AGENT})enddefretrieve(options)# Get the abbreviated (LJ-cut) bodiesdoc=Hpricot(friendly_open(blog))bodies=doc.search(%{//table[@class="entrybox"]/tr/td/table/tr[2]/td}).map{|entry|entry.inner_html.strip}comment_counts=doc.search(%{//table[@class="entrybox"]//td[@class="comments"][1]}).map{|td|td.inner_text.scan(/\d+/).first.to_i}# Get metadata and create post representationsposts=[]xml=REXML::Document.new(friendly_open(feed).read)xml.root.elements.to_a("channel/item").each_with_indexdo|item,index|breakifindex==bodies.sizeoroptions[:max]==indexposts<<{:link=>item.elements["link"].text,:title=>(item.elements["title"].textrescuenil),:date=>Time.parse(item.elements["pubDate"].text),:body=>bodies[index],:comment_link=>(item.elements["comments"].textrescuenil),:comment_count=>comment_counts[index]}end# Write to cacheFile.open(CACHE_FILE,'w'){|f|f.printMarshal.dump(posts)}postsendend
Different LiveJournal styles have different markup, so you’ll likely have to change the XPath expressions in the code to fit.
Also note that since it pulls the cut body text and the comment count from your LiveJournal front page, it can only provide that info for those entries that are displayed there. If you want more, you’d have to make it handle pagination.
Displaying the data
I use Rails date helpers to show relative days. I suppose requiring ActionController and ActionView is overkill, but hey: