Blogger, Html scraping, JQuery and YQL

For a recent project I was working on I wanted to display some sports results on my local football club's website. The association publishes the  results weekly on its website, we also wanted to post them on our website. It would be a waste of time to type them out each week so what can we do?

First solution I thought of was HTML data scrapping the results and redisplaying them, better than using an iframe as the results were not in a format that would suit our site. It would be fairly straight forward using ASP.NET on the server but this website is a Blogger site with limited server side functionality.

Earlier I had come across Yahoo Query Language or YQL, I thought it may be possible to use YQL with jQuery on Blogger to achieve the HTML data  scrapping.

Yahoo Query Language Developer Network explains YQL or the Yahoo! Query Language is an expressive SQL-like language that lets you query, filter, and join data across Web services. Yahoo also provide a web based console that allows you build and test your YQL statements. I was pleasantly surprised at how easy it was to use. There is plenty of documentation and examples to get your on track.



Using YQL you can retrieve a HTML page from a website extract a section of the HTML using an XPath query and supply the result as either JSON or XML. I started with one of the examples in the documentation.

select * from html where url="http://www.example.com/results.html" and xpath="//table[@class=\'results\']"

I selected the JSON format as I wanted to use the resulting data with jQuery and write out the bits I wanted. In the YQL console you can test and view the results of your query. From the console you can copy the YQL url, I have replaced the domain in the example.

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fwww.example.com%2Fresults.html%22%20and%20xpath%3D%22%2F%2Ftable%5B%40class%3D%5C'results%5C'%5D%22&format=json&callback=?

The url returns a page containing JSON that represents the table of results  was selecting in my query. I can use jQuery to obtain the JSON and write to my page.

div class="results"
div> 
script type="text/javascript"
function renderResults(data){ 
    var $table=$("table width='80%'/>"); 
    var table; 
    if(data.query.results.table.length){ 
        table=data.query.results.table[0]; 
    } 
    else if(data.query.results.table){ 
        table=data.query.results.table; 
    } 
    else{ 
        return; 
    } 
     
    for (i=0;itable.tr.length;i++) 
    { 
        var tr=table.tr[i]; 
        var $tr=$("tr class='ResultRow'/>"); 
        if(tr.th){ 
            for (j=0;jtr.th.length;j++){ 
                if(tr.th[j].p=="TEAM"){ 
                $tr.append("th class='ResultCell' width='60%'>"+tr.th[j].p+"th>"); 
                } 
                else{ 
                $tr.append("th class='ResultCell'>"+tr.th[j].p+"th>"); 
                } 
            } 
        } 
        else{ 
            for (j=0;jtr.td.length;j++){ 
                if(tr.td[j].p){ 
                    $tr.append("td class='ResultCell'>"+tr.td[j].p+"td>"); 
                } 
                else if(tr.td[j].a){ 
                    $tr.append("td class='ResultCell'>"+tr.td[j].a.content+"td>"); 
                } 
                else{ 
                    $tr.append("td class='ResultCell'/>") 
                } 
            } 
        } 
        $tr.appendTo($table); 
    } 
    $table.appendTo('.results'); 
var url = "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fwww.example.com%2Fresults.html%22%20and%20xpath%3D%22%2F%2Ftable%5B%40class%3D%5C'results%5C'%5D%22&format=json&callback=?"
$.getJSON(url,renderResults); 
script> 

The result produces a nice table of results which is up to date without any additional typing. This example is really only the tip of the ice berg as they say, I look forward to finding out about some of the more advance functionality.






Comments
Sunday May 16 2010 04:58 p.m.
http://sca-ap1.sca.ae/scamw/en/main.jsp Can you let me know how to use your technique for the above example and how do i use the data in my blog.