Converting HTML Table to CSV

I recently had a table of information on an HTML page, and I wanted to load that into excel. Sometimes, it is possible to simply copy and paste the information to Excel, but in my case the information was color coded rather then entered as text. I needed to convert values encoded in the <table>’s <td> attributes into appropriate data values.

HTML Page

Here’s a screenshot of the page I was working with. Each color value represents a different data point. Green is good and a cell with a red border   has additional meaning.

Html Data

Here’s a snapshot of the HTML data for the page. The route node I’m interested in is the studentTable table entry.

The Code

I used the Html Agility Pack to do the heavy lifting of parsing the HTML. The Code is a one-time use tool so it’s a little rough on the edges. The core of the parsing logic can be seen here:


    var doc = new HtmlDocument();
    doc.Load(fileName);
    foreach (HtmlNode tr in doc.DocumentNode.SelectNodes("//table[@id='studentTable']").Descendants("tr"))
    {
         foreach(var node in tr.ChildNodes)
         {
            if (node.Name != "td")
                continue;

            string nextLine = node.InnerText;
            ...
         }
    }

Summary

That’s it! Not crazy exciting, but a fun project nonetheless.

The code can be found here: https://github.com/marksl/html-table-to-csv-converter

Advertisements
This entry was posted in C# and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s