Search billions of records on Ancestry.com
   
Brett Payne's Genealogy Pages
by Brett Payne, of Tauranga, New Zealand

TAB2HTML

A simple method for transforming tabular or columnar data into web page format

by
Brett Payne,  152 Gluepot Road, Oropi  R.D.3, Tauranga 3021, New Zealand
bpayne@xtra.co.nz

Summary

A quick, simple method is presented for converting tables into text files, aligned in columns, using commonly available word processing and spreadsheet software.  The text files may then be employed in a variety of formats, including import into web pages as HTML.  The procedure employs some of the various file formats used and recognised by Microsoft Word and Excel software.  It provides an easy way to sort alphanumeric and numeric data in spreadsheet format, and then display the output in an easily viewable text format.  Inserting this text between <PRE> tags in HTML reduces the file size by 80% over that produced by the more commonly used <TAB> format.  This results not only in a more efficient use of web space, but also faster download times for users.

Introduction

Tabular data saved as tab-delimited text files can be viewed directly by most browsers.  However, the column widths are predetermined, and the original alignment of the text is therefore usually destroyed.

There are several proprietary and shareware programmes which will convert tabular data into web pages.  Most of these employ the standard <TAB> format used by HTML for tables.  If the file is viewed in the form of a spreadsheet, using a programme such as Microsoft Excel, the same effect can be produced by using the /Save As HTML/ feature.  However, HTML tables have some disadvantages.  While it is possible to manipulate various parameters to encourage or prevent text wrap-around, column widths can be more difficult to control.  In addition, the use of standard tables in HTML is accompanied by substantial increases in file-size.

This article presents a simple solution for transforming a tab-delimited text file to an ordinary text file that contains the right number of spaces to replace each tab symbol while maintaining column alignment.  There are two alternative methods that can be used, either directly from the tab-delimited text file to ordinary text using Microsoft Word, or via a spreadsheet programme, such as Microsoft Excel.  The latter option is often useful, as it will permit sorting and other manipulation of the data prior to output.


Method 1 - from MS Word

MS Word is first used to view the tab-delimited text file.  It is necessary to ensure that the tabs are arranged at suitable intervals.  In other words, care should be taken that none of the text overflows the formatted column widths.  With wider tables it may be necessary to change the page width under the /Page Setup/ option accordingly.

The file is saved as /Text with Layout/.  Using MS Word 97 or 2000 the options for file type appear as pull down options at the bottom of the /File/Save As/ dialog box.  If you use a different version of MS Word, or indeed one of the other popular word processing packages, then it shouldn’t be too difficult to find a similar option which works in the same manner.
Once saved, this will produce a file with the suffix *.ans, which is then viewed using a simple text editor, such as MS Notebook or MS Wordpad.  It is not advisable to use MS Word, as any inadvertent formatting will introduce additional, unwanted, characters in the text.  The column layout is preserved, with the relevant number of spaces inserted for the tabs to maintain the vertical alignment of text.  The suffix of the file may be edited to *.txt or *.html to suit requirements, or the block of text can be highlighted and pasted, using CTRL-C and CTRL-V, into an HTML template.


Method 2 - via MS Excel

Data stored in spreadsheet-format, such as MS Excel, can be transformed just as easily.  The spreadsheet is saved as a space-delimited text (or print) file, which has a *.prn suffix.  Using MS Excel, choose /File/Save As/ and select the /Formatted Text (Space Delimited)/ option from the file type dialog box.  To make sure that none of the entries in the columns are truncated, it is a good idea to make the width of the columns slightly larger than the widest entry that occurs in that column.  A little experimentation will enable the output to be tailored to suit individual requirements.  The resulting file is then manipulated in exactly the same way as was done with the *.ans file in Method One.


MS Word=>MS Excel=>TXT

An MS Word document containing columns separated by tabs, or a tab-delimited text file, may also be converted into spreadsheet format, for the purpose of manipulation or sorting of the data contained.  The file is first saved from within MS Word as a *.txt file, which is then imported into MS Excel.  From within the latter programme, click /File/Open/ and select /File Type/Text/ from the first dialog box.  The next dialog box provides a prompt for the selection of either /Delimited/ or /Fixed Width/.  Choose /Delimited/, if it's not already selected, and click the /Next/ button. The next dialog box should show the TAB delimiter already ticked - if not, then tick it, making sure you de-select all of the other options, click /Next/ and then /Finish/.  The tab-delimited text file may then be imported into MS Excel as a spreadsheet.  Column widths and other spreadsheet formatting parameters can then be changed to suit individual requirements.  The data may also be sorted or otherwise manipulated prior to saving it, as an interim measure, in spreadsheet format.  Method Two described above can then be used to convert the file to text.


TXT=>HTML

The intricacies of HTML and web page creation are beyond the scope of this article, but it is important to note the significant advantages gained by using the <PRE> tags in HTML.  When the block of text created using one of the abovementioned methods is inserted within the HTML document, using a simple text editor, it must be placed within the tags <PRE> and </PRE>.  This ensures that all characters, including spaces, are fixed-width, thus maintaining the column integrity.  It is also far more efficient than the space-wasting “&amp” used by HTML editors for spaces.  Formatting of text, with the insertion of links and the use of other HTML tags, can still be done within the block.  One should take care, however, not to inadvertently delete or insert any spaces.  While the editing is being done it is often useful to preview the changes in another window using a browser such as MS Internet Explorer or Netscape Navigator.  Each time changes are saved in the text editor, the effects may be checked by switching to the browser window and pressing the /Refresh/ button.


Conclusion

The latest word processing and spreadsheet programmes such as Microsoft Word and Excel include options and “wizards” which will convert tabular data into web page format with a minimum of fuss.  There are, however, two major drawbacks with using these methods to create HTML tables with large amounts of numeric and/or alphanumeric data.  An enormous quantity of often complex HTML code is produced, increasing the size of files dramatically.  In addition, the format of the tables is not particularly easy to customise without the use of web page editors such as MS FrontPage.

The methods described above employ commonly available text editing, word processing, and spreadsheet programmes.  The procedure is carried out using several simple steps, each of which can be customised according to required needs and output.  The resulting format maintains the alignment of the columns of text, while limiting the size of the file.  The following examples of file size corresponding to each step of the process have been generated using a small, simple test file.  (The links may be clicked to view the appropriate test files.)
 

Tab-delimited Text File
Microsoft Excel Spreadsheet
Web Page created using MS Excel /Save as HTML/ facility
Microsoft Word Document
Web Page created using MS Word /Save as HTML/ facility
Aligned Text File
Web Page created using TAB2HTML method
*.txt
*.xls
*.html
*.doc
*.html
*.txt/*.ans/*.prn
*.html
0.7 kb
18.0 kb
8.7 kb
19.5 kb
12.4 kb
1.2 kb
1.5 kb

All six of the above files contain exactly the same original data, merely displayed in different formats.  The web page produced using the MS Word /Save as HTML/ facility failed because the columns of text were no longer aligned.  Modern internet browsers generally include a facility to view *.txt (and *.xls or *.doc) files.  However, as clicking on the link to the Tab-delimited Text File above demonstrated, the results are sometimes less than satisfactory.  Using the TAB2HTML technique produces an HTML file twice the size of the original tab-delimited text file, but results in a space-saving of more than 80% over that generated with the MS Excel /Save as HTML/ method.

In other words, if tabular data is presented on the web using the methods described, nearly six times as much data can be fitted in the same space as that taken up by HTML files produced by the standard facility in MS Excel.  If the same data is stored as an ordinary MS Word or Excel file, it can use twice as much space again.

This has significant implications for those who transcribe and index genealogical data such as parish registers, census data or cemetery records, and wish to publish this information on the internet.  After entry into a simple text or spreadsheet file using "off the shelf" software, the data can be readily compiled, manipulated, sorted or edited in a variety of ways before transfer to an HTML file, without the need for great expertise or expensive database programmes.  The data may also be shared between users without difficulty because only widely recognised file formats are employed.

This article first appeared in the Society of Indexers Genealogical Group (SIGG) Newsletter No. 33, Summer 2003, ISSN 1364-2901.

GenDesk offers a new research tool for genealogists - the payment-based QueryBoard.  Benefit from the wide knowledge base and experience of a growing team of international researchers.  GenDesk also hosts a free genealogy chat room, where you can share your experiences or get research hints from experienced and friendly amateur researchers. 

Return to Brett Payne's South Derbyshire Genealogy Pages
© 2003-2004 Brett Payne All Rights Reserved