Result of test of converters between Microsoft Word and HTML

Last update 29 Sep 1996 by Jacob Palme <jpalme@dsv.su.se>.

Table of contents

Abstract

Software tested under MacOS: RTFtoHTML version 3.0.1, Microsoft Internet Assistant converters version 2.0, MacLinkPlus version 8.1.

For conversion from Word to HTML, Microsoft Internet Assistant was more automatic and more manipulative, but RTFtoHTML was better at handling table of content and footnotes and if you prefer a less manipulative conversion tool.

For conversion from HTML to Word Microsoft Internet Assistant was better if you want to produce a HTML document and alternately handle it in Word and HTML formats, while DataViz MacLinkPlus was better if you want to make a one-time conversion of an arbitrary HTML document into a Word document.


Back to table of contents

Summary

Introduction

This document reports tests of tools for conversion between Microsoft Word-formatted documents and HTML documents on a Macintosh. None of the tools were perfect, there will surely be better versions of both products in the future.

From Word to HTML

For conversions from Word to HTML, two tools were tested: RTFtoHTML version 3.0.1 (RTH) and Microsoft Internet Assistant version 2.0 (MIA).

General comment: MIA uses more advanced HTML features in sometimes rather funny ways, which sometimes might give more likeness between the display of the printed Word document and the HTML document, but which may be a problem if you want to manually improve the HTML document afterwards. MIA uses for example FONT SIZE attributes to try to get the same font size in HTML as in the Word document, and FACE attributes to get the same font in HTML as in the Word document.

Advantages with RTH as compared to MIA: Automatic generation of table of contents, Good handling of footnotes, better handling of manual lists, faster execution time.

Advantages with MIA as compared to RTH: More automatic translation of images, MIA had a bug in its handling of centered text, better handling of word-6-style automatic lists.

Note: RTH also supports conversion to HTML from other word processor formats than Microsoft Word.


Back to table of contents

Abstract 2: From HTML to Word

For conversions from HTML to Word, two tools were tested: MacLinkPlus translators from DataViz version 8.10 (MLP) and Microsoft Internet Assistant version 2.0 (MIA).

General comment: None of the tools were perfect. All of them require some manual after-editing of the Word document to give it an acceptable format. For example, HTML uses automatic table autoformat, while Word has semiautomatic autofit command which also does not work as well as the built-in autoformat in Web browsers. Because of this, tables have to be autoformatted when they are translated from HTML to Word, and none of the tested translators did this in a good way.

Advantages with MLP: Better handling of headings, fonts, bold text, forms, etc.

Advantages with MIA: A little better handling of tables, but not perfect.

MLP was clearly much better than MIA for editing of arbitrary HTML documents.


Back to table of contents

Use as an HTML document editor

MIA, however, has a different goal than MLP. The goal of MLP is to translate HTML documents into Word documents, which, when printed, will look as much as possible like the original HTML document. The goal of MIA is to transform Word into a HTML editor, i.e. to allow you to use Word to produce HTML documents, which you can print both as Word and HTML documents and where you can edit the HTML and save it again. Thus, MIA translates for example form elements into elements editable with Microsoft Internet Assistant into modified form elements, while MLP translates form elements into graphics depicting the form element in printable format.

Thus, if you start editing a HTML document using MIA and the HTML template, then you get documents which you can edit and print with Word and save as HTML.


Back to table of contents

Conclusions

None of the tools are yet good enough to allow a document to be automatically moved back and forward between Word and HTML. All of them can be used with more or less manual after-editing of the results. For conversion from Word to HTML, none of the tested translators are best. Microsoft Internet Assistant is somewhat more automatic, but a big advantage with RTFtoHTML is its automatic creation of table of contents and intelligent handling of footnotes. For conversion from HTML to Word, MacLinkPlus was clearly better than Microsoft Internet Assistant. Microsoft Internet Assistant was, however, better for conversion of HTML to Word documents which you can edit in Word and then save again as revised HTML documents.


Back to table of contents

Introduction

Restrictions

All tests were performed on a Macintosh. They may not apply to versions on other platforms.

Software tested:

RTFtoHTML version 3.0.1

Microsoft Internet Assistant converters version 2.0.

MacLinkPlus version 8.1

Testing tools:

Netscape 3.0

A kinder, Gentler Validator including weblint at URL http://ugweb.cs.ualberta.ca/~gerald/validate/

Manual inspection of generated HTML code

Scoring table:

5 Very good

4 Good

3 Acceptable

2 Questionable

1 Bad


Back to table of contents

Comparison of translation programs from Word to HTML

Function

Microsoft Word Internet Assistant 8.1
RTFtoHTML 3.0.1

Report
score
Report
score
Price
Free
5
Shareware US $ 29
5
Swedish national characters
OK
5
OK
5
Creates <!DOCTYPE element
No
3
No
3
Must save in RTF format before conversion
No
5
Yes
4
Graphics files generated
Yes, in GIF format
5
Yes, in PICT format , must be manually converted to GIF
4
Handling of headings
Heading 1-3 correct, Headings 4-6 simulated
4
All headers correct
5
Preformatted text
Sets FONT SIZE=2 to allow longer lines
5
Correct
4
Centered header
Correct
5
No, instead text which was not to be centered became centered
1
Automatic generation of table of contents
No
3
Yes
5
Blockquote
Sets FONT SIZE=2
3
OK
5
Horizontal ruler
OK
5
OK
5
Microsoft Word Frames
Not handled
3
Not handled
3
Tables with and without borders
OK, uses WIDTH to get neater printout
5
OK
4
Manual numbered lists
No
4
Yes
5
Word 6 type automatic numbered lists
Yes
5
No
3
Manual bullet lists
No
4
Yes
5
Menu, Glossary, Directory styles
Yes
5
Yes
5
HTML syntax validation
Uses many non-standard but widely supported tags. Should not be any problem.
4
Uses some non-standard but widely supported tags. Funny mixing of tags in some cases (example: </b><table></b>). Probably no big problem in real usage.
3

Back to table of contents

Comparison of translation programs from HTML to Word

Note: The comparison below was based on HTML documents which were not originally produced using Microsoft Word Internet Assistant. Microsoft Word Internet Assistant was much better at translating back to Word HTML documents it had originally produced itself using the Word HTML template.

Function

Microsoft Word Internet Assistant 8.1
MacLinkPlus 8.1

Report
score
Report
score
Price
Free
5
US $ 69.99

Headings
Incorrect
2
Good
5
Horizontal ruler
No
2
Good
5
Pictures
Good
5
Works for pictures in URL-s to be retrieved from the net, not for relative file URLs
4
Tables
Minor imperfections
3
Minor imperfections
3
Table borders of different thickness
Too much
3
Sometimes
3
HTML forms
Incomplete rendering
2
Good
5
Preformatted text
Incorrectly coded as Times font
1
Correctly coded as courier font
5
Merged cells in tables
No
2
No
2
Borders in tables
Yes
5
Sometimes
3
Centered text in tables
Yes
5
No
3
Bold text (/b)
No
2
Yes
5


Back to table of contents

Personal grumblings

RTFtoHTML has problems with long file names.

Microsoft Word Internet Assistant has problems if you have redefined the names of the Microsoft Word Menus (which I have done, in order to get place for a Font menu in order to get PopChar to work with Word).


Back to table of contents

Test files

Most of the test files I used can be found in BINHEX format at URL http://dsv.su.se/jpalme/reports/HTML-Word-translation.hqx.
Back to table of contents

This document was converted from Word to HTML using RTFtoHTML version 3.0.1, plus some manual improvements of the generated HTML markup.