OK, OK - Microsoft Word still takes the cake - for one page of HTML we had to convert from a word document at work, we got a 300k file. 300k. No graphics.
if you view source on a word-created html file, you will see the most insane amount of garbage code ever. i'm sorry, but xml is just not needed to display text.