Comparing Two Versions of a Word Document

Has this ever happened to you? You prepare a document in Microsoft Word and send it out for review, but when you get back the reviewed copies you find that one of the reviewers made changes without change-tracking turned on. So, how do you find out what changes were made? It turns out, there’s three ways to tackle this. (Another scenario is that you, yourself, create multiple versions of the same document over time, but then forget which is which. I can’t tell you how many times I’ve done that.)

The first method is to use Microsoft Word itself. It has a feature that allows you to compare two versions of the same document, and merge them into a third version. Whatever differences it finds between the first two become tracked changes in the third. At that point, you can review the changes to accept or reject them, just as if the reviewer had turned on track changes in the first place. Now, I must admit that I don’t personally use this feature very often. But, as I recall, it can be tricky to make sure that Word compares the files in the correct order. If you get it backwards, then additions show up as deletions and vice versa. (Note: My experience is based on Office 2000. The process may have improved since then.) In Office 2000, the instructions for doing this are under the heading of “Compare two copies of a document”. Paste that phrase into the “What would you like to do?” box in the help system, and it should come right up.

The second method is to use an external compare tool, such as WinMerge. (See our previous tips about downloading, installing and using WinMerge). WinMerge, like most third-party compare tools, can only work with ASCII files. So, before comparing to Word documents they both have to be converted to ASCII:

  1. Open the first Word document (in Word).
  2. Select File | Save-As (F12).
  3. Change the Save as Type to “Text Only with Line Breaks (*.txt)”Word Save-As Text
  4. Make sure that the filename is now different from the original. (This usually happens automatically, changing the extension from .doc to .txt)
  5. Click OK
  6. When Word warns you that you’ll be losing the formatting, click OK.
  7. Repeat the above with a second file.
  8. Now that you have ASCII text or versions of the two were documents, you can compare those with WinMerge, or another compare tool.

Yeah, yeah, I know! I hear the cynics in the audience groaning. But, bear with me for a minute.

What I just described is actually more work than the first method. Also, this method only works if all you are interested in finding are the differences in content, since the formatting will be completely lost, which is another downside. I readily admit that none of the methods described here are better than having change tracking turned on in the first place.

One situation where this second method turns out to be superior to the first is when a document is sent out to many different reviewers, each of whom only make a small number of changes. By converting each of the revised documents to ASCII, it becomes trivial to compare them against each other, mixing and matching. For example if two of the reviewers make virtually the same change, you could compare those two versions directly against each other in order to determine which version you like better. The downside is that accepting changes is no longer a matter of just clicking the Accept button in change-track-review-mode. You’ll have to copy and paste the changes back into the original Word document yourself. But again, in the case of there only being a handful of small changes scattered around, that may not be a big deal.

Tip #1: In the scenario described above, where there are multiple review copies to be assimilated, you could take advantage of WinMerge’s ability to quickly copy differences from one ASCII file to another. In that way, you’d create one “master” ASCII file that combines all of the changes from all the review copies. Having all the changes together in one place in one ASCII file might make it easier to then copy and paste the changes back into the Word document.

I promised you a third method. What if, in the second method, we could reduce the first six steps to a single step? Well, this is easily done if you have CygWin installed on your machine. (See our previous tips about installing and running CygWin.) CygWin comes with a variation of the UNIX CAT command, called CATDOC. (Cat is short for concatenate.) The regular CAT command takes the contents of one file and sends it to another, or sends it to the console, or whatever. (Invoking it consecutively 2 or more times is what makes the concatenation happen.) So, as you might imagine, CATDOC does the same thing, the difference being that whenever it sees that a source file is a Microsoft Word document, it automatically filters out all of the formatting, so that only plain ASCII is sent to the output.

catdoc review1.doc > review1.txt
catdoc -xw review1.doc > review1.txt

CATDOC has a number of optional commandline switches. The -w switch is the most interesting. That one determines whether or not paragraphs are broken into separate lines (i.e. word wrapped). The default is to wrap. Adding -w makes them unwrapped (each paragraph is one long line). In other words, specifying -w is the equivalent of using Save-As in Word with a file type of “Text Only”, and leaving out the -w is the equivalent of using Save-As in Word with a file type of “Text Only with Line Breaks”.

The -x switch tells CATDOC that whenever it encounters an unknown character it should render it using “\xNNN” notation. Otherwise, it will replace the unknown character with just a question mark (?).

Tip #2: The output from CATDOC (no -w) and the output from doing a Save-As in Word using “Text Only with Line Breaks” are close, but not identical. For one thing, they break at different line lengths. For another, Word will render paragraph numbers and bullets, while CATDOC does not. So, if you will be mixing and matching, then it would be better to be sure to specify -w when using CATDOC, and to specify “Text Only” when using Word Save-As, so, at least, the word-wrapping won’t be an issue.

Tip #3: Actually, this is just the “coming attractions” for another tip article. There is a way to automatically invoke CATDOC against all of the Word files in a given folder — to convert them all in one fell swoop — but that’ll be the subject of an entirely different article. So, be sure to set your aggregator to point to the CodeJacked RSS feed (http://www.codejacked.com/feed/), and keep your eyes peeled.

Trackbacks & Pings

Comments

  1. Sweet tip! I’ve used compare in Word and WinMerge has saved the day numerous times, but I’ve never gotten into CygWin despite reading about it many times.

    WinMerge is invaluable for tracking down small differences in any kind of document. HTML files, registry files, logs, etc.

  2. Beyond Compare is what I use (with plain text) and it does a wonderful job.

  3. i think it can be done with vim ..in windows also

  4. You could also try using Workshare’s DeltaView product or DeltaView PE.
    http://www.workshare.com/products/wsdeltaview/workshare_deltaview_pe.aspx. There’s a trial version available for download.

  5. I use a difference tool for textfiles, but I really don’t see a lot of use for word documents, where half of what I’m checking in the tracked changes is formatting changes.

    It’s really easy to remember how to use the compare documents feature in MS Word, and it seems a little condescending to write it off so quickly in method #1 above.

    1) Open the original document
    2) Do the “compare documents” and choose the one that has been changed.
    3) The resulting document shows the changes FROM the original TO the changed copy. Save this off as XX.trackchanges.doc and you’ll still have both your original and the one that was edited by someone else. Then review and accept/reject changes as you need.

  6. You forgot the fourth option:

    1. Print out both documents
    2. Arrange the printed pages so that page 1 from doc 1 is on top of page 1 from doc 2, etc.
    3. Hold the paid of pages up to the window
    4. Look for differences.

    Hey, this was the only option in the old days ;)

  7. Hi,
    I have a word doc that has been sent as an attachment, a figure in the doc has been changed. How do I delayer the document to reveal the original figure.

    Thanks guys

  8. I need clarification. What do you mean by “delayer?” Do you mean that change-tracking is on and you want to see what changed? If so, you could try doing a change review, reject that particular change, but then immediately close the document without saving changes (i.e. the rejection).

  9. Hi Craig,
    I understand that whenever a document is modified the earlier version remains hidden under the new text, hence each time a document is modified the file size gets slightly larger. I also understand that it is possible to strip these layers back to see the earlier versions, which is presumably why most organisations send out documents in .rtf or .pdf and not .doc format.
    Thanks and regards
    Tony

  10. 5th way: tell the reviewing to turn on tracking and then make his changes again.

    Its the only way they’ll learn!

  11. CompareIt is great tool to compare ms word and excel tools.

  12. ‘Diff Doc’ from Softinterface can compare any file type to any file type. Furthermore, it can compare any textual content from any two places you can copy and paste from.

    Check it out at:

    http://www.softinterface.com/MD/Document-Comparison-Software.htm

  13. Boy I really wish you would expand on method #1 because I can’t solve the “word is comparing in the wrong order” problem with Word 2003. I have 2 documents, A and B. B is mostly made up of A with a bunch of words deleted.

    When I use the Tools->”Compare and Merge Documents” option, it ALWAYS shows me the differences from the point of view of document A (that is to say, all the edited words are shown as additions, not strikethroughs). I have loaded B and then compared to A, and loaded A and compared to B and the results are precisely the same.

    I suspect this is a bug, or a very poor design choice by MS, but I would like to know for sure if there is a way to make this work correctly.

    The answer does not seem to be in the doc or on the internet, and “upgrade to 2007″ is not an option.

  14. Thanks for the tip! Method 1 worked for me. In MS Word I searched Help for “compare documents.” I was comparing 2 contracts that had similar boilerplates, and it was like a third document was produced by MS Word with Track Changes turned on. A good solution for me because I’m on a Mac. Thanks also for mentioning that it’s important which document to open/start with first.

  15. You are excellent! Bravo for all this!
    You are concise, easy to understand and very very practical.
    Thanks for all the tips…
    For me Method 2 was helpful! I have a mountain of information of the past 8 years of research I did store on CD and DVD’s … and because I am lost in old version, new, newer, newest and neweeeeeeeeeest versions of my thesis, presentations and so on.. this method helped me a lot.
    Thanks again and good luck to all of you.

  16. I am medical transcription job in India for American based Transcription Companies.

    I want to compare my two documents that is:-
    1. What I typed for the dictators. (1 document)
    2. What doctors have corrected. (2 document)

    I want to compare them. I would like to have correction or deletion or additions in my (1) document. Please help me.

  17. Thanks so much for the tip!

  18. You could also use OpenOffice to compare two MS word document:
    1. open one document (has to have the doc document, otherwise seems that OO does not allow you to compare)
    2. go under Edit -> Compare Document and select the other document file
    and now use the “Accept and Reject Changes” navigator.

    This way you can see both the text and the formatting.

    florin

  19. If you want to compare 2 docx, programmatically, from Java, you can do this with docx4j (available under ASLv2).

    See the sample at http://dev.plutext.org/trac/docx4j/browser/trunk/docx4j/src/main/java/org/docx4j/samples/CompareDocuments.java

    .. Jason

  20. I am searching for a software which can compare two word documents and give me the number of errors..but if it can calculates accuracy then nothing like it …Please let me know regarding this..thank you..

  21. There is a new webservice to do this http://www.comparemydocs.com

    If I understand it correct, it can compare multiple versions, not just two, and lets you merge and continue to work with them.

  22. You might want to have a look to tools such as ECMerge which contain a DOC to Text converter built-in, you don’t have to do any manipulation, just open the two documents for comparison and it will compare your text!

  23. You say “WinMerge, like most third-party compare tools, can only work with ASCII files”. This is not the case (at least not now). WinMerge is distributed with plugins for several file formats, including MS Word. There is also a plugin for Word 2007 format called xdocdiff.

  24. I agree it works. But when I am comparing two docuements those were having complex formatting like tables, it is showing wrong result.
    Is there any mechanism to improve compare feature of word.
    I am using Word 2003 with service pack 3.

    Please advise

  25. With Word 2007 there is an option called “Compare”

    Review > Compare > Compare 2 versions of one document (legal blackline)

    you choose the original and new versions, and you get a summary of the changes.

  26. I read then understand changes do get compared, word now so good, then knowledge fulfill, compare like any?

  27. Just wanted to let you know that while scrolling your website the diagonal lines made it nauseating.

  28. I am genuinely delighted to read this blog posts which includes
    lots of helpful information, thanks for providing such data.

  29. Thanks so much for the tip!
    aqtsoft.com

Post a Comment


Your email is never published nor shared. Required fields are marked *



© 2006-2007 Maxim Software Corp.  All rights reserved.