Hi Dave,
I have done a thing similar to this using a combination of MS Word and MS Excel. It's a little wonky, but it saves needing to figure out any heavy coding or new software. The way I did this is sentence by sentence, rather than line by line, but you could probably figure out a way to make this actually line by line if you needed to.
Here are the steps I used:
1. Copy and paste manuscript 1 into a new document in MS Word. I used "Keep Text Only" as the paste setting, since I didn't need to compare formatting.
2. In MS Word, I used Find and Replace to double each paragraph break, then turn each sentence into a separate paragraph. To double the paragraph breaks:
Find: ^p
Replace: ^p^p
Replace all
(I did this part so that I would be able to see where new paragraph actually start in the Excel comparison)
To turn each sentence into a separate paragraph:
Find: . (including the space after the period)
Replace: . ^p
Replace all
If your manuscript has any tabs in it, you will also want to get rid of them at this point, since they will mess up the Excel steps. To do this:
Find: ^t
Replace: (leave completely blank)
Replace all
If your manuscript has any tables in it, you will also want to remove them from this version, since they will mess up the Excel steps.
1. Copy and paste the reformatted manuscript 1 into a new Excel spreadsheet in column A. Each new sentence will be its own separate cell in a single column. There will be a blank cell between each paragraph.
2. Repeat steps 1 and 2 with manuscript 2. Then copy and paste manuscript 2 into the same Excel spreadsheet in column B, next to manuscript 1.
3. In column C, you are going to create a formula that tells you if the text in columns A and B are an exact match. To do this, select row 1 column C, then enter:
=IF(EXACT(A2, B2), "Same", "Different")
After entering this formula in row 1, click the bottom right corner of this cell and drag to the bottom of the two manuscript columns to make all rows compare the two texts. You can now see how manuscript 1 and manuscript 2 compare sentence-by-sentence in column C.
Some other things that can make this easier:
To make a quick visual comparison of which sentences are the same or different, you can also use Conditional Formatting to make column C change color based on whether it says "Same" or "Different." If you'd like some detailed instructions on how to do this, LMK.
If whole sentences have been added or removed, it will make the lines after that not line up. So, you might want to shift some cells up or down to make them line up again.
Hope this helps.
Best wishes,
Nora Weston, MSLS
Access & Reference Services Librarian (Contractor)
NIEHS Library - Your Partner in Research
[log in to unmask]<mailto:[log in to unmask]>
984.287.3603
Pronouns: she/her/hers
-----Original Message-----
From: Code for Libraries [log in to unmask]<mailto:[log in to unmask]> On Behalf Of David Erlandson
Sent: Tuesday, June 28, 2022 8:51 AM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: [EXTERNAL] [CODE4LIB] Variora/Differences in Manuscripts
Hi all,
I have a colleague who is looking to track changes in text of a manuscript that has 4 revisions. Apparently there are pretty major changes to the content and it would be great to identify them.
I was thinking through tools I'm familiar with (generally line by line
comparisons) but that would seem to have the pitfall of an early large revision throwing off the comparison for the rest of the text. Another silly thought was to start up a local wiki instance and overlay each version; use the built in compare tools... Has anyone worked on a project like this? Or are there any tools built and ready to go? Any guidance would be appreciated.
Thanks,
Dave
_________________________________________________________________________
David Erlandson | Metadata Analyst | Rice University - Fondren Library
Email: [log in to unmask]<mailto:[log in to unmask]> | Voice: 713.348.3727 | Fax: 713.348.5862
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.
|