Print

Print


On Jun 28, 2022, at 8:51 AM, David Erlandson <[log in to unmask]> wrote:

> I have a colleague who is looking to track changes in text of a manuscript that has 4 revisions. Apparently there are pretty major changes to the content and it would be great to identify them.
> 
> I was thinking through tools I'm familiar with (generally line by line comparisons) but that would seem to have the pitfall of an early large revision throwing off the comparison for the rest of the text. Another silly thought was to start up a local wiki instance and overlay each version; use the built in compare tools... Has anyone worked on a project like this?  Or are there any tools built and ready to go? Any guidance would be appreciated.


If I understand the question correctly, then I believe you need to do what is sometimes called "collocation", and I used a JavaScript library to accomplish a similar task. The library is called TRAViz [1].

More specifically, I had two sets of files, and each set was a translation the Psalms. One translated in 1610 and the other translated in 1700. [2] I wanted to see how each translation was similar and different. Each file in each set was similarly named. I then wrote a Python script that loops through the translations and outputs an HTML file. [3] The HTML file is highly structured, calls TRAViz, and outputs a visualization illustrating where two translations differed and converged. You can temporarily see the results of these labors online, but be forewarned because TRAViz is doing a lot of work against many paragraphs. Rendering is slow. [4] 

HTH

[1] TRAViz - http://www.traviz.vizcovery.org
[2] Psalms - http://dh.crc.nd.edu/tmp/collocations/psalms/
[3] Python script - http://dh.crc.nd.edu/tmp/collocations/bin/psalms2html.py
[4] results - http://dh.crc.nd.edu/tmp/collocations/html/

--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

574/631-8604
https://cds.library.nd.edu