I believe Turnitin and SafeAssign both compare the text of submissions to
against external sources (e.g., SafeAssign uses ABI/INFORM, among others).
I am not certain if they compare submissions against each other.
However, if you're looking for something along the lines of what Dre
suggests, you could use ssdeep, which is an implementation of a piecewise
hashing algorithm . The issue with that you would have to assume that
all students would probably be using the same file format.
You could also using something like Tika to extract the text content from
all the submissions, and then compare them against each other.
Mark A. Matienzo <[log in to unmask]>
Director of Technology, Digital Public Library of America
On Fri, Jan 23, 2015 at 8:47 AM, Andreas Orphanides <[log in to unmask]>
> My first thought was something like programatically doing a pairwise diff
> of the files, 5500 times. I was surprised I couldn't find a utility that
> just does this.
> But i did find something called diffuse , that allows you to graphically
> compare any number of text files in a diff-like fashion. This would
> probably at least be able to tell you which files need closer scrutiny.
> I think you'd presumably have to be able to extract the text from each
> file; I doubt it would work on raw Word docs or PDFs, so that might be a
> It seems like the realm of source control has a lot of software designed to
> help with this problem, so there might be other similar things out there.
> But probably not anything designed to natively handle print-ready files.
>  http://diffuse.sourceforge.net/about.html
> On Fri, Jan 23, 2015 at 7:26 AM, Judy Meirose <[log in to unmask]> wrote:
> > Can anyone recommend a plagiarism checking software besides Turnitin and
> > SafeAssign? I need to compare about 100 student assignments against each
> > other to make sure they don't copy each other's assignments.
> > Thanks.
> > Judy K. Meirose
> > Systems Librarian
> > Florida Coastal School of Law
> > 8787 Baypine Rd
> > Jacksonville, FL
> > (904)680-7603
> > This email transmission, and any documents, files or previous e-mail
> > messages attached to it, may contain confidential, privileged and/or
> > proprietary information for the sole use of the intended recipient(s). If
> > you are not an intended recipient or a person responsible for delivering
> > to an intended recipient, any disclosure, copying, distribution or use of
> > any of the information contained in or attached to this transmission is
> > strictly prohibited. If you have received this transmission in error,
> > please: (1) immediately notify me by reply e-mail; and (2) destroy the
> > original (and any copies) of this transmission and its attachments
> > reading or saving in any manner.