Just thought I'd pop my head in:
TurnItIn does compare to other previous submissions (both at your own institution and others) unless the submitter chooses not to include them in the repository for future checks.
Electronic Resources Librarian
The Wallace Center
Rochester Institute of Technology
[log in to unmask]
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Mark A. Matienzo
Sent: Friday, January 23, 2015 9:45 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Plagiarism checker
I believe Turnitin and SafeAssign both compare the text of submissions to against external sources (e.g., SafeAssign uses ABI/INFORM, among others).
I am not certain if they compare submissions against each other.
However, if you're looking for something along the lines of what Dre suggests, you could use ssdeep, which is an implementation of a piecewise hashing algorithm . The issue with that you would have to assume that all students would probably be using the same file format.
You could also using something like Tika to extract the text content from all the submissions, and then compare them against each other.
Mark A. Matienzo <[log in to unmask]>
Director of Technology, Digital Public Library of America
On Fri, Jan 23, 2015 at 8:47 AM, Andreas Orphanides <[log in to unmask]>
> My first thought was something like programatically doing a pairwise
> diff of the files, 5500 times. I was surprised I couldn't find a
> utility that just does this.
> But i did find something called diffuse , that allows you to
> graphically compare any number of text files in a diff-like fashion.
> This would probably at least be able to tell you which files need closer scrutiny.
> I think you'd presumably have to be able to extract the text from each
> file; I doubt it would work on raw Word docs or PDFs, so that might be
> a stopper.
> It seems like the realm of source control has a lot of software
> designed to help with this problem, so there might be other similar things out there.
> But probably not anything designed to natively handle print-ready files.
>  http://diffuse.sourceforge.net/about.html
> On Fri, Jan 23, 2015 at 7:26 AM, Judy Meirose <[log in to unmask]> wrote:
> > Can anyone recommend a plagiarism checking software besides Turnitin
> > and SafeAssign? I need to compare about 100 student assignments
> > against each other to make sure they don't copy each other's assignments.
> > Thanks.
> > Judy K. Meirose
> > Systems Librarian
> > Florida Coastal School of Law
> > 8787 Baypine Rd
> > Jacksonville, FL
> > (904)680-7603
> > This email transmission, and any documents, files or previous e-mail
> > messages attached to it, may contain confidential, privileged and/or
> > proprietary information for the sole use of the intended
> > recipient(s). If you are not an intended recipient or a person
> > responsible for delivering
> > to an intended recipient, any disclosure, copying, distribution or
> > use of any of the information contained in or attached to this
> > transmission is strictly prohibited. If you have received this
> > transmission in error,
> > please: (1) immediately notify me by reply e-mail; and (2) destroy
> > the original (and any copies) of this transmission and its
> > attachments
> > reading or saving in any manner.