Detect Similar Text Content
This script uses an embedded copy of the ssdeep tool originally written by Jesse Kornblum (https://ssdeep-project.github.io/ssdeep/index.html) to identify similarities in the textual content of tagged items or those items that are entries selected in the current view. The items in question do not have to be of the same type.
The script was primarily designed to assist with the identification of plagiarized content and/or forged documents.
The user is required to nominate a base output-folder (typically the current case's export folder) into which the script will create a new sub-folder to be used for the current analysis.
The script will extract the transcript data from the target files into a sub-folder of the analysis folder before having ssdeep analyze it using the -d -r -c command-line parameters.
Each transcript file will take the form of a text-file named using the primary-device-GUID, device-GUID, and item-GUID of the source item.
The ssdeep tool will write its results to a CSV-file in the analysis folder. The script will read this file and bookmark the results.
The name of the analysis and bookmark folders will be 'Text Content Analysis' followed by a timestamp representing when the analysis was performed.
Use of tags is advised to help the examiner distinguish between original files and those being analyzed for similar content - ssdeep has no way of knowing which is which.
The script will extract a copy of the ssdeep executable into the same folder when first run. If this fails, the examiner can place a copy of the executable into that folder manually.
Feedback will be provided via the console.
For additional information, please see the following Twitter post:
This script was developed for use in EnCase training. For more details, please click the following link:Download Now