About a year ago I began playing around with MD5/SHA1 digests to check if files were being copied correctly between drives. Over many 10's of thousands of files I found this wasn't the case and there were always a few files that Windows said were ok, but which generated a different digest. For those unaware a digest is a signature string for a file. If even one bit is different in a copy the digest will not match.
I have developed two
Windows Powershell scripts which I'm happy to share here on the proviso they are unsupported and you use at your own risk. I don't have time to publicly maintain them, nor even to document them very well, but as they are working for me I think others might like to know about them.
The basic process I use is:
1. Work with files on master copy drive.
2. Generate SHA1 digest for all files on a directory-by-directory basis. This results in a single checksum file per directory.
3. Sync from the master copy drive to a backup copy drive using Syncback
4. Compare the SHA1 digest for all files on the backup copy drive.
5. Recopy and repeat step 4 as necessary.
The script's use the
command line version of ExactFile (free download) to generate the digests. Modify the script to point to it's install location. You will need to give your PC permission for the script to run under PowerShell using the
Set-ExecutionPolicy RemoteSigned command. I figure if you care enough to run these scripts you'll be able to work it out.
Note: By their very nature they will create (and sometimes) files on your drives - though only those relevant to the process.
In Windows, right-click and choose "Run with Powershell" or within Powershell using the . command as in ./[path]/hashcheck.ps1
Both scripts are smart and do their best to only recreate the checksum file if a directory is changed. Run hashcreate.ps1 to create the files and hash check.ps1 to check them. This second script creates two files in the directory you specify for checking. They are checksums-ok.txt and checksums-fail.txt. Across 1TB you don't want to be checking directories that have already passed muster. To re-check simply delete the checksums-ok.txt file. In checksums-fail.txt you'll find a list of all failed files. Once corrected re-run the check script and the whole directory will be rechecked.
## Creation script- Copy this code into a text editor and save as hashcreate.ps1
- You can specify multiple directories when prompted by separating each with a space. Checksums will be created for the directory you specify and all below it
- If run from the Powershell command line the -clean parameter will force a delete of all existing checksum.exf files
THIS CODE IS PROVIDED WITHOUT WARRANTY. USE AT YOUR OWN RISK.param ([switch]$clean)
function runExf($path)
{
$timeStart = Get-Date
$contents = Get-ChildItem -LiteralPath $path | where {!$_.PSIsContainer}
if( ($contents | Measure-Object).count -gt 0 ) {
[double]$sizeMB = ($contents | Measure-Object -property length -sum).sum / 1MB
$s = "; Hashing {0:N0} files totalling {1:N3} MB" -f ($contents | Measure-Object).count, $sizeMB
Write-Host $s
}
write-host "; @ $timestart"
$exfOutput = "checksums.exf"
## Adjust the next lines to point to your install of exf.exe
$exe = "path to\exf.exe"
if ((test-path($exe)) -eq $False) {
Write-Host -f white -b red "ExactFile cannot be found at $exe. Please check script."
exit
}
$catchit = &$exe -sha1 -otf ("$exfOutput") -d "$path" *.*
$catchit
if( $catchit -match "; [0-9]+ files sucessfully hashed") {
# Display the duration.
$duration = (Get-Date)-$timeStart
$s = "; {0:D2}:{1:D2}:{2:D2} at {3:N3} MB per minute`n" -f $duration.Hours, $duration.Minutes, $duration.Seconds, ($sizeMB/($duration.TotalSeconds/60))
Write-Host $s
Return
}
if( $catchit -notmatch "; [\*]{3} No Files Found") {
# Something has gone wrong with the hashing. Exit.
Write-Host -f white -b red "An error has occurred. Exiting. Please correct and re-run."
Exit
}
}
function hashDir($path, [switch]$isRoot)
{
if( $isRoot ) {
$statusPath = $path
} else {
$statusPath = $path.Replace($startDir,"")
}
$files = Get-ChildItem -LiteralPath $path | where {!$_.PSIsContainer} | sort lastwritetime -descending | Select-object -first 1 | where {$_.name -eq "checksums.exf"}
if (-not $files) {
# checksum.exf is missing from directory
Write-Host -f red "`n[??]" $statusPath
runExf($path)
} else {
if ($files.lastwritetime -lt $files.directory.lastwritetime ) {
#something has been modified the directory, even though checksumss.exf is the latest file
#could be a new file added, which has an older date
Write-Host -f red "`n[++]" $statusPath
runExf($path)
} else {
#checksums.exf is the latest file, and there are no directory modifications
#still have to check if lines in file matches lines in directory. Perhaps hashing was
#aborted due to halting or system crash
$linecount = (Get-Content -LiteralPath $files.fullname | where {$_ -match "^[a-f0-9]{40}\s\?SHA1"} | Measure-Object).count
$filecount = (Get-ChildItem -LiteralPath $path -force | where {!$_.PSIsContainer} | Measure-Object).count - 1
if ($linecount -ne $filecount) {
Write-Host -f red "`[ne]" $statusPath
runExf($path)
} else {
Write-Host -f green "[ok]" $statusPath
}
}
}
}
$startDir = read-host "Please enter a root (starting) directory"
if($startDir.length -eq 0) {
$startDir = (pwd).path
Write-Host "Defaulting root directory to $startDir"
}
$timeStart = get-date
if($clean) {
Write-Host "Removing all existing checksums.exf files from directory and subdirectories"
Get-ChildItem $startDir -include checksums.exf -Recurse | Remove-Item
Exit
}
if(Test-Path -LiteralPath $startDir) {
Write-Host "Checking root directory"
hashDir -path $startDir -isRoot
Write-Host "`nGathering sub-directories. Please wait."
$subdirs = @(Get-ChildItem $startDir -Recurse | where {$_.PSIsContainer} | Sort fullname)
Write-Host $subdirs.count "sub-directories found under $startDir`n"
foreach ($subdir in $subdirs) {
hashDir -path $subdir.fullname
}
$duration = (Get-Date)-$timeStart
$status = "`n`nProcess complete; {0:D2}:{1:D2}:{2:D2} (duration)" -f $duration.Hours, $duration.Minutes, $duration.Seconds
Write-Host $status} else {
Write-Host -f red "'$startdir' does not exist. Nothing to do."
}
# The ReadKey functionality is only supported at the console (not is the ISE)
if (!$psISE)
{
Write-Host "Process complete"
Write-Host -NoNewLine "Press any key to continue ..."
$null = $Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")
Write-Host ""
}
# Store for quick clean
# get-childitem -include checksums-ok.txt -recurse | remove-item
# get-childitem -include checksums-fail.txt -recurse | remove-item
## Checking script- Copy this code into a text editor and save as hashcheck.ps1
- You can specify multiple directories when prompted by separating each with a space. Checksums will be checked for the directory you specify and all below it. Any failures will be in checksums-fail.txt
THIS CODE IS PROVIDED WITHOUT WARRANTY. USE AT YOUR OWN RISK.function runExf($dir)
{
$timeStart = Get-Date
$contents = Get-ChildItem -LiteralPath $path | where {!$_.PSIsContainer} | where {$_.name -ne "checksums.exf"}
if( ($contents | Measure-Object).count -gt 0 ) {
[double]$sizeMB = ($contents | Measure-Object -property length -sum).sum / 1MB
if( ($contents | Measure-Object).count -eq 1 ) {
$s = "; Checking 1 file totalling {0:N3} MB" -f $sizeMB
} else {
$s = "; Checking {0:N0} files totalling {1:N3} MB" -f ($contents | Measure-Object).count, $sizeMB
}
Write-Host $s
Write-Host $timeStart
}
## Adjust the next lines to point to your install of exf.exe
$exe = "path to\exf.exe"
if ((test-path($exe)) -eq $False) {
Write-Host -f white -b red "ExactFile cannot be found at $exe. Please check script."
exit
}
$catchIt = &$exe -c (Join-Path -path $dir "checksums.exf")
$catchit
$duration = (Get-Date)-$timeStart
$s = "; {0:D2}:{1:D2}:{2:D2} at {3:N3} MB per minute" -f $duration.Hours, $duration.Minutes, $duration.Seconds, ($sizeMB/($duration.TotalSeconds/60))
Write-Host $s
}
function hashDir($path)
{
# A [ or ] in the filename will trigger regular expression matching. We need to avoid that
# Expand this pattern for any other characters which trip up your directories.
$litpath = $path.replace("`[", "```[")
$litpath = $litpath.replace("`]", "```]")
if (Test-Path -path $litpath\checksums.exf) {
Write-Host "`n[in]" $path
$res = runExf( $path )
if (($res -match "^No errors") -or ($res -match "^No files found")) {
$path | out-file -filepath $outputOK -append
Write-Host -f green "[ok]" $path
} else {
("[in] " + $path) | out-file -filepath $outputFAIL -append
$res | out-file -filepath $outputFAIL -append
"" | out-file -filepath $outputFAIL -append
foreach ($r in $res) {Write-Host -f red $r}
Write-Host ""
}
} else {
("[??] " + $path) | out-file -filepath $outputFAIL -append
"" | out-file -filepath $outputFAIL -append
Write-Host -f red "`n[??]" $path
}
}
function listFails() {
if (Test-Path $outputFAIL) {
Write-Host -f red "`nListing errors directory by directory`n"
foreach($fail in (Get-Content $outputFAIL)) {
Write-Host -f red $fail
}
}
}
$failsExist = $false
$startDirs = read-host "Please enter the root (starting) directories"
foreach ($startDir in $startDirs.split(" ")) {
if($startDir.length -eq 0) {
$startDir = (pwd).path
Write-Host "Defaulting root directory to $startDir"
}
$outputOK = join-path -path $startdir "checksums-ok.txt"
$outputFAIL = join-path -path $startdir "checksums-fail.txt"
if( Test-Path $outputFAIL ) {
Remove-Item $outputFAIL
}
$timeStart = get-date
if(Test-Path -LiteralPath $startDir) {
if( Test-Path $outputOK) {
$okfiles = Get-Content $outputOK
} else {
$okfiles = $null
}
if(!($okfiles | where {$_ -eq $startDir})) {
hashDir -path $startDir
}
Write-Host "`nGathering sub-directories. Please wait."
$subdirs = @(Get-ChildItem $startDir -Recurse | where {$_.PSIsContainer} | Sort fullname)
Write-Host $subdirs.length "sub-directories found under $startDir"
foreach ($subdir in $subdirs) {
if( $okfiles | where {$_ -eq $subdir.fullname })
{
Write-Host -f green "[ok]" $subdir.fullname
} else {
hashDir -path $subdir.fullname
}
}
listFails
$duration = (Get-Date)-$timeStart
$status = "`n`nProcess complete; {0:D2}:{1:D2}:{2:D2} (duration)" -f $duration.Hours, $duration.Minutes, $duration.Seconds
Write-Host $status
} else {
Write-Host -f red "'$startdir' does not exist. Nothing to do."
}
}
# The ReadKey functionality is only supported at the console (not in the ISE)
if (!$psISE)
{
Write-Host "Process complete"
Write-Host -NoNewLine "Press any key to continue ..."
$null = $Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")
Write-Host ""
}