The DAM Forum
Welcome, Guest. Please login or register.
October 25, 2014, 11:09:43 AM

Login with username, password and session length
Search:     Advanced search
28020 Posts in 5140 Topics by 2910 Members
Latest Member: kbroch
* Home Help Search Login Register
+  The DAM Forum
|-+  DAM Stuff
| |-+  Loss and Recovery
| | |-+  Scripts to verify media copied on Windows
« previous next »
Pages: [1] Print
Author Topic: Scripts to verify media copied on Windows  (Read 1550 times)
David C. Buchan
Quantum Gardener
Jr. Member
**
Posts: 66



View Profile WWW Email
« on: January 03, 2012, 11:23:33 PM »

About a year ago I began playing around with MD5/SHA1 digests to check if files were being copied correctly between drives. Over many 10's of thousands of files I found this wasn't the case and there were always a few files that Windows said were ok, but which generated a different digest. For those unaware a digest is a signature string for a file. If even one bit is different in a copy the digest will not match.

I have developed two Windows Powershell scripts which I'm happy to share here on the proviso they are unsupported and you use at your own risk. I don't have time to publicly maintain them, nor even to document them very well, but as they are working for me I think others might like to know about them.

The basic process I use is:

1. Work with files on master copy drive.
2. Generate SHA1 digest for all files on a directory-by-directory basis. This results in a single checksum file per directory.
3. Sync from the master copy drive to a backup copy drive using Syncback
4. Compare the SHA1 digest for all files on the backup copy drive.
5. Recopy and repeat step 4 as necessary.

The script's use the command line version of ExactFile (free download) to generate the digests. Modify the script to point to it's install location. You will need to give your PC permission for the script to run under PowerShell using the Set-ExecutionPolicy RemoteSigned command. I figure if you care enough to run these scripts you'll be able to work it out. Note: By their very nature they will create (and sometimes) files on your drives - though only those relevant to the process.

In Windows, right-click and choose "Run with Powershell" or within Powershell using the . command as in ./[path]/hashcheck.ps1

Both scripts are smart and do their best to only recreate the checksum file if a directory is changed. Run hashcreate.ps1 to create the files and hash check.ps1 to check them. This second script creates two files in the directory you specify for checking. They are checksums-ok.txt and checksums-fail.txt. Across 1TB you don't want to be checking directories that have already passed muster. To re-check simply delete the checksums-ok.txt file. In checksums-fail.txt you'll find a list of all failed files. Once corrected re-run the check script and the whole directory will be rechecked.

## Creation script
  • Copy this code into a text editor and save as hashcreate.ps1
  • You can specify multiple directories when prompted by separating each with a space. Checksums will be created for the directory you specify and all below it
  • If run from the Powershell command line the -clean parameter will force a delete of all existing checksum.exf files
THIS CODE IS PROVIDED WITHOUT WARRANTY. USE AT YOUR OWN RISK.

Code:
param ([switch]$clean)

function runExf($path)
{   
    $timeStart = Get-Date
    $contents = Get-ChildItem -LiteralPath $path | where {!$_.PSIsContainer}
    if( ($contents | Measure-Object).count -gt 0 ) {
        [double]$sizeMB = ($contents | Measure-Object -property length -sum).sum / 1MB
        $s = "; Hashing {0:N0} files totalling {1:N3} MB" -f ($contents | Measure-Object).count, $sizeMB
        Write-Host $s
    }
    write-host "; @ $timestart"
    $exfOutput = "checksums.exf"
   
    ## Adjust the next lines to point to your install of exf.exe
    $exe = "path to\exf.exe"
    if ((test-path($exe)) -eq $False) {
        Write-Host -f white -b red "ExactFile cannot be found at $exe. Please check script."
        exit
    }
    $catchit = &$exe -sha1 -otf ("$exfOutput") -d "$path" *.*
    $catchit
    if( $catchit -match "; [0-9]+ files sucessfully hashed") {
        # Display the duration.
        $duration = (Get-Date)-$timeStart
        $s = "; {0:D2}:{1:D2}:{2:D2} at {3:N3} MB per minute`n" -f $duration.Hours, $duration.Minutes, $duration.Seconds, ($sizeMB/($duration.TotalSeconds/60))
        Write-Host $s
        Return
    }
    if( $catchit -notmatch "; [\*]{3} No Files Found") {
        # Something has gone wrong with the hashing. Exit.
        Write-Host -f white -b red "An error has occurred. Exiting. Please correct and re-run."
        Exit
    }
}

function hashDir($path, [switch]$isRoot)
{
    if( $isRoot ) {
        $statusPath = $path
    } else {
        $statusPath = $path.Replace($startDir,"")
    }
$files = Get-ChildItem -LiteralPath $path | where {!$_.PSIsContainer} | sort lastwritetime -descending | Select-object -first 1 | where {$_.name -eq "checksums.exf"}
if (-not $files) {
    # checksum.exf is missing from directory
Write-Host -f red "`n[??]" $statusPath
runExf($path)
} else {
if ($files.lastwritetime -lt $files.directory.lastwritetime ) {
#something has been modified the directory, even though checksumss.exf is the latest file
#could be a new file added, which has an older date
Write-Host -f red "`n[++]" $statusPath
runExf($path)
} else {
#checksums.exf is the latest file, and there are no directory modifications
#still have to check if lines in file matches lines in directory. Perhaps hashing was
#aborted due to halting or system crash
$linecount = (Get-Content -LiteralPath $files.fullname | where {$_ -match "^[a-f0-9]{40}\s\?SHA1"} | Measure-Object).count
$filecount = (Get-ChildItem -LiteralPath $path -force | where {!$_.PSIsContainer} | Measure-Object).count - 1
      if ($linecount -ne $filecount) {
Write-Host -f red "`[ne]" $statusPath
runExf($path)
} else {
        Write-Host -f green "[ok]" $statusPath
}
}
}
}

$startDir = read-host "Please enter a root (starting) directory"

if($startDir.length -eq 0) {
    $startDir = (pwd).path
    Write-Host "Defaulting root directory to $startDir"
}


$timeStart = get-date

if($clean) {
    Write-Host "Removing all existing checksums.exf files from directory and subdirectories"
    Get-ChildItem $startDir -include checksums.exf -Recurse | Remove-Item
    Exit
}

if(Test-Path -LiteralPath $startDir) {
    Write-Host "Checking root directory"
    hashDir -path $startDir -isRoot
    Write-Host "`nGathering sub-directories. Please wait."
    $subdirs = @(Get-ChildItem $startDir -Recurse | where {$_.PSIsContainer} | Sort fullname)
    Write-Host $subdirs.count "sub-directories found under $startDir`n"
        foreach ($subdir in $subdirs) {
        hashDir -path $subdir.fullname
    }

    $duration = (Get-Date)-$timeStart
    $status = "`n`nProcess complete; {0:D2}:{1:D2}:{2:D2} (duration)" -f $duration.Hours, $duration.Minutes, $duration.Seconds
    Write-Host $status} else {
    Write-Host -f red "'$startdir' does not exist. Nothing to do."
}

# The ReadKey functionality is only supported at the console (not is the ISE)

if (!$psISE)
{
Write-Host "Process complete"
    Write-Host -NoNewLine "Press any key to continue ..."
    $null = $Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")
    Write-Host ""
}

# Store for quick clean
# get-childitem -include checksums-ok.txt -recurse | remove-item
# get-childitem -include checksums-fail.txt -recurse | remove-item

## Checking script
  • Copy this code into a text editor and save as hashcheck.ps1
  • You can specify multiple directories when prompted by separating each with a space. Checksums will be checked for the directory you specify and all below it. Any failures will be in checksums-fail.txt

THIS CODE IS PROVIDED WITHOUT WARRANTY. USE AT YOUR OWN RISK.

Code:
function runExf($dir)
{
    $timeStart = Get-Date
    $contents = Get-ChildItem -LiteralPath $path | where {!$_.PSIsContainer} | where {$_.name -ne "checksums.exf"}
    if( ($contents | Measure-Object).count -gt 0 ) {
        [double]$sizeMB = ($contents | Measure-Object -property length -sum).sum / 1MB
        if( ($contents | Measure-Object).count -eq 1 ) {
            $s = "; Checking 1 file totalling {0:N3} MB" -f $sizeMB
        } else {
            $s = "; Checking {0:N0} files totalling {1:N3} MB" -f ($contents | Measure-Object).count, $sizeMB
        }
        Write-Host $s
        Write-Host $timeStart
    }
    ## Adjust the next lines to point to your install of exf.exe
    $exe = "path to\exf.exe"
    if ((test-path($exe)) -eq $False) {
        Write-Host -f white -b red "ExactFile cannot be found at $exe. Please check script."
        exit
    }
    $catchIt = &$exe -c (Join-Path -path $dir "checksums.exf")
    $catchit
    $duration = (Get-Date)-$timeStart
    $s = "; {0:D2}:{1:D2}:{2:D2} at {3:N3} MB per minute" -f $duration.Hours, $duration.Minutes, $duration.Seconds, ($sizeMB/($duration.TotalSeconds/60)) 
    Write-Host $s
}

function hashDir($path)
{
    # A [ or ] in the filename will trigger regular expression matching. We need to avoid that
    # Expand this pattern for any other characters which trip up your directories.
    $litpath = $path.replace("`[", "```[")
    $litpath = $litpath.replace("`]", "```]")

    if (Test-Path -path $litpath\checksums.exf) {
        Write-Host "`n[in]" $path
        $res = runExf( $path )
        if (($res -match "^No errors") -or ($res -match "^No files found")) {
            $path | out-file -filepath $outputOK -append               
            Write-Host -f green "[ok]" $path
        } else {
            ("[in] " + $path) | out-file -filepath $outputFAIL -append               
            $res | out-file -filepath $outputFAIL -append
            "" | out-file -filepath $outputFAIL -append
            foreach ($r in $res) {Write-Host -f red $r}
            Write-Host ""
        }     
} else {
        ("[??] " + $path) | out-file -filepath $outputFAIL -append               
        "" | out-file -filepath $outputFAIL -append
        Write-Host -f red "`n[??]" $path
}
}

function listFails() {
    if (Test-Path $outputFAIL) {
        Write-Host -f red "`nListing errors directory by directory`n"
        foreach($fail in (Get-Content $outputFAIL)) {
            Write-Host -f red $fail
        }
    }
}

$failsExist = $false

$startDirs = read-host "Please enter the root (starting) directories"

foreach ($startDir in $startDirs.split(" ")) {
    if($startDir.length -eq 0) {
        $startDir = (pwd).path
        Write-Host "Defaulting root directory to $startDir"
    }

    $outputOK = join-path -path $startdir "checksums-ok.txt"
    $outputFAIL = join-path -path $startdir "checksums-fail.txt"
    if( Test-Path $outputFAIL ) {
        Remove-Item $outputFAIL
    }

    $timeStart = get-date

    if(Test-Path -LiteralPath $startDir) {
        if( Test-Path $outputOK) {
            $okfiles = Get-Content $outputOK
        } else {
            $okfiles = $null
        }
        if(!($okfiles | where {$_ -eq $startDir})) {
            hashDir -path $startDir
        }
        Write-Host "`nGathering sub-directories. Please wait."
        $subdirs = @(Get-ChildItem $startDir -Recurse | where {$_.PSIsContainer} | Sort fullname)
        Write-Host $subdirs.length "sub-directories found under $startDir"
        foreach ($subdir in $subdirs) {
            if( $okfiles | where {$_ -eq $subdir.fullname })
            {
                Write-Host -f green "[ok]" $subdir.fullname
            } else {
                hashDir -path $subdir.fullname
            }
        }
        listFails

        $duration = (Get-Date)-$timeStart
        $status = "`n`nProcess complete; {0:D2}:{1:D2}:{2:D2} (duration)" -f $duration.Hours, $duration.Minutes, $duration.Seconds
        Write-Host $status
    } else {
        Write-Host -f red "'$startdir' does not exist. Nothing to do."
    }
}

# The ReadKey functionality is only supported at the console (not in the ISE)
if (!$psISE)
{
Write-Host "Process complete"
    Write-Host -NoNewLine "Press any key to continue ..."
    $null = $Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")
    Write-Host ""
}
Logged
Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!