function Check-Header{param($path)# Hexidecimal signatures for expected files$pdf = 25504446;$TIFF_1 = 492049;$TIFF_2 = 49492A00;$TIFF_3 = 4D4D002A;$TIFF_4 = 4D4D002B;# Get content of each file (up to 4 bytes) for analysis([Byte[]] $fileheader = Get-Content -Path $path -TotalCount 4 -Encoding Byte) |
cadetblue; font-family: "Courier New"; font-size: 9.0pt;">ForEach-Object {
if(("{0:X}" -f $_).length -eq 1)
{
$HeaderAsHexString += "0{0:X}" -f $_
}
else
{
$HeaderAsHexString += "{0:X}" -f $_
}
}
# Validate file header
@($pdf, $tiff_1, $tiff_2, $tiff_3, $tiff_4) -contains $HeaderAsHexString
}
This function does a few things:- Takes a file path argument
- Declares five known signatures (there are the headers we want files to have)
- Reads the first 4 bytes of the file into a [Byte[]] array
- Passes this byte array to a simple if/else statement to convert each byte from byte to a hexidecimal string
- Compares an array of all known good signatures to see if any of them match the converted file signature
If the -contains operator validates that one of the binary arrays matches our header the function returns true. If it does not find a match it returns false. On a directory of 1024 files this took just over 3.9 seconds on my test server. If I can get a straight run, I anticipate my 3.8 million file collection to take just a shade more than 4 hours. I will be doing some other manipulation, so, it will be considerably slower, but, in cases like this, it just goes to show there is no alternative to a good automated solution.