This function does a few things:function Check-Header{param($path)# Hexidecimal signatures for expected files$pdf = 25504446;$TIFF_1 = 492049;$TIFF_2 = 49492A00;$TIFF_3 = 4D4D002A;$TIFF_4 = 4D4D002B;# Get content of each file (up to 4 bytes) for analysis([Byte[]] $fileheader = Get-Content -Path $path -TotalCount 4 -Encoding Byte) |ForEach-Object {if(("{0:X}" -f $_).length -eq 1){$HeaderAsHexString += "0{0:X}" -f $_}else{$HeaderAsHexString += "{0:X}" -f $_}}# Validate file header@($pdf, $tiff_1, $tiff_2, $tiff_3, $tiff_4) -contains $HeaderAsHexString}
- Takes a file path argument
- Declares five known signatures (there are the headers we want files to have)
- Reads the first 4 bytes of the file into a [Byte[]] array
- Passes this byte array to a simple if/else statement to convert each byte from byte to a hexidecimal string
- Compares an array of all known good signatures to see if any of them match the converted file signature
If the -contains operator validates that one of the binary arrays matches our header the function returns true. If it does not find a match it returns false. On a directory of 1024 files this took just over 3.9 seconds on my test server. If I can get a straight run, I anticipate my 3.8 million file collection to take just a shade more than 4 hours. I will be doing some other manipulation, so, it will be considerably slower, but, in cases like this, it just goes to show there is no alternative to a good automated solution.