I dont want to go through this at too granular a level, so, I will hit the highlights. The comments should provide a pretty good road map. When the script run it:# Clear console in GUIClear-Host;# Set root path$dir = C:MickeyMouseNinjaOfCheese;# Get list of child foldersGet-ChildItem -Path $dir |# Iterate over result setForEach-Object {# Iterate over files in each sub folderGet-ChildItem -Path $_.fullname |# Exclude specific files irrelevant to the discussionWhere-Object {$_.name -ne kungfu.txt} |# Process each fileForEach-Object -Begin {# Instantiate new hashtable$fileheaders = @{};# Report status to hostWrite-Output "Processing $($_.Fullname)"} `-Process {# Clear string buffer from previous iteration$string = $null# Read file as binary data for first four charactersGet-Content -Path $_.fullname -Encoding Byte -TotalCount 4 | % {# Convert each ASCII code characer to hex and append to buffer$string += [Convert]::ToString($_,16).PadLeft(2,0)}# Add file path (as key) and header (as string)$fileheaders.Add($_.fullname, $string)} `-End {# Enumerate collection$fileheaders.GetEnumerator() |# Group by file signature (i.e., each hastable members value)Group-Object Value |# Output to table displaying the count of each grouped file signature and the signatureFormat-Table -AutoSize -Property Count, Name}}
- Gets a list of folders and passes that collection to a foreach
- The foreach gets the files in each folder and passes that to the pipeline
- This gets passes through a Where to eliminate any kungfu.txt files as I dont need to know what their file type is
- With each file I pass it through a ForEach which uses the three sections
- Begin: to set up a new hashtable for the folder and output the status to the host
- Process: to clear the buffer ($string), read each file as binary for 4 characters and convert this string to a hex string stored in the buffer. Finally, it adds the file fullname (as a unique key) and the signature to the hashtable.
- End: to process the hashtable, group the collection by hashtable value (the file signature) and output the results to a Format-Table cmdlet. This autosizes, to display long paths, and shows the count and name (or file signature).
When I run this script on a few folders, I see these output:
With a little more work, I can see the breakdown of file types by analyzing my signature database, but, that is another story of sorts.Processing C:MickeyMouseNinjaOfCheese001Count Name----- ----210 49492a001 5b4964652 ffd8ffe0Processing C:MickeyMouseNinjaOfCheese002Count Name----- ----212 49492a002 224d3a5c3 0d0a4d3a2 6364202e1 5b496465Processing C:MickeyMouseNinjaOfCheese003Count Name----- ----209 49492a001 5b4964651 424d6e0e1 424d4e131 424d0e10