On occasion I need to track down duplicate entries in a file. Without going through a bunch of mechanics, I found this approach useful, and, most importantly, easy. First, we will create a dummy array and store the contents in a temp file:
# Create temp file with dummy data including duplicate lines
Next, we get the data into an array. Interestingly, Get-Content does this for you without any extra work:
# Get file contents into an array
$filecontents=Get-Content-Path$tempfile
Once we have an array, which is verifiable by using this command:
$filecontents.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
we can use the Group-Object (or group alias) with a Where-Object (or where alias) cmdlet pattern to find collections (or groupings) with more than 1 entry. In essence, this is a set of lines (or array entries) where more than 1 entry exists per group:
# Find duplicates
$filecontents |
Group |
Where {$_.count -gt 1}
When this gets run, it shows results:
Count Name Group
----- ---- -----
4 1 {1, 1, 1, 1}
3 2 {2, 2, 2}
2 3 {3, 3}
To finalize this sample, remove the temp file:
# Clean up
Remove-Item-Path$tempfile
While such a simple example may seem artificial, I am working on a way to reference the actual lines where duplicates appear this may "break" the simple Group cmdlet usage shown above, but, if you are in a hurry, these steps can save you very easily with minimal effort.
0
comments to “PowerShell v3 Find Duplicate lines in File”