- http://franckrichard.blogspot.com/2010/08/powershell-get-encoding-file-type.html
- Chad Millers script (referenced above) - http://poshcode.org/2059
- and, Lee Holmes variant - http://poshcode.org/2153
To give myself something to work with I decided to explore the standard encodings available with most cmdlets. Interestingly, there are a few standard encodings that you should be familiar with:
- ASCII
- Big Endian Unicode
- Default
- OEM
- Unicode
- UTF-32
- UTF-7
- UTF-8
In testing the output for these, I used this approach:
which yielded this output:unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem |sort |% {Out-File -FilePath "C:dataDocumentsPowershellProjectsEncoding est$_.txt" -InputObject Test -Encoding $_;$bytearray = Get-Content -Path "C:dataDocumentsPowershellProjectsEncoding est$($_).txt" -Encoding byte"$($_): $($bytearray -join )"}
As you can see, there are some similarities between each, but, when working with encoding it is important to know what is "expected" and what is purely data. I highlighted the "common" characters in red so it was obvious what the control was in each case. Alternatively, here is the same thing in Hex.ascii: 84 101 115 116 13 10bigendianunicode: 254 255 0 84 0 101 0 115 0 116 0 13 0 10default: 84 101 115 116 13 10oem: 84 101 115 116 13 10unicode: 255 254 84 0 101 0 115 0 116 0 13 0 10 0utf32: 255 254 0 0 84 0 0 0101 0 0 0 115 0 0 0 116 0 0 0 13 0 0 0 10 0 0 0utf7: 84 101 115 116 13 10utf8: 239 187 191 84 101 115 116 13 10
It is clear you need to be careful when you are dealing with unknown file formats. I will more than likely use Lees function as it covers some non-standard encodings:unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem |sort |% {Out-File -FilePath "C:dataDocumentsPowershellProjectsEncoding est$_.txt" -InputObject Test -Encoding $_;$bytearray = Get-Content -Path "C:dataDocumentsPowershellProjectsEncoding est$($_).txt" -Encoding byte"$($_): {0}" -f (($bytearray | % { [Convert]::ToString($_,16).PadLeft(2,"0")}) -join )}ascii: 54 65 73 74 0d 0abigendianunicode: fe ff 00 54 00 65 00 73 00 74 00 0d 00 0adefault: 54 65 73 74 0d 0aoem: 54 65 73 74 0d 0aunicode: ff fe 54 00 65 00 73 00 74 00 0d 00 0a 00utf32: ff fe 00 00 54 00 00 00 65 00 00 00 73 00 00 00 74 00 00 00 0d 00 00 00 0a 00 00 00utf7: 54 65 73 74 0d 0autf8: ef bb bf 54 65 73 74 0d 0a
font-family: "Courier New"; font-size: 10.0pt;">function Get-FileEncoding
{
##############################################################################
##
## Get-FileEncoding
##
## From Windows PowerShell Cookbook (OReilly)
## by Lee Holmes (http://www.leeholmes.com/guide)
##
##############################################################################
<#
.SYNOPSIS
Gets the encoding of a file
.EXAMPLE
Get-FileEncoding.ps1 .UnicodeScript.ps1
BodyName : unicodeFFFE
EncodingName : Unicode (Big-Endian)
HeaderName : unicodeFFFE
WebName : unicodeFFFE
WindowsCodePage : 1200
IsBrowserDisplay : False
IsBrowserSave : False
IsMailNewsDisplay : False
IsMailNewsSave : False
IsSingleByte : False
EncoderFallback : System.Text.EncoderReplacementFallback
DecoderFallback : System.Text.DecoderReplacementFallback
IsReadOnly : True
CodePage : 1201
#>
param(
## The path of the file to get the encoding of.
$Path
)
Set-StrictMode -Version Latest
## The hashtable used to store our mapping of encoding bytes to their
## name. For example, "255-254 = Unicode"
$encodings = @{}
## Find all of the encodings understood by the .NET Framework. For each,
## determine the bytes at the start of the file (the preamble) that the .NET
## Framework uses to identify that encoding.
$encodingMembers = [System.Text.Encoding] |
Get-Member -Static -MemberType Property
$encodingMembers | Foreach-Object {
$encodingBytes = [System.Text.Encoding]::($_.Name).GetPreamble() -join -
$encodings[$encodingBytes] = $_.Name
}
## Find out the lengths of all of the preambles.
$encodingLengths = $encodings.Keys | Where-Object { $_ } |