UTF-8 vs UTF-16
In this article, I am going to write key points about what is UTF and difference between UTF-8 and UTF-16.
What is UTF
UTF stands for Unicode Transformation Format. It is a family of standards for encoding the Unicode character set into its equivalent binary value. UTF was developed so that users have a standardized means of encoding the characters with the minimal amount of space.
UTF-8
– UTF-8 is a variable-width encoding that can represent every character in the Unicode character set.
– UTF-8 was designed for backward compatibility with ASCII.
– UTF-8 is byte oriented format and therefore has no problems with byte oriented networks or file.
– UTF-8 uses 1 byte at the minimum in encoding the characters.
– UTF-8 is also better in recovering from errors that corrupt portions of the file or stream as it can still decode the next uncorrupted byte.
UTF-16
– UTF-16 is a character encoding for Unicode capable of encoding 1,112,064 numbers (called code points) in the Unicode code space from 0 to 0x10FFFF.
– UTF-16 is not byte oriented and needs to establish a byte order in order to work with byte oriented networks
– UTF-16 uses 2 bytes at the minimum in encoding the characters.
Thanks,
Morgan
Software Developer