
FSS-UTF, UTF-2, UTF-8, and UTF-16 - Unicode
Jul 6, 2001 · The Unicode Standard, Version 2.0 description of UTF-8 includes the text, "Each code value (non-surrogates) is represented in UTF-8 by 1, 2, or 3 bytes, depending on the code value. Pairs of surrogates take 4 bytes." So at the 1996 point of simultaneous publication of UTF-8 and UTF-16 in both 10646 and the Unicode Standard, D800..DFFF were no ...
UTR#17: Unicode Character Encoding Model
Nov 11, 2022 · The UCS-2 encoding form, which is associated with ISO/IEC 10646 and can only express the subset of characters in the BMP, is a fixed-width encoding form. In contrast, UTF-16 uses either one or two code units and is able to cover the entire codespace of Unicode. UTF-8 provides a good example.
UAX #19: UTF-32 - Unicode
Mar 27, 2002 · UTF-32 defines an encoding form for Unicode for representing Unicode code points with single 32-bit code units. With the addition of UTF-32, the Unicode Standard now has three sanctioned encoding forms: UTF-8, UTF-16, and UTF-32. These use 8-bit, 16-bit, and 32-bit code units, respectively.
UTR#17: Character Encoding Model - Unicode
Nov 23, 1999 · Unicode 1.1 has either the UCS-2 (default) or UTF-8 encoding form ISO/IEC 10646, depending on the declared implementation levels, may have UCS-2, UCS-4, UTF-16, or UTF-8. Note that Shift-JIS is not an encoding form: it is discussed in the next section.
Glossary - Unicode
A multibyte encoding for text that represents each Unicode character with 1 to 4 bytes, and which is backward-compatible with ASCII. UTF-8 is the predominant form of Unicode in web pages. More technically: (1) The UTF-8 encoding form. (2) The UTF-8 encoding scheme. (3) “UCS Transformation Format 8,” defined in Annex D of ISO/IEC 10646:2003 ...
Unicode Standard
About the Unicode® Standard Characters for the World. The Unicode Standard is the universal character encoding designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world.
Unicode Mail List Archive: RE: utf-8 to ucs-2
Jun 19, 2002 · What you want to do is turn the UCS-2 characters in your String into a UTF-8 byte[] and then that byte[] back into a string. What you need is a transformation that won't move any of the bytes around. There is an encoding that maps 0->FF linearly from Unicode.
Re: FSS-UTF, UTF-2, UTF-8, and UTF-16 - unicode.org
The Hangul mess took place with Unicode 2.0, not 2.1. And this is a red herring anyway when we are talking about UTF-8. As stated before, UTF-8 has never changed even though the Unicode beneath it has changed: * by moving the Hangul block in version 2.0 * by creating the UTF-16 mechanism to support surrogates in 1993 (not 2001)
What is Unicode?
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
Unicode Mail List Archive: Re: RE: UTF-2
Subject: Re: UTF-2 Ar 10:23 -0800 1997-12-11, scríobh Misha Wolf: >UTF-8 was originally called FSS-UTF and then, for a while, UTF-2. The "2" >indicated "the second UTF" (there was also a UTF-1). Some vendors, eg >Oracle, kept the UTF-2 name in their documentation for a long time. Even if it is wrong it might be prudent to register it as an ...