InternetUnicodeHTMLCSSScalable Vector Graphics (SVG)Extensible Markup Language (xml)ASP.NetDocument Object Model (DOM)JavaScript Server-Side VBScript Draft for Information Only
ContentVBScript Characters
VBScript CharactersThe two concerns of VBScript character are the character set used for representing all valid characters and character code for storing a character in script coding, computer manipulating and user interfacing. VBScript Character CodeIn general, Visual Basic uses Unicode to store and manipulate strings. Unicode is a character set using 21 bits to represent each character. However, some other programs, such as 16-bit object libraries, uses ANSI (American National Standards Institue) or DBCS (Double-Byte Character Set) to store and manipute strings. Therefore manipulating strings between different enviroments may encounter differences between Unicode and ANSI/DBCS. For example,
Besides, the manipulateing of byte data of string by Visual Basic itself may also cause problems when dealing with bytewise operations. For example, Chr, ChrB, Asc, AscB, LeftB, MidB, RightB, and InStrB functions. Character SetAlthough the ANSI caharacter set can handle most Western European language with 256 characters using individual 8-bit character sets, some languages such as Chinese (Traditional and Simplified), Japanese, and Korean (Hangeul) require more than 256 characters. DBCS and Unicode becomes the most popular character sets used to represent text string. ASCIIASCII (American Standard Code for Information Interchange) uses 7 bits to represent each character in 128 characters, including control characters and printable character. ASCII character set is the most important character set because the ASCII is usually the common part included in other common character sets. In other words, the character code of ASCII characters is usually a subset of other common character sets. Windows ANSIAlthough ANSI stands for American National Standards Institute, ANSI is also a genetic term used by Microsoft Windows as a character code standard. ANSI standard uses only a single byte, 8 bits, to represent each character in 256 characters. Windows ANSI character code can therefore be divided into two parts, with the lower 128 part is identical to ASCII and the upper 128 is assigned to various international character sets. A code page number is used as the identifier of each individual character set for Windows. For example,
In other words, Window ANSI character set is a kind of SBCS (Single-Byte Character Sets) DBCSDBCS stands for Double Byte Character Sets. Although SBCS is adequate for English, languages in East Asian region typically require much more than 256 characters. A 2-byte value is a common solution to tackle this problem. Unlike SBCS, DBCS are actually multi-byte encodings, a mix of 8-bit and 16-bit characters. In general, the 8-bit character in DBCS character sets is restricted to the ASCII character set. In other words, only the lower 128 part of a SBCS is used (number 0-127). The upper 128 part of a SBCS is reserved and function as the lead-byte of a 16-bit character. For DBCS data stream used in PC, a 16-bit character is restricted to lead byte with upper 128 part of a single byle while the trial byte can be lower or upper part of a single byte. Similar to ANSI, a DBCS code page number is used as the identifier of each individual DBCS character set for different language set. However, each DBCS code page has a different predefined leadbyte and trailbyte range. For example
Besides the number of these ideographic characters, an ideographic character also cannot fit in the exist form. A form called full-width,, wide-width character is introduced to contain 2-byte characters, primarily ideographs. The form containing 1-byte characters is sometime called half-width character. According to these arrangement, a full-width character may have a half-width variant and a half-width character may also have a full-width variant. For example,
UnicodeAlthough the basic multilingual plane of first version Unicode (1991-1995) is a 16-bit character-encoding scheme (U+0000-U+FFFF), the encoding space is changed to a 16-bit character-encoding scheme (U+0000-U+10FFFF) starting with Unicode 2.0 (July, 1996). Unicode uses all digits for every character trying to globally contain all commonly used characters of different language within all possible code points. Besides the basic plane, more planes are defined but not all possible code points are assigned. Unlike the using of code page identifier, Unicode is language-independence and code points are assigned by agreement such that all languages can share common characters without considering the artistic issue. In general, the Unicode codespace is divided into planes of 2-byte code points and each plane is subdivided into block according to assignment. For example,
Because of the expansion of the codepoint space, encoded Unicode character set cannot be manipulated in a compute directly as other encoded character sets, for example, ASCII, DBCS. Unicode value can only be used as a unique code point for every assigned character such that each character can be refered to a code with simple digit configuration. In order to avoid ambiguity in data processing, Unicode code point should be encoded. Some common encoding forms are UTF-8 of using one to four 8-bit bytes, UTF-16 of using one or two 16-bit code units, and UTF-32 of using one single 32-bit code unit. In Unicode, the character at code point U+FEFF is defined as the byte order mark (BOM), while the byte-reversed counterpart, U+FFFE is a noncharacter (U+FFFE) in UTF-16 encoding form, or outside the code space (0xFFFE0000) for UTF-32. The BOM is used to specify the order of bytes in a code unit. A code unit with the most significant byte (MSB) first is called big-endian, while a code unit with the least significant byte (LSB) first is called little-endian. Therefore, if all Unicode code points of a data stream lies within the Basic Multilingual Plane, the data stream of Unicode code points can be stored together with the byte order mark at the beginning of the data stream without causing any ambiguity. However, whenever there is a Unicode code point of the data stream lies outside Basic Multilingual Plane with Unicode code point >U+FFFF, the data stream of of Unicode code points must be encoded to one of the encoding forms. For Windows, Unicode has already been used by the Component Object Model (COM) on all 32-bit versions of Windows, used as the basis for OLE and ActiveX technologies, and fully supported by Windows NT. UTF-8In order to make the Unicode coded character set feasible for information representation and manipulation in a computer, the coded character set must be mapped to unambiguous form that can be recognized by the computing software. UTF-8 is one of the encoding forms that is commonly used with Unicode. The UTF-8 encoding form uses one to four 8-bit bytes to represent a Unicode code point according to some standard rules. Since the byte sequence arrangement can be determined from the byte, using BOM to identify the byte order of the code units is usually not necessary. The rules used to encoding Unicode code point to UTF-8 can be summarized as following
Example of UTF-8 with BOM
UTF-16UTF-16 is another encoding form for Unicode. The UTF-16 encoding form uses one to two 16-bit bytes to represent a Unicode code point according to some standard rules. Unlike UTF-8, BOM at the head of data stream is usually used to identify the byte order of the code units.. The rules used to encoding Unicode code point to UTF-16 can be summarized as following
Example of UTF-16 with BOM (big endian/little endian
UTF-32UTF-32 is another encoding form for Unicode. The UTF-32 encoding form uses one single 32-bit code unit to represent a Unicode code point with 21-bit code point space. Unlike UTF-8, BOM at the head of data stream is usually used to identify the byte order of the code units.. The rules used to encoding Unicode code point to UTF-32 can be summarized as following
Example of UTF-32 with BOM (big endian/little endian
VBScript Character SetAlthough VBScript can manipulate characters of different character sets, the characters used in VBScript scripting are restricted to
The control characters, Chr(8) backspace, Chr(9) horizontal tab, Chr(10) linefeed, and Chr(13) carriage return of ASCII are supported by Microsoft Windows with no graphical representation but may have visual effect on visual display of text depending on application. The Chr(0) nullchar is the Null character with character having the value 0. However, Chr(11) verticaltab and Chr(12) formfeed are not useful in Microsoft Windows. For example, forming the layout of VBScript. The printed characters. Chr(32)-Chr(126) are supported and used in VBScript scripting. But Chr(127) delete is also not supported. The alphabetical and numeric characters play a key role in VBScript. However, the special symbolic characters in the Visual Basic character set also play various functions in VBScript. For example, organizing vbscript, defining the tasks, but not specifying an operation to be performed. But for the extended part of ASCII, Chr(128)-Chr(255) are ANSI characters depending on the code page specified in the local computer system. These characters are supported by Microsoft Windows but not in VBScript scripting. ExamplesExamples of Chr function
ASP VbScript Command:
HTML Web Page Embedded Output: ©sideway ID: 180400011 Last Updated: 4/11/2018 Revision: 0 Latest Updated Links
|
Home 5 Business Management HBR 3 Information Recreation Hobbies 8 Culture Chinese 1097 English 339 Reference 79 Computer Hardware 249 Software Application 213 Digitization 32 Latex 52 Manim 205 KB 1 Numeric 19 Programming Web 289 Unicode 504 HTML 66 CSS 65 SVG 46 ASP.NET 270 OS 429 DeskTop 7 Python 72 Knowledge Mathematics Formulas 8 Set 1 Logic 1 Algebra 84 Number Theory 206 Trigonometry 31 Geometry 34 Calculus 67 Engineering Tables 8 Mechanical Rigid Bodies Statics 92 Dynamics 37 Fluid 5 Control Acoustics 19 Natural Sciences Matter 1 Electric 27 Biology 1 |
Copyright © 2000-2024 Sideway . All rights reserved Disclaimers last modified on 06 September 2019