Hope ãæ’âã£æ’â¬ã£æ’â¼ Blue Lakes Rd Markleeville Ca 96120 ã‚â¢ã£æ’â¡ã£æ’âªã£â€šâ«ã¥âë†ã¨â¡â€ å›â½
UTF-8 Encoding Debugging Chart
Here is a Encoding Problem Chart that aids in debugging common UTF-8 character encoding problems. See these 3 typical problem scenarios that the chart can help with.
- Encoding Problem 1: Treating UTF-8 Bytes as Windows-1252 or ISO-8859-1
- Encoding Problem 2: Incorrect Double Mis-Conversion
- Encoding Problem 3: ISO-8859-1 vs Windows-1252
Debugging Chart Mapping Windows-1252 Characters to UTF-8 Bytes to Latin-1 Characters
The following chart shows the characters in Windows-1252 from 128 to 255 (hex 80 to FF). The Unicode code point for each character is listed and the hex values for each of the bytes in the UTF-8 encoding for the same characters. These UTF-8 bytes are also displayed as if they were Windows-1252 characters. You can use this chart to debug problems where these sequences of Latin characters occur, where only one character was expected. If you match the sequence that occurs to the sequence in the chart, and the expected value in the chart matches the value that you expected to see, then the problem is being caused by UTF-8 bytes being interpreted as Windows-1252 (or ISO 8859-1) bytes. See Encoding Problem: Treating UTF-8 Bytes as Windows-1252 or ISO-8859-1 for a more detailed explanation.
Code Point | Characters | UTF-8 Bytes | Code Point | Characters | UTF-8 Bytes | |||||
---|---|---|---|---|---|---|---|---|---|---|
Unicode | Windows 1252 | Expected | Actual | Unicode | Windows 1252 | Expected | Actual | |||
U+20AC | 0x80 | € | € | %E2 %82 %AC | U+00C0 | 0xC0 | À | À | %C3 %80 | |
0x81 | U+00C1 | 0xC1 | Á | Ã | %C3 %81 | |||||
U+201A | 0x82 | ‚ | ‚ | %E2 %80 %9A | U+00C2 | 0xC2 |  | Â | %C3 %82 | |
U+0192 | 0x83 | ƒ | Æ' | %C6 %92 | U+00C3 | 0xC3 | à | Ã | %C3 %83 | |
U+201E | 0x84 | „ | „ | %E2 %80 %9E | U+00C4 | 0xC4 | Ä | Ä | %C3 %84 | |
U+2026 | 0x85 | … | … | %E2 %80 %A6 | U+00C5 | 0xC5 | Å | Ã… | %C3 %85 | |
U+2020 | 0x86 | † | †| %E2 %80 %A0 | U+00C6 | 0xC6 | Æ | Æ | %C3 %86 | |
U+2021 | 0x87 | ‡ | ‡ | %E2 %80 %A1 | U+00C7 | 0xC7 | Ç | Ç | %C3 %87 | |
U+02C6 | 0x88 | ˆ | ˆ | %CB %86 | U+00C8 | 0xC8 | È | È | %C3 %88 | |
U+2030 | 0x89 | ‰ | ‰ | %E2 %80 %B0 | U+00C9 | 0xC9 | É | É | %C3 %89 | |
U+0160 | 0x8A | Š | Å | %C5 %A0 | U+00CA | 0xCA | Ê | Ê | %C3 %8A | |
U+2039 | 0x8B | ‹ | ‹ | %E2 %80 %B9 | U+00CB | 0xCB | Ë | Ë | %C3 %8B | |
U+0152 | 0x8C | Œ | Å' | %C5 %92 | U+00CC | 0xCC | Ì | ÃŒ | %C3 %8C | |
0x8D | U+00CD | 0xCD | Í | Ã | %C3 %8D | |||||
U+017D | 0x8E | Ž | Ž | %C5 %BD | U+00CE | 0xCE | Î | ÃŽ | %C3 %8E | |
0x8F | U+00CF | 0xCF | Ï | Ã | %C3 %8F | |||||
0x90 | U+00D0 | 0xD0 | Ð | Ã | %C3 %90 | |||||
U+2018 | 0x91 | ' | ‘ | %E2 %80 %98 | U+00D1 | 0xD1 | Ñ | Ã' | %C3 %91 | |
U+2019 | 0x92 | ' | ’ | %E2 %80 %99 | U+00D2 | 0xD2 | Ò | Ã' | %C3 %92 | |
U+201C | 0x93 | " | “ | %E2 %80 %9C | U+00D3 | 0xD3 | Ó | Ã" | %C3 %93 | |
U+201D | 0x94 | " | †| %E2 %80 %9D | U+00D4 | 0xD4 | Ô | Ã" | %C3 %94 | |
U+2022 | 0x95 | • | • | %E2 %80 %A2 | U+00D5 | 0xD5 | Õ | Õ | %C3 %95 | |
U+2013 | 0x96 | – | â€" | %E2 %80 %93 | U+00D6 | 0xD6 | Ö | Ö | %C3 %96 | |
U+2014 | 0x97 | — | â€" | %E2 %80 %94 | U+00D7 | 0xD7 | × | × | %C3 %97 | |
U+02DC | 0x98 | ˜ | Ëœ | %CB %9C | U+00D8 | 0xD8 | Ø | Ø | %C3 %98 | |
U+2122 | 0x99 | ™ | â„¢ | %E2 %84 %A2 | U+00D9 | 0xD9 | Ù | Ù | %C3 %99 | |
U+0161 | 0x9A | š | Å¡ | %C5 %A1 | U+00DA | 0xDA | Ú | Ú | %C3 %9A | |
U+203A | 0x9B | › | › | %E2 %80 %BA | U+00DB | 0xDB | Û | Û | %C3 %9B | |
U+0153 | 0x9C | œ | Å" | %C5 %93 | U+00DC | 0xDC | Ü | Ãœ | %C3 %9C | |
0x9D | U+00DD | 0xDD | Ý | Ã | %C3 %9D | |||||
U+017E | 0x9E | ž | ž | %C5 %BE | U+00DE | 0xDE | Þ | Þ | %C3 %9E | |
U+0178 | 0x9F | Ÿ | Ÿ | %C5 %B8 | U+00DF | 0xDF | ß | ß | %C3 %9F | |
U+00A0 | 0xA0 | Â | %C2 %A0 | U+00E0 | 0xE0 | à | Ã | %C3 %A0 | ||
U+00A1 | 0xA1 | ¡ | ¡ | %C2 %A1 | U+00E1 | 0xE1 | á | á | %C3 %A1 | |
U+00A2 | 0xA2 | ¢ | ¢ | %C2 %A2 | U+00E2 | 0xE2 | â | â | %C3 %A2 | |
U+00A3 | 0xA3 | £ | £ | %C2 %A3 | U+00E3 | 0xE3 | ã | ã | %C3 %A3 | |
U+00A4 | 0xA4 | ¤ | ¤ | %C2 %A4 | U+00E4 | 0xE4 | ä | ä | %C3 %A4 | |
U+00A5 | 0xA5 | ¥ | Â¥ | %C2 %A5 | U+00E5 | 0xE5 | å | Ã¥ | %C3 %A5 | |
U+00A6 | 0xA6 | ¦ | ¦ | %C2 %A6 | U+00E6 | 0xE6 | æ | æ | %C3 %A6 | |
U+00A7 | 0xA7 | § | § | %C2 %A7 | U+00E7 | 0xE7 | ç | ç | %C3 %A7 | |
U+00A8 | 0xA8 | ¨ | ¨ | %C2 %A8 | U+00E8 | 0xE8 | è | è | %C3 %A8 | |
U+00A9 | 0xA9 | © | © | %C2 %A9 | U+00E9 | 0xE9 | é | é | %C3 %A9 | |
U+00AA | 0xAA | ª | ª | %C2 %AA | U+00EA | 0xEA | ê | ê | %C3 %AA | |
U+00AB | 0xAB | « | « | %C2 %AB | U+00EB | 0xEB | ë | ë | %C3 %AB | |
U+00AC | 0xAC | ¬ | ¬ | %C2 %AC | U+00EC | 0xEC | ì | ì | %C3 %AC | |
U+00AD | 0xAD | | Â | %C2 %AD | U+00ED | 0xED | í | Ã | %C3 %AD | |
U+00AE | 0xAE | ® | ® | %C2 %AE | U+00EE | 0xEE | î | î | %C3 %AE | |
U+00AF | 0xAF | ¯ | ¯ | %C2 %AF | U+00EF | 0xEF | ï | ï | %C3 %AF | |
U+00B0 | 0xB0 | ° | ° | %C2 %B0 | U+00F0 | 0xF0 | ð | ð | %C3 %B0 | |
U+00B1 | 0xB1 | ± | ± | %C2 %B1 | U+00F1 | 0xF1 | ñ | ñ | %C3 %B1 | |
U+00B2 | 0xB2 | ² | ² | %C2 %B2 | U+00F2 | 0xF2 | ò | ò | %C3 %B2 | |
U+00B3 | 0xB3 | ³ | ³ | %C2 %B3 | U+00F3 | 0xF3 | ó | ó | %C3 %B3 | |
U+00B4 | 0xB4 | ´ | ´ | %C2 %B4 | U+00F4 | 0xF4 | ô | ô | %C3 %B4 | |
U+00B5 | 0xB5 | µ | µ | %C2 %B5 | U+00F5 | 0xF5 | õ | õ | %C3 %B5 | |
U+00B6 | 0xB6 | ¶ | ¶ | %C2 %B6 | U+00F6 | 0xF6 | ö | ö | %C3 %B6 | |
U+00B7 | 0xB7 | · | · | %C2 %B7 | U+00F7 | 0xF7 | ÷ | ÷ | %C3 %B7 | |
U+00B8 | 0xB8 | ¸ | ¸ | %C2 %B8 | U+00F8 | 0xF8 | ø | ø | %C3 %B8 | |
U+00B9 | 0xB9 | ¹ | ¹ | %C2 %B9 | U+00F9 | 0xF9 | ù | ù | %C3 %B9 | |
U+00BA | 0xBA | º | º | %C2 %BA | U+00FA | 0xFA | ú | ú | %C3 %BA | |
U+00BB | 0xBB | » | » | %C2 %BB | U+00FB | 0xFB | û | û | %C3 %BB | |
U+00BC | 0xBC | ¼ | ¼ | %C2 %BC | U+00FC | 0xFC | ü | ü | %C3 %BC | |
U+00BD | 0xBD | ½ | ½ | %C2 %BD | U+00FD | 0xFD | ý | ý | %C3 %BD | |
U+00BE | 0xBE | ¾ | ¾ | %C2 %BE | U+00FE | 0xFE | þ | þ | %C3 %BE | |
U+00BF | 0xBF | ¿ | ¿ | %C2 %BF | U+00FF | 0xFF | ÿ | ÿ | %C3 %BF |
Copyright © 2011 Tex Texin. All rights reserved.
return to top
heringtondayinceds.blogspot.com
Source: http://i18nqa.com/debug/utf8-debug.html