Messy code issue
Reason
Encode
Decode
Lack of a font library
Analysis phenomenon
- Caused by encoding
In English Windows, u create a txt, type and save “你好”. Then u will see “??” after u open it.
Reason:
Windows uses ANSI encode by default, and locale of Ewin is English, which mapping codepage 437 as the encode way is ISO-8859-1. This cause all chinese symbols will be mapping “3F3F” as encode result. And 3F reach “?”.Solution:
No decode way could display that right characters. So we should choose the right encode way when we save double byte character doc such as GB2312 or UTF-8 as simple chinese while BIG5 or UTF-8 in complex chinese. For chinese user, changing the locale to Chinese also a good idea.
- Caused by decoding
Create a txt with “你好”, and copy it to Ewin. Then open it and get the error.
Reason:
Cwin create txt used ANSI as GB2312, and after copy it to Ewin, notepad will use ISO-8859-1 as decode way.Solution:
Select the right decode method.
- Caused by application function.
Open the uedit32.exe(cn version) and get the messy code.
Reason: Windows will use Unicode if the application support Unicode or use the ANSI(Which means as the country decided standard encode method)
Solution: Edit the Regional and language options: set the
standard and format
andnon-Unicode
as simple chinese. Then the system will decode use ANSI.
- Caused by lack of font
Open file and get square symbol.
Reason: From binary byte sequence to code point, then to character which is found from font library. Then show as lattice on the screen. If not fonud, then use square to replace it.
Solution: Setup the library.
Think in coding
I/O operation: read is decode(byte->character) while write is encode(character->byte)
Here is the java I/O interface:
When we use Writer and FileOutputStream:
- String.getBytes.
String.getBytes(): Encodes this String into a sequence of bytes using the platform’s default charset(Charset.defaultCharset(), which is decided by system attribute file.encoding), storing the result into a new byte array.
Note: if use do not set the jvm’s file.encoding, it will depend on the environment which start the JVM: If cmd, then use regional language while eclipse could set this attribute.
List[1]. String.getBytes() display messy code
1 | public static void main(String[] args) { |
List[2].outputStreamWrite to set character library
1 | private static void writeErrorWithCharSet(String a_error) { |
To avoid messy code issue, when call the I/O api, u had better to use the overload format with pointing library args.
Web Application
Reason:
Browser not followed the URI encode standard. Server not config the encode and decode. Devloper’s error.
GET method: encode the non-ASCII character by urlencode.
域名:端口/contextPath/servletPath/pathInfo?queryString
PathInfo and queryString will depend on the server. Tomcat always set them on the server.xml, pathInfo part decode character library is defined on the connector’s
To avoid the encode which we do not want, we had better use ASCII only(or urlencode first) on the url.
- Post method: Browser will check the contentType(“text/html;charset=utf-8”) then encode form by using it.
<%@ page language="java" contentType="text/html; charset="GB18030" pageEncoding="UTF-8"%>
pageEncoding is how to save the jsp file.
list[3] POST request set setContentType
1 | protected void doPost(HttpServletRequest request, HttpServletResponse |
JSP, use post method to do request
1 | <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> |
- Browser display: Chrome use jsp contentType and charset while firefox use text encoding.
- For jsp(html): jsp will saved as pageEncoding, if not ponit it, then use charset, if not charset, then as default ISO-8859-1. Charset reponse for notify the browser how to decode web page.
- For dynamic: Server use HttpServletResponse.setContentType to set http header's contentType.
File name be messy code when downloading
Reason: Header only support ASCII library, and encode other character to 3F(?
)
Solution: urlEncode.encode(filename, charset) at first, then put it on the header.
list[4]
1 | protected void doGet(HttpServletRequest request, HttpServletResponse |
DataBase operation
Bridge: Unicode
Server database, client system, client environment varible.
Create databse using utf-8, and SQL NCHAR could solve the multi-language issues.
Deep in analyzing the web request
- Post title:Messy code issue
- Post author:ReZero
- Create time:2018-01-23 16:55:51
- Post link:https://rezeros.github.io/2018/01/23/messy-code/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.