Sunday, August 31, 2008

unicode and html entities in javascript


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>test</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
<script type="text/javascript" charset="utf-8">
var three = "叁";
// 单字符计算,多字符也可用此进行计算
var code = three.charCodeAt(0).toString(16); // 10进制转16进制
document.write(code);
document.write("<br/>");
document.write("&#x" + code + ";");
var decimal = parseInt(code, 16); // 16进制转10进制
document.write("<br/>");
document.write(decimal);
document.write("<br/>");
var san = String.fromCharCode(decimal);
document.write(san);
document.write("<br/>");
document.write("<br/>");

// javascript 支持unicode直接量
var unicode = "\u53c1"; // different from string "\\u" + "53c1"
document.write("<br/>");
document.write(unicode);
document.write("<br/>");
document.write("<br/>");

// 多字符转换,利用escape/unescape()方法进行转换,不推荐
var PRC = "中华人民共和国";
var entityChar = PRC.replace(/[^\u0000-\u00FF]/g, function($0){
return escape($0).replace(/(?:%u)(\w{4})/gi, "&#x$1;")
});
document.write("<br/>");
document.write(entityChar); // "中华人民共和国"
document.write("<br/>");
var origin = unescape(entityChar.replace(/&#x/g, '%u').replace(/;/g, ''));
document.write(origin);
</script>
</body>
</html>

unicode编码规则:
unicode码对每一个字符用4位16进制数表示。具体规则是:将一个字符(char)的高8位与低8位分别取出,转化为16进制数,
如果转化的16进制数的长度不足2位,则在其后补0,然后将高、低8位转成的16进制字符串拼接起来并在前面补上"\u"即可。

No comments :