全形/半形轉換

從BBS上看到有鄉民問如何將 全形小寫如何轉成全形大寫 (文章已不存在 一皿一!!)

問題分析

要處理這個問題,必需先解決「全形/半形」轉換的問題,目前看來有兩種方式:一種是透過字元字碼相對位置,計算差值補位後再轉換,另外一種就是直接建立半形/全形轉換表。接下來只要將全形英數轉半形後,轉大寫,再轉為全形即可達成目的。

解決方法

字碼轉換法

從觀察 wiki 的 ASCII 字元的半形/全形對照碼表 發現,除了空白字元之外,其它字元的相對位置似乎都一樣。
所以只要計算字元全形與半形之間的差值,就可以針對每個字元進行補位轉換。

不過在開始之前,試著用下面的程式將全形部分全部輸出來做個檢查。

1
2
3
4
$hexdec = hexdec(bin2hex("!"));
for($i=$hexdec; $i<($hexdec+256); $i++) {
echo hex2bin(dechex($i)) . "<br />\n";
}

注意,如果 PHP 版本小於 5.4.0,必需自行處理 hex2bin 函式如下

1
2
3
4
5
6
7
function hex2bin($data) {
$binary = '';
for($i=0;$i<strlen($data);$i+=2) {
$binary .= pack("C",hexdec(substr($data,$i,2)));
}
return $binary;
}

從輸出的結果看來,並非像 wiki 文件順序一樣。在全形的 _ 之間,不知哪來 192 個亂碼字元。
換句話說,從字元 _ 之後的半形字元字碼,都要額外補加 192 後,才會剛好對應到全形的字元。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/**
* ASCII 字元自動全形/半形轉換 (字碼補位法)
*
* @authro LIAO SAN-KAI
*
* @param string $char 欲轉換的 ASCII 字元
* @param string $width 字形模式 half|full|auto (半形|全形|自動)
* @return string 轉換後的對應字元
*/
function shiftSpace($char=null, $width='auto') {
//取得當前字元的16進位值
$charHex = hexdec(bin2hex($char));
//判斷當前字元為半形或全形
$charWidth = ($char == ' ' or ($charHex >= hexdec(bin2hex('!')) and $charHex <= hexdec(bin2hex ('~')))) ? 'full' : 'half';
//如果字元字形與指定字形一樣,就直接回傳
if($charWidth == $width) {
return $char;
}
//如果是空白字元就直接比對轉換回傳
if($char === ' ' ) {
return ' ';
} elseif($char === ' ') {
return ' ';
}
//計算 ASCII 字元16進位的unicode差值
$diff = abs(hexdec(bin2hex ('!')) - hexdec(bin2hex ('!')));
//計算字元"_"之後的半形字元修正值(192)
$fix = abs(hexdec(bin2hex ('_')) - hexdec(bin2hex ('`'))) - 1;
//全形/半形轉換
if($charWidth == 'full'){
$charHex = $charHex - (($charHex > hexdec(bin2hex('_'))) ? $diff + $fix : $diff);
} else {
$charHex = $charHex + (($charHex > hexdec(bin2hex('_'))) ? $diff + $fix : $diff);
}
return hex2bin(dechex($charHex));
}

注意,我特定為這個函式提供第二個參數,可以讓字元在轉換時,略忽原本就已完成字形模式。這樣的好處是可以處理整個字串的轉換。例如下面的範例:

1
2
3
4
5
6
7
8
9
10
11
12
13
function strShiftSpace($string='', $width='auto') {
$str = null;
for($i=0; $i<mb_strlen($string,'UTF-8'); $i++) {
$char = mb_substr($string,$i,1,'UTF-8');
$str.= shiftSpace($char, $width);
}
return $str;
}
echo strShiftSpace('Hello World','full');//輸出 Hello World
echo strShiftSpace('Hello World','half');//輸出 Hello World
echo strShiftSpace('Hello World','auto');//輸出 Hello World
echo strShiftSpace('Hello World');//輸出 Hello World

字碼查表法

字碼查表的方法相較起來就沒那麼複雜,只要手工刻出對照表,就能夠立即使用。而且在理論上,這樣執行的效能也比字碼轉換來的高。不過前提是轉換的字元數量還在可以接受的範圍,不然可能會刻到吐血。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
/**
* ASCII 字元自動全形/半形轉換 (手刻查表法)
*
* @authro LIAO SAN-KAI
*
* @param string $char 欲轉換的 ASCII 字元
* @param string $width 字形模式 half|full|auto (半形|全形|自動)
* @return string 轉換後的對應字元
*/
function shiftSpaceTable($char=null, $width='auto') {
//手刻對應表
$charTable = array(
" " => " ",
"!" => "!",
""" => "\"",
"#" => "#",
"$" => "$",
"%" => "%",
"&" => "&",
"'" => "'",
"(" => "(",
")" => ")",
"*" => "*",
"+" => "+",
"," => ",",
"-" => "-",
"." => ".",
"/" => "/",
"0" => "0",
"1" => "1",
"2" => "2",
"3" => "3",
"4" => "4",
"5" => "5",
"6" => "6",
"7" => "7",
"8" => "8",
"9" => "9",
":" => ":",
";" => ";",
"<" => "<",
"=" => "=",
">" => ">",
"?" => "?",
"@" => "@",
"A" => "A",
"B" => "B",
"C" => "C",
"D" => "D",
"E" => "E",
"F" => "F",
"G" => "G",
"H" => "H",
"I" => "I",
"J" => "J",
"K" => "K",
"L" => "L",
"M" => "M",
"N" => "N",
"O" => "O",
"P" => "P",
"Q" => "Q",
"R" => "R",
"S" => "S",
"T" => "T",
"U" => "U",
"V" => "V",
"W" => "W",
"X" => "X",
"Y" => "Y",
"Z" => "Z",
"[" => "[",
"\" => "\\",
"]" => "]",
"^" => "^",
"_" => "_",
"`" => "`",
"a" => "a",
"b" => "b",
"c" => "c",
"d" => "d",
"e" => "e",
"f" => "f",
"g" => "g",
"h" => "h",
"i" => "i",
"j" => "j",
"k" => "k",
"l" => "l",
"m" => "m",
"n" => "n",
"o" => "o",
"p" => "p",
"q" => "q",
"r" => "r",
"s" => "s",
"t" => "t",
"u" => "u",
"v" => "v",
"w" => "w",
"x" => "x",
"y" => "y",
"z" => "z",
"{" => "{",
"|" => "|",
"}" => "}",
"~" => "~",
);
//判斷當前字元為半形或全形
$charWidth = array_key_exists($char,$charTable) ? 'full' : 'half';
//如果字元字形與指定字形一樣,就直接回傳
if($charWidth == $width) {
return $char;
}
//如果是要轉半形,反轉對應表
if($charWidth == 'half') {
$charTable = array_flip($charTable);
}
return $charTable[$char];
}

我試著模擬鄉民 tas72732002 問題的實際狀況,而且這次改用字碼查表法的函式來解決。

1
2
3
4
5
6
7
8
9
10
11
12
$halfstr = 'I am halfwidth words';
$fullstr = null;
for($i=0; $i<mb_strlen($halfstr,'UTF-8'); $i++) {
$char = mb_substr($halfstr,$i,1,'UTF-8');
//先轉成半形
$lowerchar = shiftSpaceTable($char, 'half');
//轉大寫
$supperchar = strtoupper($lowerchar);
//再轉全形
$fullstr .= shiftSpaceTable($supperchar, 'full');
}
echo $fullstr;//輸出 I AM HALFWIDTH WORDS

目錄

  1. 1. 問題分析
  2. 2. 解決方法
    1. 2.1. 字碼轉換法
    2. 2.2. 字碼查表法