Friday, September 24, 2010

[HOWTO] Convert Simplified Chinese to Traditional Chinese and vice-versa (繁轉簡)

Now, I am implementing a program that need to have a database pre-stored some data. I grep the data from internet but those are simplified Chinese. I need to convert them into traditional Chinese. This can be done using one command

$ iconv simplified_chinese_input.txt -f utf8 -t gb2312 | iconv -f gb2312  -t big5 | iconv -f big5  -t utf8 -o traditional_chinese_output.txt


This involves three steps actually,
1. This first convert the text file from UTF8 to GB2312 (Simplied Chinese) 2. Then, convert the GB2312(Simplied Chinese) to Big5 Encoding (Traditional Chinese) 3. Finally, convert the Big5 to UTF8 text file

In fact, you can change the order so that it can convert traditional Chinese to simplified Chinese
$ iconv traditional_chinese_input.txt -f utf8 -t big5 | iconv -f big5 -t gb2312 | iconv -f gb2312 -t utf8 -o simplified_chinese_output.txt


If you need to do this many times, you can store it as a Shell Script named S2T.sh with following content.
#!/usr/bin/sh
iconv $1 -f utf8 -t gb2312 | iconv -f gb2312 -t big5 | iconv -f big5 -t utf8 -o $2


Then, set it to be executable
$ chmod u+x S2T.sh


Finally, use it with
$ ./S2T.sh input.txt output.txt


Hope it helps.

1 comment:

  1. Using iconv to do the job is error-prune, for example, "非常强烈" will become "非常烈"

    ReplyDelete