Edwinux: [HOWTO] Convert Simplified Chinese to Traditional Chinese and vice-versa (繁轉簡)

Friday, September 24, 2010

[HOWTO] Convert Simplified Chinese to Traditional Chinese and vice-versa (繁轉簡)

Now, I am implementing a program that need to have a database pre-stored some data. I grep the data from internet but those are simplified Chinese. I need to convert them into traditional Chinese. This can be done using one command

$ iconv simplified_chinese_input.txt -f utf8 -t gb2312 | iconv -f gb2312  -t big5 | iconv -f big5  -t utf8 -o traditional_chinese_output.txt

This involves three steps actually,
1. This first convert the text file from UTF8 to GB2312 (Simplied Chinese) 2. Then, convert the GB2312(Simplied Chinese) to Big5 Encoding (Traditional Chinese) 3. Finally, convert the Big5 to UTF8 text file

In fact, you can change the order so that it can convert traditional Chinese to simplified Chinese

$ iconv traditional_chinese_input.txt -f utf8 -t big5 | iconv -f big5 -t gb2312 | iconv -f gb2312 -t utf8 -o simplified_chinese_output.txt

If you need to do this many times, you can store it as a Shell Script named S2T.sh with following content.

#!/usr/bin/sh
iconv $1 -f utf8 -t gb2312 | iconv -f gb2312  -t big5 | iconv -f big5 -t utf8 -o $2

Then, set it to be executable

$ chmod u+x S2T.sh

Finally, use it with

$ ./S2T.sh input.txt output.txt

Hope it helps.

1 comment:

Wang XuancongAugust 13, 2014 at 10:19 AM
Using iconv to do the job is error-prune, for example, "非常强烈" will become "非常烈"
ReplyDelete
Replies

Edwinux

Friday, September 24, 2010

[HOWTO] Convert Simplified Chinese to Traditional Chinese and vice-versa (繁轉簡)

1 comment:

Labels

About Me