Chinese input methods for Emacs

Sunday, 8th February, 2009

From the llaisdy blog archives.

This all feels a bit antiquated now I’m working on a Mac (I use the Mac input methods instead of the emacs input methods), but it’s useful whenever I have to work on a Windows machine. The notes below are not complete and I’d appreciate any comments to help fill in the gaps.

Contents

Introduction
Methods overview
Methods in detail
Conclusion
References

Introduction

Emacs provides 25 input methods for Chinese. Although each input method has its own describe-input-method page, these pages can be rather terse. There is also no overview or comparison between the different input-methods, neither have I been able to find one on the web.

Here I have gathered together the information I’ve been able to find. I’d be pleased to hear about any errors I’ve made, and where I can find further information to correct my omissions. I’ll keep this page up-to-date.

I’m learning Mandarin Chinese, I’m interested in simplified script, and for the moment I find a pinyin-based approach to the written language easiest. For my own current requirements, chinese-tonepy is fitting the bill, but I’m interested in learning a structural input method (i.e., not based on pronunciation). See the Conclusion for further discussion.

Methods overview

Table: Chinese input methods provided by Emacs

Method name	Method name
chinese-4corner	chinese-array30
chinese-b5-quick	chinese-b5-tsangchi
chinese-ccdospy
chinese-cns-quick	chinese-cns-tsangchi
chinese-ctlau	chinese-ctlaub
chinese-ecdict	chinese-etzy
chinese-punct	chinese-punct-b5
chinese-py	chinese-py-b5
chinese-py-punct	chinese-py-punct-b5
chinese-qj	chinese-qj-b5
chinese-sisheng	chinese-sw
chinese-tonepy	chinese-tonepy-punct
chinese-ziranma	chinese-zozy

The basic idea common to all these input methods is that you type in a key sequence, bringing up a manu of options, from which you chose the character you want.

For example, in chinese-tonepy, typing in ‘hao3’ brings up this menu in the minibuffer window:

echimp11

If you want hao3 = ‘good’, choose option 2 (by typing 2 or clicking on the menu). Character 好 appears at point.

At any point in any of the input methods you can press <tab> for a full tree-list of the options available to you from that point, e.g.:

echimp21

punct

Some input methods have versions with or without -punct (see the table above). The -punct versions support proper Chinese punctuation characters. However, (a) although chinese-punct works, chinese-py-punct (& poss. others) doesn’t seem to; (b) versions without -punct use ascii punctuation, which meets my needs for the moment.

In the discussion below I ignore the -punct versions.

b5

Some input methods have versions with or without -b5. The -b5 input methods generate characters in the Big Five character set. Big Five is a Taiwanese character set for traditional characters. Other input methods support GB2312, which is a character set from the People’s Republic of China, for simplified characters. For example:

Input Method	Input	Output
chinese-py-b5	guo2/1 yu3/2	國語
chinese-tonepy	guo2/1 yu3/7	国语

For the full skinny on character sets see Lunde (1999).

I presume that, apart from chinese-b5-quick and chinese-b5-tsangchi, the -b5 input methods work the same way as the non-b5 methods, but just output Big Five (i.e., traditional script) instead of GB2312 (i.e., simplified script). Consequently, in the following discussion I ignore the -b5 versions (apart from chinese-b5-quick and chinese-b5-tsangchi).

Methods in detail

For each method I’ll give the character set (if not GB2312), and, as a simple illustration of use, how to generate 你好吗 ( nǐ hǎo ma = are you well?).

chinese-4corner

No description in description!

The Wikipedia provides a description [http://en.wikipedia.org/wiki/Four_corner_method]. You use four digits to describe the four corners of a character (top left, top right, bottom left, bottom right).

Quoting the Wikipedia article:

Digit	Meaning
1	a horizontal stroke,
2	a vertical or diagonal stroke,
3	a dot stroke,
4	two strokes in a cross shape,
5	three or more strokes in which one stroke intersects all others,
6	a box-shape,
7	where a stroke turns a corner,
8	the shape of the Chinese character 八 and its inverted form, and
9	the shape of the Chinese character 小 and its inverted form.
0	where there is either nothing in a corner, the part in a corner is already represented by a previous corner, or where a corner has a dot stroke followed by a horizontal stroke

Usage example:

Char	Digits	Interpretation
你	2729/2	Diagonal; Corner; Vertical; 小; ‘2’?
好	47447/1	Cross; Corner; Cross; Cross?; option 1
吗	???	???

To be honest, I cheated here: the Wiktionary gives four corner codes for Chinese characters (e.g., http://en.wiktionary.org/wiki/好).

chinese-array30

Outputs Big Five.

Some docs:

http://www.array.com.tw/ (in Chinese)
http://openvanilla.org/ (partly in Chinese)
scim-array: Array 30 Input Method Engine for SCIM (in English).

Seems to be quite popular, and the MS Windows Chinese input method seems to use it. However, I can’t make head or tail of it. Also, it outputs Big Five.

chinese-b5-tsangchi

This is a Taiwanese method based on geometrical decomposition of characters. See: http://en.wikipedia.org/wiki/Cangjie_method.

Char	Input	Interpretation
你	onf	O = 人 (LHS); N = hook (top of RHS); F = 火 (bottom of RHS)
好	vnd	V = 女 (LHS); N = hook (top of RHS); D = 木 wood (?)
嗎	rsqf	R = 口 (LHS); S = 尸 (L & top of RHS); Q = 手 (next top RHS); F = 火 (bottom of RHS)

Again, I got these from the Wiktionary, but I can understand how they were made up. Notice that ‘ma’ is the traditional 嗎 and not the simplified 吗.

chinese-b5-quick

This looks like it should be a ‘quick’ version of b5-tsangchi. Indeed it takes fewer keystrokes to get to an end character – but I can’t find the end character I want. In other words, I don’t know how it uses the Tsangchi system.

chinese-ccdospy

From the description:

This input method works almost the same way as chinese-py_.  The
difference is that you type a single key for these Pinyin spelling.
    Pinyin:  zh  en  eng ang ch  an  ao  ai  ong sh  ing  yu(ü)
    keyseq:   a   f   g   h   i   j   k   l   s   u   y   v

Char	Input	Interpretation
你	ni7	7th option
好	hk6	6th "
吗	ma9	9th "

chinese-cns-tsangchi

Probably the same as chinese-b5-tsangchi, but outputing a CNS (Chinese National Standards) character set instead of Big Five (possibly CNS 11643-1992). My Emacs hasn’t got the fonts for it.

chinese-cns-quick

See chinese-cns-tsangchi and chinese-b5-quick.

chinese-ctlau

An input method based on Sidney Lau’s Romanisation system. (a) it’s for Cantonese; (b) it generates Big5; (c) I can’t get it to work.

chinese-ctlaub

See chinese-ctlau.

chinese-ecdict

Pretty impressive. You type in the English word and the Chinese (Big5) word appears! Wow!

Char	Input	Interpretation
香蕉	banana	banana
冰箱	refrigerator	refrigerator
你	you	you
良好的	good	good

Impressive but:

notice that ‘good’ doesn’t actually give us hao3 (好), it gives us liang2 hao3 de (良好的). 良好 still means ‘good’ but 的 is a connector making this adjective ready to add on to a noun.
I don’t know how I would get a grammatical particle like 吗 (ma).
This will be handy for emergencies but I think I’ll keep a dictionary around too.

chinese-etzy

A Zhuyin input method with Big5 output.

chinese-py

Type in pinyin, without tones.

Char	Input	Interpretation
你	ni1	1st menu option
好	hao	guessed correctly
吗	ma2	2nd option

chinese-qj

Possibly another Zhuyin input method. The description has virtually no information.

chinese-sisheng

This will be quite useful. This input method does not generate Hanzi, but it transforms pinyin with tone numbers into ‘proper’ pinyin with diacritics.

Char	Input	Interpretation
nǐ	ni3	你
hǎo	hao3	好
ma	ma	吗
nǚ	nv3	女

chinese-sw

Type in a pair of radicals. Quite clever and fast (and not pronunciation-based), easy to pick up. Usefulness will depend on coverage.

Char	Input	Interpretation
你	kv4	亻+ 小 + 4th option
[好]	???	can’t find it
吗	fd9	口 + 刀 + 9th option

chinese-tonepy

Type in pinyin, with tones.

Char	Input	Interpretation
你	ni3
好	hao32	2nd menu option
吗	ma5

n.b.: chinese-py-b5 is the same as this method, but outputs Big Five i.e., traditional characters. A useful alternative to chinese-tonepy for when I want to write Trad.

chinese-ziranma

A pinyin-based input method where the pinyin initials and finals are mapped onto the qwerty keyboard (see http://eyegene.ophthy.med.umich.edu/unicode/KeyboardLayouts/ZiRanMaPinYinKeyboard.html).

The basic idea is a four-stroke pattern: initial, final, tone, quote (‘). This may not be that impressive for, say, 好 (hke’), but the pattern extends to cover words, not just characters. That means that two-, three- or four- character words can be input with the same four-stroke pattern. The example from the description is 北京电视台 (bjdt) (Běijīng diànshìtái; Beijing TV station), which is impressive for four keystrokes.

Char	Input	Interpretation
你	n	guessed correctly
好	hke’
吗	mae’5	5th option

chinese-zozy

A Big5 Zhuyin input method.

Conclusion

chinese-tonepy is looking the best for now for most uses (see also chinese-ccdospy, chinese-ziranma).

chinese-sisheng will be useful for writing pinyin.

I think Zhuyin is probably in my future if/when I start learning Chinese seriously, but unfortunately both the Emacs Zhuyin output Big5 (i.e. traditional script). How much is Zhuyin used in the People’s Republic?

It would be useful to know a method not based on pronunciation. I might not always know how to say what I want to write – eg "How do you say X?" Emacs’ structural input methods generating GB characters are:

chinese-4corner
chinese-qj (maybe)
chinese-sw

I’ll explore these, and update this page as I found out more.

References

Lunde, K. (1999). CJKV Information Processing. Sebastopol, CA.: O’Reilly.

http://en.wikipedia.org/wiki/Chinese_input_methods_for_computers

Posted by llaisdy
Filed in 中文
Tags: emacs, 中文

12 Comments »

12 Responses to “Chinese input methods for Emacs”

Max Says:

Wednesday, 23rd December, 2009 at 6:12 pm
Thanks for this thorough survey of available input methods. My personal favorite is ZiRanMa, because it works so quickly when you a familiar with pinyin and when you are comfortable with a qwerty keyboard.

Reply
llaisdy Says:

Wednesday, 23rd December, 2009 at 7:08 pm
Dear Max

Thanks for your comment!

I like the sound of ZiRanMa, but my Chinese is nowhere near good enough yet for it to stand out. Next year!

Reply
llaisdy Says:

Wednesday, 23rd December, 2009 at 7:42 pm
Come to think of it, ZiRanMa does offer more than the Mac input methods.

Reply
Max Says:

Wednesday, 23rd December, 2009 at 8:23 pm
About ZiRanMa: It took me a few days to remember the mapping of vowel-endings on a qwerty keyboard (I had to print out the diagram and leave it next to the keyboard for a while). But once I knew them, it became very fast to type. I’m expecting an even higher speed gain once I’m familiar with the mapping for compound words.

I’m curious about the four-corner method. It sounds intriguing but I have yet to take the time to try it out.

Reply
David Says:

Sunday, 4th December, 2011 at 2:02 pm
Very useful overview, thx.

The punct versions chinese-tonepy-punct and chinese-py-punct are working fine for me though. But one has to remember to enter “v” before entering the punctuation.

Reply
- llaisdy Says:
  
  Sunday, 4th December, 2011 at 2:35 pm
  Aha! Thanks for the tip! Yes, it works for me too now.
  
  Thanks for your comment.
  
  Do you have documentation for any of these input methods?
  
  Reply
  - David Says:
    
    Sunday, 4th December, 2011 at 2:43 pm
    Probably not more than you. It’s mentioned in the 2nd paragraph after M-x describe-input-method. I’m using Emacs 24 though, maybe it wasn’t mentioned back in 2009?
  - llaisdy Says:
    
    Sunday, 4th December, 2011 at 2:50 pm
    That’ll be it, thanks:).
Michael Hayes Says:

Thursday, 26th November, 2015 at 7:59 am
Dear Ivan,
a very useful comparison. Thanks for sharing it with the public. I have a question about the chinese-py-b5 IM: how can I access the ü character needed for 女 for example? Some other Chinese IMs e.g. Microsoft’s, accept v for ü but chinese-py-b5 doesn’t. I can’t work it out. Must I switch to something like chinese-py to get the v for ü facility?

Reply
- llaisdy Says:
  
  Thursday, 26th November, 2015 at 8:51 am
  Dear Michael
  
  Thanks very much for your comment! I’m glad this page is still useful.
  
  In chinese-py-b5, “u:” seems to work for “ü”; so, typing “nu:3” brings up “女”.
  
  Best wishes
  
  Ivan
  
  Reply
  - Michael Hayes Says:
    
    Thursday, 26th November, 2015 at 8:02 pm
    🙂 You’re a star. Thank you so much. I must have been staring at the : option and didn’t realise what it was for.
  - llaisdy Says:
    
    Friday, 27th November, 2015 at 11:35 am
    Pleasure 🙂

Llaisdy

Chinese input methods for Emacs

Sunday, 8th February, 2009

Introduction

Methods overview

punct

b5

Methods in detail

chinese-4corner

chinese-array30

chinese-b5-tsangchi

chinese-b5-quick

chinese-ccdospy

chinese-cns-tsangchi

chinese-cns-quick

chinese-ctlau

chinese-ctlaub

chinese-ecdict

chinese-etzy

chinese-py

chinese-qj

chinese-sisheng

chinese-sw

chinese-tonepy

chinese-ziranma

chinese-zozy

Conclusion

References

12 Responses to “Chinese input methods for Emacs”

Leave a reply to Max Cancel reply

Here

There

Llaisdy

Chinese input methods for Emacs

Sunday, 8th February, 2009

Introduction

Methods overview

punct

b5

Methods in detail

chinese-4corner

chinese-array30

chinese-b5-tsangchi

chinese-b5-quick

chinese-ccdospy

chinese-cns-tsangchi

chinese-cns-quick

chinese-ctlau

chinese-ctlaub

chinese-ecdict

chinese-etzy

chinese-py

chinese-qj

chinese-sisheng

chinese-sw

chinese-tonepy

chinese-ziranma

chinese-zozy

Conclusion

References

Share this:

Related

12 Responses to “Chinese input methods for Emacs”

Leave a reply to Max Cancel reply

Here

There