Chinese input methods for Emacs
Sunday, 8th February, 2009
From the llaisdy blog archives.
This all feels a bit antiquated now I’m working on a Mac (I use the Mac input methods instead of the emacs input methods), but it’s useful whenever I have to work on a Windows machine. The notes below are not complete and I’d appreciate any comments to help fill in the gaps.
Introduction
Emacs provides 25 input methods for Chinese. Although each input method has its own describe-input-method page, these pages can be rather terse. There is also no overview or comparison between the different input-methods, neither have I been able to find one on the web.
Here I have gathered together the information I’ve been able to find. I’d be pleased to hear about any errors I’ve made, and where I can find further information to correct my omissions. I’ll keep this page up-to-date.
I’m learning Mandarin Chinese, I’m interested in simplified script, and for the moment I find a pinyin-based approach to the written language easiest. For my own current requirements, chinese-tonepy is fitting the bill, but I’m interested in learning a structural input method (i.e., not based on pronunciation). See the Conclusion for further discussion.
Methods overview
Table: Chinese input methods provided by Emacs
Method name | Method name |
---|---|
chinese-4corner | chinese-array30 |
chinese-b5-quick | chinese-b5-tsangchi |
chinese-ccdospy | |
chinese-cns-quick | chinese-cns-tsangchi |
chinese-ctlau | chinese-ctlaub |
chinese-ecdict | chinese-etzy |
chinese-punct | chinese-punct-b5 |
chinese-py | chinese-py-b5 |
chinese-py-punct | chinese-py-punct-b5 |
chinese-qj | chinese-qj-b5 |
chinese-sisheng | chinese-sw |
chinese-tonepy | chinese-tonepy-punct |
chinese-ziranma | chinese-zozy |
The basic idea common to all these input methods is that you type in a key sequence, bringing up a manu of options, from which you chose the character you want.
For example, in chinese-tonepy, typing in ‘hao3’ brings up this menu in the minibuffer window:
If you want hao3 = ‘good’, choose option 2 (by typing 2 or clicking on the menu). Character 好 appears at point.
At any point in any of the input methods you can press <tab> for a full tree-list of the options available to you from that point, e.g.:
punct
Some input methods have versions with or without -punct (see the table above). The -punct versions support proper Chinese punctuation characters. However, (a) although chinese-punct works, chinese-py-punct (& poss. others) doesn’t seem to; (b) versions without -punct use ascii punctuation, which meets my needs for the moment.
In the discussion below I ignore the -punct versions.
b5
Some input methods have versions with or without -b5. The -b5 input methods generate characters in the Big Five character set. Big Five is a Taiwanese character set for traditional characters. Other input methods support GB2312, which is a character set from the People’s Republic of China, for simplified characters. For example:
Input Method | Input | Output |
---|---|---|
chinese-py-b5 | guo2/1 yu3/2 | 國 語 |
chinese-tonepy | guo2/1 yu3/7 | 国 语 |
For the full skinny on character sets see Lunde (1999).
I presume that, apart from chinese-b5-quick and chinese-b5-tsangchi, the -b5 input methods work the same way as the non-b5 methods, but just output Big Five (i.e., traditional script) instead of GB2312 (i.e., simplified script). Consequently, in the following discussion I ignore the -b5 versions (apart from chinese-b5-quick and chinese-b5-tsangchi).
I presume that, apart from chinese-b5-quick and chinese-b5-tsangchi, the -b5 input methods work the same way as the non-b5 methods, but just output Big Five (i.e., traditional script) instead of GB2312 (i.e., simplified script). Consequently, in the following discussion I ignore the -b5 versions (apart from chinese-b5-quick and chinese-b5-tsangchi).
Methods in detail
For each method I’ll give the character set (if not GB2312), and, as a simple illustration of use, how to generate 你好吗 ( nǐ hǎo ma = are you well?).
chinese-4corner
No description in description!
The Wikipedia provides a description [http://en.wikipedia.org/wiki/Four_corner_method]. You use four digits to describe the four corners of a character (top left, top right, bottom left, bottom right).
Quoting the Wikipedia article:
Digit | Meaning |
---|---|
1 | a horizontal stroke, |
2 | a vertical or diagonal stroke, |
3 | a dot stroke, |
4 | two strokes in a cross shape, |
5 | three or more strokes in which one stroke intersects all others, |
6 | a box-shape, |
7 | where a stroke turns a corner, |
8 | the shape of the Chinese character 八 and its inverted form, and |
9 | the shape of the Chinese character 小 and its inverted form. |
0 | where there is either nothing in a corner, the part in a corner is already represented by a previous corner, or where a corner has a dot stroke followed by a horizontal stroke |
Usage example:
Char | Digits | Interpretation |
---|---|---|
你 | 2729/2 | Diagonal; Corner; Vertical; 小; ‘2’? |
好 | 47447/1 | Cross; Corner; Cross; Cross?; option 1 |
吗 | ??? | ??? |
To be honest, I cheated here: the Wiktionary gives four corner codes for Chinese characters (e.g., http://en.wiktionary.org/wiki/好).
chinese-array30
Outputs Big Five.
Some docs:
- http://www.array.com.tw/ (in Chinese)
- http://openvanilla.org/ (partly in Chinese)
- scim-array: Array 30 Input Method Engine for SCIM (in English).
Seems to be quite popular, and the MS Windows Chinese input method seems to use it. However, I can’t make head or tail of it. Also, it outputs Big Five.
chinese-b5-tsangchi
This is a Taiwanese method based on geometrical decomposition of characters. See: http://en.wikipedia.org/wiki/Cangjie_method.
Char | Input | Interpretation |
---|---|---|
你 | onf | O = 人 (LHS); N = hook (top of RHS); F = 火 (bottom of RHS) |
好 | vnd | V = 女 (LHS); N = hook (top of RHS); D = 木 wood (?) |
嗎 | rsqf | R = 口 (LHS); S = 尸 (L & top of RHS); Q = 手 (next top RHS); F = 火 (bottom of RHS) |
Again, I got these from the Wiktionary, but I can understand how they were made up. Notice that ‘ma’ is the traditional 嗎 and not the simplified 吗.
chinese-b5-quick
This looks like it should be a ‘quick’ version of b5-tsangchi. Indeed it takes fewer keystrokes to get to an end character – but I can’t find the end character I want. In other words, I don’t know how it uses the Tsangchi system.
chinese-ccdospy
From the description:
This input method works almost the same way as chinese-py_. The difference is that you type a single key for these Pinyin spelling. Pinyin: zh en eng ang ch an ao ai ong sh ing yu(ü) keyseq: a f g h i j k l s u y v
Char | Input | Interpretation |
---|---|---|
你 | ni7 | 7th option |
好 | hk6 | 6th " |
吗 | ma9 | 9th " |
chinese-cns-tsangchi
Probably the same as chinese-b5-tsangchi, but outputing a CNS (Chinese National Standards) character set instead of Big Five (possibly CNS 11643-1992). My Emacs hasn’t got the fonts for it.
chinese-cns-quick
See chinese-cns-tsangchi and chinese-b5-quick.
chinese-ctlau
An input method based on Sidney Lau’s Romanisation system. (a) it’s for Cantonese; (b) it generates Big5; (c) I can’t get it to work.
chinese-ctlaub
See chinese-ctlau.
chinese-ecdict
Pretty impressive. You type in the English word and the Chinese (Big5) word appears! Wow!
Char | Input | Interpretation |
---|---|---|
香蕉 | banana | banana |
冰箱 | refrigerator | refrigerator |
你 | you | you |
良好的 | good | good |
Impressive but:
- notice that ‘good’ doesn’t actually give us hao3 (好), it gives us liang2 hao3 de (良好的). 良好 still means ‘good’ but 的 is a connector making this adjective ready to add on to a noun.
- I don’t know how I would get a grammatical particle like 吗 (ma).
- This will be handy for emergencies but I think I’ll keep a dictionary around too.
chinese-etzy
A Zhuyin input method with Big5 output.
chinese-py
Type in pinyin, without tones.
Char | Input | Interpretation |
---|---|---|
你 | ni1 | 1st menu option |
好 | hao | guessed correctly |
吗 | ma2 | 2nd option |
chinese-qj
Possibly another Zhuyin input method. The description has virtually no information.
chinese-sisheng
This will be quite useful. This input method does not generate Hanzi, but it transforms pinyin with tone numbers into ‘proper’ pinyin with diacritics.
Char | Input | Interpretation |
---|---|---|
nǐ | ni3 | 你 |
hǎo | hao3 | 好 |
ma | ma | 吗 |
nǚ | nv3 | 女 |
chinese-sw
Type in a pair of radicals. Quite clever and fast (and not pronunciation-based), easy to pick up. Usefulness will depend on coverage.
Char | Input | Interpretation |
---|---|---|
你 | kv4 | 亻+ 小 + 4th option |
[好] | ??? | can’t find it |
吗 | fd9 | 口 + 刀 + 9th option |
chinese-tonepy
Type in pinyin, with tones.
Char | Input | Interpretation |
---|---|---|
你 | ni3 | |
好 | hao32 | 2nd menu option |
吗 | ma5 |
n.b.: chinese-py-b5 is the same as this method, but outputs Big Five i.e., traditional characters. A useful alternative to chinese-tonepy for when I want to write Trad.
chinese-ziranma
A pinyin-based input method where the pinyin initials and finals are mapped onto the qwerty keyboard (see http://eyegene.ophthy.med.umich.edu/unicode/KeyboardLayouts/ZiRanMaPinYinKeyboard.html).
The basic idea is a four-stroke pattern: initial, final, tone, quote (‘). This may not be that impressive for, say, 好 (hke’), but the pattern extends to cover words, not just characters. That means that two-, three- or four- character words can be input with the same four-stroke pattern. The example from the description is 北京电视台 (bjdt) (Běijīng diànshìtái; Beijing TV station), which is impressive for four keystrokes.
Char | Input | Interpretation |
---|---|---|
你 | n | guessed correctly |
好 | hke’ | |
吗 | mae’5 | 5th option |
chinese-zozy
A Big5 Zhuyin input method.
Conclusion
chinese-tonepy is looking the best for now for most uses (see also chinese-ccdospy, chinese-ziranma).
chinese-sisheng will be useful for writing pinyin.
I think Zhuyin is probably in my future if/when I start learning Chinese seriously, but unfortunately both the Emacs Zhuyin output Big5 (i.e. traditional script). How much is Zhuyin used in the People’s Republic?
It would be useful to know a method not based on pronunciation. I might not always know how to say what I want to write – eg "How do you say X?" Emacs’ structural input methods generating GB characters are:
- chinese-4corner
- chinese-qj (maybe)
- chinese-sw
I’ll explore these, and update this page as I found out more.
References
Lunde, K. (1999). CJKV Information Processing. Sebastopol, CA.: O’Reilly.
http://en.wikipedia.org/wiki/Chinese_input_methods_for_computers
Wednesday, 23rd December, 2009 at 6:12 pm
Thanks for this thorough survey of available input methods. My personal favorite is ZiRanMa, because it works so quickly when you a familiar with pinyin and when you are comfortable with a qwerty keyboard.
Wednesday, 23rd December, 2009 at 7:08 pm
Dear Max
Thanks for your comment!
I like the sound of ZiRanMa, but my Chinese is nowhere near good enough yet for it to stand out. Next year!
Wednesday, 23rd December, 2009 at 7:42 pm
Come to think of it, ZiRanMa does offer more than the Mac input methods.
Wednesday, 23rd December, 2009 at 8:23 pm
About ZiRanMa: It took me a few days to remember the mapping of vowel-endings on a qwerty keyboard (I had to print out the diagram and leave it next to the keyboard for a while). But once I knew them, it became very fast to type. I’m expecting an even higher speed gain once I’m familiar with the mapping for compound words.
I’m curious about the four-corner method. It sounds intriguing but I have yet to take the time to try it out.
Sunday, 4th December, 2011 at 2:02 pm
Very useful overview, thx.
The punct versions chinese-tonepy-punct and chinese-py-punct are working fine for me though. But one has to remember to enter “v” before entering the punctuation.
Sunday, 4th December, 2011 at 2:35 pm
Aha! Thanks for the tip! Yes, it works for me too now.
Thanks for your comment.
Do you have documentation for any of these input methods?
Sunday, 4th December, 2011 at 2:43 pm
Probably not more than you. It’s mentioned in the 2nd paragraph after M-x describe-input-method. I’m using Emacs 24 though, maybe it wasn’t mentioned back in 2009?
Sunday, 4th December, 2011 at 2:50 pm
That’ll be it, thanks:).
Thursday, 26th November, 2015 at 7:59 am
Dear Ivan,
a very useful comparison. Thanks for sharing it with the public. I have a question about the chinese-py-b5 IM: how can I access the ü character needed for 女 for example? Some other Chinese IMs e.g. Microsoft’s, accept v for ü but chinese-py-b5 doesn’t. I can’t work it out. Must I switch to something like chinese-py to get the v for ü facility?
Thursday, 26th November, 2015 at 8:51 am
Dear Michael
Thanks very much for your comment! I’m glad this page is still useful.
In chinese-py-b5, “u:” seems to work for “ü”; so, typing “nu:3” brings up “女”.
Best wishes
Ivan
Thursday, 26th November, 2015 at 8:02 pm
🙂 You’re a star. Thank you so much. I must have been staring at the : option and didn’t realise what it was for.
Friday, 27th November, 2015 at 11:35 am
Pleasure 🙂