Chi was greatly encouraged. His solo work runs parallel to these larger efforts. However, most of them still can’t get rid of the bulky keyboard. While breaking down characters into components is good enough for specific character retrieval indexes and typewriter keyboard designs, it doesn’t translate directly into writing such a process for a computer.
Chi remembers the advantages of shape-based methods, where the character part helps to directly recognize the whole character. To incorporate this useful principle into his coding scheme, Zhi decided to use the first letter of the pinyin spelling of each component to index characters by component (the simpler character in each ideogram).
The idea took another two years to flesh out. On average, characters can be divided into two to four components, for a total of 300 to 400 components. Most characters can be divided into two halves – vertical or horizontal – and other possible geometric shapes. This generates a 2- to 4-letter letter code for each character, which means that each character requires up to four keystrokes on a traditional English keyboard. By comparison, the average English word length is nearly 4.8 letters. Chi thus makes the alphabet work more efficiently for individual ideograms than it does for English. The system also neatly addresses dialect differences and homophones. Because the code contains only the first letter, not the full pronunciation of the character, most regional phonetic changes don’t matter. Four-letter codes are like acronyms for different parts of the characters. Chi basically uses the alphabet as a component rather than a proxy for word spelling.
He arranged the components of each character in handwritten order. Coding by components provides context and important clues, reducing the risk of ambiguity and duplication of code. The chances of the same components (even components starting with the same letter) appearing in the exact same order in two different characters is low.
Zhi’s method of alphabetically indexing Chinese characters made it easier for humans to input Chinese — as long as you knew how to write the language — and created a more systematic human-machine interface. For example, in his system, the character “road”, the road (lu) , which has 13 handwriting strokes that can be broken down into just four parts: the mouth (kou) , end (zhi), 唵 (Can), and mouth (kou). The first letter that isolates each component gives the character code for KZPK. Or take the character Wu (wu), a common surname that can be quickly broken down into two parts, the mouth (kou) and days (tian) to generate a character code KT.
Alphabetic spelling, once mediated by Chinese in this way, is no longer pinyin but a semantic spelling system, where each letter actually represents a character rather than a sound. This indexing method can also be extended to represent groups of characters.Take “socialism” for example, or shehui zhuyi: Socialism. A phrase can be encoded as a four-letter sequence SHZY by marking the first letter of each of the four characters in the phrase. Or consider another oft-quoted phrase, the seven characters that make up the “People’s Republic of China” – People’s Republic of China: People’s Republic of China. It can simply be entered as ZHRMGHG.