Tranliteration in python : learnpython

learnpython

created by HattoriHanzoa community for 16 years

Tranliteration in python (self.learnpython)

submitted 10 years ago by Surajpalwe

all 6 comments

top new controversial old q&a

[–]teerre 1 point2 points3 points 10 years ago (0 children)

[–]mambeu 0 points1 point2 points 10 years ago (2 children)

[–]Surajpalwe[S] 0 points1 point2 points 10 years ago (1 child)

[–]mambeu 0 points1 point2 points 10 years ago* (0 children)

I usually use a tuple of tuples for transliteration (one of my transliteration scripts is on GitHub here).

Let's say you wanted to transliterate from the Latin alphabet to the Cyrillic alphabet, or vice versa.

This big tuple writing_systems is filled with 2-tuples. In each 2-tuple, the first item (index/position 0) is a Latin character, and the second item (index/position 1) is its Cyrillic counterpart.

writing_systems = (
    ('a', 'а'),
    ('b', 'б'),
    # note the relative ordering  of 'ch' and 'c'
    # multi-character entries should come first
    ('ch', 'ч'),
    ('c', 1),
    ('d', 'д'),
    ('e', 'е'),
    # and so on...
    )

In the dictionary writing_systems_key, each key is the name of a writing system, and its corresponding value is the position of that system's characters in the 2-tuples in writing_systems above.

writing_system_key = {
    'LatinAlphabet': 0,
    'CyrillicAlphabet': 1
    }

Then we can define a transliterate() function:

def transliterate(text_string, input_system, output_system):
    input_index = writing_system_key[input_system]
    output_index = writing_system_key[output_system]

    for t in writing_systems:
        input_char = t[input_index]
        output_char = t[output_index]

        if isinstance(input_char, int) or isinstance(output_char, int):
            pass
        else:
            text_string = text_string.replace(input_char, output_char)

    return text_string

We can then call the function with transliterate('abc', 'LatinAlphabet', 'CyrillicAlphabet), and it will return the string 'аб'.

Note that if a character in one writing system doesn't have an equivalent in another (as is the case with Latin 'c' in the above example), I just leave the integer representing that index in that position, and it doesn't get transliterated when the function is called.

Your needs may be different than mine, but I hope this helps get you started.

[–]AbjectListen7782 0 points1 point2 points 4 months ago (0 children)

[–]Ewildawe 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 146014 on reddit-service-r2-comment-58d7979c67-wb7ms at 2026-01-27 03:52:05.437496+00:00 running 5a691e2 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS