Welcome Guest ( Log In | Register )



 
Reply to this topicStart new topic
> Localizing Python, Some hacking to force python accept more natural languages words
ignite
post Jan 5 2007, 05:21 PM
Post #1


Member [ Level 1 ]
Group Icon

Group: Members
Posts: 34
Joined: 17-November 06
Member No.: 17,334



Overall idea about this topic was inspired by my childrens. Some time ago I started to teach them programming. As a base language I choose Python. So the next step was an attractive subject to program examples and exercises . I create some "robot" program. There is nothing new in it: yet another logo-style game.
(See attacments)
The "robot" is actually lorry which moves at squire cells, concrete or ground and can't move through walls. It understands commands like "forward", "backward" and so on.
All sounds good as for now.
We started from program like this:

CODE
#!/usr/bin/python
# -*- coding: utf-8 -*-
from robot import Robot
robot=Robot('z4-4.maz')

while robot.clear_forward():
  robot.forward(1)
robot.right()
while robot.ground():
  robot.backward(1)
robot.forward(1)
while robot.ground():
  robot.plant()
  robot.forward(1)
robot.forward(1)


But, as you already noticed, I am not native english speaker. When I speak about this program with childrens I forced continuosly switch ukrainian-english and back again and again. So topic arised: can Python understand ukrainian? Or in global context: any language other then english?

I devide the overall task into three different in size and effort parts:
1) make python allows international symbols as identifiers: variables, classes, functions names.
2) translate parts of most used keywords and functions.
3) full python localization.

Part One.
As everybody knows characters represented inside computers as one or more bytes. Historically first widespread charset was ASCII coding standart. It declairs first 127 of 255 possible values of one byte. Later was introduced other 8-bit coding systems, such as latin1, koi8, win1252 etc. And finally was invented unicode, especially in utf-8 form. Utf-8 deffer from other unicode standarts by unique features: it compliant with ASCII code in first 127 values, and all other chars encoded by values from 127-255 vector, so no clash can occure with ASCII signs, digits and such.
Let's start.
Download latest python 2.5 source:
wget -c http://www.python.org/ftp/python/2.5/Python-2.5.tar.bz2
Extract archive:
tar -xjf Python-2.5.tar.bz2
cd Python-2.5

Pyton checks for allowed symbols in tokenizer module: Parser/tokenizer.c. Open it with your favorite editor and locate first call of function isalpha(), it looks like:
CODE
        /* Identifier (most frequent token!) */
        if (isalpha(c) || c == '_') {

All we need to do is to allow symbols above 127 (hex 7F), so edit line to get something like:
CODE
        /* Identifier (most frequent token!) */
        if (isalpha(c) || c == '_' || c > 0x7F) {

Next point is 20 line firther, search for isalnum() function, lines look like:
CODE
                while (isalnum(c) || c == '_') {
                        c = tok_nextc(tok);
                }

Do exactly edentical edit:
CODE
                while (isalnum(c) || c == '_' || c > 0x7F) {
                        c = tok_nextc(tok);
                }

Thats all!
Just do regular things like conigure, make, install:
CODE
./configure
make
make install


Type /usr/local/bin/python and you'l get python interpreter wich allows all umlauts, accents and cyrillics.


Part Two.

.... to be continued: hack python grammar - translate keywords ("def", "while", "if") and builtin functions ("range")...



This post has been edited by ignite: Jan 5 2007, 06:45 PM
Attached File(s)
Attached File  z1.png ( 73.1k ) Number of downloads: 10
 
Go to the top of the page
 
+Quote Post
Lewisthemusician
post Jan 5 2007, 11:12 PM
Post #2


Member [ Level 2 ]
Group Icon

Group: Members
Posts: 51
Joined: 5-January 07
Member No.: 19,160



kl, i am thinking about teaching myself python but i want to focus on getting the languages i know perfect before starting a new one.
Ill give it a week before i start more biggrin.gif
The Thing is though, you have some coding which won't work in other languages like in AS (action script) you have this coding

On (press) gotoandstop(2)

And in some languages they don't have a world for "on" by itself so it would be like and on in the word.
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic

Collapse

> Similar Topics

Topics Topics
  1. Python(5)
  2. Python (win32)(1)
  3. [python]got A Problem In My Little Program(1)
  4. Python Versus Java ?(4)
  5. Prove Yourself As A Python Coder - Challenge Site(4)
  6. Pagerank Checksum Algo For Python(0)
  7. Lpt Port And Python(0)
  8. Python Mysqldb Threading(0)


 



- Lo-Fi Version Time is now: 11th October 2008 - 07:39 AM