The business of going from Original Script Persian to transliteration is much trickier than what we did, which was to go from Romanized Urdu BACK to Original Script Urdu. Unfortunately I haven’t tried going the other way, but it seems like it would require an ALA-Romanized Persian dictionary to make it work. Name might be easier, since there’s a lot of Persian in original script I the LC authority files and since names are often repeated you could get a lot use out of a modest sized dataset. I don’t know any rules of Persian orthography, but if there were any (like “i” before “e” except after “c” …) it would THEORETICALLY be possible to leverage those.
Joel Hahn did a nice macro of Hebrew for OCLC (which has similar vocalization issues) but my Hebrew cataloger tells me that the vowels still have to be tweaked. Since I know even less about Hebrew than I do about Persian, I don’t know if there’s any part of his methodology you could repurpose for Persian.
Sorry I can’t be of more help with this issue.
From: Han, Yan [mailto:[log in to unmask]]
Sent: Wednesday, April 17, 2013 8:14 PM
To: Jacobs, Jane W; Code for Libraries ([log in to unmask]); [log in to unmask]
Cc: Seyede Pouye Khoshkhoosani
Subject: RE: : Persian Romanization table
Hello, All and Jane
First I would like to appreciate Jane Jacobs at Queens Library providing me Urdu Romanization table.
As we are working on creating Persian/Pushutu transliterate software, my Persian language expert has the following question :
" In according to our conversation for transliterating Persian to Roman letters, I faced a big problem: As the short vowels do not show up on or under the letters in Persian, how a machine can read a word in Persian. For example we have the word “??? "; to the machine this word is PDR, because it cannot read the vowels. There is no rule for the short vowels in the Persian language; so the machine does not understand if the first letter is “pi”, “pa” or “po”. Is there any way to overcome this obstacle? "
This seems to me that we missed a critical piece of information here. (Something like a dictionary). Without it, there is no way to have good translation from computer. We will have to have a Persian speaker to check/correct the computer's transliteration.
Any suggestions ?
From: Jacobs, Jane W [mailto:[log in to unmask]]
Sent: Wednesday, January 23, 2013 6:28 AM
To: Han, Yan
Subject: RE: : Persian Romanization table
As per my message to the listserve, here are the config files for Urdu. If you do a Persian config file, I d love to get it and if possible add it to the MARC::Detrans site.
Let me know if you want to follow this road.
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Han, Yan
Sent: Tuesday, January 22, 2013 5:31 PM
To: [log in to unmask]
Subject: [CODE4LIB] : Persian Romanization table
I have a project to deal with Persian materials. I have already uses Google Translate API to translate. Now I am looking for an API to transliterate /Romanize (NOT Translate) Persian to English (not English to Persian). In other words, I have Persian in, and English out.
There is a Romanization table (Persian romanization table - Library of Congress<http://www.loc.gov/catdir/cpso/romanization/persian.pdf> www.loc.gov/catdir/cpso/romanization/persian.pdf<http://www.loc.gov/catdir/cpso/romanization/persian.pdf>).
For example, If
???? should output as Kit?b
My finding is that existing tools only do the opposite
1. Google Transliterate: you enter English, output Persian (Input Bookmark , output ??????? , Input ??????? , output ??????? )
2. OCLC language: the same as Google Transliterate.
3. http://mylanguages.org/persian_romanization.php : works, but no API.
Anyone know such API exists?
Connect with Queens Library:
The information contained in this message may be privileged and
confidential and protected from disclosure. If the reader of this
message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited. If you have
received this communication in error, please notify us immediately
by replying to the message and deleting it from your computer.