Print

Print


As explained in the last paragraph of 
http://www.loc.gov/catdir/cpso/romanization/persian.pdf :

"In romanizing Persian, the Library of Congress has found it necessary 
to consult dictionaries as an appendage to the romanization tables, 
primarily for the purpose of supplying vowels. For Persian, the 
principal dictionary consulted is:

M. Muʼīn. Farhang-i Fārsī-i mutavassit."

That is, any algorithm for romanizing Persian would need to not only map 
from Persian letters to roman ones but also to look up the word in a 
digital form of this dictionary in order to know what vowels to insert. 
  The digital dictionary doesn't actually need to be transliterated; 
that is, instead of doing this:

original ==> transliterated without vowels ==> transliterated with 
vowels ("romanized")

you can instead do this:

original ==> Persian letters with vowels ==> transliterated with vowels 
("romanized")

which would allow your dictionary to use the original form as the input.

As Jane indicates, Persian and Hebrew both often omit vowels in the 
original, yet they are always supplied in romanization.  Since 
dictionary lookups are not always perfect (especially with proper 
names), a human will likely have to tweak the vowels.  The 
transliteration table also discusses when to capitalize the words in the 
romanized form: something else that will be quite difficult to code.

In short, you will probably need to have a Persian-speaking librarian 
review the transliterated output of your code to correct errors.

--Kevin

On 4/18/13 10:37 AM, Jacobs, Jane W wrote:
> Hi Yan,
>
>
>
> The business of going from Original Script Persian to transliteration
> is much trickier than what we did, which was to go from Romanized
> Urdu BACK to Original Script Urdu.  Unfortunately I haven�t tried
> going the other way, but it seems like it would require an
> ALA-Romanized Persian dictionary to make it work.  Name might be
> easier, since there�s a lot of Persian in original script I the LC
> authority files and since names are often repeated you could get a
> lot use out of a modest sized dataset.  I don�t know any rules of
> Persian orthography, but if there were any (like �i� before �e�
> except after �c� �) it would THEORETICALLY be possible to leverage
> those.
>
>
>
> Joel Hahn did a nice macro of Hebrew for OCLC (which has similar
> vocalization issues) but my Hebrew cataloger tells me that the vowels
> still have to be tweaked.  Since I know even less about Hebrew than I
> do about Persian, I don�t know if there�s any part of his methodology
> you could repurpose for Persian.
>
>
>
> Sorry I can�t be of more help with this issue.
>
> JJ
>
>
>
> -----Original Message----- From: Han, Yan
> [mailto:[log in to unmask]] Sent: Wednesday, April 17, 2013
> 8:14 PM To: Jacobs, Jane W; Code for Libraries
> ([log in to unmask]); [log in to unmask] Cc: Seyede Pouye
> Khoshkhoosani Subject: RE: : Persian Romanization table
>
>
>
> Hello, All and Jane
>
> First I would like to appreciate Jane Jacobs at Queens Library
> providing me Urdu Romanization table.
>
> As we are working on creating Persian/Pushutu transliterate software,
> my Persian language expert has the following question :
>
> " In according to our conversation for transliterating Persian to
> Roman letters, I faced a big problem: As the short vowels do not show
> up on or under the letters in Persian, how a machine can read a word
> in Persian. For example we have the word �???  "; to the machine this
> word is PDR, because it cannot read the vowels. There is no rule for
> the short vowels in the Persian language; so the machine does not
> understand if the first letter is �pi�, �pa� or �po�. Is there any
> way to overcome this obstacle?"
>
> This seems to me that we missed a critical piece of information here.
> (Something like a dictionary). Without it, there is no way to have
> good translation from computer. We will have to have a Persian
> speaker to check/correct the computer's transliteration.
>
> Any suggestions ?
>
> Thanks,
>
> Yan
>
>
>
>
>
> -----Original Message-----
>
> From: Jacobs, Jane W [mailto:[log in to unmask]]
>
> Sent: Wednesday, January 23, 2013 6:28 AM
>
> To: Han, Yan
>
> Subject: RE: : Persian Romanization table
>
>
>
> Hi Yan,
>
>
>
> As per my message to the listserve, here are the config files for
> Urdu.  If you do a Persian config file, I d love to get it and if
> possible add it to the MARC::Detrans site.
>
>
>
> Let me know if you want to follow this road.
>
> JJ
>
>
>
> -----Original Message-----
>
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> Of Han, Yan
>
> Sent: Tuesday, January 22, 2013 5:31 PM
>
> To: [log in to unmask]
>
> Subject: [CODE4LIB] : Persian Romanization table
>
>
>
> Hello, All,
>
> I have a project to deal with Persian materials. I have already uses
> Google Translate API to translate. Now I am looking for an API to
> transliterate /Romanize (NOT Translate) Persian to English (not
> English to Persian). In other words, I have Persian in, and English
> out.
>
> There is a Romanization table (Persian romanization table - Library
> of Congress<http://www.loc.gov/catdir/cpso/romanization/persian.pdf>
> www.loc.gov/catdir/cpso/romanization/persian.pdf<http://www.loc.gov/catdir/cpso/romanization/persian.pdf>).
>
>
>
>
> For example, If
>
>
>
> ????  should output as  Kit?b
>
> My finding is that existing tools only do the opposite
>
>
>
> 1.      Google Transliterate: you enter English, output Persian
> (Input  Bookmark , output  ???????  , Input  ???????  , output
> ???????  )
>
>
>
> 2.      OCLC language: the same as Google Transliterate.
>
>
>
> 3.      http://mylanguages.org/persian_romanization.php  : works, but
> no API.
>
>
>
> Anyone know such API exists?
>
>
>
> Thanks much,
>
>
>
> Yan
>
>
>
>
>
>
>
>
>
>
> Connect with Queens Library:
>
> *  QueensLibrary.org http://www.queenslibrary.org/
>
> *  Facebook http://www.facebook.com/queenslibrarynyc
>
> *  Twitter http://www.twitter.com/queenslibrary
>
> *  LinkedIn http://www.linkedin.com/company/queens-library
>
> *  Google+ https://plus.google.com/u/0/116278397527253207785
>
> *  Foursquare https://foursquare.com/queenslibrary
>
> *  YouTube http://www.youtube.com/queenslibrary
>
> *  Flickr http://www.flickr.com/photos/qbpllid/
>
> *  Goodreads
> http://www.goodreads.com/group/show/58240.Queens_Library
>
>
> The information contained in this message may be privileged and
> confidential and protected from disclosure. If the reader of this
> message is not the intended recipient, or an employee or agent
> responsible for delivering this message to the intended recipient,
> you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have
> received this communication in error, please notify us immediately by
> replying to the message and deleting it from your computer.
>