Seeking English to Arabic/Farsi/Pashto/Urdu transcription help

Assalam u Alaikum / Senga Yai,

My name is Joel. I consider myself a journeyman jack of all trades
linguist (Arabic and Hebrew) and application (Python) and web page programmer
(Javascript and PHP).

I have recently realized what I consider one of my life’s ambitions of creating a
functioning English to Arabic Transcription Veracity Verification (MULTVV) application/
web page at the following URL:

enartrans (dot) com (forward slash) arabictranstest (dot) php

Transcription is the formal terminology for spelling, especially in a phonetic
respect from one language to another.

In a nutshell or in other words (no pun intended) what I do is take the user’s
(your) English input name, proper noun or term and first put it through several
English to Arabic “transcription engines” where each outputs one transcription
(variation) or one transcription engine that outputs a few possible or
probable valid alternative transcription variations and then run each Arabic
variation respectively through a search engine.

From the URL count return or “number of hits” I categorize and quantify
which one perhaps of several transcription variations are the most
recognized or accepted by the world Arabic speaking community and to what
extent (order of magnitude).

From the transcription categorization coupled with an inline regular Arabic
word dictionary, a very powerful native language search engine utility can
be realized. The greater the accuracy of the transcription classification
and regular word dictionary; the greater the effectiveness of the combination
being a multilingual search engine utility like no other.

While my language app/web page is an entity in its own right, it is also
inherently powerful adjunct to the relatively new Google Cross Language
Information Retrieval Language tool or “CLIR” where Google specifically
prompts the user for alternative transcriptions or regular word translations
if the one it (Google) has come up with does not suit his/her needs.

Currently my page is currently “tooled” only for Arabic. My objective is to
ultimately to the same for all languages represented by web pages on the
Internet.

Tapping off of my established “base” Arabic transcription programming
infrastructure I want to expand my application into Farsi/Dari, Urdu and
Pashto and I need some help re: very fine clarifications and information for
a better transcription result for you. Shuran Jiddan/Tashakkur for any
help you can provide for any references or web pages containing already
established English to native Farsi/Dari, Urdu and/or Pashto. I would
welcome and appreciated more Arabic examples as well. The more examples
the better! This is the crux of my post here.

Without getting into too many specifics and technicalities at this juncture,
my first inclination to to try to find existing transcription engines for a
given target language filtering out transcriptions which blatantly have no
phonetic correspondence to the English input term through a phonetic
transcription “post-processor filter” of my own (design) … no sense in
reinventing the wheel.

Arabic more than other languages seem to have more advanced or developed
transcription engines by far compared to others. Thus, my plan is to tweak
the output of the Arabic transcription engine(s) for phonetically similar
languages such as Dar/Farsi, Urdu, Pashto etc., otherwise build my own
transcription engines from scratch which I foresee doing once I address say
Hindi (way down the road).

For instance when I “Google map” Karachi (Pakistan) Google at the following
URL, Google displays the Urdu representation which contains the the Character “Che” which is not present in the Arabic character set.

nationsonline (dot) org (forward slash) onworld (forward slash) map (forward slash) google_map_Karachi (dot) htm

كراچى

Notice the contrast of corresponding the two Arabic transcription variations
of Karachi from my web page:

كراتشي

كاراتشاي

(Hopefully the preceding Urdu and Arabic are coming to you in human
readable Arabic characters i.e. HTML entities and not being converted to
some kind of encoding representation. Perhaps some of you are familiar
with some of the encoding issues I’ve encountered in your own Semitic
/Indo-European language work.)

I envision from the from the “basic Arabic infrastructure” to “map” or
“translate” to the Urdu Che using specialized logic (i.e. another filter) from
the corresponding Arabic characters that would equate to it. Likewise I
trust you can see what I’m to trying to accomplish here as well.

===================================================

  Here is the basic instructions:

Bring up;

enartrans (dot) com (forward slash) arabictranstest (dot) php

(On some versions and/or settings of IE the result matrix may disappear.
If so try using only IE Version 7.0 and above or change the screen resolution.

There is never any problem in this regard using Firefox or Opera. However, Firefox
or Opera does not permit the functionality of clicking the button and having the
Arabic transcription go right to your memory/clipboard. Only IE
permits this)

To enter a new name or proper noun on on my web page that
is not in the database (i.e. not in the drop-down autocomplete
selection:

Start typing/inputting the name, proper noun or term in

English in the name field on the left side of the web page.

If you see it already in the database i.e. in the autocomplete
selection, select it via mouse.

If you have a term that is not in the database just type [Enter]
when you have your desired word spelled to your satisfaction.

[size=2]===================================================/SIZE[/size]