Movatterモバイル変換


[0]ホーム

URL:


Frequently asked questions about the Sanskrit engine

Sarasvati

How to interact with the Reader

Click on to select segment,click on to rule out segment.

Click on colored rectangle to get its morphology information.

Why is the green menu bar inconveniently placed?

The green menu bar is supposed to stick at the bottom of the page. If it appearsin the middle of the page, sometimes hiding essential dialog boxes and buttons,it is that your browser is not XHTML compliant. Normally the sliding bar allowsall the text to be pulled and made visible. If you encounter difficulties witheither Safari, Chrome or Firefox, please report the precise problem with theprecise versions of browser and OS used.

Is the interface compliant with the Unicode standard?

Yes. The Sanskrit printed by the system is encoded as strings of Unicode,in two flavors - romanisations, rendered as roman script withdiacritic marks, such asvaiśeṣikaḥ,and devanāgarī representations,rendered with proper ligatures if your browser has appropriate fonts loaded,such asवैशेषिकः

The input windows may accept Unicode input under UFT-8 encoding,either indevanāgarī, or in Indological romanization script.It accepts alos transliterated encoding, where diacritics are special characters such as quotation signs and periods, long vowels are noted by duplication, and the whole phoneme stream should berepresented as an ASCII string, such asvaize.sika.h.

What is the default transliteration convention?

The default roman transliteration scheme stems from Velthuis' devnag TeX input conventions.The left table gives you the transliteration convention for each Sanskrit letter,the right table shows its rendering with usual diacritic conventions:

aaaiiiuuu.r.rr.leaioau
.m.h
kkhgghf
cchjjh~n
.t.th.d.dh.n
tthddhn
pphbbhm
yrlvz.ssh
aāiīuūeaioau
kkhggh
cchjjhñ
ṭhḍh
tthddhn
pphbbhm
yrlvśsh

Alternatively, you may write"s instead ofzand"n instead off.Note that upper case letters are reserved for initial capitals in proper nouns.Thus A isNOT equivalent to aa (however read the next section for alternative schemes).Such proper nouns are accepted only in the "Sanskrit made easy"search engine. Seeproper names below.

Other known transliteration schemes

The Sanskrit platform allows the optional use of three other transliterationschemes, the so-called Kyoto-Harvard scheme KH usual for Western indologists,the WX scheme used at University of Hyderabad and other Indian sites,and the SLP1 scheme of the Sanskrit Library. Thus for instance the wordtacchrutvā may be input astacchrutvaa in the default Velthuis scheme,astacchrutvA in the Kyoto-Harvard scheme, aswacCruwvA in the WX scheme, or astacCrutvA in the SLP1 scheme.Similarlyvaiśeṣikaḥ may be input asvaize.sika.h in the default Velthuis scheme,asvaizeSikaH in the Kyoto-Harvard scheme, asvESeRikaH in the WXscheme, or asvESezikaH in the SLP1 scheme.A convenienttablecompares the various encodings. In order to choose a transliteration scheme other than the default,click on the corresponding radio button in the interface pages.

devanāgarī input

It is now possible to enter Unicode representation ofdevanāgarī in UTF8 encoding.Just press the Deva button associated with the input window. Try for instance searching for:देवनागरी or for devanāgarī.

How come thedevanāgarī rendering I see is wrong?

This question is complex.It depends on your browser and its proper parameterization, on the rendering engine of the operating system and its windowing library runon your station, and on the fonts which are actually available in ts library.In particular, it is the rendering engine that compiles theligatures into actual glyphs displayed on your screen.Thus this site may be seen more or less correctly according to your localconfiguration. Do not blame me if you get garbage on Explorer or Netscape.I personally use Safari on MacOSX, and the rendering is generally good.I occasionally check on Firefox and Chrome for interoperability, and I try toinsure that my Web pages are compliant with the W3C specification for HTML5.I give some indications about proper fonts to load on the entry page of the site.

How should I present stems to the grammarian?

The grammatical engine has two interfaces, one for the declension of nouns,adjectives, pronouns and numbers, the other for the conjugation of verbs.When it is activated from links in the dictionary, the results are displayed in Romanscript with diacritics. When it is activated directly (from the Grammar link in thegreen menu at the bottom of the page), it gives you as option to display indevanāgarī. Note that for declensions the first person is listed firstin Roman, consistently with Western tradition, whereas indevanāgarī theprathama puruṣa is listed first, consistently with Indian tradition.

The parameters of the declension engine are the stem and the gender. The stem must bepresented in the same style as in the dictionary. For instance, a present participleshould be presented with its weak stem (e.g.tudat), and not with itsstrong stem (tudant). When the stem does not correspond to a dictionary entry,the declension table is given with the input stem marked with a question mark,indicating that this word is not known from the lexicon, and thus the displayed formsmay not be trusted. For some combinations of stems and genders an error message maybe displayed. When the entry does exist, the answer provides its link to thedictionary. In this case, however, the forms returned are not obtained by table lookup,but are recomputed on the fly.

The gender parameter may be Mas for masculine, Fem for feminin, Neu for neuter, or the special parameter Any which is reserved for deictic pronouns (such asaham,tvad,ātman) and numbers (eka,dva, ...)but alsokati.

The parameters of the conjugation engine are the stem and the present class.Here as in the declension engine, an attempt is made to guess a possible homonymyindex from the parameters. Thus if you enter stemvid and press on the buttonmarked 2, you will get the forms of rootvid_1 (such asvetti), whereasif you press on the button marked 6 you will get the forms of rootvid_2 (such asvindati). If you want the forms ofvid_1 of class 6 (such asvidati) you have to enter explicitlyvid#1.

The forms displayed are the forms of the present system in the indicated class,followed by all forms of other systems known to the machine, in the primaryconjugation as well as in the derived conjugations. Not all seven forms of aoristare given for every root, only the ones that correspond to a paradigm explicitlylisted (from 1 to 7) in the lexicon. In the present version, the only passive formsare those of the present system (plus the passive aorist 3rd person forms suchasagāmi, listed in the middle/reflexive voice table).

The participles section lists a collection of participial stems in the 3 genders.Their gender marks are mouse sensitive links, which call the nominal declension toolto display the corresponding adjectival stems. Thus a strongly generative rootsuch as may lead literally to thousands of forms.

Not all returned forms are attested, we have been rather generous in permittingboth active and middle/reflexive voices for many roots.

All the forms obtainable from the dictionary entries are listed in morphological bases,available both in XML and in pdf formats. We remark that in these tables the conversionof finalr ors tovisarga hasnot been effected. Thisis essential for the correct analysis by the segmenter of constituants suchaspunarapi, obtainable by external sandhi from formspunar andapi.Thus thevisarga-ended forms displayed by the grammatical engines,consistently with the Indian tradition, actually correspond to an information lossat the time of display.

How should I present sentences to the reader/parser?

Sanskrit input is presented to the parser in two possible forms.
The first form is thesaṃhitāpāṭha continuous representation. All words arelinked together via sandhi. A hiatus between vowels is noted by _ (underscore), andelision of an initial a is noted by ' (apostrophe) for avagraha.
It is also possible to represent the input inpadapāṭha separated form, whereindividual words, represented in terminal sandhi, are separated by spaces.

The Sanskrit Reader gives you a choice between the two input conventionsby the control button Sandhied (saṃhitāpāṭha)or Unsandhied (padapāṭha). The default isSandhied. In this mode, spaces are allowed too to help the segmenterby indicating word limit places. Then, the successivechunksof input are not assumed to be in terminal sandhi form. Typically,a terminalo will be reverted as anaḥ if the first letterof the right chunk would turn it intoo by sandhi. Similarly ananusvāra may occur instead of anm.

In both modes, spaces may appear only at the frontier of full words -i.e., you cannot break a compound with spaces, and you cannot put a space between a preverb and a verb form.

Example of input (in VH transliteration):
dvitiiyakak.syaayaa.mdvebaaliketrayobaalakaazcapa.thanti
The same with a few spaces:
dvitiiyakak.syaayaa.m dvebaaliketrayo baalakaaz ca pa.thanti (in Sandhied mode) ordvitiiyakak.syaayaam dvebaaliketraya.h baalakaa.h ca pa.thanti (in Unsandhied mode).Note that we now have significantly less solutions (6 instead of 13).Fortunately, the parser is smart enough to propose only one solutionin both cases.

A phrase obtained by sandhi ofatas andatas may be represented asato'ta.h in both modes, asato 'ta.h orato ata.hin Sandhied mode, but only asata.h ata.h in Unsandhied mode.

Adding spaces may help the parser by giving segmentation hints.Also it helps focusingon the difficulty when a chunk is not analysable.The difficulty may be yours, such as aspelling or transliteration mistake.It may also be due to the limited coverage of ourlexicon.

In rare cases, it is not possible to split the input after the finalo of forms such asvocatives of nouns inu.

Most hiatus situations may now be handled in Sandhied form with a spacerather than an explicit underscore.For instance considerti.s.thanbaalaka upaadhyaayasyapraznaanaamuttaraa.nikathayati,equivalent toti.s.thanbaalaka_upaadhyaayasyapraznaanaamuttaraa.nikathayati.Here thepadapāṭha input would beti.s.thanbaalaka.h upaadhyaayasyapraznaanaamuttaraa.nikathayati,and actually this last input will lead to less potential solutions considered, since the formti.s.thanbaalake will not be considered as a possible potential segment. Similarlyanya upavi.s.taa.h will correctly interpret the firstcomponent asanye.
The only situation where an explicit underscore is still required isin VH transliteration, for words such astita_u andpra_uga.

The fully unsandhiedpadapāṭha form, where each chunk isapada (inflected form of a word) may be recognized by specifying theText category as Word rather than Sentence, using the correspondingradio button. In this mode, every chunk must be a single word.The best way to tag a single word is to input its form in Word Text category (eitherin Sandhied or Unsandhied mode, since the two differ only when there areseveral chunks).Thus, if you inputpravaran.rpamuku.tama.nimariicima~njariicayacarcitacara.nayugala.hyou get its unique decomposition as a 10 components compound, whereas you wouldget 16 possible segmentations in Sentence mode.

What is the meaning of the Strength argument to the parser/reader?

Our reader comes in two versions. The Complete version uses all possibleparticipial forms of all roots and will recognize the so callednancompounds (nouns or adjectives using the privative prefixa/an).The Simplified version recognizes such forms only when they are explicitlylexicalized in the Sanskrit Heritage dictionary. Furthermore, only theComplete version allows vocatives. The default version is Complete.The Simplified version may be used for teaching beginners on simplesentences without vocatives, since it is much more precise when it worksthan the Complete one, which may badly overgenerate. Once the basiccompetence on using the interactive interface is acquired, the user shouldswitch to the Complete mode.

The precise grammar used to recognize sentences in the Simplified versionmay be visualized as a local automaton graph:Simple.Compare with theComplete one.

Why does the system not recognize certain participial forms known to the stemmer?

Our Simplified reader/parser does not use all the participles generatedby the grammatical engine, and recognized as such by the stemmer,but only the ones that are explicitly listed in the lexicon. If you wantthe full generative power, use the Complete mode.

What is the use of the contextual topic argument?

It is now possible to specify a contextual topic argument as an ellipsed nominative. This indication is given to the parser by a radio button, allowing choicebetween a topic of a given gender and no topic (default value).By specifying a masculine topic (here corresponding toRāma),one may parse successfully sentences with an ellipsed topic. For instance,the phrase "jayati" parses as a finite verbal form, rather than the locativeof the present participle "jayat".
This feature is still experimental.

Can I cut in the system's output and paste in the system's input windows?

You can now inputदेवनागरी script inUnicode representation (UTF8 encoding), provided you press the Deva buttonin the corresponding input window. It is also possible to input romanisationscript with diacritics, such asdevanāgarī, by pressingthe Roma button.

Is the Sanskrit engine available as standalone software?

Yes, please visit theReference manual.

Where is the list of abbreviations defined?

The list of abbreviations, of the Heritage dictionary as well as the grammatical engine,is given at the end of the introduction of its book form. For the convenienceof the Web site users, the list of abbreviations is available hereas a standalonepdf document.

What is the policy regarding the rendering of proper names?

Proper names are indicated in the dictionary with an initial capital letter. Proper names are currently allowed for input in the "Sanskrit made easy" interface only.When the initial is a long vowel, it should be doubled, likeAAtreyaforĀtreya.

Proper names are in general presented in the nominative case, except that a possiblevisarga is dropped. Thus we writeAgni and notAgniḥ, andsimilarlyŚiva,Viṣṇu andLakṣmī;alsoTvaṣṭā,Mātā,Brahmā (the Creator God, distinct from the impersonal all-pervasivebrahman principle).AlsoAtharvā,Mātariśvā,Nandī,Aṅgirā,Viśvakarmā,Kṛtavarmā,etc. But alsoAṃśumān,Aryamān,Ketumān,Garutmān andHanumān.SimilarlyIravān,Kakṣīvān,Jāmbavān,Vivasvān,Śaradvān,Satyavān andHimavān.

What is the meaning of Piic?

Piic is a lexical class, it is to participles (Part) what Iic is to nouns (Noun);that is, it stores the raw stems of participles, usable as first component of a nominal compound. Participles are a special case of first level nominalderivatives from roots, calledkṛdantāḥ in thetraditional terminology.

What is the meaning of Ifc?

Ifc is a lexical class of nominal affixes that may only occur as second componentof a nominal compound. More generally, thephases of the lexical analyserare explained as states of a finite automaton describing the Sanskrit morphology.Look at its (simplified)state transition graphhere.

Who is using this site?

This site is used first of all by the researchers of the joint research teaminSanskrit ComputationalLinguistics, at theDepartment of Sanskrit Studies,University of Hyderabad, at theDepartment of Computer Scienceand Engineering, Indian Institute of Technology, Kharagpur,and at theSanskrit Library.Many other users are attested worldwide.Some of them are kind enough to send us their appreciations, please visit ourGolden book.

Objective Caml
Top |Index |Grammar |Sandhi |Reader |Corpus |Help |Portal
© Gérard Huet 1994-2023
Logo Inria

[8]ページ先頭

©2009-2025 Movatter.jp