Santali is characterised by a split into at least a northern and southerndialect sphere, with slightly different sets of phonemes: Southern Santali has six phonemic vowels, in contrast with eight or nine in Northern Santali, different lexical items, and to a certain degree, variable morphology. Santali is recognised by linguists as being phonologically conservative within the Munda branch. Unlike many Munda languages that had their vowel systems restructured and shrunk to five such as Mundari, Ho, andKharia, Santali retains a larger vowel system of eight phonemic cardinalvowels, which is very unusual in theSouth Asian linguistic area.[7][8] The language also usesvowel harmony processes inmorphology andexpressives similar toHo andMundari.[9]Morphosyntactically, Santali, together withSora, are considered less restructured than other Munda languages, having less influence fromIndo-Aryan andDravidian languages.[10] Clause structure istopic-prominent by default.[11]
The Santals call themselveshɔɽ (lit. 'man') and their languagehɔɽ rɔɽ ("language of the Santals"). It is also referred asmãjhi bhasa ("language of the Majhis"), and the Santals, when being asked about their caste, sometimes call themselvesmaɲjhi ormãjhi ("village headman", "chief").[12] InNorth Bengal, the language is known asjaŋli orpahaɽia. InBihar it is calledparsi ("foreign"). The name Santal, in turn, was derived fromSāmanta-pāla ('dwellers of the frontiers') and was used by Bengalis to refer the Santals. L.O. Skrefsrud assumed that Santal was derived fromSãot, name of a place inMidnapore region inWest Bengal where the Santals were supposed to have been settled in remote antiquity.[13] In Nepal, the Santali language is known asSatar.[14]
Santali remained non-literary until the mid-1800s, when European interest in thelanguages of India led to the first efforts to document it. The language was initially recorded using the Latin alphabet, thenBengali,Devanagari, andOdia by European-American anthropologists, folklorists, and missionaries such asJeremiah Phillips, A. R. Campbell,Lars Skrefsrud, andPaul Bodding.[16] Their work resulted in Santali dictionaries, collections of folk tales, and studies on the language’s morphology, syntax, and phonetics. By the late 19th-century, several Santal intellectuals began to use several writing systems to compose books, stories, and poems in their language. The first Santali weekly magazine in Latin alphabet, thePera Ho̠ṛ, was established in 1922, followed by theMarshal Tabon (1946); Bihar-run DevanagariHo̠ṛ So̠mbad (1947), BengaliPachim Bangla (1956), and theJug Siriro̠l (since 1971) in Latin. There are two Bangladesh-based Santali monthly magazines–Aboak’ kurumuTureak’ Kurai andGoDet’–both written in Bengali script and published fromRangpur andDhaka, respectively.[16]
In 1922,Sadhu Ramchand Murmu fromJhargram district of West Bengal attempted to create a Santali script calledMonj Dander Ank, but it did not gain popularity. Later, in 1925,Raghunath Murmu fromMayurbhanj district of Odisha developed theOl Chiki script, which was first publicised in 1939 and eventually became widely adopted.[17][18] The Ol Chiki script is now considered as official script for Santali literature and language acrossWest Bengal,Odisha, andJharkhand.[19][20] However, users from Bangladesh use Bengali script instead.[dubious –discuss]
Santali was included in theEighth Schedule to the Constitution of India for official recognition as a scheduled language in 2003 through the 92nd Amendment Act, granting it the right to be used in government communication, education, and competitive examinations.[21] In December 2013, theUGC, the higher education regulatory body of India, introduced Santali as a subject in theNational Eligibility Test (NET), enabling its use for lectureship and as a medium of instruction in colleges and universities.[22]
Santali is one of India's22 scheduled languages.[6] It is also recognised as the additional official language of the states of Jharkhand and West Bengal.[29][30]
According to observation by Ghosh, "In the lexicon SS (Southern Santali) and NS (Northern Santali) are somewhat different, initiated by borrowing from the neighbouring languages. The local borrowings in the two dialects are so high that sometimes one appears to be unintelligible to the other. In certain cases the usage is also different."[16]
Santali has 21 consonants, not counting the 10 aspirated stops which occur primarily, but not exclusively, inIndo-Aryan loanwords and are given in parentheses in the table below.[35][page needed]
In native words, the opposition between voiceless and voiced stops is neutralised in word-final position. A typical Munda feature is that word-final stops are "checked", i. e. glottalised and unreleased.
Bodding (1929) noted that in the vowel space between an open syllable and a syllable that starts with a vowel, if both vowels are of the same height,approximant /w/ is inserted in between cues of two low vowels, and /j/ for mid-high and high vowels.
The Southern Santali dialect (Singhbhum) features a smaller inventory of six vowels /a, i, e, o, u, ə/.[36][37]
There are numerous diphthongs and triphthongs. Larger vowel sequences can be found, eg.kɔeaeae, meaning 'he will ask for him', with six consecutive vowels.[38]
Note that in the level diphthongs /ea, ia, io, iu, oa, ua/, semivowels /w, j/ are usually inserted in between and dissolve the diphthong into two syllables when realised.[39]
Santaliprosody exhibits iambic patterns with stress is always released in the second syllable in most disyllabic words, excepting loan words fromHindi,Bihari,Bengali andAssamese. In trisyllabic words, a process called V2 deletion actively drops the second vowel, turning the supposedly trisyllable into a disyllable consisting of two heavy syllables. Despite that, stress consistently falls on the second syllable. Eg. hapaɽam ('ancestor') → hapˈɽám.[40][10]
Like all Kherwarian languages,vowel harmony in Santali is a morphological triggered process.[41] In morphology and word formation, Santali uses a vowel harmony system based on vowel height. There are certain restrictions in a vowel harmonic sequence:[9]
1). /e/ and /o/ never co-occur with /u/ in the same stress unit (word with affixes, enclitics,...).
2). /ɛ/ and /ɔ/ never co-occur with /e/ and /o/. Thus, some suffixes and enclitics may have two variants, such as the instrumental suffix-tɛ, the vowel is raised to /e/, → [-te]. Note that this only occurs with weak (harmonic) syllables and suffixes, while others do not. More examples to show:ɛɽɛ=e → [ɛɽɛ=jɛ] (lie=3),ɛgɛr ("to scold"),gɔʈɛn ("part"),mɛrɔm ("goat"),ɛhɔp ("to begin").
3). Syllables with /i/ and /u/ only co-occur with /ə/, but not /a/. Eg.busək ("to give birth"),bidə ("to dismiss"),əgu ("to bring").
4). Only /a/ can co-occur with /e o ɛ ɔ/ while /ə/ cannot. Eg.boŋga ("evil spirit"),sadɔm ("horse"),hako ("fish"),mare ("ancient").
5). /e/ may be alternated to /i/ if the preceding syllable ends with /u/ or /ə/.
Santali, like allMunda languages, is a suffixingagglutinating language. It remains a subject of intense linguistic debate over whether Santali and related languages such asMundari andKherwarian lects have recognizableparts of speech (verbs, nouns, adjectives,...). Traditional grammatical descriptions often treat lexemes that take cases in a syntactical unit as parts of the nominal system, and those that take TAM/Person/Number as verbal. However, deeper analyses by Neukom (2001),Hengeveld & Rijkhoff (2005), Peterson (2005), Rau (2013) suggest that in fact Santali is a flexible language; that is, the lexemes are inherently underspecified for lexical category and can either function in referential ("noun"), predicative ("verb"), or attributive ("modifier") roles; whileEvans & Osada (2005) andCroft (2005) argue that the Kherwarian languages do possess, but fluid, defined word classes. Currently, theOxford Handbook of Word Classes (2023) rates Santali as a Type I Flexible language.[42]
Santali has possessive suffixes which are only used with kinship terms: 1st person-ɲ, 2nd person-m, 3rd person-t. The suffixes do not distinguish possessor number.[46]
True gender distinction marking on nominals and verbs (like in Sanskrit, Hindi, otherIndo-Aryan andDravidian languages) does not exist in Santali. Native peripheral markers such as the genitive, locative markers, and nominalizers can be used to distinguish between animate and inanimate noun classes. For lexicalized gender distinction, there are several ways to mark the contrast between female and male:
- Morphologically-marked modifiers borrowed from Indo-Aryan such as-i for feminine, and-a for masculine are found in certain lexemes:[48]
kuɽa ("boy") – kuɽi ("girl")
bhola ("dog") – bholi ("bitch")
mama ("maternal uncle") – məni ("maternal aunt")
caɖra ("bald man") – cəɖri ("bald woman")
bheɖa ("ram") – bheɖi ("sheep")
- Sex-based gender lexemes. These words are inherently gendered and cannot be inflected for gender, unlike the words listed above.[49]
dʒãwaj "husband" – bəhu "wife"
bɔeha "brother" – misɛra "sister"
ənɖiə "ox" – gəi "cow"
kaɖa "male buffalo" – bitkil "female buffalo"
- Compounded sex-based gender. The head noun is compounded with a gender-denoting modifying word. Masculine compounds go withənɖiə,sanɖi,pɛ̄ʈhar,kuɖu, and feminine objects go withɛŋga,bətʃhi, andpəʈhi.
The demonstratives distinguish three degrees of deixis (proximate, distal, remote) and simple ('this', 'that', etc.) and particular ('just this', 'just that') forms.[52]
The numerals are used withnumeral classifiers. Distributive numerals are formed by reduplicating the first consonant and vowel, e.g.babar 'two each'.
Numbers basically follow abase-10 pattern. Numbers from 11 to 19 are formed by addition,gel ('10') followed by the single-digit number (1 through 9). Multiples of ten are formed by multiplication: the single-digit number (2 through 9) is followed bygel ('10'). Some numbers are part of a base-20 number system. 20 can bebar gel orisi.
To derive new nominals, the stems of lexical verbs, adjectives, and other nouns can employ many different methods, includingaffixation,reduplication, andcompounding.
Suffixation: Two nominalising suffixes-itʃˀ for animate, and-akˀ for inanimate noun class, are used to form referential nominals.[54]
Verbs → nouns:jɔm ('eat') >jɔmakˀ ('food')
adjectives → nouns:nɔtɛ ('this side') >nɔtɛn ('belonging to this side') >nɔtɛnakˀ ('thing of this side') /nɔtɛnitʃˀ ('one of this side')
Infixation is the most productive derivation method in Santali. Infixes-tV-,-nV-,-mV-,-ɽV-, and-pV- are often inserted into nouns, verbs, adjectives to derive new words.[55]
ɛhɔp ('begin') >ɛtɔhɔp ('beginning')
rakap ('rise', 'ascend') >ranakap ('development')
Prefixation in North Munda has been reduced to a very few restricted exceptions.[56]
Verbs in Santali inflect for tense, aspect and mood, voice and the person and number of the subject and sometimes of the object.[57] However, definingparts of speech in traditional linguistic terms, such as "verbs" and "nouns" inJharkhandiMunda languages more generally (including mostKherwarian varieties andKharia) is a highly controversial issue, since the evidence for discrete lexical categories like nouns, verbs, and adjectives is often extremely weak or even virtually absent, at least in the basic lexical level.[58] From this perspective, it may be nearly unfeasible to apply the conventional parts-of-speech framework to North Munda. A single element with apparently nominal semantics (may be metonymic in nature) may function as the predicate base in one sentence (typically in clause-final position), while appearing elsewhere as an argument in the same phonological and morphological form with zero-derivation. In fact, predicates and their complements may be primarily defined by syntactic configurations rather than by inherent lexical categories. For further theoretical and empirical discussions on word classes in Mundari, seeEvans & Osada (2005),Peterson (2005),Hengeveld & Rijkhoff (2005),Croft (2005); for Kharia, seePeterson (2013).
Similarly, Santali has been described as a language with a regular degree of lexical flexibility.[59]Neukom (2001) posits that "nouns" don't exist in Santali, but instead there are "flexible lexemes" that can function either as arguments (=referential role) or as predicates within phrasal units, with no profound categorical distinction between these uses.[60] In everyday speech, Santali flexibility may show even more idiosyncrasies than those documented for Mundari.Rau (2013) provides attested examples showing that, within accepted usage, even proper names—cross-linguistically often treated as purely referential expressions denoting inherent properties may frequently occur as predicates in Santali without eliciting objections.[61] For instance, the sentenceunkin-dɔ Kaɽa ar Guja-wa-kin-a 'Their names were Kara and Guja' (lit. "they were Kara-and-Guja-ed") uses the second proper name directly as an active applicative predicate, while the first name precedes the conjunctive element, producing a distributive interpretation of the predication.[62]
Neukom (2001) further notes that almost any type of lexeme—including nominals, interrogatives, and indefinites—can function predicatively, but only in combination with either a light verb copula (kan "COP.IPFV" ortahɛ̃kan "COP.IMPREF") or an applicative suffix-a/-wa (often glossed as "for/to someone") plus the indicative/finite suffix. Together, these elements act as a compositional verbalising operator, yielding a structure that behaves like a nominal sentence.[63][64]Rau (2013) also notes that there are examples of zero-copula construction.[65] A commonly cited property of lexically flexible languages is the absence or reduced productivity of lexical derivational mechanisms. WhileGhosh (2008) (#Morphology##Derivation) showcases that Santali does indeed possess a productive derivational system, the extent to which derived forms participate in systematic, corpus-wide lexical flexibility in Santali has not yet been assuredly established. For discussion on the flexibility of Southern Santali, seeDash (2025).
The Santali TAM system is very complicated. In fact, categories of tense-aspects and voices always fuse into an interlocked system consisting of a series of verbal subtemplates, so it is impossible for analyses to single out a morpheme that marks a single TAM category accordingly. TAM paradigms interact withactive andmiddle voice intricately: Active TAMs denote senses of UNMARKED, transitive, volitional, and outwardly directed, mostly employed in polyvalent predicates; Middle TAMs signify the status of intransitive, self-directed, and avolitional, mostly found in monovalent predicates. There are two subtemplates for the imperfective and perfective. Two recognisable tense categories are non-past and past, and the past is further divided into two tenses: anterior andaorist. Theimperative/prohibitive do not have any markers but possess their own unique verbal templates.[66]
Applicative voice in Santali is represented by adding the applicative marker-a- to four tenses (Future, Imperfective, Past 1, Perfect) with an additional and rare Past 2 tense in the cases of inanimate objects. The active set serve polyvalent predicates, while the middle set mark for monovalent ones.
Transitive verbs may form agreements with non-arguments/outside/indirect objects. To denote inalienable possession of the concerned indirect object, prefix-t- is attached to the applicative forms of the pronouns; otherwise it is marked in the noun phrase and functions as an attribute.
In specific contexts nowadays, Santali speakers have been increasingly using the pronominal duals to expresshonorific in a generalised sense to show respect to the addressed interactants, such as senior, highly-regarded, or unfamiliar persons.[67][68]
Two verbsmena ("to be") andhena ("to have") have irregular templates. The subject pronominal marker, instead of being an enclitic form, appears as a suffix in the slot where the object marker normally would be placed.[69] All constructions involving these two verbs are conjugated in the middle voice to express existence, possession, and location.[64]
Two CLF chicken children be-MID.PRES-3PL.SUBJ-1PL.POSS-FIN
'We have two chicks.'
Santalimena seems to be stemmed out from a small number of originally middle, intransitive predicate bases that have an inversed pronominalized pattern. Some other inherently intransitive, low agency, and non-volitional verbs such asrɛnɛtʃ ("be hungry") may display similar irregular behaviors like that ofmena.
In Santali as well asKherwarian languages, the pronominal subject markers are mobileclitics that may encompass the whole clause. In most of the cases, except the stemsmena andhena mentioned above, the pronominal subject clitics have two placements: (1) attach to the word preceding the verb stem, or, (2), enclitic to the final position of the verbal complex:
According to MacPhail (1957), (1) occurs more frequently than (2).[71]
In complicated predicates, where there are more than one lexeme constitutes the sentence, such as the glossed one below, the subject clitic follow the (2) indexation pattern, not the (1) as expected:[72]
we(PL.EXCL) TOP stupid ignorant foolish Bhuya IPFV.COP-IND=1PL.EXCL
'We are foolish, stupid, witless Bhuyas.'
The placement of the subject clitic can also distinguish the type of nominal sentences (sentences with copulae). In apredicational sentence where the subject isreferential and the complement is non-referential, the host of the clitic is the subject.[73]
Indexing arguments in Santali is essentially intertwined with the distinction ofanimacy of arguments. Distinction between animate/inanimate is not marked on nouns at all, but is conveyed through morphosyntax, such as in genitive and locativecases and verbal agreement. That is, if an argument of the verb does not belong to the animate noun class, the verb will not index that argument. Inanimate entities such as flower, tree, rice, book, food,... and objects that cannot move by themselves like vehicles (eg. motorbike, car, aeroplane) are never indexed by the verb. However, there are some notable exceptions of inanimate objects that are significant ('sun', 'moon', 'star') or culturally important ('doll') are considered animate in Santali:
Likewise, 'Government' is also considered a single body of animate entities and is marked with third person singular. Even mushroom, thorn being pricked, puff-ball, earwax are perceived as animate and are indexed by pronominal markers as such, showing the unpredictability of the Santali animacy-based indexation system.[75]
In negative formations, the negation particle may show indexation of an inanimate subject, while other Kherwarian languages suppress it.
As described byGhosh (2008), there are no specific markers for the imperative series. However, in the affirmative imperative, the indicative/finite marker-a is replaced by second person markers. In the negative imperative, verb (TAM/person-syntagma) takes-a while the imperative subject marker moves to the enclitic position behind the negative particle, right before the verb (See ##Negation).
Any finite predicates will attach-a, except the imperative and in the subordinate clause. This suffix marks the predicate an indicative (real, default, narrative) mood.[76]
There are two causative markers:a- and-otʃo.-otʃo is attached on every type of verb stems, anda- is restricted to two transitive verbsjɔm ('eat') andɲu ('drink').[77]
While both the causative and the permissive share the same suffix-otʃo, the permissive is different as an applicative marker is combined with the causative morpheme, resulting in the shift of the concerned person from the accusative to the dative position.
Infix-pV- turns transitive and ditransitive verb roots into reciprocal meaning, but in many verbs it also conveys that the action is done together by two participants.[78]
The benefactive for transitive and ditransitive stems is-ka in Northern Santali dialect and-ka-k in Southern Santali. In Southern Santali, if the object is animate, the last-k will be replaced by pronominal clitics. All benefactive stems are conjugated with active TAM markers.[78]
Transitive roots, transitive-intransitive roots, and causative stems will take-ok to derive passive stems. In the transitive-intransitive roots, it denotes the prominence of transitivity. Attaching it to transitive verbs will create reflexivity.[79]
ɲɛl ('see') >ɲɛlok ('be seen') (passive)
ranotʃo ('cause to medicate') >ranotʃok ('be caused to medicate') (causative > passive)
mak ('cut') >makok ('cut oneself') (reflexive)
The intransitive applicative TAM set is also interpreted as expressing reflexivity and used to emphasise the action directed toward the subject themselves.
In daily speeches, nominal roots can be found functioning as verbs with appropriate inflection. The verbalisation of nominals extends to interrogatives and indefinites. Adjectives that are derived from nominals can take inflection as well as person indexation, too. It is said that virtually every entity-denoting lexeme is capable of functioning the predicative role in Santali.[64]
In the (1) example, the "verbalized" predicate structure of the lexemeɔdʒɔn bears the identical semantics as of the free lexeme itself, with an additional applicative (to give DATIVE) sense. The (2) sentence with middle TAM suffix also shows similar regularity of semantics, producing an inchoactive meaningto become X (X here is entity/state/property-denoting semantics). The (3) sentence exemplifies an active TAM suffixed predicate using an "noun-like" lexemeʈuər ("orphan") as the semantic base, which brings up a subtle shift to causative themeto make X/make someone be X, but the semantics is still mostly uniform (orphan–motherless).[81]
Similar "verbalization/recategorization" viazero derivation like these can occur in English (eg. gun–gunned "get shot by gunfire", ice–iced "become ice", empty–emptied "become empty, make something empty",...).[82] However, English has both idiosyncratic verbalization (unpredictable semantic outcome) and compositional verbalization (predictable semantic outcome),[83] while in Santali it displays extreme regularity and predictability as they have direct semantic correspondence with their nominal counterparts and very little idiosyncrasies.[61][84]
that.far.INAN tree this.INAN tree-PL-ABL TOPbig-FOC-IND
'That tree isbigger than this tree.'
The existence of an independent adjective class in Santali is invalidated by sentences (5), (6), (7), since these adjective-like lexemes can occur in predicate position, take TAM/Person/Number and semantically/syntactically behave like the aforementioned examples (1), (2), (3).[56]
Further more, mimetic sounds, such asãã (animal groan) (8), complex units, such as the postpositional phrasekombɽo tuluj "with thieves" (9), and even proper names (10) can function as the semantic bases of the predicates. These examples below provide a compelling argument against analyzing the flexibility as a lexical derivational process byEvans & Osada (2005).[72] This perspective on "verbalization" support the implication that rather than a linguistic anomaly, flexibility is in fact the nature of the language itself.[85][a]
uni buɖhi-ren hɔpɔn-tɛtˀ koɽa-wakˀ ɲutum=dɔTurtə-wa-e-a
that old.woman-GEN.AN son-3.POSS boy-NMLZ.INAN name=TOPTurta-APPL-3SG.OBJ-IND
'The old woman's son's name wasTurta.' (lit. 'That woman's son's name was Turta-ed')
In the cases of proper names, when an active applicative suffix is applied, it expresses thatx is caused to be the individual named N, which translates intobeing called N. In nonpast active form, the construction describes the (temporal) property ofbeing the individual named N to the subject.[87]
Two or more verbs and modifiers can combine together to derive a compound verb. Normally they are combinations of two transitive verbs or two intransitive verbs and limited numbers of transitive+intransitive and intransitive+transitive combinations.[80]
Complex predicates are pervasive in Munda clause structure. Simple verbs like go, become, finish, come, try,... are often employed as auxiliary verbs (v2 in South Asian linguistics) to add or embolden modality, aktionsart, and orientations to the predicates. In Santali, there are univerbated auxiliary constructions to mark many functions. One example show below, the verbgɔt ("pluck") is often used as auxiliary verb to denotetelicity, that is, a quick, sudden, or intense action.[88] Santali AVCs exhibit split-doubled pattern: the lexical verb may index the object argument, and the auxiliary verb may index the subject argument.[89]
'You guys suddenly caught sight of him/her' or 'You guys saw him/her off/said good-bye to him/her.'
Some auxiliary constructions may exhibit behaviours of compound verbs. Two most common used auxiliary verbs in Santali aredaɽe ("can") andlega ("try"). The first one is often combined with an active applicative suffix, while the latter mostly found with the middle TAMs.[90]
There are three particles in Santali used to express negation:baŋ,ɔhɔ andalo.baŋ andba (shortened form) are the negatives for interrogative and declarative sentences;ɔhɔ is the emphatic negative of declarative sentences;alo is the prohibitive negative in the imperative. These negation particles will take away the subject marker from the verb.[91]
In existential/locative copular formations, negation is different in present tense and past tense. Below is the chart of negative, non-past, fully finite existential/locative copula paradigm:[92]
Negated, non-past, fully finite copular structure
singular
dual
plural
1st person
exclusive
bən-ug-iɲ=a
ban-uʔ-liɲ=a
ban-uʔ-le=a
inclusive
ban-uʔ-laŋ=a
ban-uʔ-bon=a
2nd person
ban-uʔ-m=a
ban-uʔ-ben=a
ban-uʔ-pe=a
3rd person
Animate
ban-ug-itʃˀ=a
ban-uʔ-kin=a
ban-uʔ-ko=wa
Inanimate
ban-uʔ=wa
In negative past copular constructions, the negation particleban encodes the subject, and the past tense is indicated by the separate copulataheken.[93]
Expressives arguably can be justified as an independent lexical category in Santali.Echo-word formation can be constructed by three processes: (1) generating masdar in an identical form; (2) augmenting a consonant in the repeated element; (3) vowel mutation. Sometimes masdars co-occur with vowel mutation simultaneously. Expressives can express highly detailed semantics and cannot behave like nominals or predicates.[90]
(1) masdars. These expressives are formed by simply reduplicating the first element.
ahal ahal "distressed"
atrɔm atrɔm "incompletely"
baɖgɔˀt baɖgɔˀt "rough"
datʃaŋ datʃaŋ "ubiquitous"
halaˀt halaˀt "slightly"
kãˀtʃ kãˀtʃ "whine as a dog"
adʒaˀk adʒaˀk "clamour for"
baɖgaˀk baɖgaˀk "sharp painful sensation"
tʃəɖuˀk tʃəɖuˀk "noise of pumping into water"
gab gab "sink deeply"
dʒeleˀp dʒeleˀp "flashing"
məkur məkur "sound of crunching"
(2) (∅VX CVX) masdars with augmenting a consonant
əbuˀk tʃəbuˀk "here and there"
abɛ tabɛ "just at the time of"
adha padha "unfinished"
əɖəi bəɖəi "arrogant"
albaʈ salbaʈ "contradictory"
(3) (∅V1X CV2X) with vowel mutation
adha padhə "half"
agaɽ bigəɽ "topsy turvy"
əhir kuhir "fix the eyes upon"
ə̃iʈhə̃ dʒithə̃ "leavings of food"
əril kuril "stare as smoke nips the eyes"
The initial and medial consonants of the first element may be alternated in masdars.[94]
kadar kapar "rubbish"
hadraˀk gasraˀk "stumbingly"
The Santals categorize expressives as a form of "twisted speech" (benta katha), a discourse mode characterized by profound metaphorical depth.[95] These items occupy a central role in Santali daily communication and cultural life. Expressives are especially high prevalent within performance traditions—including music, storytelling, folktales, and poetry—with an extensive presence in the oral genres of performances.[96]
Example:noa əɖi maraŋ bir (that very big forest) "very big forest"
Santali person indexation clearly shows that it follows thenominative-accusative alignment: the subject pronominal clitic agrees with the person/number of the nominative argument; the object pronominal infix agrees with the person/number of the accusative argument.[64] But there are no markings featured on NPs whatsoever to show their relation:
Thus, word order may be used to determine which constituent part of the non-verbal elements is the subject argument or accusative/object argument. Usually, the unmarked word order isSOV. However, Santali word order is highly influenced by context, discourse, and pragmatism. If the S/A is considered less topical than the O/P, then the word order would be reduced to OV. The sentence would be shrunk down further if no argument is deemed topical.[11] Some can argue that then the pronominal clitics representing arguments in NPs, perhaps, should be considered the arguments themselves.
The default word order of INTRANSITIVE, MONOVALENT sentence is SV, though notice that it can be reduced if the subject is not a matter of topic or focus.
used in conditional sentences to introduce the apodosis, in which the protasis is supposed not to have been realized, and therefore, the apodosis would not have occurred.
unkin dɔ din-ge əɖi kurumuʈu=kin kəmi-jamɛnkhan tʃheka-katɛ=je mitˀ din uni hɔɽ-rɛn orakˀ boŋga dɔ bɔhɔkˀ latʃˀ haso ɲam-ke-d-e-a
They.DU TOP day-FOC very diligently=3DU.SUBJ work-INDbut how-CONV=3SG.SUBJ one day s/he man-GEN house goddess TOP head stomach pain get-ACT.AOR-TR-3SG.OBJ-IND
'They two work very hard, but one day for unknown reason the man's wife was affected by pain in stomach and head.' (They → the man's wife)
In subordinating clauses, there are the uses of converbkatɛ, ablativekhɔn, place markerʈhɛn, temporalkhan, and purposivejɛmon available to link the subordinates with the narrative clauses.[101]
Indefinite pronounsjãhã ("any") andjãhãe ("anyone") are used to link relative clauses. The choice of which particle should be used primarily depends on the semantics and animacy of the referred argument.[102]
any tree-LOC=2SG.SUBJ climb.AUGM.PASS.MID-MID.ANT-IND that-LOC one CLF honey comb be-MID.PRES-IND
'There is a honey-comb in the tree which you climbed.'
Pronouns, interrogative pronouns, and correlative particlesjodi ("if"),tahle ("then"),tobe ("then"),dʒɛmɔn ("as"),tɛmɔn ("so") are used to form correlatives in both the main and attributive clauses.[103]
'Chando fulfilled my wish as I had asked.' (~ As I had said, so Chando fulfilled my wish)
Combining uses of indefinite pronouns with demonstratives/locatives likejãhã:ona,jãhãe:uni/onko, andjãhã:on-rɛ likewise can also be considered correlative conjunctions.[103]
In daily conversations, Santali speakers generally employ high percentages of words of native Austroasiatic/Munda/Santali origins, compared to other Munda languages such asKharia andJuang. The loan strata, mostly borrowed fromHindi (eg.rəskə "joy" < Hindirasika,haʈ "market" < Hindihāʈ,kagodʒ/kagotʃ "paper" <Persiankāgaz via Indo-Aryan,...) and regional languagesSadri (Eg.kuʈəm/kutɨsi "hammer" < Sadrikuʈasi),Khortha,Angika,Maithili,Assamese,Bengali (Eg.rəs "heap" < Banglaraʃi,bhəgnə "nephew" < Banglabhagina/bhagna),Nepali,Oriya and even English may account for almost 20% of the lexemes of daily needs. Younger generation who have opportunities to engage in higher education tend to be more accustomed with lexical influence from neighbouring languages as well as English.[104] A good number of words seem to be derived from earlier stages of Indo-Aryan (eitherVedic Sanskrit dating from 1,500-1,000 BCE,Classical Sanskrit ~500 BCE, orMiddle Indo-Aryan) are also found, such asdatlom "sickle" < Vedic/Classical Sanskritd̪at̪ra-m "sickle-SG.N.NOM/ACC" (cf.Palid̪at̪t̪a "sickle",Bengali দাd̪a "blade"). Santali also is the source of borrowings by several regional Indo-Aryan languages, namelySadri,Khortha andKurmali. Eg. Khorthagidʌr "child" < Santaligidrə, Kurmalinisʈai "exactly, truly" < Santaliniʈsahi, et cetera.[105]
A limited number of words are shared betweenKuṛux and Santali, eg. Kuṛuxkʰotā "nest" and Santalitukə "nest", Kux.ura "beatle" and Sat.uru "beatle", Kux.busū "straw" and Sat.busupˀ "straw", but they are difficult to analyze as their cognates also appear across other Munda and Indo-Aryan languages.[106] A very few lexical items appear to be shared between Munda andTibeto-Burman, likely represent the remaining traces of earlier contact between the two groups.[107] Eg.Tshobdunsnəm "oil" and Santalisunum "oil",Limbupɛːr "to fly" and Sat.apir "to fly",Lepchapok "to throw" and Sat.tapaʔ "to throw", et cetera.
As for the Austroasiatic lexicon, most Santali terms share same origins with other Austroasiatic languages, including aspirated-phoneme words. For examples:
^Note that flexibility is mostly a North Munda/Jharkhandi phenomenon. In comparison, South Munda languages such asRemo,Sora,Gorum exhibit much less flexibility compared to North Munda and Kharia.[86] For instances, modifiers (i.e. "adjectives") cannot take TAM/Person and have (some languages optionally) to be accompanied with copula verbs in predicational sentences: 1). Remo
^Kobayashi, Masato; Osada, Toshiki; Murmu, Ganesh (2003). "Report on a Preliminary Survey of the Dialects of Kherwarian Languages".Journal of Asian and African Studies.66:331–364.
^"Santali". The Department of Linguistics, Max Planck Institute (Leipzig, Germany). 2001. Archived fromthe original on 1 December 2017. Retrieved27 November 2017.
^Choksi, Nishaant (2021). "Structure, Ideology, Distribution: The Dual as Honorific in Santali".Linguistic Anthropology.31 (3):382–395.doi:10.1111/jola.12343.
^Sidwell, Paul (2024). "500 Proto Austroasiatic Etyma: Version 1.0".Journal of the Southeast Asian Linguistics Society.17 (1):i–xxxiii.hdl:10524/52519.
Hansda, Kali Charan (2015).Fundamental of Santhal Language. Sambalpur.
Hembram, P. C. (2002).Santali, a natural language. New Delhi: U. Hembram.
Newberry, J. (2000).North Munda dialects: Mundari, Santali, Bhumia. Victoria, B.C.: J. Newberry.ISBN0-921599-68-4
Mitra, P. C. (1988).Santali, the base of world languages. Calcutta: Firma KLM.
Зограф Г. А. (1960/1990). Языки Южной Азии. М.: Наука (1-е изд., 1960).
Лекомцев, Ю. K. (1968). Некоторые характерные черты сантальского предложения // Языки Индии, Пакистана, Непала и Цейлона: материалы научной конференции. М: Наука, 311–321.
Maspero, Henri. (1952).Les langues mounda. Meillet A., Cohen M. (dir.), Les langues du monde, P.: CNRS.
Neukom, Lukas (2001).Santali. München: LINCOM Europa.
Pinnow, Heinz-Jürgen. (1966).A comparative study of the verb in the Munda languages. Zide, Norman H. (ed.) Studies in comparative Austroasiatic linguistics. London—The Hague—Paris: Mouton, 96–193.
Sakuntala De. (2011).Santali : a linguistic study. Memoir (Anthropological Survey of India). Kolkata: Anthropological Survey of India, Govt. of India.
Vermeer, Hans J. (1969).Untersuchungen zum Bau zentral-süd-asiatischer Sprachen (ein Beitrag zur Sprachbundfrage). Heidelberg: J. Groos.
2006-d. Santali. In E. K. Brown (ed.) Encyclopedia of Languages and Linguistics. Oxford: Elsevier Press.
Peterson, John M. (2005). "There's a grain of truth in every "myth", or, Why the discussion of lexical classes in Mundari isn't quite over yet".Linguistic Typology.9 (3):391–405.
Hengeveld, Kees; Rijkhoff, Jan (2005). "Mundari as a Flexible Language".Linguistic Typology.9 (3):406–431.
Croft, William (2005). "Word classes, parts of speech, and syntactic argumentation".Linguistic Typology.9 (3):431–441.
Osada, Toshiki (1996). "Notes on the Proto-Kherwarian vowel system".Indo-Iranian Journal.39:245–258.doi:10.1007/BF00161864.
Anderson, Gregory D. S. (2007).The Munda verb: typological perspectives. Trends in linguistics. Vol. 174. Berlin: Mouton de Gruyter.ISBN978-3-11-018965-0.
Peterson, John M. (2013). "Parts of speech in Kharia: a formal account". In Rijkhoff, Jan; Lier, Eva Helena van (eds.).Flexible word classes: typological studies of underspecified parts of speech (1 ed.). Oxford: Oxford University Press. pp. 131–168.doi:10.1093/acprof:oso/9780199668441.003.0005.ISBN978-0-19-966844-1.
Rau, Felix (2013). "Proper names, predicates, and the parts-of-speech system of Santali". In Rijkhoff, Jan; Lier, Eva Helena van (eds.).Flexible word classes: typological studies of underspecified parts of speech (1 ed.). Oxford: Oxford University Press. pp. 169–184.doi:10.1093/acprof:oso/9780199668441.003.0006.ISBN978-0-19-966844-1.
Anderson, Gregory D. S. (2014). "Overview of the Munda languages". In Jenny, Mathias; Sidwell, Paul (eds.).The Handbook of Austroasiatic Languages. Leiden: Brill. pp. 364–414.doi:10.1163/9789004283572_006.ISBN978-90-04-28295-7.
Anderson, Gregory D. S.; Jora, Bikram (2023). "A Typology of Grammatical, Local/Directional and Instrumental Markers in Kherwarian Languages". In Ring, Hiram; Sidwell, Paul (eds.).Papers from the Eighth International Conference on Austroasiatic Linguistics. JSEALS Special Publication No. 11. University of Hawai'i Press. pp. 1–14.
Anderson, Gregory D. S. (2020). "Proto-Munda Prosody, Morphotactics and Morphosyntax in South Asian and Austroasiatic Contexts". In Jenny, Mathias; Sidwell, Paul; Alves, Mark (eds.).Austroasiatic Syntax in Areal and Diachronic Perspective.Brill. pp. 157–197.doi:10.1163/9789004425606_008.ISBN978-90-04-42560-6.
Anderson, Gregory D. S.; Jora, Bikram (2020). "Proto-Kherwarian Negation, TAM and Person-Indexing Interdependencies". In Jenny, Mathias; Sidwell, Paul; Alves, Mark (eds.).Austroasiatic Syntax in Areal and Diachronic Perspective.Brill. pp. 236–257.doi:10.1163/9789004425606_008.
Dilip, Mayuri; Kumar, Rajesh; V. Subbārāo, Kārumūri; Rao, G. Maheshwar; Everaert, Martin (2020). "Relative Clauses in Santali: A Matching Analysis Approach". In Jenny, Mathias; Sidwell, Paul; Alves, Mark (eds.).Austroasiatic Syntax in Areal and Diachronic Perspective.Brill. pp. 258–283.doi:10.1163/9789004425606_011.
Kisku, Sarada Prasad; Murmu, Ganesh; Choksi, Nishaant (2020). "Expressives in the Santali Poetry of Sadhu Ramchand Murmu". In Badenoch, Nathan; Choksi, Nishaant (eds.).Expressives in the South Asian Linguistic Area.Brill. pp. 223–236.doi:10.1163/9789004439153_011.ISBN978-90-04-43915-3.
Subbarao, K. V.; Everaert, Martin (2021). "Agreement Reversal in Munda Languages: An Interplay of Functional/Thematic and Syntactic Criteria". In Mohan, Shailendra (ed.).Advances in Munda Linguistics. Cambridge Scholars Publishing. pp. 108–130.ISBN978-1527570474.
Kobayashi, Masato (2021). "The Past Suffixes of Hill Korwa". In Mohan, Shailendra (ed.).Advances in Munda Linguistics. Cambridge Scholars Publishing. pp. 142–150.ISBN978-1527570474.
Macphail, R. M. (1964).An Introduction to Santali, Parts I & II. Benagaria: The Santali Literature Board, Santali Christian Council.
Minegishi, M., & Murmu, G. (2001).Santali basic lexicon with grammatical notes. Tōkyō: Institute for the Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies.ISBN4-87297-791-2