Movatterモバイル変換

Wiktionary:Beer parlour/2007/September

From Wiktionary, the free dictionary

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.

Beer parlour archivesedit

2026

2025

Earlier years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

"Third-person" vs. "Third person"

Perhaps this is a silly question, but things likeCategory:Third person singular forms should beCategory:Third-person singular forms (and subcategories, and similar categories for other people), shouldn't they? Our articles indicatethird-person is the adjective form, while thethird person is the noun, which makes sense, so the hyphenated form makes more sense in this context, no?Dmcdevit·t 06:05, 1 September 2007 (UTC)[reply]

I agree with the opinion that's probably intended, which is that the form "third-person" is for attributive use and the form "third person" is for other uses. (I don't think it's ever truly an adjective, and if it were, I think I'd apply the same difference-splitting rule.) So, I'm all for moving them. —Ruakh_TALK07:07, 1 September 2007 (UTC)[reply]

At the same time, I would also like to moveCategory:English third person singular forms toCategory:Third-person singular forms (English) (or possiblysomething longer). When we're finally able to derive language from the language code, that's the format that I believe would be most supported. Stephen doesn't like the two- and three-letter codes because they're easy to mix up, and putting the language at front is ambiguous. Opinions?DAVilla 19:38, 1 September 2007 (UTC)[reply]

Why does putting the language at the front look ambiguous? I think the parenthetical just looks bad, and even "in English" at the end would be better. Also I do very much prefer putting the tense in the category, too. Take a look at some of the categories{{es-verbform}} spits out atUser talk:Dmcdevit/Test.Dmcdevit·t 20:37, 1 September 2007 (UTC)[reply]

Variations such as "in English" are completely acceptable.DAVilla 04:40, 2 September 2007 (UTC)[reply]

Since the basic level of subdivision on Wiktionary is the language, it makes most sense to have "English" or "en:" at the front of any category specific to English. --EncycloPetey 01:45, 2 September 2007 (UTC)[reply]

To address both of you, of course it isn't a problem for parts of speech, but it doesn't generalize to the topical categories, where "English given names" can mean the given names used in England, and "Russian mountatins" are the mountains of Russia, not the names of all mountains in Russian. Do "English:Given names" and "Russian:Mountains" stand out that much to you? It would be much clearer to say both "of Russia" and "in Russian". The two- and three-letter codes are not ambiguous because they associate with language specifically, but these are somewhat obscure. If that standard ever changes, it would be best to have the POS categories already match it.DAVilla 04:40, 2 September 2007 (UTC)[reply]

We've setled that issue, please don't exhume its corpse again. --EncycloPetey 04:53, 2 September 2007 (UTC)[reply]

I'm not suggesting that we change the topical categories. I'm saying that we should have a naming standard for parts of speech that is consistent with one for topical categories if the latter should ever change.DAVilla 05:04, 2 September 2007 (UTC)[reply]

It's a different topic entirely, though. If we ever change the names in the future, we will change all the categories that need changing at once, but there is no need to think about that here where we are just hyphenating the titles.Dmcdevit·t 05:19, 2 September 2007 (UTC)[reply]

It's not just about the hyphenation. The form-of templates themselves are being completely rewritten, asdiscussed above.DAVilla 05:24, 2 September 2007 (UTC)[reply]

In any case, this is the list of categories that need to be changed:

Could it be done with some sort of a bot?Dmcdevit·t 05:19, 2 September 2007 (UTC)[reply]

The categories are correct the way they are. Try searching Google Books for "third person singular". The form without the hyphen is overwhelmingly preferred. --Ptcamn 09:41, 3 September 2007 (UTC)[reply]

Not a fair statement, since "third person singular" is a noun phrase in its own right. The issue here is whether that should be hyphenatedwhen it's used to modify another noun. --EncycloPetey 00:21, 5 September 2007 (UTC)[reply]

Surely that would be "third-person-singular" then?

But anyway, my point still stands:

1994: Merriam-Webster's Dictionary of English Usage
Although the lack of a common-genderthird person singular pronoun has received much attention in recent years from those concerned with women's issues, [...]
1999: Mary E. Coffman Crocker,Schaum's Outline of French Grammar
Some verbs use thethird person singular form of the present tense rather than the infinitive as the future stem.
2001: Ludovica Serratrice, "The emergence of verbal morphology and the lead-lag pattern issue in bilingual acquisition", inTrends in Bilingual Acquisition
Although at 2;3.17, 2;3.7, and 2;4.14 thethird person singular present indicative inflection is used productively, it is not until 2;5.6 that the first contrasts emerge.
2002: Joanne Scheibman,Point of View and Grammar: Structural patterns of subjectivity in American English conversation
Third person singular subjects by tense and verb type
2002: Lieselotte Anderwald,Negation in Non-Standard British English: Gaps, Regularizations, and Asymmetries
It is also interesting to note that although there are areas wherein't does not occur with anon-third person singular pronoun, although it occurs with athird person singular pronoun, the reverse is never the case.

--Ptcamn 09:28, 5 September 2007 (UTC)[reply]

sum of parts

If a phrase's meaning could be derived from the meanings of its components except that the phrase is used in a more general context than one or more of its parts, it is considered idiomatic. Examples: one part is obsolete whereas the phrase is merely dated; one part is dated whereas the phrase is not; one part is used only in a mathematics context whereas the phrase is not so restricted; one part is marked(chiefly British) whereas the phrase is not so marked.

What do you all think of the above passage (which is not taken from anywhere; I just wrote it)?—msh210℠06:36, 2 September 2007 (UTC)[reply]

(The other way around, where neither part is restricted but the whole is used in mathematical contexts, would already pass.) I like it, although I would add that it has to be clearly so, since we don't have criteria for obsolete and dated. Maybe archaic and dated, or obsolete and contemporary, would make a better case. Do you have any concrete examples? Ultimately it would depend on whether the community found those acceptable for your reasons.DAVilla 12:35, 3 September 2007 (UTC)[reply]

Concrete examples:

lo and behold:lo is marked(archaic) whereas the phrase is not, so is considered idiomatic.
Probablypleased as Punch:Punch (now redlinked) is a non-entry or, perhaps,(UK) whereaspleased as Punch is, Ithink not just(UK), so is idiomatic.

—msh210℠18:14, 5 September 2007 (UTC)[reply]

I find it difficult to read meaning from the passage as written. --EncycloPetey 00:19, 5 September 2007 (UTC)[reply]

See examples above.—msh210℠18:14, 5 September 2007 (UTC)[reply]

FWIW, I don't recallever having seen the "P" capitalized, before now, inpleased as punch. --Connel MacKenzie 19:00, 5 September 2007 (UTC)[reply]

I guess I'm not seeing the purpose behindthe question. You've asked "What do you think?" without any context. It may be true that entries meeting the stated condition are idiomatic, but so what? And it doesn;t mean that all idiomatic forms will meet the condition either. --EncycloPetey 03:49, 6 September 2007 (UTC)[reply]

The "What do you think?" meant two things. One, do you think that this accurately represents what is obviously part of the community's opinion on set phrases already? Two, if it's not obviously part of the community's opinion, then is it part ofyour opinion?—msh210℠17:03, 6 September 2007 (UTC)[reply]

I'm not sure. If a phrase would be completely sum-of-parts to someone who knew what all the parts meant, and the only issue is that one part's rarity makes it likely that someone wouldn't know what it meant, well then, it seems like having an entry for that part should suffice: you look up the part and the phrase makes sense. On the other hand, if you look up the part and see it marked(dated), you would naturally assume that the book uses dated terms, even if that's not actually true and the book is simply using a stock expression that contains an otherwise-dated term. —Ruakh_TALK04:12, 6 September 2007 (UTC)[reply]

Could that not be explained in usage note at the said dated term’s entry? –Something like: “Usage of this word by itself has declined considerably since [DATE]; however, it still sees frequent use in the set phrase.”?† Raifʻhār Doremítzwr 11:30, 6 September 2007 (UTC)[reply]

Circular definitions

Do you know of any good way to mark circular definitions, or pairs of definitions relying on each other? See for examplewalkway andpassageway.Circeus 21:22, 2 September 2007 (UTC)[reply]

We don't have that as of yet. The best thing that I've found is to try and rewrite the circular definitions so that one of them has a distinct definition of it's own. If you find any that you don't feel comfortable rewriting, feel free to leave a note on mytalk page and I'll try and tackle it. --Neskaya ^talk 23:41, 2 September 2007 (UTC)[reply]

Or bring it up in thetearoom. If it were completely circular that's a more serious flaw, but this one wasn't too bad. You have to consider that people might know what one is and not the other, and by saying "X or Y-Z" you'd be helping the X crowd who find the Y-Z explanation cumbersome.DAVilla 12:40, 3 September 2007 (UTC)[reply]

Every single one of our definitions uses words that are defined elsewhere using different words that are defined elsewhere . . . . . In the end, ALL of our definitions are circular, it's just that some circles are bigger than others.SemperBlotto 13:29, 3 September 2007 (UTC)[reply]

Well, that's true to an extent: what you tend to have in dictionaries is term A is defined using term B, which is defined using term C, etc, and by the time you get to terms X, Y and Z (say) the terms are so basic that it definining Z inevitably uses X and Y. I'm talking about words likeof andthe here (the latter being traditionally defined as "the definite article", and so defined in terms of itself). This cannot be avoided without recourse to ametalanguage, which English does not have. However, that is no reason for words near the beginning of the chain to be part of loops; no word should be defined as part of a chain of synonyms that ends up being a loop (A means B, which means C, which means D, which means A).

As for tracking these down, this would require a sophisticated bot or other software, and would be a mammoth task even then (in algorithmic terms, multiple traversals of an enormous graph). —Paul G 16:17, 3 September 2007 (UTC)[reply]

I was thinking of a template or category that could be used to track them, but maybe it's just me.Circeus 16:44, 3 September 2007 (UTC)[reply]

What we need is something like the old "Oracle of Bacon" that counted the "distance" between a celebrity and Kevin Bacon via movie connections. We could do the same thing with two entries in the same language, but looking at the direction both ways. If the directed distance between two articles is 1 inboth directions, then it needs a serious looking at. Of course, such a tool would be complicated by the synonyms sections, where wewant that kind of "circular" linking. --EncycloPetey 00:18, 5 September 2007 (UTC)[reply]

There are (or at least were) at least two "six degrees of Wikipedia" tools that find the distance between Wikipedia articles (seew:Wikipedia:Six degrees of Wikipedia#External links). I imagine that one of these could be modified to work with Wiktionary. It could (probably) be limited to definitions by only reading lines starting with #. A list that output a list of articles that were less than say 3 degrees apart would help find circular definitions - there would be false positives though, e.g. "A meaning B; XYZ" and "B meaning A; XYZ".Thryduulf 11:24, 5 September 2007 (UTC)[reply]

Another thought on this is that perhaps it could list the longest and shortest chains between the definition. IIRC the tools give up finding chains after following 10 links (to avoid situations where A is defined as B, B is defined as C, and C is defined as B) but if it is that long the chances are it wont be a problem with circular definitions.Thryduulf 02:35, 6 September 2007 (UTC)[reply]

Template:blend

What's going on withTemplate:blend? This no longer seems to add terms to their correct alphabetical position inCategory:Portmanteaus but bundles most terms in an unordered list at the end. —Paul G 16:08, 3 September 2007 (UTC)[reply]

Now fixed, though it might take a while to clear the edit queue. (It was sorting entries by{{lcfirst:{{{PAGENAME}}}}}, which is simply{{{PAGENAME}}}. Now it will sort entries by{{lcfirst:{{PAGENAME}}}}, which is whatever the name of the entry is, except with a lowercase letter first.) —Ruakh_TALK16:47, 3 September 2007 (UTC)[reply]

Thank you. Muppet and Socceroo (which both have an initial capital) are still coming out under "M" and "S" rather than "m" and "s" respectively. —Paul G 16:10, 6 September 2007 (UTC)[reply]

Muppet isn't using{{blend}}; it is directly included inCategory:Portmanteaus.Socceroo was missorted because it was explicitly included inCategory:Portmanteaus with no sort key, overriding the inclusion done by{{blend}}. I've removed [[Category:Portmanteaus]] fromSocceroo and givenMuppet a sort key manually.Mike Dillon 16:17, 6 September 2007 (UTC)[reply]

Blocks too excessive

Blocks are a plenty here, they are excessive in length compared to similar situations on other foundation wikis, and another difference is that they (usually from what I've seen) come without any warning. This isn't very encouraging to the "openness" that this wiki and the foundation is built upon, and discourages new users who are testing something out from ever becoming a valuable member of this community.

But I couldn't say it better thanAnthere (who, I have just learned is the chair of the WM Board of Trustees), who said the following (which I am copy/pasting fromUser talk:Cynewulf#Excessive block):

I personally find a block of one week for a rather stupid testing, with NO previous warning comment on the talk page of the person, very much against our spirit of openness. Yes, a block wears off, yes, they will live, but their experience with Wiktionary will be unpleasant. Imagine that the editor was a young guy, maybe just playing/experimenting in a rather stupid way. What will he do afterwards ? I am 100% sure that when he is unblocked, if still interested, he will transform himself in a bad editor, a troll, a bugger, a generally nasty person. Whereas, if you just let him a small gentle comment, saying "please, do not do that, you like the project ? Do not damage it on purpose", I am pretty sure there is a significant chance that the editor might turn into a good guy. Never neglect the force of being nice to people. Remember wikilove.

I really didn't mean to come here to be a pain in the ass, I just stopped by, clicked the recent changes, and saw what in my opinion were blocks that didn't need to be done, where a simple "hello, welcome, hope you can contribute" would have a greater effect than blocks.

As for "Stupidity" in the block reason list, well, I hope you'll change that too.ZJH 18:38, 4 September 2007 (UTC)[reply]

FWIW I don't see the need for "general stupidity" there either -- everything I come across is either "personal attack" or otherwise talking about specific people, "nonsense/spam", or "deleting info".

For block lengths, what do we want to accomplish? Make everybody happy? I'd be happy if vandalism stopped. Do we want to change people? I don't know how to do that. Do we want to have an effect on people? A fifteen minute block isn't going to have any effect, and neither is an infinite one. A three-day block will hurt an addict, but what about somebody who doesn't care? Should we be thinking about this in terms of hurting people? (probably not) Will{{test}} on their talk page suddenly make people nice? Call me a pessimist, but I doubt it -- meanwhile we'll have more stuff from them to clean up. Bah, all this hullaballoo makes me wonder why I bother with vandal patrol. Every time I revert "poop" out of something somebody gets on my case. Well, enjoy your poop.Cynewulf 19:00, 4 September 2007 (UTC)[reply]

Case study:[1] (warning, obscene content). I haven't blocked this one yet. What action would create "significant chance that the editor might turn into a good guy"? To me, this looks like a 13-year-old kid going to the page of one of his "friend"'s names and talking trash. The IP belongs to a K-12 school in Florida. We can wait until he grows up, right?Cynewulf 19:07, 4 September 2007 (UTC)[reply]

I tend to lean toward warnings and assuming good faith, but in this case, I support Cynewulf's choice to block the anonymous editor for a week. He or she was clearly not trying to help this project and a one-week block is an effective way to minimize subsequent garbage. Several factors make vandal patrol here more challenging than elsewhere (e.g., Wiktionary has orders of magnitude more entries than other WikiMedia projects and its entries are in hundreds of languages, which complicates distinguishing vandalism from testing from useful edits).Rod(A. Smith)19:21, 4 September 2007 (UTC)[reply]

There is a significant principle that each wiki project is independent. Wikipedia administrators are not given administrator privileges here,for many reasons. The culture conducive to building a usable dictionary is quite different from the "culture" over on Wikipedia. From Wiktionary's inception, it has had a much higher per-user vandalism level than any other WMF project.In theory, a dictionary definition, being brief, is a perfect outlet for a short "GOSH IS GAY" type entry. Wikipedia's permissiveness may be changing that by attracting more vandalism onto itself. Their "culture" is so troll-friendly now, little productive encyclopedic work is done anymore. What is it these days; 60% of Wikipedia traffic isentertainment related? The hard stance that Wiktionary has had (since long before I was an administrator here,) has helped to nip that in the bud. Yes, we get "normal" vandalism levels now, but I think Anthere has an enormous hill to climb with her argument, that over-permissiveness would somehow be beneficial tothis project. Her disregard for the separate cultures of each member project is very disconcerting...it is as if she thinks all projectsare Wikipedia? --Connel MacKenzie 19:27, 4 September 2007 (UTC)[reply]
I don't know anything about wikipedia's vandalism level, but their administrator level is 1309, compared to 52 total here (and only what, three who do large amounts of vandal patrol?), and 55 total (27 active) on wikinews (whenceZJH). We don't have the manpower for much red tape.Cynewulf 19:38, 4 September 2007 (UTC)[reply]
No, I do not think that all projects are Wikipedia. And I also know that even on one projects, all languages does not share the same needs, habits and rules. I still think that openness is one of the few values we have in common. Correct me if I am wrong on this. The day Wiktionary decides to restrict editing to those only between 25 to 35 and under real name, then all projects will have to discuss if we want to stick together or travel each in its direction. I also think that attitude such as trust and discussion are also one of the few values we have in common. Again, if Wiktionary decides one day that only those with a university diploma and having shown their identity card are allowed to edit, then again, we'll have to discuss whether Wiktionary wants to go its own way. Last, I think that we share as a value the fact that the entire community is allowed to participate in the discussion over rules. Seeing Cynewulf comment, I am suddenly wondering if rules are open to discussion with the community or only restricted to administrators input. Do I exagerate ? Perhaps a bit. But the issue is not about disregarding separate cultures of each project. The issue is that we share some common values and each of us implement these values the way we see best. So, *my* comment will be that I do not find a good idea to block someone for one week, for the first offense, without at least one warning. But I can live with that. What I find much more problematic is that when someone makes a comment to express a disagreement, he is answered "Interfering with administrators carrying out vandalism cleanup tasks is not recommended". Again, I may be wrong, but I like to think that on our projects (all of them), administrators are not using intimidation as part of their tactics to scare away newbies.Anthere

Anthere, I love you, what you do for WMF and your crazy idealism, but we aren't talking about a potential good contributor here; we're talking about a known-bad contributor, and a subsequent (unwarranted) accusation fromUser:ZJH.User:Cynewulf suggested the specific complaint be redirected from his talk page, tohere, the centralized discussion area. While his tone may have been far from perfect, I think it is unfair to harp on one sentence of it. Better, instead perhaps, to harp on the ridiculous tone that started it, fromUser:ZJH. Would I have immediately gotten defensive in the same situation? Wouldn't you?

The WiktionaryWT:VOTE process isvery much open to all Wiktionary contributors, (even if you would block me based on my age.) I think you have overstated your assumptions about some perceived lack of openness. --Connel MacKenzie 22:14, 4 September 2007 (UTC)[reply]

Nothing ridiculous about the tone I used whensuggesting the block be lowered, which was wholly based on myopinion (ahh, linking words for people when they have no idea what they mean, that's what I like about Wiktionary ;) BTW, Isuggest you look up the definition ofcommunity as well.ZJH 14:55, 5 September 2007 (UTC)[reply]

Your misplaced, defensive, sarcastic insults do not help here.Note that you are now on the defensive; lashing out. A very human reaction, don't you think? I saidtone...since you obviously don't know what I mean, I'll be clear:the manner in which speech or writing is expressed. You initiated an adverse conversation with an accusation...sorry, but that is pretty dumb. You set thetone for the conversation by doing so. To pretend that you maintained some semblance of civility is nonsense. Couching your statements inweasel words does not change thetone; what you said was still an accusation. Soon after, you called in the posse to bail you out, once you were in over your head. Do you consider yourself part ofthis community? You seem to have an enormous amount of unjustified bravado. Offhand, I'd guess you are being disruptive. Upping the ante all the way to the board? For a justified, reasonable one-week block? Get a grip! --Connel MacKenzie 18:50, 5 September 2007 (UTC)[reply]

I think admins here would be more willing to issue warnings rather than blocks if there were a good way to keep an eye on a user. Currently, if I don't block an editor for vandalism that another admin probablywould block him/her for, I feel somewhat responsible for any later vandalism by that editor (since I could have prevented it by blocking). If I had some sort of user-watchlist that let me watch contributions by editors I've sent warnings to, I'd be less concerned: I visit the site frequently enough that I'd see fairly quickly if a given non-block needed to be turned into a block. (I imagine that other admins would at least try out this possibility, and if you're right that a short note can end vandalism positively, then we'd get to see it for ourselves, and our blocking policies would lighten up quickly. Of course, if you're wrong, it probably wouldn't end up changing anything at all.) —Ruakh_TALK20:33, 4 September 2007 (UTC)[reply]

Actually, I like Ruakh's suggestion about monitoring specific contributors, but for a different reason. In my case, I am more focused on my area of expertise, which is Asian languages in general (Mandarin in particular). I know of several users that add new Chinese words on occasion, but usually get the formatting wrong (formatting issues can be very confusing to an inexperienced contributor). If I could put something in my watch list that would allow me to know when that person made a new Chinese entry, I could then verify that the entry is properly formatted, plus I could make sure that I agree with the English definition. Currently, the only way to do this is to either frequently click on one of therecent changes links (wading through a bunch of non-Chinese edits), or click on the specific contributor'suser contributions link. Neither of these two is a very elegant solution. --A-cai 21:32, 4 September 2007 (UTC)[reply]

I can envision a toolserver task that floods stalker's watchlists with the stalkee's edits. It might meet some objections, though. --Connel MacKenzie 22:14, 4 September 2007 (UTC)[reply]

What would be better perhaps is a bot that sat on (say) the rc IRC channel and provided rss feeds on a per language basis. I have a bot watching that I later filter for language entries, but the filtering is done on my box at home, not on the fly by the bot. (Although I have also wished for an rc user-specific feed.)ArielGlenn 22:46, 4 September 2007 (UTC)[reply]

I'm not sure I'd like that solution, because then pages would suddenly appear on my watchlist without my knowing why. (I don't remember the username of every person I warn, and anyway, by the time I visit my watchlist the page might have been re-edited by a different user.) It would be better than nothing, though. (It might be good to restrict it to administrators, so as to ensure that no one creates an automated account that, say, reverts all of a certain user's edits. Though I guess someone couldalready do that by tracking the user's contributions page … has no one done this before? It seems like a great way to really piss off a user; you'd think some troll would have done it by now.) —Ruakh_TALK00:53, 5 September 2007 (UTC)[reply]

FWIW, I agree broadly with Anthere's comments, and think that for various reasons our community here has become excessively closed and unwelcoming. This is not limited to blocks, but affects almost all aspects of the project. But I think the pragmatic concerns raised above deserve serious consideration as well. In understanding the differences between Wikipedia and Wiktionary culture, a look at the respectiveSpecial:Statistics pages is instructive (well, I found it instructive, anyway). Consider that on ENWP, there are roughly 1520 pages per administrator (and this even with ENWP's astronomically high standards for RfA); even given that many admin accounts like mine are inactive, this allows for a fairly high level of personal monitoring and attention. In contrast, here on ENWT we have roughly 12,500 pages per administrator; even if we were all highly active, monitoring that many pages would be a serious challenge. Now, to some extent this is offset by the fact that the overall rate of editing is fairly low, so that most changes can be patrolled through RC... but even so, once problematic edits have slipped through, they are likely to sit unnoticed for a very long time. This raises the stakes in vandal-fighting, and certainly contributes to the community's current crustiness. But it also raises the stakes in editor education and socialization, which is where I think we really need to focus. --Visviva 03:13, 5 September 2007 (UTC)[reply]

...which means we could use more admins, which means that we have to be more open to newcomers, and even make some personal compromises on users who clearly wish to benefit the project, though their views may differ from our own. How many users likeThecurran have stumbled into an old debate they were completely unaware of? How many well-intentioned users would be put off entirely by a block, or even by an unexplained revert to an early edit, for but one minor flaw?DAVilla 03:52, 5 September 2007 (UTC)[reply]

Out of curiosity, what percentage of blocks are given to users with accounts, as opposed to anonymous users? --EncycloPetey 04:39, 5 September 2007 (UTC)[reply]

Sry, I've no idea how that could even be measured. Do you mean short-term username blocks (as opposed to infamous user's sockpuppets?) --Connel MacKenzie 05:01, 5 September 2007 (UTC)[reply]

Running some quickgreps over the last 5,000 blocks and unblocks (which go back to mid-March) … it looks like 4,212 (84.24%) are blocks of anons, 35 (0.7%) are unblocks of anons, 704 (14.08%) are blocks of accounts, 37 (0.74%) are unblocks of accounts, and 12 (0.24%) are unblocks of numbers. (I don't know what that last one means, but there are 12 log entries of the form01:30, 29 August 2007Connel MacKenzie (talk •contribs) unblocked #22584 at various time and with various numbers, and one with Versageek instead of Connel. Note that there are no instances of anyoneblocking a number.) —Ruakh_TALK05:45, 5 September 2007 (UTC)[reply]

Those numbers are autoblock IDs. The software logs autoblocks publicly so they can be unblocked, but obviously conceals the user's IP with a unique ID.Dmcdevit·t 06:15, 5 September 2007 (UTC)[reply]

(re: Ruakh's stats) - Those numbers are skewed a bit by ~220 sleeper accounts belonging to one of our long-term vandals. I re-blocked these in early April. They had been blocked before for a time period which was less than "indef" and he had started reusing them for page-move vandalism. --Versageek 06:28, 5 September 2007 (UTC)[reply]

Of the username blocks, any way of telling how many were of persistent sockpuppeteers (i.e. indef blocks)? 10% still seems far too high. --Connel MacKenzie 06:38, 5 September 2007 (UTC)[reply]

I think it is fair enough that admins sometimes get smacked for blocking out of hand. Cynewulf was right to block the user in question, but I agree that a week was quite long for a first offence and the problem really was that when he was asked about it, he didn't react ever so well. Admins have to be accountable which means they should be able to account for their actions politely.Widsith 09:01, 5 September 2007 (UTC)[reply]

Preparation of Fundraiser 2007

Hi, this is just a first introduction message to tell you: there is more to come. I am dealing with the Project Management of the Fundraiser 2007 and therefore will search for contacts of wikimedians who can help us to do our tasks on all projects. I am actually also building the structure for the fundraiser onMeta. We will need people who help to design buttons, translate texts of buttons, documents, sitenotices etc. Should you feel you want to co-operate please let me know. You can reach me onmy meta user page or by e-mail at scretella (at) wikimedia (dot) org. If you wish to notify us that you would like to co-operate on translations, it would be nice if you used e-mail and copied the e-mail to me and Aphaia (aphaia (at) gmail (dot) com). Thank you for your attention and I hope to meet you soon! Cheers :-) -- 4 September 2007Sabine

Thank you Sabine! Is there a timeline for it?I don't think meta: will need much assistance for English translations. :-) --Connel MacKenzie 23:34, 4 September 2007 (UTC)[reply]

FYI, on IRC I was told this is planned for October 22nd through December 22nd. So, only a couple weeks left to get translations in, on meta. --Connel MacKenzie 15:06, 9 September 2007 (UTC)[reply]

OT note: writing your email as user (at) something (dot) com doesn't help: you think the spammers haven't figured that one out? Send people to your user page where they can use the "email this user" link ;-)Robert Ullmann 14:45, 21 September 2007 (UTC)[reply]

West Frisian

The ISO 639-1 code "fy" is the collective code forthe Frisian languages, so{{fy}} appropriately renders as "Frisian". Within that collection, there are some more specific ISO 639-3 codes, e.g. "frr" (North Frisian), "frs" (Saterland Frisian), and "fry" (West Frisian). Unfortunately, the ISO 639-2 code for the Frisian languages is also "fry", so{{fry}} also renders as "Frisian", leaving no English Wiktionary code for "West Frisian". Fortunately, no entries seem to use{{fry}} yet, so I'd like to to update it to say "West Frisian" and require editors to use the ISO 639-1 code "fy" for the collective Frisian languages. Any objections?Rod(A. Smith)17:40, 6 September 2007 (UTC)[reply]

Actually, fy is defined specifically as West Frisian, not Frisian in general.[2] --Ptcamn 22:38, 6 September 2007 (UTC)[reply]

I see. “Previous usage of code has been for Western Frisian, although language name was "Frisian"”. That's where the confusion arose. In that case, I'll just update{{fy}} and{{fry}} to reflect the more accurate name. Thanks, Ptcamn.Rod(A. Smith)22:50, 6 September 2007 (UTC)[reply]

It seems we should also deprecateCategory:Frisian language and its subcategories with "Frisian" in their titles in favor ofCategory:West Frisian language and the like, and change all "==Frisian==" headers to "==West Frisian==".Rod(A. Smith)22:56, 6 September 2007 (UTC)[reply]

Er, correction. According tow:Frisian language, “ISO 639-1 code fy and ISO 639-2 code fry were assigned to the collective Frisian languages, but are as of 2006 used only for West Frisian.” Assuming we don't care for categories or 2nd level headings for the collective Frisian languages as a unit, I'll convert all plain occurrences of “Frisian” to “West Frisian”.Rod(A. Smith)23:13, 6 September 2007 (UTC)[reply]

Choosing the “primary entry” for idiomatic phrases

Connel and I are disagreeing over which of these two forms:son of the manse orchild of the manse should house the “primary entry” for this idiom. Connel favoursson of the manse because it is the most common form, and I favourchild of the manse because it is epicene, is the most general, and because its plural form,children of the manse receives moreGoogle Book Search hits thansons of the mansedoes (260:240). More detail can be foundhere. Which is it to be?† Raifʻhār Doremítzwr 21:07, 6 September 2007 (UTC)[reply]

Note1,334 b.g.c. forson of the manse. --Connel MacKenzie 21:34, 6 September 2007 (UTC)[reply]

(after edit conflict that nearly crashed my PC) (for the purposes of this comment: "A" is the most common singular form, "As" is its plural; "Bs" is the most common plural form, "B" is its singular; "C" is a less common singular form, "Cs" is its plural)

I'd say the primary entry should be at the singular form with the most widespread recent (i.e. last 5-10 years) usage ("A"). If the most common plural form ("Bs") is not the most common singular form's plural ("As"), then the singular of the most common plural form ("B") should be a soft redirect to the most common singular form ("A"). There should be usage notes at "A", "As" and "Bs". - i.e:

A: main entry with usage note

As: standard "plural of" entry with usage note

B: soft redirect to "A"

Bs: standard "plural of" entry with usage note

C: hard redirect to "A"

Cs: hard redirect to "Bs"

If any of this is hard to understand (likely) then let me know and I'll try again!Thryduulf 21:44, 6 September 2007 (UTC)[reply]

I like this, simple and straightforward. But heretofore we have commonly been using "one's" and similar locutions in primary entries, e.g.feel one's oats, which saves a lot of grief. That precedent would also support the use ofchild of the manse as the primary. --Visviva 23:24, 6 September 2007 (UTC)[reply]

We have been using "one's" as a reflexive pronoun placeholder in entry titles, although "one's own" may be better. The non-reflexive version has been "someone's" and "somebody's".Rod(A. Smith)00:18, 7 September 2007 (UTC)[reply]

Quite right. I guess my point was that we have been using a gender- and person-neutral form ("one's X" or "someone's X") rather than whichever form happens to be most common ("his X", "my X"). This seems obviously to be the right choice where pronouns are concerned; it is less clear whether the same logic applies in cases like the current one (leading to a preference for "child" over "son", perhaps "sibling" over "brother/sister", etc.) --Visviva 01:18, 7 September 2007 (UTC)[reply]

Yes,Visviva, that logic, as well as the principle that words should be defined from the general to the specific, are the two cruces of my reasoning.

Also, a minor correction if I may:Connel hereinbefore provided a link to the hits page yielded by searching forson of the manse on the Google Book Search engine. He forgot to enclose the phrase in quotation marks, which meant that the engine searched for every instance ofson +manse (asof andthe are excluded from such searches), rather than the set phraseson of the manse. The correct statistics are:

“son of the manse” = 623;
“child of the manse” = 129;
“sons of the manse” = 240; and,
“children of the manse” = 260.

† Raifʻhār Doremítzwr 16:24, 7 September 2007 (UTC)[reply]

Well it's easy enough to explainson of the manse in terms ofchild, but if the main definition were atson, how would you explainchild in terms of it? It seems the only other option would be to have two main entries, and the question becomes whetherson should be a complete entry or rely onchild. In these cases I prefer a little bit of explanation to avoid link-chasing, while keeping the linked term dominant with a fuller explanation.DAVilla 03:52, 8 September 2007 (UTC)[reply]

Yes, that is the “general → specific” rationale. Could this issue be solved by expanding the definitions forson of the manse anddaughter of the manse to “A specificallymale child of the manse;that is, adiligent andindustrious man or boy” and “A specificallyfemale child of the manse;that is, adiligent andindustrious woman or girl”, respectively? –Is that acceptable to you,Connel?† Raifʻhār Doremítzwr 12:05, 8 September 2007 (UTC)[reply]

I think that would be an excellent solution. Cheers!bd2412T 01:29, 10 September 2007 (UTC)[reply]

Not quite. Rather, at first blush, for this individual example, that almost seems to fit. But, it is still a pretty wild diversion from common practice - the practice of putting the main idiom entry ate theprimary entry form. The termson of the manse doesn't quite have more available citations thatall the other variants combined but it clearly (very clearly) is the most common form. Wikilawyering reasons to slightly expand some of the soft-links doesn't address the underlying problem that brought this conversation here, to wit: the creation of specifically "incorrect" idiom forms while redirecting the "correct" form to that incorrect entry. The form "children of the manse" seems to be a cute reworking of the existing idiom - almost jocular. It is very misleading to our readers, to suggest that "children of the manse" should be preferred in their writings, over theunderstood idiom "son of the manse." The entire topic is difficult, as "son of the manse" is itself, such a rare idiom. But that fact, makes it more important that we identify the idiom that is likely to be understood, while giving little or no emphasis on the incorrect form. Simply following the existing practice of using the most common idiom form as the primary idiom entry, satisfies all these concerns.

Looking more closely into this curious UK idiom, specifically atw:Manse and its references, it seems to have a very strongmonastery intended meaning. This reinforces the notion that "daughter of the manse" and "child of the manse" can be construed as intentionally humorous, jocular, sarcastic or satirical only. You, hailing from that side of the pond, should be proposing this; not me, an American. (Oh wait, our entry for "monastery" seems to include its antonym "convent" in its definition - as a subtype! - now? WTF?) --Connel MacKenzie 16:03, 16 September 2007 (UTC)[reply]

Furthermore, atWT:CFI#Idiomatic phrases it says"Many phrases take several forms. It is not necessary to include every conceivable variant. When present, minor variants should simply redirect to the main entry. For the main entry, prefer the most generic form, based on the following principles:" followed byWT:CFI#Pronouns, where it clarifies:"Prefer the generic personal pronoun, one or one’s. Thus, feel one’s oats is preferable to feel his oats. Use of other personal pronouns, especially in the singular, should be avoided except where they are essential to the meaning."

To me, it seems pretty clear that "sons", "child", "children", "daughter" and "daughters" are all unquestionably "minor variants." Furthermore, it also seems clear (now) that "son"is in fact, essential to the meaning. --Connel MacKenzie 07:09, 20 September 2007 (UTC)[reply]

Any active bureaucrat?

Are there any active bureaucrat to have a look atWiktionary:Votes/bt-2007-08/User:VolkovBot for bot status please? --Volkov 19:14, 9 September 2007 (UTC)[reply]

Russian slang

I just tripped over a bit of silliness. Stephen doesn't like category "ru:Slang", thinks it should be "Russian slang", and I agree; It is Russian word thatare slang, not Russian wordsabout slang. Just like English nouns or French verb forms or Min Nan idioms. It is a grammatical attribute, not a topic.

The silliness wasthis edit in which he changed the middle "a" in "slang" to a non-latin letter to keep it from being categorized. There are other entries where he moved(slang) to the end of the definition line to keep AF from converting to the context tag. They all need to be fixed (the present contents ofCategory:Russian slang).

But then I think we should change the category class in{{slang}} so it will generate the language name forms. There are some languages that already have categories.

Similar issue with "Vugarities", and there are several more that ought to be looked at.Robert Ullmann 14:02, 11 September 2007 (UTC)[reply]

The same argument would probably apply to everything underCategory:Lexicons. None of them are topics. I'm not sure I'd call them grammatical attributes either; from what I can see they mostly pertain to issues of speech register or pragmatics.Mike Dillon 15:11, 11 September 2007 (UTC)[reply]

Seems reasonable. The naming convention should maybe be that categories concerned strictly with the non-contextual definition of a word (i.e. semantic attributes) use the language code, while all other attributes (syntactic, pragmatic, discursive) use the language name. Except for etymology. And except for all those mysterious categories that conflate discursive and semantic properties (isCategory:Anatomy for the professional terminology of anatomists, or anything related to a body part?) ShouldWiktionary:Categorization perhaps document these conventions? --Visviva 04:42, 12 September 2007 (UTC)[reply]

That makes sense. What do you suggest forCategory:Archaic,Category:Nonstandard, and so on (where the words are themselves archaic or nonstandard, notabout archaic or nonstandard)? Should they be renamed to something likeCategory:English archaisms,Category:English nonstandard usages, and so on? (This has the awkwardness of making us seek a good noun equivalent for each adjectival descriptor, but we can always fall back on adding "usages".) —Ruakh_TALK16:04, 12 September 2007 (UTC)[reply]

I think thatCategory:Archaic should be for archaic words. Archaic russian words would be placed underCategory:ru:Archaic. Check outCategory:zh-tw:Archaic (the words in this category are archaic Chinese words written inTraditional Chinese script). Words about archaic would not get a special category since I doubt we could come up with more than a half a dozen wordsabout the topic archaic. If I'm wrong about that, we could always start a new category calledCategory:words related to archaic. --A-cai 17:07, 12 September 2007 (UTC)[reply]

Re: "Words about archaic would not get a special category since I doubt we could come up with more than a half a dozen wordsabout the topic archaic.": Oh, certainly. This isn't about making room for other categories that better deserve these names, but rather about using more consistent names as it is. To me it seems like "this word is archaic" has more in common, in terms of the meaning of the category, with "this word is plural" than with "this word pertains to horses". —Ruakh_TALK17:11, 12 September 2007 (UTC)[reply]

Translation into lemma only

Conversation moved toWiktionary talk:Translations/Translation into lemma only to consolidate recent translation-related discussion. SeeWiktionary talk:Translations.Rod(A. Smith)22:18, 3 November 2007 (UTC)[reply]

Multiple context tags

What do we do in the case that a word exists in all regions with the same meaning, but needs a context tag specific to one region's peculiarity? I'm wondering if there is a general standard for this, but, in particular, if a word is formal only in one region, {{UK|formal}} produces "(UK,formal)" which to me implies that the word is British-only, and formal as well. I thought of the form "(UK:formal)," but I'm not sure if that's much clearer in saying that it's used elsewhere, but the context tag applies only to the one region; it's still not obvious that's not UK-only. "(Formal in the UK)," or is there a better way of doing it?Dmcdevit·t 02:46, 13 September 2007 (UTC)[reply]

I'd say the term has two distinct senses. A formal sense in the UK and a standard sense elsewhere. So, create two definitions.Rod(A. Smith)03:34, 13 September 2007 (UTC)[reply]

What about when it is formal in Spain, and standard elsewhere? Spanish speakers can probably see where I am going with this. Do we want to break all second-person plural verb forms in all moods and tenses into two senses, one for spain where it is formal-only, and one for the rest, where it is the standard form?Dmcdevit·t 03:59, 13 September 2007 (UTC)[reply]

That's probably not necessary since those are not lemma entries. Non-lemma entries probably just need definitions in terms of the grammar relationship to lemma entry (“(grammatical) third person plural form of...”), not in terms of any semantics.Rod(A. Smith)04:09, 13 September 2007 (UTC)[reply]

Er, the lemma form for verb forms is an infinitive, which is most certainly standard. Since it is only this inflected form that has the variable meaning, we would be omitting necessary information by excluding the context tag on these.Dmcdevit·t 04:28, 13 September 2007 (UTC)[reply]

Of course the lemma form for Spanish verbs is the infinitive. I was discussing whether the definitions for the non-lemma third-person plural forms of every Spanish verb should be split according to how different coutries treat the pronounustedes. The formality is a property of the pronoun, not a property of the verb inflection, so there is no need to indicate the regional formality differences anywhere but inustedes and in an appendix.Rod(A. Smith)06:40, 13 September 2007 (UTC)[reply]

Since the pronoun isn't the lemma, the only alternative is to inexplicably banish the formality and region of the word to an appendix, and not any of the other grammatical information. In any case, this is getting a bit off-topic, but marking these is already the current convention. I'd just like to come up with a better convention for these.Dmcdevit·t 07:35, 13 September 2007 (UTC)[reply]

Since this will apply toevery such second-person form, my solution is to haveUsage notes section containing a template with a short message stating the siuation and linking to an Appendix. --EncycloPetey 15:52, 14 September 2007 (UTC)[reply]

I've been thinking about the same thing; perhaps the best solution would be to create{{formal in UK}} and a matchingCategory:UK formalisms (or whatever); likewise{{UK slang}} andCategory:UK slang (aha! I see that exists already)... we certainly have enough material to justify many of these intersections already, and this project is really just getting started. --Visviva 03:39, 13 September 2007 (UTC)[reply]

I guess I misunderstood the nature of the concern... I don't have any opinion on cases like that discussed above (where regional formality is a global property of certain forms), but in other cases I don't see why we shouldn't put that information at the end of a definition rather than the beginning. That is, if a particular sense is widespread but is considered formal/informal only in one region, it should not be presented as modifying the entire definition. --Visviva 06:31, 14 September 2007 (UTC)[reply]

The way to do it is to use{{context}}, so you can have{{context|formal in the|UK}}, which gives:(formal in the, UK). We just need DAVilla or someone to tweak the template so that the editor can tell the template not to insert the default comma. --EncycloPetey 15:52, 14 September 2007 (UTC)[reply]

The “_” argument does the trick. E.g.:{{context|formal in the|_|UK}}, which produces, “(formal in the UK)”.Rod(A. Smith)17:26, 14 September 2007 (UTC)[reply]

Wiktionary:About languages

Wiktionary talk:About Greek attention was drawn to the fact that this type of page would more suitably reside in "Help:". I tend to agree, although many references agree with the current locations:Wiktionary:Language considerations,Wiktionary:About Persian,Wiktionary:About Latin etc. Any move would need to be coordinated with others.

Question: should "About Greek" be moved, and where? —Saltmarsh 14:47, 13 September 2007 (UTC)[reply]

The way I understand it, some of the content (i.e. "typing in greek") can and probably should go in "help", but everything that has to do with how Wiktionary deals in wording or formatting the specificities of a language should stay at "about X". In the case of Persian, "transliteration" would stay where it is (if it is ever adopted).Circeus 15:16, 13 September 2007 (UTC)[reply]

My understanding of these pages, was that they were supposed to providepolicy clarification. That is to say, only listing special concerns about a languagethat supersede the regular rules in WT:ELE. This provides a quick summary of what is acceptable, both for newcomers interesting in entering terms, and sysops (like me) who may not be familiar with a language, but see something that looks wrong with a particular entry.

I admit that some of the other "About" pages have also diverged from that purpose, quite a bit. I guess my question is twofold: #1) do we want these "help-ish" guides in the about pages? #2) Where do we want to keep the brief summary of "the language concerns that override regular WT:ELE rules"?

TIA. --Connel MacKenzie 19:00, 13 September 2007 (UTC)[reply]

Good points. Another factor to keep in mind is that we have two audiences to consider: editors and readers. To me, the Wiktionary namespace seems intended for editors, while the Help namespace seems intended for readers.Rod(A. Smith)19:08, 13 September 2007 (UTC)[reply]

Hmm. A quick glance at the Help namespace shows I'm wrong about its purpose.Rod(A. Smith)19:09, 13 September 2007 (UTC)[reply]

The namespace you're looking for isAppendix:, and that's where information about the language should be placed if it is directed at readers. --EncycloPetey 15:44, 14 September 2007 (UTC)[reply]

Most of the information in the "About" pages is directed towards editors rather than readers. So what does go in the Help namespace, anyways?ArielGlenn 19:23, 15 September 2007 (UTC)[reply]

Not sure. I seldom see it used, so there's no pattern to see what it's supposed to be used for. I was editing here for almost a year before I even knew it existed. Iassume that it's for technical information. --EncycloPetey 18:23, 16 September 2007 (UTC)[reply]

References

I have used the<ref> html tag (if it is html - it isn't in my 2002 manual) in the articleχούφτα - it creates a footnote type reference which gives detail to the reader. It is commonly used in Wikipedia. Whereas the references ingobbet make it difficult to see what was sourced where. Is my departure deprecated? (For technical reasons, but not because it creates more work :).) Should I stop doing it? —Saltmarsh 14:44, 14 September 2007 (UTC)[reply]

I've wondered about this too, Saltmarsh. As Wikitionary entries grow more complete and complex, it seems to me to be increasingly desirable to be able to tie specific parts of an entry to their specific sources (as, for example, inambulance chaser). However, it strikes me as awkward and unsightly for both footnotes and the more common sorts of references to other dictionaries to be combined under a single header (as inhowever). --WikiPedant 14:57, 14 September 2007 (UTC)[reply]

I’ve been using in-line references too. I think they’re vital for showing exactly which information is sourced and whence — be it an etymology, a pronunciation, an irregular inflexion or conjugation, a context tag, or whatever. In the case ofhowever, I’d move the nine bulleted references to a “Dictionary notes” sectionwhereunder they’d be more suitable.† Raifʻhār Doremítzwr 15:05, 14 September 2007 (UTC)[reply]

Take a look at how I just subdivided the references inhowever and see what you think. I think subheaders under "References" is the best way to go, since moving the bulleted references that cite other dictionaries to a whole new section would be inconsistent with the style of a gazillion other WT entries. --WikiPedant 15:26, 14 September 2007 (UTC)[reply]

Please, forgive my bluntness, but the list of references forhowever is utterly useless. There is no indication at all as to what information came from any of the listed sources or how the sources were used as a reference. The point of a reference is to (1) give credit to the source of information used, and (2) bolster statements of fact by making it possible for someone to verify te research. A bulleted list of web sites at the end of an entry fulfills neither of these purposes. Subdividing it does nothing to help. --EncycloPetey 15:42, 14 September 2007 (UTC)[reply]

Agreed. However, rather than using actual subheaders, use code like;Dictionaries and;Notes instead — the result is vitually identical, but has the added benefit of getting rid of the “edit the section” buttons on the right (which are for some reasonsbigger for the sub-sub-sub-…-headers) and will probably stop Autoformat going nuts with its rfc-invalid header templates.† Raifʻhār Doremítzwr 15:38, 14 September 2007 (UTC)[reply]

Yes, Doremítzwr, the semicolon command clearly works much better. I disagree with EncycloPetey that the bulleted links to other dictionaries are useless, though. They are just what the doctor ordered for users like me who frequently want to get a quick take on other defns to compare/contrast with what they read on the WT page (which strikes me as consistent with Petey's point (2) above). One other technical matter -- I notice, Doremítzwr, that when you editedhowever you changed "References" to L4. This conforms toWT:format, but those examples show separate "References" headers under each meaning (which I'm not so sure I've ever actually seen in Wiktionary). On the other hand, the template provided when the user hits, say,the "Noun" button on the create entry screen shows an L3 "References" header (which makes more sense to me, if the "References" section comes at the end of the entire entry). Are both header levels acceptable, depending on the placement of the section? --WikiPedant 16:13, 14 September 2007 (UTC)[reply]

I think I should clarify that the bulleted list is uselessas references. They are fine for use as External Links, but they arenot useful as References, --EncycloPetey 00:57, 15 September 2007 (UTC)[reply]

I think Cite.php (the ref-references system) is very useful in certain sections: etymology (as in your example), usage notes, and perhaps pronunciation. These are areas where our usual principle of verification from use is difficult or impossible to apply. However, in re the discussion above, I think we need to continue aggressively questioning the practice of "referencing" whole entries to third-party dictionaries. Copyright concerns aside, it's just sloppy practice; our entries should stand or fall on their own merits. --Visviva 15:58, 14 September 2007 (UTC)[reply]

I'm very much in theoretical support of <ref>, but Cite.php has the serious bug that multiple uses of <references>-s don't work properly — seeUser:Ruakh/Cite for a demonstration — so we can't have a separate "References" section for each language. If we can get that bug fixed, then <ref> is the way to go. —Ruakh_TALK22:53, 14 September 2007 (UTC)[reply]

In the meantime, with only one (references /) tag available the refs would have to be at the end of the page - with multilanguage pages this would necessitate a L2 Reference header - is this possible/advisable. Since the superscript number and related up-arrow allow easy movement between text and reference perhaps the normal reader would not be bothered where the references were? —Saltmarsh 05:42, 15 September 2007 (UTC)[reply]

Has the bug been reported to the technical people? If so, do they have any idea when it’ll be sorted? If sourcing information for multiple languages under one references section, then yes, sticking them at the end under an L2 header seems like the logically best option — will Autoformatet alia be OK with that though? In the meantime, is developing<ref1 name="">,<ref1>,<references1/>,</ref1>… commands possible / workable / useful?† Raifʻhār Doremítzwr 15:28, 15 September 2007 (UTC)[reply]

Yes. It'sbugzilla:6271. I'll add a comment about how this is useful for Wiktionary because of the multi-language issue. Feel free to vote for it and it might get some love.Mike Dillon 20:51, 15 September 2007 (UTC)[reply]

P.S. The proposed functionality would look something like this in the wikitext:

== English ==......<ref name="XXX" group="en">....</ref>...<references group="en"/>...----== German ==......<ref name="XXX" group="de">....</ref>...<references group="de"/>

The only part that would be slightly onerous would be that the editor would have to manually putgroup="XX" onto every<ref> and<references> tag; there is no way to get the software to do it automatically based on our section conventions.Mike Dillon 21:03, 15 September 2007 (UTC)[reply]

Sounds good! —Saltmarsh^Talk05:59, 16 September 2007 (UTC)[reply]

They don't need to be grouped; it would suffice if the references/ tag would clear the list when it generates them, so the next section starts anew. It doesn't need to know anything about our section conventions. Reading the bugzilla shows other intended uses though, and we could certainly use the groups if that was provided. As Mike points out though, this is more work that we don't really need.Robert Ullmann 14:38, 21 September 2007 (UTC)[reply]

That's a good point and it would probably need a separate bugzilla. I'd expect that having the tag clear by default isn't a slam dunk with the MediaWiki developers, but I could see aclear="true" flag being accepted, or even a setting inLocalSettings.php. I can't think of any reasons to use multiple<references> tagswithout having it clear out, but the possibility of unintended regressions probably means that a separate attribute or a flag in the settings would be easier to get into the code.Mike Dillon 00:43, 22 September 2007 (UTC)[reply]

Wait, I just thought of a complication. Some state would need to be maintained to avoid generating invalid duplicate ids for the forward reference links and backlinks.Mike Dillon 00:44, 22 September 2007 (UTC)[reply]

Hanja entries

For anyone interested, a conversation about hanja entries has been moved fromUser talk:Connel MacKenzie#about Hanja toWiktionary talk:About Korean#Hanja entries.Rod(A. Smith)19:09, 14 September 2007 (UTC)[reply]

Entries for letters of the Latin (Roman) alphabet

Letters of non-Latin (non-Roman) alphabets have their own entries, e.g. the Cyrillic lettersа andб, but incredibly, letters of the Latin (Roman) alphabet do not. The entries ata andb, for example, have definitions for various words and symbols with the spellingsa andb, but nothing for the letters themselves. Is that omission by choice or oversight?Rod(A. Smith)23:15, 14 September 2007 (UTC)[reply]

I think all our alphabet listings have suffered from the general lack of consistency. Once a year or so, someone volunteers to plow through them all, giving up when the problems become intractable, or too many people complain. (E.g. Arabic Alphabet.) I would like to see some consistency in how we handle these. Listing themall as symbols (===Symbol===) in a "Translingual" section seems like the most comprehensive, universal approach. The individual definition lines there can describe what language (or language families) use those characters. (Right?) --Connel MacKenzie 01:02, 15 September 2007 (UTC)[reply]

Agreed. Though, can we have a ===Character=== header instead? "Symbol" is the Unicode term for a certain subset of the characters (distinguished from letters, marks, numbers, punctuation, separators, and control characters), so it seems a bit awkward to use it more broadly. —Ruakh_TALK03:19, 15 September 2007 (UTC)[reply]

To me, acharacter is simply a minimal written unit of text (so it describes digits, letters, whitespace, symbols, ideograms, etc.) but asymbol must actuallysymbolize something specific, so is technically not accurate for letters themselves. From a brief conversation I had in IRC, though, I get the impression that others' definitions don't necessarily distinguish between the terms. Whatever we use, it should be understandable to readers and should be described inAppendix:Glossary.Rod(A. Smith)07:44, 15 September 2007 (UTC)[reply]

Please see what I've done with the entry for the lettera. I used the heading "===Character===" as an example. The "etymology" describes briefly the roots of the character's shape. The definition line explains that this particular character is a lower-case letter, as opposed to other characters that may be digits, symbols, ideograms, etc. I also show some of the more common or well known derivations, using a link to the appendix for further exploration. Comments?Rod(A. Smith)21:56, 15 September 2007 (UTC)[reply]

I think that this looks good - I raise a couple of points: (1) Do the "===Character===" and the "===Abbreviation===" have different Entymologies? (2) With lists of chars eg (à,á,â,ā,ä,å) "commas as separators" may easily become confused with "commas as modifiers/diacritics" as in the lista,a,α',,α,ά - where,α is confusing - should the separators be omitted? —Saltmarsh^Talk06:37, 16 September 2007 (UTC)[reply]

Good suggestion, Saltmarsh. It looks much cleaner without the commas. Since the entry and the list items are single characters, nobody will be confused.

As for etymologies, though, there really is more than one etymology. Well, "etymology" in the Wiktionary sense of the word (just as "part of speech" includes "proverb" here). The origin of the letter is a different letter, but the origin of the abbreviations is the words they abbreviate. Is there a better way to present the origin of the letter?Rod(A. Smith)09:58, 16 September 2007 (UTC)[reply]

I like that "===Character===" accurately describes the smallest units of language that we describe here.

===Symbol=== is wrong because strict definitions ofsymbol, e.g. that used by Unicode, exclude meaningless letters, while looser definitions include entire words.
===Letter=== is wrong becauseletter excludes punctuation and non-alphabetic scripts.
===Grapheme=== is wrong because we don't want to describe any particular font, but the abstract, underlying concept.

Anyway, the possibility of adopting the header "===Character===" brings to light some other potential types of entries that are smaller than a word:

Digraphs, e.g. Spanishch andll, and trigraphs for that matter. Note: these are different from ligatures, which are technically characters.
Morse code sequences, e.g.•- (dot-dash, a.k.a.di-dah, “A”).

Should we have a unique POS header for these entries that are smaller than words/morphemes but larger than characters?Rod(A. Smith)22:53, 16 September 2007 (UTC)[reply]

As I said before, there is significant inconsistency in the existing entries. BUT, that doesn't mean anything new is needed. Currently, ===Symbol=== and ===Letter=== are used fairly extensively. Since letters are symbols themselves, it makes sense to me, to use thegeneral purpose heading "Symbol" for this class of entry. But, if you were to disregard that concern, the sensible solution would be to use "Symbol" and/or "Letter" as appropriate...just asWT:POS suggests. But that, I think, would be a mistake. Describing items in more detail ondefinition lines is better than randomly adding (rarely used) headings.

To address the notion of Unicode: how they define "Symbol" is fine for them, but not adequate for our purposes. For comparison, just as CGEL calls all nouns "noun phrases", we don't use other's specialized terminology. Instead, as needed, we have our own specialized terminology (e.g. all "noun phrases" here are called "===Noun==="s.) So, anyone quoting the CGEL, saying that nouns don't exist (or some other thing that might be appropriate only in a CGEL context,) is bound to have trouble here, where our concerns dictate the opposite nomenclature. Likewise, even though the kind people of the Unicode consortium are very intelligent, their goal is not to define "all words in all languages." So adoptingtheir terminology, suitable totheir goal is not helpful.

For the ~20 "good" headings, the ~25 "acceptable" headings, ~20 explicitly "deprecated" headings, the ~100 automatically "corrected" heading errors, we still have some1,174 third-level headings in use, total. (In the main namespace only, not counting 4th level POS headings, etc.) I don't think people realize how useless it is to have unusable data like that laying around. Being so wild, causes all such entries to simply be excluded from all derivative works (e.g.http://www.panimages.org/,http://ninjawords.com/, yawiktionary, etc.) It also causes those entrieshere to be mis-categorized, mis-corrected, misplaced, miscounted and discounted. That is to say, there is avery desperate need to consolidate the existing headings, not ever to encourage more. From where I sit, anyone proposing new headings is subtly seeking to destroy the usefulness of en.wiktionary.org, or en.wiktionary.org itself. But, perhaps I've been staring at code too long. (Certainly Stephen thinks I have.) But I'm not even talking about parsing Wiktionary entries into a fine-detail level; I'm only talking about the highest level structure...even that isn't contained!

To say the situation is frustrating, doesn't even begin to describe it. Anyone desiring additional headings should first build a time machine and go back a few years, to propose the heading in 2003/2004. For now, our main concern should be at eliminating the 1,000+ invalid headings, and setting policy to ensure they don't resurface.

--Connel MacKenzie 02:43, 17 September 2007 (UTC)[reply]

Thanks for that insight into where you're coming from when you get angry about proposals for new standard headers. I think your position is misguided, though: each additional standard header probably means the use of a dozenfewer headers, because suddenly there's a usable standard header to replace a bunch of nonstandard headers with. (If a word doesn't even remotely fall under any of our existing headers, then editorswill use a non-standard header for it, and resist any attempt to replace it with a useless-but-standard header; and worse, different editors will usedifferent non-standard headers for the same kind of word. If we cover such classes of word with a few additional headers, suddenly a lot of that variety gets handled quite easily.) —Ruakh_TALK04:56, 17 September 2007 (UTC)[reply]

I'm not speaking theoretically (as your postulation surely is.) My experience here on en.wiktionary shows the exact opposite to be true; for each "standard" heading added, at least a dozen variants (even more tangential) and many dozens more typos of those, start being used. The majority of those, standard or tangential, are soon abandoned. --Connel MacKenzie 05:07, 17 September 2007 (UTC)[reply]

Hmm, I somehow missed the fact thatWT:POS includes "===Letter===". I also didn't realize that adding a new standard header was so complicated (and I still don't fully grasp the extent of the work that must be involved), but I'm relieved to learn that "===Letter===" is already valid. Fortunately, that solves my Spanish digraph problem, sincech andllare letters but arenot characters. So, I will use "===Letter===" for Latin (Roman) alphabet entries. Since "===Symbol===" is the heading we need to use for entries that are not letters but are smaller than morphemes, please define our unique sense ofsymbol inAppendix:Glossary and updateWT:POS to link there. (It currently links to our dictionary entry forsymbol.)Rod(A. Smith)05:52, 17 September 2007 (UTC)[reply]

I only get 520 distinct headers in error, with 16142 instances, including ~7000 "X form" headers. There are 834031 L3 headers in the wikt NS:0, this is not a big error rate (1.1%), although we should do better. And a few of those are needed POS headers.Robert Ullmann 14:26, 21 September 2007 (UTC)[reply]

Script templates.

I'd like to write a bot that creates{{Deva}} (for Devanagari),{{Ethi}} (for Ethiopic, i.e. Ge'ez), and so on for all the ISO 15924 four-letter script names (listedhere), except for Zxxx, Zyyy, Zzzz, and (obviously) any that already have templates. This bot wouldn't be particularly intelligent; it would just go through the list of four-letter codes, check if the template exists already, and if not, set it to (for example)<includeonly><span >{{{1}}}</span></includeonly><noinclude>This template may be used to enclose text in the Devanagari (Nagari) script; it may be called directly, as <tt>{{Deva|<var>text</var>}}</tt>, or may be passed via the <tt>sc</tt> parameter to templates that support that parameter, such as {{temp|t}} or {{temp|term}}. Note that text in a non-Latin script should ordinarily be accompanied by a romanization. [[Category:Script templates|Deva]]</noinclude> (Later, these templates might be modified by people familiar with each script, setting good default fonts, linking to relevant language-considerations pages, mentioning similar-but-distinct templates — like how we have a{{KUchar}} separate from{{Arab}} and a{{polytonic}} separate from{{Grek}} — redirecting to better templates, etc.) I will check and patrol every single one of the bot's contributions during this process, and will fix any major mistakes manually, so that's not a concern; but before doing it, I'd like to make sure that other editors agree this is something that should be done. (By the way, if I do this, it will be asRukhabot.) —Ruakh_TALK00:47, 15 September 2007 (UTC)[reply]

The recent discussions onWT:GP about existing language templates shows a strong desire to have these prefixed in a meaningful way. for example, I think{{l-Deva}} would be better. --Connel MacKenzie 00:52, 15 September 2007 (UTC)[reply]

I don't have a strong opinion either way about whether to prefix the script templates, but it would be nice to have them all available. I would suggest, though to have the bot write that brief documentation to the talk page instead, surrounded by a "=Documentation=" line and a "=Discussion=" line.Rod(A. Smith)00:56, 15 September 2007 (UTC)[reply]

Good call, will do. —Ruakh_TALK19:53, 15 September 2007 (UTC)[reply]

Hmm. The ones we currently have don't have any sort of prefix; seeing as they'd all be in one category anyway (Category:Script templates), do you object to my creating these now for consistency, and moving them later if we decide on a different naming scheme? (If you do object, I'll hold off until there's a clear consensus on how exactly they should be named.) —Ruakh_TALK19:53, 15 September 2007 (UTC)[reply]

Please use a coherent prefix (AKA a pseudo-namespace within the template namespace) for these. These collectively represent a fairly specialized use; having them together (not just categorized) means that erroneous additions can be caught and corrected. Having them spread across the (crowded) template namespace, means nonstandard additions will likely go unnoticed indefinitely. --Connel MacKenzie 05:14, 17 September 2007 (UTC)[reply]

Thanks for your reply. I'll just wait, then. I actually tend to agree with you that we should have a prefix for these, but since other people have expressed disagreement, and the existing ones don't have a prefix, and anyway I don't want to be in charging of deciding what prefix to use, I'd rather just not worry about it until we seem to have consensus on whether and what prefix to use. —Ruakh_TALK05:37, 17 September 2007 (UTC)[reply]

On my screen modern Greek script shows up correctly without filtering through a template. (1) Why do we need to use it? (2) Should it be used for all occurences of mGreek? The output looks more attactive, but the labour entailed considerable. I feel that the answers need to be given for simple souls like myself AND it should be detailed under the Documentation for each template ,for those who follow. (with apologies if this has been done elsewhere) —Saltmarsh^Talk12:44, 15 September 2007 (UTC)[reply]

I had a longish conversation withUser:Rodasmith on IRC earlier about the usage of{{Grek}} (or rather, how I don't ever use it), before he added the documentation. I don't think there was any particular use in mind... (Rod please correct me if that's wrong), just making the template available in case someone *does* want it for something.ArielGlenn 19:29, 15 September 2007 (UTC)[reply]

I don't think anyone expects that all users will make use of script templates; but if they all exist, then bots can start adding them appropriately. (Some things might require more intelligence than a bot, but I think a fairly simple bot could handle the great majority of cases, while leaving alone cases that it can't.) At any rate, I'm not willing for my bot to add documentation of how the templatesshould be used, because I don't know that we have an answer for that yet, and indeed, I don't know that we'll have the same answer for all scripts; rather, at this point it will just add documentation on how theycan be used. (After all, template talk-pages don't exactly constitute policy pages, anyway; it's up to real policy pages to tell people what templates to use.) —Ruakh_TALK19:53, 15 September 2007 (UTC)[reply]

Correct. Most modern browsers display Greek fonts just fine without any particular script selection, so there is no requirement to use{{Grek}}. I have modified the documentation to clarify that.Rod(A. Smith)21:35, 15 September 2007 (UTC)[reply]

I absolutely agree - there is no point in making life more complicated with unnecessary templates. Can{{Grek}} be phased out or at least marked as deprecated? —Saltmarsh^Talk09:40, 16 September 2007 (UTC)[reply]

We should not make Greek the only script without a template.λόγος looks significantly different from (and more legible than) λόγος on my browser. Does it look better or worse on yours? In any event, the presense of{{Grek}} shouldn't make anyone's life more complicated. Each script defined by ISO 15924 really does need its own script template. Eventually, those templates will be applied consistently, allowing us to provide the best possible fonts for displaying each word. Readers will even be able to choose their favorite font for each script.{{Grek}} is marked as optional.Rod(A. Smith)09:52, 16 September 2007 (UTC)[reply]

Sorry Rod, I ain't trying to be awkward :). I agree the Grekλόγος looks better than the bog standard one, but how can a template be applied automatically when λόγος is the same in Greeks Ancient and modern - [λόγος : λόγος] context will not be sufficient. Also, we shall need to think through where, when there is o/p through a template, the font will be applied. And, as Connel mentions above - we should rename the templates methodically: f-el, f-grc etc. You say that{{Grek}} is marked as optional, but surely if we use it, we should apply it to all new i/p. —Saltmarsh^Talk14:38, 16 September 2007 (UTC)[reply]

The font specification is just "temporary" (with a timeline as long as browser versions and OS versions come and go, so the fonts will be here for quite a while); the end-point to get to is to apply the XHTML tag "Grek" and let the browser do what it will. The names shouldnot be given a fixed prefix; they aredesigned to work with the language codes. (You think we are the only ones using this stuff??!!) Modern Greek should be{{Grek}}, Ancient Greek (now polytonic) should be{{grc-Grek}} which is thestandard IETF/ISO/XHTML tag. This isn't complicated; and all the coding is done for us.Robert Ullmann 16:02, 16 September 2007 (UTC)[reply]

None of those standards organizations or standards specify "{{" nor "}}" for these. Since we areusing their standard names forour purposes, it only makes sense to use an organizational prefix. I think "s-" would be fine, as would "p-". Neither{{s-}} nor{{p-}} are used now, nor are likely to be needed for anything else. So,{{s-Grek}} &{{s-grc-Grek}} would be a lot less ambiguous in this en.wiktionary.org context. --Connel MacKenzie 05:18, 17 September 2007 (UTC)[reply]

A few points/queries: (1) the italic form of the font from{{Grek}} is not so good (cfφίλιος withφίλιος), could it be enlarged (φίλιος). (2) I am not sure that I understand theoptional nature of this, resulting in Greek words appearing in a mixture of fonts - most users will think that there is significance in this variation. (3) Are we stuck with the name "Grek" which departs from the 2-letter language codes used elsewhere, making it harder to recall those used infrequently. —Saltmarsh^Talk10:14, 23 September 2007 (UTC)[reply]

I agree that we should identify the variations and seek to standardize them. So that I can better understand the problem, can you point out a circumstance where we put Greek terms in italics? Just one example of an entry with italic Greek text should suffice. (I ask because you there may be a technical problem, causing you to see italics where I see non-italics.) Regarding the template name, please notice that the two-letter codes are for languages, but this is ascript template, so it should have the four-letter, initial upper case, ISO script code. Does that make sense?Rod(A. Smith)20:25, 23 September 2007 (UTC)[reply]

gaol#Etymology was where I came face-to-face with it when I was completing a request for Greek script. —Saltmarsh^Talk05:45, 24 September 2007 (UTC)[reply]

Very helpful. Thank you for pointing that example out. The easy answer is just to remove the italics. Coincidentally, though, I also am proposing{{term}} as a means of formatting such things. So, the standard form for your example and the form using the draft{{term}} follow:

German ''[[geil]]'', “wanton”; Greek {{Grek|[[φίλιος]]}}, “friendly”.

Germangeil, “wanton”; Greekφίλιος, “friendly”.

German {{term|geil||wanton|lang=de}}; Greek {{term|sc=Grek|φίλιος|tr=fílios||friendly}}.

Germangeil(“wanton”); Greek(deprecated template usage)φίλιος(fílios).

How do those examples appear for you (both the wikitext and the rendered format)?Rod(A. Smith)06:06, 24 September 2007 (UTC)[reply]

Excellent - both the appearance and means of achieving it, thanks. —Saltmarsh^Talk04:17, 25 September 2007 (UTC)[reply]

Am I right in assuming that the template is intended for use in all occurrences of Greek characters? —Saltmarsh^Talk06:33, 25 September 2007 (UTC)[reply]

Braille letters, digits, and other symbols

I just noticed Braille letters A (⠁) and Z (⠵) had entries. They weren't in a standard style, so I cleaned them up a bit and created categories for Braille letters and digits. Any changes to make before continuing?Rod(A. Smith)00:49, 15 September 2007 (UTC)[reply]

Those both look fine to me. Nicely done. --Connel MacKenzie 00:55, 15 September 2007 (UTC)[reply]

Categories, semantic and contextual

Hi all,

OK, this has been bugging me for a long time, so I'm just going to get it off my chest.

Much of our current topical structure is problematic. ConsiderCategory:Anatomy, which contains both a large number of common terms for body parts likearm andhead, as well as technicalanatomy terms likecaudal. This is an impressive jumble, but it is difficult to see its value, either to the end user or to us. I would like to propose the following general approach:

That categories pertaining to the technical terminology of a specific field be at [[Category:<language><field> terminology]], thus for exampleCategory:English anatomy terminology (orCategory:Anatomy terms, if preferred). Membership in a technical lexicon is a property of the word, not of the referent.
1. That most categories generated by{{context}} and its ten thousand children should be handled in this way, since they deal with the contextual use of the word rather than with its semantic meaning (see recent discussion ofCategory:Vulgarities).
2. That all contextual-use categories be grouped underCategory:Lexicons, possibly with the addition of trunk categories likeCategory:Technical terminology.
That categories pertaining to context-independent meaning (i.e, to the referent) be kept in a clear hierarchical relationship based on meronymy and hyponymy (bodies:body parts:arm).
1. That we acknowledge the direct linkage of semantic and POS affiliation; for example,Category:Body parts is by nature a direct or indirect descendant ofCategory:Nouns.
2. That we give serious consideration to the lessons ofWordNet in defining fundamental categories for each POS. Miller & Fellbaum's work identified 26 fundamental noun groups and 14 fundamental verb groups; we needn't rely on this specifically, but it's an obvious point of reference (at least for nouns and verbs).
That categories with overlapping contextual and semantic affiliations should be permitted only where there is a genuine intersection; thusCategory:Proteins is properly contained in a biochemical terminology category, in addition to a semantic category such asCategory:Substances.
That the existing topical categories be maintained primarily as user-convenient access points to the relevant terminological and semantic categories and appendices.

The above may not be the best solution -- I'm just tossing it out there because it's been on my mind -- but I don't think that our current topical structure can ever yield satisfactory benefits. This basically stems from the fact that words don'thave topics; in this respect they differ from websites, encyclopedia articles, books, and the like. Topical ontologies are marvelous on WP and DMOZ and the town library, but can't perform adequately in a lexicographic setting. --Visviva 14:08, 15 September 2007 (UTC)[reply]

Personally, I think this sort of thing is better handled with anIndex: orAppendix:. That does not mean that I am opposed to the idea of refining our categories; rather, I am going to advise care in whatever we choose to do. Consider that the current topical categories are set up (as often as possible) to match the context at the head of the definition line, and not for any other reason. Thus, a word used in discussing anatomy, or a sense used in an anatomical context, will have (anatomy) at the head of that definition line and the context template used to insert that text will simultaneously categorize the word so that all such words with that context will appear in a single category. That doesn't mean that we can't addadditional categories to an entry, but we shouldn't try to completely divorce such entries from the primary catgeory either. As an example,Category:Astronomy has subcategories ofCategory:Constellations andCategory:Stars. The words in these categories have (astronomy) as their context, but use a separate template that categorizes them in the appropriatesubcategory of Astronomy. In short, Whatever topical separations we might make shouild be carefully structured to maintain ties to the category of the parent context. --EncycloPetey 15:45, 15 September 2007 (UTC)[reply]

Yes, and I definitely don't want to undo any of the good work that has been done so far. My preferred approach is to maintain the existing topical structure -- which has definite advantages in terms of user-friendliness and cross-project compatibility -- but split most of these into semantic and contextual categories with their own hierarchies; so we would haveCategory:English astronomy terminology alongsideCategory:Constellations et al. (most constellation names obviously not being technical astronomical terms, although I suppose many star names are). Ideally I'd say in any case where{{context}} is appropriate, the category should belexiconic rather than topical -- that is, it is appropriate to label an entry(astronomy) only if it is actually a technical astronomical term.

I had been thinking about using Wikisaurus: or Appendix: space for this, but these are basically properties of individual word senses -- as our pervasive use of{{context}} shows -- so categorization seems like the ideal method. This doesn't need to be any sort of massive enterprise; my ideal approach would be, after some discussion here, to do a proof-of-concept on some small chunk of the topical tree ... then to return to the community for discussion and an update toWiktionary:Categorization (which is overdue for an update anyway) ... and then to work on gradually applying these principles to the topical tree as a whole. Since this doesn't involve uprooting the existing tree -- more like planting some rigorous saplings alongside it -- it can be done gradually without undue disruption. --Visviva 04:32, 16 September 2007 (UTC)[reply]

But weshouldn't haveany;; topical categories that identify the words as specifically "English". Topical categories are English by default, and when they are not English they are prefixed with the appropriate ISO code. --EncycloPetey 18:21, 16 September 2007 (UTC)[reply]

That's my understanding as well... Any categories related to what words mean, such asCategory:Stars andCategory:Vehicles, viz. semantic categories, should be English by default (at least that's our practice, and I don't seek to change it). On the other hand, our practice forcategories based on usage has been somewhat confused, but tending toward the same [language] [category description] convention found in POS categories. So I would have assumed that a lexicon category for technical terminology in field X should be at "[language] field X terminology." That was -- I thought -- the point of the recent flap overCategory:Vulgarities and others. I have no real objection to "Category:Field X terminology" with language prefixes for non-English categories, but I didn't think that was our preferred convention.

I don't really understand *topical* categories as such (although again I don't propose to do away with them). Words have senses and usage characteristics, but they don't have topics. There is no real limit to the words one can use to discuss thetopic of astronomy; on the other hand, the words or senses one can use to name astronomical objects, or which are largely unique to learned astronomical discourse, are relatively limited, and therefore suitable as a basis for categorization. --Visviva 04:24, 17 September 2007 (UTC)[reply]

But words have senses that are only used in a certain topical context. If I sayJupiter in a conversation about astonomy, I probably mean the planet. If I sayJupiter is a conversation about Ancient Greece, I probably mean the deity. Additionally, topical categories serve the purpose of allowing users to find words they don't know on the basis of their context. So, if I'm looking for a particular astronomical term that I heard once but can't remember, I can look through a list atCategory:Astronomy. These categories also provide a useful reference for specialists looking to learn technical jargon for a particular field in another language, such as a doctor who plans to work in a relief hospital in another country and who wants to learn basic vocabulary in order to communicate with the patients. --EncycloPetey 15:37, 17 September 2007 (UTC)[reply]

Apart from the flap about English being a default, I think this is a good proposal. It gets at the heart of an issue that has been left open to interpretation without answering any questions solidly. When a category is not a context, such astime, it is clear that it is not appropriate as a context label for any term. When a category is largely unheard of, such ascombinatorics, a context label on the definition line is probably appropriate for every term in the category. The real question is what a context label means. Was Marcus Manilius talking about the gods or the planets when he looked up at the heavens? Is the the cow that jumped over the man in the moon astronomical or astrological? Do people who believe in Martians have a different definition for "Mars" than astronomers, or are they also doing astronomy?

To say that there is a scientific meaning for a term does not mean that the term has to be used in that or any specific context. I would go so far as to say that, for linguistic purposes, the accepted scientific meaning is more often incorrect. Since its inception, further back than we can even trace the roots of our own language, asecond has been measured, whether precisely or not, as one part in 60 of a minute, which is one part in 60 of an hour. For the last two thousand years, an hour has been considered one part in 24 of the period of a day, which as the most easily quantized astronomical event has existed in concept since prehistoric times. Thescientific definition of a second, on the other hand, has changed four times in the last century alone. So what did Leybourn and Morden mean by "The Length of a Pendulum for Seconds" in 1702? A short, indeterminate amount of time? The duration of 9,192,631,770 periods of radiation corresponding to the transition between two hyperfine levels of caesium-133?

The distinction made in this proposal between a category as topical, such asballparks forbaseball, and technical, such asRBI's, is one aspect of resolving this issue. These destinctions can be very wide or they can be very narrow. Preschoolers learn what atriangle is, but the geometrical shape has a very precise mathematical definition that applies equally well to hyperbolic space. Are they fundamentally different concepts? I would say yes since the distinction is made atcircle. On the other hand,imaginary numbers have only a single idiomatic meaning. But are precalculus students doing complex analysis? For that matter, are physicists? We should really look at how we topically categorize terms in the first place, and decide if and how the split for jargon would be necessary.DAVilla 17:48, 17 September 2007 (UTC)[reply]

Wiktionary:Votes/pl-2007-09/Placenames stopgap

The pageWiktionary:Criteria for inclusion currently claims — erroneously — that we exclude all place names that aren't used attributively. I've created a proposed vote atWiktionary:Votes/pl-2007-09/Placenames stopgap, which would correct that claim. (It's a "stopgap" in that it doesn't replace this claim with a precise description of our criteria for place names, because we don't yet have such a description, and is therefore only a temporary measure to keepWiktionary:Criteria for inclusion accurate-if-incomplete.) The vote hasn't started yet; please take a look, and let me know if there's anything you object to. —Ruakh_TALK06:23, 16 September 2007 (UTC)[reply]

That is an erroneous assumption.You think WT:CFI is wrong. Yet your exampleFrance obviously is used attributively. (French fries, anyone?) I very strongly object to someone pushing their POV by starting a vote out of the blue, completely removed from reality. That guideline currently is fairly well understood (with the exception perhaps of "attributively" - which you don't address) and is actually followed, by and large. I cannot AGF about someone's motives for pushing "notability" criteria, with absolutelyno previous discussion. The previous similar attempts at this type of "Wikipedia notability" nonsense have died for numerous tangible reasons. Please seeWhat Wiktionary is not and read it - it is perhaps the oldest of our "Guideline" pages. --Connel MacKenzie 08:10, 16 September 2007 (UTC)[reply]

Come off it. There is absolutely no reason to assume anything but good faith here. This is not out of the blue. Everyone knows we've discussed this issue at length, and Ruakh has even set the tentative open date a week out to give us more time to discuss and refine thestopgap vote. (Besides, the word "France" is not in "french fries".)Rod(A. Smith)09:21, 16 September 2007 (UTC)[reply]

That is so false, it isn't funny. Proposed votes are supposed to be reflections or solidifications of what the community currently agrees to; not violent vehicles of imposing one POV in direct conflict with existing practice. Seeing the success his friend had with sneaking past an another invalid vote, he decided to bypass normal process intentionally here. Starting the vote on a new approach to a controversial topic withno previous discussion? Sorry, no. There is no possible way to assume good faith. Starting this vote was simply a malicious, disruptive act. The fact thatI previously changed the default to "one week" for newly created votes says nothing in his defense. --Connel MacKenzie 15:24, 16 September 2007 (UTC)[reply]

What? O.K., first of all, Iknow you remember that in past discussions I was quite willing for these non-CFI-meeting entries to simply be moved to appendices;you objected. (By which I mean, many people objected, and you were one of them.) So obviously it's notmy POV that CFI need to be extended in this regard; I'm simply trying to codify a POV that seems to have consensus andthat has already been applied in spite of the CFI as written. (The funny thing is, back then you accused me of POINT-pushing for supporting these entries' deletion; now you accuse me of POV-pushing for trying to fix CFI to allow these entries. Which is it? I guess to you it's more important to be accusing me of something, then to have any basis for the accusation?) Regarding your last sentence: that doesn't make sense. I wasn't sure how long to wait before starting to vote, saw that the default was six days, and decided that worked fine. If I were trying to push this vote through quickly, I could have simply started it immediately; it's not like you cast some sort of magical enchantment that would have prevented me from bypassing your default had I wanted to. —Ruakh_TALK15:44, 16 September 2007 (UTC)[reply]

My "accusation" is that you are starting a vote with no discussion. Of course thegeneral topic has been discussed at length; no acceptable modification has yet been found! That is, perhaps, the only thing that is clear from all the discussions. So you, instigating a vote foryet-another-wording of an already refuted concept isn't worthy of criticism? That does not follow.Not sure what the magical barb is all about; Rod was giving you enormously undue credit for the one-week{{premature}} phase...that was all I was refuting. But it seems you wish to take credit for that, too, anyway? Sheesh. --Connel MacKenzie 16:23, 16 September 2007 (UTC)[reply]

Ah, I see. I think you and I feel roughly the same way: we need some discussion before we start the vote, in order to make sure that the vote does in fact reflect a consensus (rather than simply splitting people into sides and determining which side is larger). However, whereas I feel that it makes sense to create exact text for the vote, so we have something concreteto discuss, you feel that everything I do is automatically wrong, and that I do it in bad faith. —Ruakh_TALK16:36, 16 September 2007 (UTC)[reply]

And all the previous iterations for proper noun proposals, where such text was supplied here on WT:BP? For highly controversial votes, the only time that stunt was pulled before, was when your friend did it for possessives. (No, wait, also for "brand-names 2" - the other obviously invalid proposal.) While I admit my opinion of your personal credibility has taken an astounding nose-dive, I was critiquing your actionsfor this vote only; you did it wrong, and you know better. That's not an assumption of bad faith, it is a clear observation of a fundamental fact. There was no reason for you to be so deceptive. --Connel MacKenzie 18:47, 16 September 2007 (UTC)[reply]

I didn't think this was a controversial voteat all, much less "highly controversial"; my intent was for it to reflect what people already seem to agree on, leaving for later discussion things that people don't agree on. There's nothing deceptive here; you're just being either insane or malicious (I'm not sure which). Since we're being so frank, my opinion of you changes significantly over time. Sometimes you seem to be a perfectly sane human being with the desire to make Wiktionary better; other times it's fairly obvious that you're a troll with the sole goal of pissing off people who want to actually contribute here. (For the past week or so you've been in the latter category, and I'm kind of just waiting till you become sane again.) —Ruakh_TALK21:02, 16 September 2007 (UTC)[reply]

How could a vote about a controversial topicnot be controversial? Look at your actions, from the perspective ofany person other than yourself and it is clear that you intentionally sidestepped the discussion, just to push the specific changeyou wanted. That, good sir, is either deceptive or malicious. I suppose I shouldn't rule out other possibilities, like insanity on your part. That would explain your recent actions with the bizarre (undiscussed, out-of-the-blue) headings in the main namespace. Or the bot stuff. Or the template stuff. Taken collectively, it seems to me that you are the one acting in an unbalanced manner. I cannot imagine any way that your could rationalize any of those, let alone all of them in succession.Are you enjoying trading barbs? I've re-written this three times now, to slow down the obvious escalation of name-calling. But you seem to be on a defensive, manic swing? Is it just a love of wikidrama? --Connel MacKenzie 06:52, 17 September 2007 (UTC)[reply]

Connel, have you noticed that you are theonly person who assumes that there was deception or malice here?Rod(A. Smith)07:20, 17 September 2007 (UTC)[reply]

No, but I do see numerous others expressing similar concerns more diplomatically. --Connel MacKenzie 08:15, 17 September 2007 (UTC)[reply]

Actually I don't see the need for a stopgap if we can just come up with some reasonable criteria and put it to a vote. Connel, why don't you open a vote on that proposal you made before on celestial objects?bd2412T 19:08, 16 September 2007 (UTC)[reply]

Because it kept getting shot down. My reading of Ruakh's and Dmcdevit's concerns on that (very long) thread, was that there is no possible way to satisfy all (nor even enough of the community's concerns) to make it a feasible proposal. When all was said and done, it was still a proposal that would inflictencyclopedic notability for English entries here on en.wiktionary.org. I don't see how to rectify that. The complaint is quite genuine: we have a CFI that is based on a word'suse in language. No matter how you look at it, creating exceptions for that, based on "notability", is unacceptable to some of the interested parties. And that is a serious shortcoming, that I've come to appreciate and agree with. --Connel MacKenzie 06:52, 17 September 2007 (UTC)[reply]

If someone does that in the near future, I'll withdraw my proposed vote. Until that happens, though, I have to assume that the months of stagnation indicate that we do not have consensus about what exactly the CFI should allow in the way of place names; but wedo have consensus that the CFI are broken, because some place names are sufficient important (for some value of "important") that they warrant inclusion despite a lack of attributive use. If Connel votes against this just to be a dick, that's his right, but it doesn't seem that even he actually disagrees with the proposed change, and I don't see that it will cause the vote not to pass. —Ruakh_TALK21:02, 16 September 2007 (UTC)[reply]

If I support your proposal, you'll stop calling me names? We can't have that - the Earth's orbit might be put in jeopardy. But thanks for wearing your heart on your sleeve.

Your proposal is to add"with exceptions being made for place names that are of particular importance." Yes, I object to that, for all the same reasons that the "celestial objects" proposal was shot down. That is, "particular importance" is notlexical importance. (Important towho, anyhow?) But more to the point, I'll vote against it on principle - it was started out of the blue, with no discussion. (Your assertion that any of the "place names" proposals is not controversial is inexplicable, given the many kilobytes you yourself have posted on the subtopics. They are all undeniably controversial.)

I'm not sure at all what you mean, when you say CFI is broken. WhenI say it is broken, I mean that #1) it allows for unedited (and non-spell checked) Usenet postings, #2) the one-year date range is far too small and #3) the three citations minimum is far too low. Sadly, I don't see a workable compromise; requiring twenty (20) or more book citations from reputable publishers spanning ten (10) years would put too great a burden on the volunteers that cite entries for WT:RFV already. And that would exclude legitimate, specialized jargon. --Connel MacKenzie 06:52, 17 September 2007 (UTC)[reply]

Your claim that "'particular importance' is notlexical importance" is deceptive, as I already made clear (below) was that Ido intend for it to mean lexical importance, and I welcome any change to the proposed wording that would make that more clear.

Your claim that the vote was "started out of the blue, with no discussion"would be deceptive — the vote hasn't been started yet, period, and this right here? This is discussion — except that deception requires some sort of belief, or at least hope, that you might successfully deceive someone. No one here is stupid enough to buy into this claim, and you know it. Making obviously false statements that no one will believe doesn't make you deceptive; it makes you a troll.

When I say the CFI are broken, I mean a host of different things. One is the same as one of yours: that we can't increase the requisite number of citations, even though it's pretty clear that three is just too low, because currently the CFI are entirely dependent on people actually typing up each citation. Another is similar to one of yours: that they give equal weight to a Usenet posting as to an actual book, except in the special case that the book is a "well-known work" (which counts triple); I think it's clear that a citation from a book is much more valuable, and much more meaningful, than a citation from Usenet. And one is completely separate from yours: that the CFI as stated excludeFrance#English, even though we have consensus that the English proper nounFrance is important enough (for some value of "important") for it to be kept. (And, there are a bunch of other things besides. These are the three major ones, though.)

—Ruakh_TALK15:30, 17 September 2007 (UTC)[reply]

Sorry, but no. You created the vote (as you repeated in the very top of this section) indeed, with no prior discussion. Yes, that is "out of the blue." You've usedthis discussion as a vehicle for a slew of personal attacks. Your clarification below reverses the meaning of the proposal; if the wording of the proposal had said"place names, where the name itself is of particular importance..." it would be one thing, but it never did. Even that wording has far too many subjective holes though. You act as if there is no cause for complaint, when you start/create a vote, yet point-by-point acknowledge the validity of each critique. Nice. Glad to provided you with an outlet for more name calling. How anyone might think that behavior of yours is anything but trolling, is hard to guess. --Connel MacKenzie 16:37, 17 September 2007 (UTC)[reply]

I'm sorry, but you've failed to give any reason why it might be wrong to create a page for a proposed vote, link to it here, and let it be discussed and improved — and possibly canceled — before it starts. I gather that you think it's wrong, and that it deeply bothers you for some reason. Fair enough, I won't do it again; but I don't see how you can have expected me to know that. Certainly other editors have participated in this discussion, and as far as I can tell none of them minds that a page already exists with a proposed version of the vote. I started this discussion here, and it's you who launched into personal attacks and assumptions of bad faith. (I'll grant that I shouldn't have stooped to your level, though.) Yes, I'm proposing that precise-but-erroneous wording be replaced with accurate-but-subjective wording; I consider it more important that our policy pages be accurate than that they be objective, and it didn't occur to me that other people might feel differently. (That's my fault; I failed to see the obvious analogy of main-namespace pages, where you prefer a standard structure with inaccurate information over accurate information with a nonstandard structure. I'm sorry; your point of view on this is just so strange and foreign to me, that I have difficulty taking it into account. I'll try harder.) —Ruakh_TALK17:01, 17 September 2007 (UTC)[reply]

Oh really? Let me see if I understand your position, then. You are saying that votes should be started with no prior discussion, using only the default one week "rewording" period to determine if those new votes (which some will say should never have been created in the first place) should be withdrawn? Furthermore, you think that sort of disruption (for controversial topics especially) should beencouraged? I could accomplish quite a bit of policy reformation, if I stooped to that level. But WT:VOTE would be quite overwhelmed. Your analogy to the main namespace is cute; but again, I don't see the point of having completely unusable useless data clogging up searches...GIGO. The fact that so many of the bizarre deviations cause secondary problems seems to mean nothing to you at all; that, I do think is more than just strange. Your continued misrepresentation ("standard structure with inaccurate information"? No.) shows that you still wish to troll here. Good going. One mistake of mine amidst a thousand corrections is license for you to harp on and on? Yet you enter batches of intentionally useless material in the wrong place, then grouse for weeks when I suggest it should be corrected? Yes, my opinion of you has now managed to drop even lower. --Connel MacKenzie 17:27, 17 September 2007 (UTC)[reply]

Re: "my opinion of you has now managed to drop even lower": I'm starting to think that's something I should be happy about. :-) —Ruakh_TALK17:32, 17 September 2007 (UTC)[reply]

Well then, have a great day. Funny, that actually trying to respond to the rational-sounding points you make and actually pointing out where you went wrong, results in more snipes from you. What you did, starting the vote, was wrong. Attackingme because you made an error and I pointed it out, is understandable. You have my pity. --Connel MacKenzie 19:17, 17 September 2007 (UTC)[reply]

I don't see this as issue of bad faith either, and even if it were, I don't see where bringing that into the discussion would help. That's what the whole spirit of AGF is. No matter what our first instincts are, if we train ourselves to react as if we thought something was done in good faith, we're likely to get the more cooperative response, so it is almost never useful not to do so.Dmcdevit·t 10:19, 16 September 2007 (UTC)[reply]

I just noticed this after I had posted to the proposal's talk page. To be clear, so it doesn't look like I'm just being contrarian: I don't have any love for our current placenames CFI either; I think it is wrong, actually, and (*cue sound of distant universes popping out of existence*) I think we should be allowing certain non-attributive placenames. This is the wrong way to do it, though, and it is largely the same problem as all three other votes proposed so far, just in fewer words. It seems every time people try to solve the problem it comes out as a definition of notability, not word usage. Certainly that is not current practice as the proposal implies, though. I'm crossposting the posts fromWiktionary talk:Votes/pl-2007-09/Placenames stopgap below, if no one minds.Dmcdevit·t 10:19, 16 September 2007 (UTC)[reply]

Importance as a criterion

I really believe thatimportance (of a place or a person) is a good criterion for Wikipedia, but not here. The criterion was linguistic. I agree that it must be changed, but the new one should be linguistic too (e.g.Confucius,White House,Le Havre andFrance should be accepted because they are words (or can be considered as words),George Washington orWashington Street should not be accepted, because they cannot be considered as words). Of course, this should be refined, but you can see the idea.Lmaltier 07:13, 16 September 2007 (UTC)[reply]

I agree. Importance has nothing to do with it, and we shouldnot be entering into debates on notability (although, note that all cities, towns, villages, etc. with any population on any census records are considered notable on Wikipedia. This suggests that importance can be defined rather broadly). The problem is that a word is a word if it has usage in it's particular language, not if the thing it represents is important. Is "frtyu" an important place name? It might qualify, since that's the word that I just coined for "France," a notable place. This is an absurd example, but I would say that adding many other "important" place names would be similar coinages. As I've said before, I believe that "Melekeok" is not a word that can be fairly said to have entered the English vocabulary. This does not mean that the capital of a sovereign nation is not an important place, but if in all of JSTOR, there is not a single reference to it without parenthetical explanatory context (i.e., what you would do when introducing a foreign word that is not translatable), I would say theword is not important. In terms of adictionary, trying to construct a criterion for inclusion on the basis of the real-world importance of the concept a word refers to, and not the word itself, is necessarily arbitrary, and encyclopedic.Dmcdevit·t 08:00, 16 September 2007 (UTC)[reply]

Hmm. In a draft y'all didn't see, I originally wrote "particularly important place names", but then I realized that that could sound like "{{particularly important} place} names", i.e. names of particularly important places (an encyclopedic criterion, not a linguistic one); so, I rephrased in a way that I thought was unambiguous: "place names that are of particular importance", where I thought it was clear that thenames were what had to be important. Judging from your comments, however, it seems my rephrasing was insufficient: it still sounds, or risks sounding, like it's talking about encyclopedic notability.Note: this paragraph was edited 17:01, 17 September 2007 (UTC) to fix a major typo; specifically, to replace "place names" with "places" in one spot.

I guess what I'm trying to say is, I completely agree, and that's what I was trying to say to begin with. Please help me out by proposing a phrasing that doesn't have this problem. :-)

—Ruakh_TALK15:25, 16 September 2007 (UTC)[reply]

I understand the urge to want to include "important" names (how could we not have France in a dictionary?) but I am also uneasy about getting there via a requirement that comes down to a form of notability. Having said that, I do not have a good alternative proposal. Explicitly allowng cetain classes of place names seemed like a good way around some of this issue but past that I hope someone else can see their way out of the thicket. I don't think I can support this proposal as it stands.ArielGlenn 15:38, 16 September 2007 (UTC)[reply]

"Particular lexical importance"? Or "place names which most members of the community, as of today, regard as lexically important"? The first would rule out encyclopedism (I think), the second would describe our actual practice. Ultimately our concept of "importance" can only be suitably defined through a test (or tests) for lexical importance, such as those that have been proposed for brand names, phrases, et al.... however, you are probably sensible to leave such tests out of this stopgap vote. --Visviva 15:45, 16 September 2007 (UTC

Generally speaking, place names of particular lexical importance are names of important places, because they are used more often... But is importance important? All verbs are accepted, not only important ones. Even names of small places are interesting (if they are words), for their etymology, their pronunciation, their gentilic, etc. Also note that etymological dictionaries specialized in place names do exist.Lmaltier 19:52, 16 September 2007 (UTC)[reply]

If we weren't already swimming in namespaces, a "Gazetteer:" namespace would have much to recommend it. --Visviva 03:52, 17 September 2007 (UTC)[reply]

Aren't gazeeteers encyclopedic? I was referring to books written by toponymists. Toponymy is an important part of onomastics, which is an important part of lexicology.Lmaltier 05:48, 18 September 2007 (UTC)[reply]

I've changed it to "place names that are particularly important words", which I think means the same thing as "place names of particular lexical importance" but without using the word "lexical", which sounds too technical to me. That said, if you want to change it to "place names of particular lexical importance" or "place names which most members of the community, as of today, regard as lexically important", or anything else though, go ahead; I'm by no means sold on any specific wording, and do not by any means consider myself to "own" that vote page. —Ruakh_TALK17:13, 17 September 2007 (UTC)[reply]

Personally, I would removeimportant altogether, and change it toplace names which most members of the community, as of today, consider as includable (not as names, but as words). Would not this wording be acceptable to everybody?Lmaltier 05:48, 18 September 2007 (UTC)[reply]

To be honest, I'm not a huge fan. I prefer your wording to our current inaccurate text, but I don't really like the "not as names, but as words" part, because I consider a name to be akind of word (albeit a very special kind). Also, it seems strange to me for a policy page to use the phrase "as of today". If you don't think we should say "important" — and I'm starting to agree with that view, as it's becoming clear that for many people "important" means "having encyclopedic importance" — then I'd prefer something like "place names that appear to have entered the language". —Ruakh_TALK06:45, 18 September 2007 (UTC)[reply]

Some names are words, some are not words (nor terms), this is my point.Confucius is a word, notGeorge Washington,Champs-Elysées is a word, notavenue des Champs-Elysées, SNCF is an acceptable term, notSociété Nationale des Chemins de fer Français. Of course, only terms actually used in a given language can be included for this language, this is the general rule. Aboutas of today, I agree with you, but I just copied your second proposal.Lmaltier 16:38, 18 September 2007 (UTC)[reply]

The wordimportant should not enter into the discussion, because it is far too subjective. Again, not all languages provide indicators such as spaces and upper case letters. Asian languages often lack such indicators. This is why it is more important thatobscure people and place names beincluded. Here is an example fromRomance of the Three Kingdoms/Chapter 2:

於是長沙賊區星作亂
Yúshì, Chángshā zéi Ōu Xīng zuòluàn
After that, a bandit fromChangsha namedOu Xing began to wreak havoc

Note that the indicators (spaces and upper case letters) in the pinyin and English are not present in the original text. A person who is familiar with China would probably already know thatChangsha is the capital ofHubei province. However, very few people would know that區星 is the name of a person, and that區 should be read as Ōu (it is normally read as qū). --A-cai 21:52, 18 September 2007 (UTC)[reply]

set similes

I see we already havemad as a hatter, but calling it just an "adjective" seems somehow missing something. There are loads of these –neat as ninepence,clean as a whistle,happy as Larry – and so on. Should we a have a category for them, and what are they called anyway? I can't think of anything better that "set similes", which sounds horrible.Widsith 07:30, 16 September 2007 (UTC)[reply]

Some possibilities:famous similes,infamous similes,very common similes,widely-known simile,{{idiomatic simile}}. I'm not sure it merits a separate category...the "catsect" (category intersection) tool should be able to show{{idiom}} &{{simile}} intersections. --Connel MacKenzie 07:42, 16 September 2007 (UTC)[reply]

It seems like it's obvious from the form of the expression that it comes from a simile, so I think just labeling it{{idiom}} should suffice. (And I suspect thatneat as ninepence andhappy as Larry also warrant{{UK}}; at least, this Midwesterner isn't familiar with them.) I do think it would be nice to have aCategory:English similes; it doesn't seem that the "set" part needs to be included in the name, because it's a given that we only include set similes. —Ruakh_TALK06:53, 18 September 2007 (UTC)[reply]

Script templates

Hi everyone. I've been really busy with other things lately that I've not noticed that script templates have been edited. I need someone to explain to me why the script templates have been redirected, altered, etc. For example, theURchar template now redirects toArab template. I have not been able to find any conversation regarding this (except the little mention of this on the Grease pit last month). I understand that to many English speakers Arabic and Urdu might be the same thing - however they're not. So, someone please explain this. I've edited thousands of entries using these templates. It makes me mad that no one has contacted users that use these templates very frequently to discuss this with them. --Dijan 16:42, 16 September 2007 (UTC)[reply]

It was discussed (rather shortly) in theWT:GP, and DAVilla immediately went and started re-arranging them to match the ISO script name codes (e.g.{{Arab}} for Arabic script). However he did a number of things badly, in way too much of a hurry. For example replacing {URchar} with a redirect to {Arab}. Which is wrong, Nastaliq should be{{ur-Arab}}. (The standard IETF/XHTML tag.) I will fix it. Note that Urdu written in Devanagari would use{{Deva}}, unless a script variant is needed there? I don't think so.Robert Ullmann 17:08, 16 September 2007 (UTC)[reply]

Thank you Robert! Yes,{{ur-Arab}} would be more accurate. No, a script variant for Devanagari would not be necessary.{{Deva}} would be just fine. Also, this is not just about Arabic script, but also about Cyrillic and others. For example, present{{Cyrl}} template seems to be designed specifically for Russian - it uses fonts designed to support Russian Cyrillic. Anyway, in the end, all we've done is just renamed (standardized) the templates according to script name, correct? Thank you so much for replying! Makes life a lot easier. :) I know I do not participate in many conversations here, but I would appreciate if people contact me or Stephen (who also uses them very frequently) when it comes to these templates. Thanks, again. :) --Dijan 17:21, 16 September 2007 (UTC)[reply]

Thank you, Robert, for resolving that (and catching other mistakes like mismatched CJKV). I hadspecifically named Nasta`līq script after making the transition as not mapping 1-to-1, but didn't know how to follow through.

How many of these variants are we going to need? Don't a number of them map equivalently (ur, ps, fa)? I can't imagine how their use would be regulated. Couldn't we pass the language code to the script template and have it do something special if needed? For instance, if we always wrapped the language, then it would be possible to boldface Latin but italicize English. On the other hand, it doesn't take a script template to do that.DAVilla 18:59, 17 September 2007 (UTC)[reply]

The thing is that the specific template for the language (ur-Arab) can then add that to the HMTL:lang="ur" xml:lang="ur-Arab" (think I've got that right), and then a browser can do user specified font selection. We don't need a lot of these; Arabic and CJKV are the only serious cases. The set of tags used is maintained by IANA. (And with several previous systems, is very complicated ;-). We just need to worry about the cases we find we need.Robert Ullmann 13:08, 21 September 2007 (UTC)[reply]

Wouldn't we always want to wrap the HTML with this?

<span xml:lang="{{#if:{{{lang|}}}|{{{lang}}}-Script" lang="{{{lang}}}|Script}}">

DAVilla 18:35, 21 September 2007 (UTC)[reply]

Spanish grammar tags

Some healthy discussion about the appropriate grammar tags to use for Spanish entries has been taking place in various places, some on wiki and some elsewhere. In order to gather the appropriate input, let's continue the discussions atWiktionary talk:About Spanish#Third-person verb form definitions andWiktionary talk:About Spanish#Present participle grammar tags.Rod(A. Smith)01:41, 19 September 2007 (UTC)[reply]

download wiktionary

Is it possible to download the english wiktionary database in a basic sql or txt format?— Thisunsigned comment was added by87.65.86.44 (talk) at00:34, 20 September 2007 (UTC).[reply]

Have a look athttp://download.wikimedia.org/enwiktionary/. I doubt anything there qualifies as "basic", though.Mike Dillon 03:33, 20 September 2007 (UTC)[reply]

For various technical reasons I don't pretend to understand, the major database dumps are no longer available in SQL format. You have to download them as (compressed) XML and convert to SQL using the utility of your choice. --Visviva 07:10, 20 September 2007 (UTC)[reply]

The main concern, as I understand it, is that deleted entries are not removed from the SQL database, only flagged as deleted (so they can be restored if needed.) AFAIK, the "deleted" items are only actually removed periodically, whenever the WMF cluster runs out of space. --Connel MacKenzie 07:27, 20 September 2007 (UTC)[reply]

Dont forget to mention the existence ofSpecial:Export.Mutante 14:47, 23 September 2007 (UTC)[reply]

Template talk:audio

I just wanted bring awareness of the new audio player. Please continue discussion on the talk page forTemplate:audio. --Steinninn 01:03, 21 September 2007 (UTC)[reply]

Yes, if you follow the [file] link, the audio file can be played using the Java media player on commons. --Connel MacKenzie 06:11, 21 September 2007 (UTC)[reply]

I agree with Connel's objections--especially the ugly blocky button. Using such a player doesn't make sense for the tiny pronunciation files we have. --EncycloPetey 12:51, 25 September 2007 (UTC)[reply]

CPT?

Just checking...I seem to recall we agreed a long time ago that having an entry for each CPT code was acceptable. That is still the case, right? Assuming that is so, isCPT 13160 an acceptable format? --Connel MacKenzie 06:13, 21 September 2007 (UTC)[reply]

http://www.ama-assn.org/ama/pub/category/3657.html Seems to be a problem. --Connel MacKenzie 06:58, 21 September 2007 (UTC)[reply]

Copyright issues aside -- although those alone are probably sufficient toscotch any program of inclusion --, this doesn't strike me as something that plays to our strengths as an open dictionary. There's just not that much that can be said linguistically about a single numeric code. --Visviva 07:24, 24 September 2007 (UTC)[reply]

Are standard templates toonaked ?

TheWiktionary:English_entry_templates typically have just two headings listed, butWiktionary:Entry_layout_explained#Additional_headings lists so much more in the way of information thatmight be included. If the idea of the template is to standardize the layout, content and structure, it would see to be moreadvantageous to have the standard template be all-inclusive. One would naturally instruct users to trim bits that were unneeded, but it might give cause for users to give greater thought to the content of new additions. I'm a newbie, soinsight i'm missing would be most welcome. -Iggynelix 22:26, 21 September 2007 (UTC)[reply]

I've tried an all-inclusive headings version, but entries just aren't started that way.DAVilla 13:39, 23 September 2007 (UTC)[reply]

I've never really understood why these aren't set up as standard substable templates, so that I could just type in

{{subst:new en noun|etym=[[foo]] + [[bar]]|definition=A bar with Foovian characteristics}}

or some such thing, and generate the properly-formatted entry in one swoop... Such an approach would allow ParserFunctions to be used, so that the "Etymology" section (or whatever) would be generated only if the user specified a value for it. But there is probably a reason why this has been avoided... --Visviva 07:54, 24 September 2007 (UTC)[reply]

You mean like{{new en noun bot}}? Etymology would be a nice addition there. --Connel MacKenzie 08:08, 24 September 2007 (UTC)[reply]

Hey, that's a handy template. Actually I was thinking of something further along that road, along the lines ofUser:Visviva/new en noun, using subst'ed ParserFunctions to add or remove sections. (I'm still having some issues with whitespace handling in that draft.) --Visviva 08:56, 24 September 2007 (UTC)[reply]

Yes, all the "new" prefixed templates are supposed to have " bot" suffixed equivalents. If you figure out how to do equal signs within the templates...your thing might work. (The syntax would get pretty cumbersome with one #if: per line, but at least it might work.) --Connel MacKenzie 09:01, 24 September 2007 (UTC)[reply]

Noting lemma forms inWT:ELE

Conversation moved toWiktionary talk:Translations/Noting lemma forms in WT:ELE for easy reference from here orWiktionary talk:Translations. Please continue the conversation in either location.Rod(A. Smith)22:25, 3 November 2007 (UTC)[reply]

Future of the dictionary

Hi guys, did you seethis talk? Seems to me this is exactly where we are heading…H.(talk)08:18, 25 September 2007 (UTC)[reply]

Yes, it's been posted before but from YouTube. --EncycloPetey 12:47, 25 September 2007 (UTC)[reply]

AtWiktionary talk:Main Page#Etmyology [sic] a while back. Excellent stuff...I should have thought of mentioning it here. --Connel MacKenzie 06:40, 26 September 2007 (UTC)[reply]

RFT entries

I think it's wonderful that we have these {'{rft}'} tags and that the entries show up onWT:TR but I feel that as long as something remains on the rft list: the reason for that status should probably remain in the TR, a link to the archived TR topic should be available,or at least some clear information about how to find the relevant discussions should appear on the entry's history or discussion pages. Once again, I admit I could be overlooking something. The specific entry I'm concerned with isAnglosphere & I raised this inWT:TR#Anglosphere.Thecurran 20:16, 25 September 2007 (UTC)[reply]

Indeed, once the discussion has closed/been archived the tag should be removed. I suspect that the problem is that TR discussions don't have the same character as deletion or verification discussions: they aren't "closed" but simply shuffled off into nospace after a decent interval, and many discussions don't pertain to a specific entry, so tag-removal isn't an obvious part of the archiving process. Cleaning up leftover tags should be a bottable task, though -- the bot could scanSpecial:Whatlinkshere/Template:rft and remove the template from those articles that don't have an incoming link from TR. --Visviva 04:57, 26 September 2007 (UTC)[reply]

Well,{{rft}} is quite recent, compared to how long the tea room has been around. But yes, the tea room discussions pertain to single specific entries only about half the time. The ones that do, should be archivedto the page's talk page. But, as Visviva points out, there isn't usually a "this is closed/answered/resolved" concept for the tea room. Perhaps any conversation inactive for a month or two should just move to the respective talk page (if there is one) or a tea room archive page, otherwise. It is now about 1/4 MB, so better archiving should start to be considered, for it. --Connel MacKenzie 06:49, 26 September 2007 (UTC)[reply]

Proposal to expand word associations

I don't understand why Wiktionary limits itself to information that can be found in many other dictionaries without taking advantage of the virtually unlimited room that a computerized database offers. As it stands, each entry presents brief and often uncomprehensive definitions, a smattering of examples to show usage, and alist of synonyms. Wikisaurus also takes the "list" approach, requiring users to click on each item to see its definition on a separate page.

I would like to see Wiktionary and Wikisaurus combined, so that I can find all information about a word in one place. But more important, I would like to see a more expansive approach to incorporating related words having a variety of relationships to the primary term, along with a phrase describing the relationship in each case. Entries and related words would be grouped logically according to their meaning. The result would be areadable text in which both poets and technicians could easily find and grasp meanings that revolve around the primary term. And more editors would be inclined to contribute if they could add to a freeform text rather than just the constricted, formal approach of the current version.

This approach would undoubtedly make for many lengthy entries, but why not? Imagine an entry for "ball" that cites not only the various types of ball but also the sports that use a ball and the various plays in which a ball figures ("address" in golf, "alley" in bowling, "at bat" in baseball), with a phrase defining the connection ("Toaddress a golf ball is to take one's stance and adjust the club preparatory to hitting it").

Here is an example of such an entry, in which words based on "abject" are grouped according to their meaning rather than their part of speech. Each of the related words (italicized here) would contain a link to the article containing its full definition. Obviously, the format could be improved. Any believers?Fbarw 23:23, 25 September 2007 (UTC)[reply]

abject 1 a (1) [adj] : cast down in spirit; without spirit or pride;cringing, groveling, servile, subservient

abjective [adj] : tending to make abjectabjectly [adj] : in an abject manner

One that is servilely abject isfawning.Slavish may connote abjectness.Subservient implies compliance and obedience, perhaps abject.An abject servility or obsequiousness isservilism. Tocower is to shrink away or cringe, usually in abject fear of something menacing. One maycrawl by advancing abjectly.Grovel implies a crawling or wriggling close to the ground, as in abject fear.An abject parasite or toady is alickspittle. One mayadulate someone by admiring or being devoted abjectly to that person.Humble may suggest an abject attitude and demeanor.Modest is without any implication of abjectness.Mister, used in direct address and not followed by the given name or surname of the man addressed, sometimes expresses abject deference (as of a beggar).

abject 1 a (2) [trans verb]obs : to cast out; reject

abject 1 a (3) [noun] : one cast off; outcastabjection 1 [noun] : a casting out or off; rejectionabject 1 a (4) [trans verb]obs : to cast off

abjection 2 [noun] : the discharge or casting (as of the spores of certain fungi)

abject 1 b [adj] : unrelieved by any sign of independence, courage, or originality; showing utter resignation; hopeless, helpless,supine

Someonecraven is characterized by abject defeatism. Someonepuling is of an abject nature.Pusillanimous connotes abjectness.Recreant implies abject lack of resistance.Supine suggests lethargic abjectness. Tolie down is to submit abjectly to defeat, disappointment or insult.Superstition may be an irrational abject attitude of mind toward the supernatural, nature or God.Something donepoorly is done abjectly [arch].

abject 2 a [adj] : sunk to or existing in a low state or condition;underfoot

abjection 3 [noun] : a low or downcast state; degradation, humiliation

Dirt andruin [arch] are abject states.

abject 2 b [trans verb]obs : to cast down; abase

abjection 4 [noun] : the act of making abject; humbling

To reduce someone to abject poverty is topauperize that person.

And how will people with limited English make use of this? What will happen when a single page contains information for several words spelled differently, and must cope with severla languages as well. The approach you've described looks more like a print dictionary than what we have here. Look at an entry such asparrot to see a well-filled out English entry. Look atser to see how we accommodate the fact that the spelling "s-e-r" occurs in multiple languages. --EncycloPetey 06:48, 26 September 2007 (UTC)[reply]

Actually, it looks similar to one of the proposed Wikisaurus formats. As I recall, that style didn't have any opposition, but it also had no one willing to go to that great an effort, either. (The difference, was that the proposed/abandoned format covered only "Wikisaurus:abject" in the above example, but would have separated "Wikisaurus:abjection" onto a separate thesaurus page.) --Connel MacKenzie 07:02, 26 September 2007 (UTC)[reply]

This corresponds somewhat to#Future of the dictionary above, where the speaker talks about "clickiness" and the similarity between outdated print dictionaries and online dictionaries that have not built to their potential. It also relates to a previous request to highlight a particular definition on a page. This is useful information, but is there any way to reference it without having to duplicate it everywhere? Can we transclude{{abject&def=ae923fc42llc47bf}}?DAVilla 16:37, 27 September 2007 (UTC)[reply]

OmegaWiki uses WikiData to implement the relational structure that is required for an endeavor. Without WikiData or some similar extension, it's pretty difficult to support the complex transclusion structure that would be required under out-of-the-box MediaWiki.Rod(A. Smith)23:27, 27 September 2007 (UTC)[reply]

Some replies to the above comments:

For EncycloPetey: People with limited English (and aren't we all limited?) will type "abject" in the search box and be directed to this page. However, even with a phrase showing how the entry term is linked to the associated word ("Slavish may connote abjectness"), the user may still want to click over to the "slavish" entry to see what other meanings it may have, as the entries for italicized words do not necessarily convey the full definition of those words. An element of this approach is that the description for each associated word is tailored to include only the part of its meaning that ties it to the main entry.

Another element is that words based on the entry term ("abjective", "abjectly", "abjection") appear on the same page with their full definition(s). If a secondary term relates to only one (or several) of the meanings of the primary term, it is placed in close proximity to that definiton ("abjection", meaning a casting out or off, follows "abject", to cast out). "Abjection" would not have a separate page, and a search for it would lead to this page. Although my example does not cover this, the full entries for all synonyms of "abject" ("cringing", "groveling", "servile", "subservient") could also be included on the same page, allowing the user to compare the usage of each. The idea is to provide a database in which the definitions and associations for each term are classified by their meaning, yet each term is accessed simply by typing it into a search box.

Connel MacKenzie correctly notes that this approach would require "great ... effort". You bet! I have completed an entry for "ability" (including "able", "capable" and "capacity", among others) that weighs in at 2.8 megabytes (without images). But what are a few thousand megabytes to the Wiki community? Yes. it would look a lot like a print dictionary, but is that a crime? People who like to read dictionaries would love this one.

DAVilla is concerned about duplication. Yes, there would be a lot of it, but not having to click about so much would make such a database much friendlier for users.

For A. Smith: With each of the associated-word descriptions hand-tailored for each main entry, there would be less room for transclusions. I'm afraid this approach relies a great deal on human intervention.

I have not yet worked out how multiple languages would be dealt with, but this doesn't seem like an insurmountable problem; the "ser" example cited by EncycloPetey should work well with my proposal.

EncycloPetey calls attention to the "parrot" entry in Wikisaurus. First, the layout of that page is neat and appealing, no doubt. My sample needs better formatting, since my original document in Word does not transfer well into the Wiki editing format. All of the non-definitional material, such as etymology and pronunciation, should of course be retained. But because my approach classifies by meaning rather than parts of speech, the noun and verb meanings of "parrot" relating to repetition would be together, separate from the "bird" meanings. Also, the adjective meaning "of, resembling, or of the nature of a parrot" would be included, followed by derived terms such as "parrot fever" and "parrotfish". Most importantly, all of the various (significant) types of parrot would be listed, organized according to their scientific classification, together with full definitions (in this case brief descriptions of appearance, habitat and idiosyncracies). Distinguishing features would also be included separately, such as "cere", a soft swollen mass, often feathered in parrots, through which the nostrils open at the base of the upper mandible. All of this might duplicate some information in Wikipedia, but the emphasis would be on individual words rather than essay-length discourses.

Thanks for your comments. Can anyone direct me to a project that might find this proposal useful?Fbarw 21:53, 3 October 2007 (UTC)[reply]

I can't answer your question, but it seems to me that your approach would benefit greatly from standard formats for long entries that hid information 'under' buttons so that an initial screen showed what people most commonly needed and showed the complete range of content for the entry. I don't know what combination of show-hide buttons (size?), lead, table of contents, and format rules would do the job, but there ought to be a way.DCDuring 14:49, 9 October 2007 (UTC)[reply]

Bot vote: Interwicket

Please seeWiktionary:Votes/bt-2007-09/User:Interwicket. A new 'bot that is much more efficient that the bot code designed for the 'pedias. Make sure sure you look atUser:Interwicket/code if you are at all interested in how it works at present, understand it is a work in progress. (You might like to look atUser:AutoFormat/code too, although not relevant to this.)

We haven't had a working iwiki bot since July,User:VolkovBot was going to run, but only ran on the 15th, and a few edits on the 20th. We need a better bot with less overhead; we'll see how it goes.Robert Ullmann 23:36, 25 September 2007 (UTC)[reply]

Important notes: (seeWT:GP for more info) this is far bigger than anticipated; it turns out that bots runninginterwiki.py on Special:Newpages or whatever have missed something like 1/4 of a million iwikis. Once they miss it, it is not recovered. I am trying to run code to do some of them; but it takes a while: today (last 24 hours) I have had 3 threads running at/from "d", "m", and "s"; none has made it out of its start letter.

A couple of people have commented on the name, please do. Do note that because of the number of edits, renaming is not a good idea (large overhead, even thoughUser:UllmannBot has been doing a lot); but creating a new name would be fine.Robert Ullmann 23:09, 30 September 2007 (UTC)[reply]

Wiktionary is not a usage guide

I'm surprised there are not more issues related more directly to the dictionary/grammar/style and usage guide distinctions inWiktionary:What Wiktionary is not.

Seethis diff onenormity. Just like any proper dictionary, it is not our place tooccult senses that exist, but happen to be disputed, sometimes hotly so. Although it is part of our duty to note where disputed usage exists (hence the recent creation of{{proscribed}}), we should not make factual statements about errors of meaning when semantic change is real and acknowledged (although likely denounced) by authorities.

All this to say, what do you think of adding a "Wiktionary is not a usage or style guide" entry to What Wiktionary is not?Circeus 18:54, 26 September 2007 (UTC)[reply]

Writing the text for that will be very difficult, since wedo want information about usage here. When use of a word stamps the user with a particular regionalism, social class, level of profanity, or educational level, we would like the user to know about that. However, we don't exclude words just because they are vulgar, stigmatized as incorrect, or are likely to incite anger. --EncycloPetey 03:33, 27 September 2007 (UTC)[reply]

My point has to do with dispute over fluctuation in usage, especially definitions (mostly stuff like what is found onw:List of English words with disputed usage). We should not say "definition x is wrong", but rather "many usage writers strongly feel that usage x is semantically/grammatically wrong", or "usage x is the object of disputes amongst usage writers". And we should certainlynot eliminates a disputed definition altogether, as was done inenormity. That would be was us Frenchies call anénormité.Hey, Petey! I didn't know you hanged around here too.Circeus 05:25, 27 September 2007 (UTC)[reply]

Indeed, that is the approach we've always taken. Actually, that's not entirely true; previously, disputed items weredeleted. The tags and usage notes are an essential part of making Wiktionary live up to "all words in all languages" - without them, we'd be back to mass-deletions. --Connel MacKenzie 20:15, 15 October 2007 (UTC)[reply]

In other words, we must have a neutral point of view on our subject (language).Lmaltier 05:38, 27 September 2007 (UTC)[reply]

Actually, that an excellent way to put it. *Snickers* I should have thought about it myself.Circeus 05:49, 27 September 2007 (UTC)[reply]

There is a policy under discussion :Wiktionary:Neutral point of view. For those interested, there is also an adaptation of the general Wikimedia policy to the French wiktionary (an adaptation was needed, because this policy was written with Wikipedia in mind, but principles are sound) :fr:Wiktionnaire:Neutralité de point de vue.Lmaltier 20:46, 27 September 2007 (UTC)[reply]

Thanks for linking tofr:Wiktionnaire:Neutralité de point de vue; it's an interesting read, and IMHO significantly better thanWiktionary:Neutral point of view. That said, neither one addresses my biggest question about applying NPOV here, which is how we ought to handle cases where different POVs imply different structures for an article. If different sources disagree about whether two uses of a word are etymologically related, how do we handle that? Similarly, if different sources disagree about the appropriate POS for a word, how do we handle that? We don't have any mechanism for ambiguously structured entries. (Perhaps both Wiktionnaire's document and ours avoid this question because no one has a satisfactory answer yet. ;-) —Ruakh_TALK22:17, 27 September 2007 (UTC)[reply]

In the first case, there seems to be a need for separate etymology sections. Separating what could have been grouped cannot be wrong, if there is a reason to do so. In the second case, why not including both, again with appropriate comments? If the divergence is only about the name, the most usual name should be chosen.Lmaltier 16:46, 28 September 2007 (UTC)[reply]

Traditionally, disputed senses have gone to WT:RFV (I see no major flaws with continuing that.) Questions about etymology or references have typically gone to WT:TR. (That too, still seems tenable.) For multiple etymology issues, it really hasn't been a particular point of contention yet. As User:Lmaltier indicates, splitting into multiple etymologies when justified (despite how distasteful it sometimes may be) has been the general tactic used to resolve disputes. I agree it seems likely tobecome an issue as Wiktionary grows. It might be wise to advocate public domain references over copyright-protected sources, in general. --Connel MacKenzie 20:15, 15 October 2007 (UTC)[reply]

Rendering Afroasiatic scripts

I know that I can't read Hebrew well with my currentw:Firefox setup, because all thenequdot are displayed separately from their letters. This really hurtsdagesh usage. This may be why I can't read the following properly. I want to know how to get a better view like I had inw:MSIE.

I just noticed that in bothAllah andHezbollah, I see each letter displayed separately but normally when I type الله (Allah) in Arabic Word Processors, the single Unicode glyph U+FDF2, ﷲ, is diplayed; even inbismi l-lāh. Now, has my browser got it wrong, my understanding got it wrong, or have the writers got it wrong?

— Thisunsigned comment was added byThecurran (talk •contribs) at19:01, 26 September 2007 (UTC).[reply]

In bothAllah andHezbollah the glyphs are displayed. However, they are a little bit modified here due to the Arabic fonts used by the{{Arab}} template. --Dijan 20:19, 26 September 2007 (UTC)[reply]

As for the Hebrew, it turned out that both Windows itself and the standard Windows fonts up to Windows 2000 all had a wrong implementation of Hebrew script. They depended on the opposite order of dagesh and nikud as was specified in the Unicode standard. When the MediaWiki software implemented Unicode normalization this caused all the Hebrew entered by users to work with broken Windows to be fixed but the side effect was that everybody still using a broken version of Windows now saw broken Hebrew text. Windows XP has fixed rendering software and fixed Hebrew fonts. As a bonus it also renders old Hebrew text that was not normalized correctly. No upgrade was made available for older versions of Windows.

The developers have discussed adding an option to reverse this part of Unicode normalization for users without Windows XP but with so much other stuff to do it's still waiting. —Hippietrail 23:13, 26 September 2007 (UTC)[reply]

I use Firefox and XP. Under "Control Panel" > "Date, Time, Language, and Regional Options" > "Regional and Language Options" > "Languages" > "Supplemental language support", if I have "Install files for complex script and right-to-left languages (including Thai)" unchecked, then I get what you describe, with nikud coming after the letter, as though there were a non-breaking space there. If I have it checked, then it works properly. I make no promises for you, but when I pointed this out inthe relevant Bugzilla entry, other users commented that it solved the problem for them as well; you should try it. (Fair warning, though: you may, depending on your setup, need your XP installation CD in order to install these files. Also, the bug has been fixed in the trunk, so if you're keeping up with Firefox updates it shouldn't be terribly long before the problem fixes itself for you anyway.) —Ruakh_TALK01:46, 27 September 2007 (UTC)[reply]

Quoting quotations.

Sometimes, as in all three quotations atwho shot John, we obtain a quotation from a secondary source. In some cases, including the aforementioned, the secondary source has the entire quotation, in which case all is well; we identify the actual date and source of the quotation, add "quoted in", and identify where we got the quotation from. But in other cases, such as atNecronomicon, the secondary source has only part of the quotation, so the above approach is unsatisfactory: ideally, we'd like to include some of the context from the secondary source, and obviously we can't attribute that content to the original source. Nonetheless, the word itself was used by the original source, and the secondary source is only quoting it. I don't see a good way to handle this; does anyone have any thoughts? —Ruakh_TALK19:27, 27 September 2007 (UTC)[reply]

For RFV purposes, you are suggesting it is OK to reference the quotation in another secondary source? I think that is a bad idea. But for inclusion in the actual entry, there is no way to justify reusing the same quotation, as one found in another secondary source. --Connel MacKenzie 21:14, 27 September 2007 (UTC)[reply]

Wait, did you visit the articles I linked to? It's not like I'm talking about taking quotations from other dictionaries; I really don't see what the problem is. (In particular, the phrase "reusing the same quotation" doesn't seem to apply.) —Ruakh_TALK22:06, 27 September 2007 (UTC)[reply]

No, I hadn't. It was not at all clear you were talking aboutspoken quotations that have been transcribed and printed in a regular book. Generally, "secondary sources" can mean that, but is far enough outside the normal manner we use that term here on Wiktionary, that I was misled. It is also weird that you'd choose troublesome quotations, whenso many better ones are easily available. FWIW, when I asked in April, the definition given didn't make sense. Being well before the current atmosphere on RFV, such questions used to be dealt with less formally...i.e. just the link to Google above, plus the rewrite, would probably have been sufficient. Anyhow, thank you for the citations. I see no need to encourage the "quoted in" style variant you used there. It would not be wise to prohibit it, though. --Connel MacKenzie 02:02, 28 September 2007 (UTC)[reply]

Usually I give priority to the newest and oldest relevant and vaguely-useful hits on b.g.c., and if they're not terribly great cites, then the third cite I add will be the clearest and most useful I can find. In some cases, however, I include one cite because I think it's the oldest, and then I find an older one (or I simply realize that my first wasn't as old as I thought — b.g.c. misdated it — but I've already typed it up and see no reason to throw it out), such that I end up choosing all three cites for their date or apparent date rather than for their relative usefulness. If you'd like to add additional, better cites, then by all means please do so. In particular,who shot John seems to have quite a range of meanings, and accordingly it would be nice to have a more representative sample of quotes. —Ruakh_TALK02:31, 28 September 2007 (UTC)[reply]

I think the approach inwho shot John is appropriate: identify the original speaker as closely as possible, and then identify the work in which they are quoted (although FTR I think that listing an "author" for a set of hearing transcripts is a bit misleading). --Visviva 01:34, 28 September 2007 (UTC)[reply]

Mytwo cents: It is certainly true that we should prefer the original source of a quotein most cases. I would rather cite a quote attributed to Shakespeare directly from the play or sonnet in question rather than through a secondary source. However, there are times when citing the secondary source is not only legitimate, but preferred. In cases where someone is quoting an oration, we cannot always rely on having an accurately spelled and punctuated original. The secondary source, while quoting someone, is primary in the sense that it presents a printed version. Newspaper articles are a case in point, where a journalist reports what someone said. It is certainly possible that what the person is alleged to have said was misquoted, but as a dictionary organized by spellings of words, we are interested in print citations. In sum, there are times when it is perfectly appropriate or even desirable to cite a quotation through a secondary source. As Ruakh has noted, sometimes the secondary source provides additional context not present in the original, and can result in a new shade or meaning. Such quotations are not rendered invalid by virtue of being secondary. --EncycloPetey 04:55, 28 September 2007 (UTC)[reply]

I hate having to cite these because I don't know any other approach to doing them.DAVilla 14:45, 5 October 2007 (UTC)[reply]

Place names: toward a functional compromise?

OK, so I'm sorting out the March 2007 RFDs, and I come to this pocket of place name articles. These are reasonably well-formatted entries which have gotten a lot of attention from various editors; it would be a shame to delete them outright. On the other hand, the current wording ofWT:CFI unambiguously bars the vast majority of place names, and the only acknowledged exceptions to that wording involve "too-prominent-to-exclude" cases likeFrance. Support for loosening these criteria is far from unanimous, and no actual revision to the CFI is currently in prospect. To complicate the situation,Appendix:Place names is quixotically structured to simply link to entries in the mainspace, meaning that it will always be either perversely incomplete or perversely filled with redlinks to entries that can never be created.

For today, I've been moving the entries in question toAppendix:Gazetteer, because of the structural incompatibility, but I think there is a better solution: RestructureAppendix:Place names and sub-appendices to point to subpages ofAppendix:Place names as a matter of course. When an otherwise adequate placename entry is found to fail CFI, move it toAppendix:Place names/Foo and link appropriately. (So for example the entry forAbakan would be atAppendix:Place names/Abakan and linked fromAppendix:Place names in Russia.) For placenames which currentlymeet CFI, create Appendix:Place names/Foo as a redirect. (That way, editors can be sure thatif a place name has an entry somewhere, it can be reached through the appendix).

Basically, I'm not proposing any changes in what we currently exclude and include, just grasping for a solution that all parties can live with. --Visviva 06:36, 28 September 2007 (UTC)[reply]

No objection from me. I've said before that appendices are a fine place for placenames[3], andothers have argued for that as well. We can make them searchable, and prominently link them, and so on. The problem with that solution at the moment though, is that while it is already perfectly within policy and acceptable, the ambiguity caused by people that support no restrictions on placenames, or something similar, means that such within-policy actions like moving a placename to an appendix is bound to be controversial; as RfD nominations simply following our CFI have been in the past. Of course, I might be wrong, and if so, I'll be the first to help with the appendicizing of appropriate articles.Dmcdevit·t 07:08, 28 September 2007 (UTC)[reply]

I agree 110% withDmcdevit's comment. I donot, however, agree with the proposal that "For placenames which currentlymeet CFI, [we] create Appendix:Place names/Foo as a redirect"; for Appendix:Place names/Foo to be useful, it would need to be able to include non-CFI-meeting senses of CFI-meeting placenames. —Ruakh_TALK12:17, 28 September 2007 (UTC)[reply]

That's a good point... I guess a soft redirect of some kind would be the best solution (not sure exactly how to format it, though). --Visviva 12:53, 28 September 2007 (UTC)[reply]

Sure. I think all that was meant by that comment was that we wouldn't omit CFI-meeting placenames from the Appendix articles, since then it might lead to them actually becoming less visible due to inconsistency in finding them. If redirects are conflicting with the non-CFI placenames, then we'd simply replace the redirect int he Appendix namespace with an article with all placenames. But if there are only CFI-meeting placenames, then the redirect might be the simple solution, or we could duplicate the content. But that's probably not the most crucial part of the proposal.Dmcdevit·t 12:57, 28 September 2007 (UTC)[reply]

Direction on form-of templates

Could those of you who have been testing different styles of form-of templates comment on them? Have we decided we want to prefix them with a language code, such as{{fr-conjugation}}, or is passing a language parameter like{{plural of|lang=fr}} still on the table?DAVilla 18:34, 29 September 2007 (UTC)[reply]

I'm still convinced that we need language-specific templates for most purposes. —Ruakh_TALK19:09, 29 September 2007 (UTC)[reply]

To clarify: In the specific case of plural nouns, I think a non-language specific{{plural of}} is probably the way to go. Indeed, we can have a more general nominal-inflection template that should work for many languages' noun and adjective needs, taking a mandatory argument for the lemma and a mandatory lang(uage code), plus optional arguments g(ender — m/f/n/c/____), n(umber — s/p/dual/trial/___), andc(ase), and perhapsvalue (positive/comparative/superlative). Even here, some languages will need their own templates (such as Celtic languages, with their initial consonant mutations, and Semitic languages, with their noun states). But for verbs, prepositions, and so on, I don't think we can we can find an approach that will work for many different languages. I'm not opposed to trying, though. —Ruakh_TALK19:57, 29 September 2007 (UTC)[reply]

I could be happy either way, though I prefer a more generic template that passes a language parameter, so that we don't have to keep creating new language-prefixed templates; I rather have fixed base templates and fixed lang parameters than lots of individual language-specific templates. --EncycloPetey 19:11, 29 September 2007 (UTC)[reply]

I've been thinking about how to do form-of for Japanese, and I've come to the conclusion that it would be difficult to fit Japanese morphology into the English/Indo-European mold. For instance, depending on how it's done you either have "貸して" the joining-with-other-verbs form of貸す, or "貸し" the particular-stem-for-certain-endings form of 貸す. (There are particular names for these, which I can never keep straight.) It would be nice to have a unified template, but I don't think it's feasible.Cynewulf 20:25, 29 September 2007 (UTC)[reply]

I see no problem there.If both deserve an entry (I have no clue), then templates should be created for both these particular terms. This might be particular to Japanese, but that is irrelevant.H.(talk)17:19, 1 October 2007 (UTC)[reply]

Err.. oh, you mean ifboth should have an entry. Of course Japanese conjunctives, past tenses, etc. deserve an entry just as much as English plurals. The reason I mention two forms is that the English language books I've read describe "kashite" as a verb-conjunctive form, but apparently in Japanese schools the verb form is "kashi" only, and the "te" is something unrelated that just happens to appear in all cases when joining with verbs. The question here is whether we should create language-specific templates{{ja-te form of verb}} or general ones "conjunctive form of". I'm showing that in the latter case, I'm not aware of any other language that would use the "general" template. Similar languages can share templates, but Japanese isn't very similar.Cynewulf 17:06, 2 October 2007 (UTC)[reply]

What would you see as going on the definition line, specifically? The choice is between{{ja-conjugation|言葉|form=te}} or some abbreviation of that on the one hand, and on the other something like{{te of|言葉|lang=ja|pos=v}} or whatever would be correct in this case,{{conj of|...|言葉|lang=ja}} if you prefer that (and if it is specific enough). I consider form-of templates as specific as{{ja-te form of verb|言葉}} to be a nightmare, and if you doubt that take a look at all the ones for Finnish.DAVilla 20:29, 2 October 2007 (UTC)[reply]

Hmm, yeah, having to update ten thousand templates would get old after the forty-seventh. Unifying all Japanese forms in ja-conjugation sounds nice. I don't think doing "{{te of}}" would be a good idea -- if like that, then pick an English name for it. I don't really have the perfect answers here. My main goals are maintainability for contributors and describing the form correctly and precisely to users. One thing that bothers me is using the general{{past of}} to describe past tenses, but using something like "ja-te form" for -te forms. So, I guess it's either find names for all the weird forms and create things like "conjuctive form of", or go with ja-conj (or ja-form, or ja-formdef, or some nice name) for all of Japanese. (For reference, you can get an idea of the forms by looking at話す for verbs and白い for adjectives) My personal feeling here is that we can make general-purpose templates like{{plural of}} that get reused over most languages, but significantly different languages would get their own single-language all-forms template.Cynewulf 21:03, 2 October 2007 (UTC)[reply]

Well, you asked what I envision going on the definition line -- I hacked together{{ja-form of}} and put it on言った as a prototype. SeeAppendix:Japanese verb inflections to get an idea of how different Japanese is. There are also similar inflections for i-adjectives, and very irregular ones for the copulaだ/です, and other irregular things likeます. Now could everybody please tell me what's wrong with this implementation? Don't worry, I won't start using it everywhere right away.Cynewulf 17:52, 7 October 2007 (UTC)[reply]

I like the way you have laid it out inWiktionary:Form-of templates. Often enough, related languages have same categories. In that case it is not very useful to have different templates for them, a lang parameter suffices (see e.g.marche for French and Spanish). And if we have a rigid naming scheme as suggested there, there should be no problem. It might be necessary to decide on an order in the possible terms. I.e. 1) person 2) number 3) tense 4) mood 5) whatever else for verbs, but maybe for nouns other order etc.

Of course, it can happen that some template is exclusively used in one language, but that doesn’t necessitate us to prefix it with the language code.H.(talk)17:19, 1 October 2007 (UTC)[reply]

Quotation template

SeeUser talk:Connel MacKenzie#Format question andUser talk:Doremítzwr#Quotation template. Conversation onWiktionary talk:Quotations#Quotation template please. :) Best regards 22:04, 29 September 2007 (UTC)[reply]

You've provided links to several places, but I'm still not sure what your question or comment is. --EncycloPetey 22:08, 29 September 2007 (UTC)[reply]

Now my question is, could you have a look at different sizes and colors of the quotation markshere please. :) Please comment them. Technicalities have been solved. Best regards 20:31, 7 October 2007 (UTC)[reply]

They are blue at the moment, but it's still possible to compare the two colors on atest page. Best regards 12:22, 8 October 2007 (UTC)[reply]

Transliterating Greek vowels

I have posted a couple of questions atWiktionary talk:About Greek/Transliteration about how to transliterate Greek vowels. In summary, how shouldώ be transliterated? Some options, ranging from easiest to type to least ambiguous, areo,ó,ō, andṓ.Rod(A. Smith)19:35, 30 September 2007 (UTC)[reply]

I would say clarity beats ease of typing, so ṓ would get my vote as the standard. However, I've always preferred to use ẃ in my own notes. --Visviva 12:03, 1 October 2007 (UTC)[reply]

As a very minor issue, can the acute accent be made more central on the unitalicised characters? (–Or is this an irrelevant concern, being as all transcriptions will be italicised anyway?)† Raifʻhār Doremítzwr 12:24, 1 October 2007 (UTC)[reply]

Unicode defines the codepoint for ṓ, which it calls "Small letter o with macron and acute". Your browser, operating system, and installed fonts then determine how to render that character. In my environment, it looks pretty much like I'd expect it. In your environment, does the mark appear farther to the left or farther to the right than you'd prefer?Rod(A. Smith)16:19, 1 October 2007 (UTC)[reply]

Much further to the left. Imagine if the left-hand sides of both diacritics were connected by a hinge, then you get an idea of how it looks at my end.† Raifʻhār Doremítzwr 20:49, 1 October 2007 (UTC)[reply]

Asian classifiers and measure words

Hi A-cai. I recently came home from Vietnam and I've been reading up a bit on the Vietnamese language and brought some of it here. I've noticed that the headings and categories etc for the various Asian languages using classifiers or measure words are not consistent. I'm on IRC #wiktionary if you are available to chat. —Hippietrail 11:52, 23 September 2007 (UTC)[reply]

I'm not sure that it is an issue that we have tackled in as a group. A while ago, I added a mw variable toTemplate:cmn-noun. You can check out the "What links here" from the template page, to see some examples of where I've used it in the past. Of course, I'm open to suggestions for modifications if you have an idea for how to make improvements. --A-cai 12:00, 23 September 2007 (UTC)[reply]

Yes I noticed the measure word on some han nouns and thought it was a great idea. Sadly I don't know enough about Vietnamese to know which classifier to put with many nouns though.

What I was thinking about wasCategory:Classifiers which now contains some of these terms for Vietnamese and Thai and space for Khmer ones as well. But there are no corresponding categories for Chinese, Japanese, or Korean. Is it that the latter three languages use "measure words" and that those are not the same as "classifiers"? Or should we choose one term to use for POS headings and categories for all languages? If not it would be a good idea to set upCategory:Measure words for those three languages to match the classifier categories. —Hippietrail 12:10, 23 September 2007 (UTC)[reply]

I was just taking a look at the Wikipedia articles (classifier andmeasure word). It seems as though ameasure word is one type ofclassifier. BTW,自行车 is an example ofTemplate:cmn-noun with the mw variable. I'll have to read the two wikipedia articles more thoroughly. For now, I can say that the Mandarin term which describes the mw variable in自行车 (辆) is called量词 (lit. "measure word") in Mandarin. --A-cai 12:21, 23 September 2007 (UTC)[reply]

In Japanese we've been usingCounter. They are also sometimes "count words". (And in English, "singulatives", like head, e.g. of cattle. ;-) We should settle on something in common. As you note, "classifier" is broader than "counter", so we may need both.Robert Ullmann 16:32, 24 September 2007 (UTC)[reply]

Based on how things are organized in Wikipedia, you were correct to use counter for Japanese (see:Japanese counter word). The word in Japanese is 助数詞, which literally meanshelp with counting word. However, the Mandarin term is 量词, which meansmeasure word (henceChinese measure word). It seems as though the termcounter word is a synonym formeasure word, but I'm not sure if there are any subtle differences between the two (I can't think of any off top of my head). One difference (fromJapanese counter word):

The problem is partially solved for the numbers from one to ten by using thetraditional numbers (see below) which can be used to quantify some nouns by themselves. For example, "four apples" isringo yonko (リンゴ四個) whereko (個) is the counter, but can also be expressed using the traditional numeral four asringo yottsu (リンゴ四つ). These traditional numerals cannot be used to count all nouns however; some, including people and animals, require the proper counter.

In Mandarin, you have to use a measure word even for numbers below ten. --A-cai 22:51, 24 September 2007 (UTC)[reply]

So have we come to a decision? Should I createCategory:Japanese Counter words orCategory:Classifiers? And what for Chinese and Korean? Along with the categories we need a POS section for each such term in each of these languages. Or should I bring the discussion to the Beer parlour now? —Hippietrail 13:00, 30 September 2007 (UTC)[reply]

It sounds like we could useClassifier as the overarching POS section header, then place an in-line parenthetical clarification at the head of definitions, just as we use (interrogative) and (personal) under the header ofPronoun. Does that work? --EncycloPetey 15:10, 30 September 2007 (UTC)[reply]

Sounds reasonable to me. However, I agree with Hippietrail that we should probably raise the issue at Beer Parlour, so that others can weigh in. --A-cai 22:02, 30 September 2007 (UTC)[reply]

Would one of you Asian-language experts like to bring the topic to the Beer parlour please? I think you'd do a better job than me. —Hippietrail 09:25, 1 October 2007 (UTC)[reply]

In Korean, these have traditionally been considered a form of noun (specifically "dependent" or "bound" nouns, 의존명사), rather than a distinct part of speech. But I assume this is not the case for all other languages, so a globalClassifier heading seems reasonable. --Visviva 11:55, 1 October 2007 (UTC)[reply]

Unifying the header as "classifier" or anything else is fine with me. "yottsu" and the other native Japanese numerals are just a weird exception; they act as counters without being counters themselves.Cynewulf 16:52, 2 October 2007 (UTC)[reply]

It sounds like we've reached a consensus so I wonder if some of you would like to create an entry or POS section for one or two each of Chinese, Japanese, and Korean classifiers and post the links to them here. I'll set up some blank categories. Please look at some words in the existing Thai and Vietnamese classifier categories for some existing examples. Please also feel free to comment on anything necessary to unify formats that will work across all languages.

Does anybody know if Burmese and Tibetan also have such word classes, or any other major languages from this part of the world. What about Tagalog or Indonesian or Balinese?

As for "yottsu" I think Korean and Vietnamese also have alternate Sino words for some numbers that are only used in certain situations. But I could be wrong (-: —Hippietrail 02:33, 3 October 2007 (UTC)[reply]

I would say we need a headingClassifier for Thai etc; but for Japanese etc it should be justCounter, which we already use, and we already haveCategory:Japanese counters, these are not the same thing. (English also has a singulative, but we treat it differently.) The Chinese languages should also use "Counter" (not "measure word" which is Chinese-English translationese, like "number one Chinese food" ;-).Robert Ullmann 02:43, 3 October 2007 (UTC)[reply]

With respect to the Englishmeasure word, it is unclear whether it came from Chinese or the Chinese term came from it. It could be the later, since the Wikipedia article formeasure word seems fairly detailed, and touches on languages besides Chinese such as Russian and Bengali. --A-cai 11:40, 3 October 2007 (UTC)[reply]

I did a lot of Googling last night and it seems that in the field of linguistics "classifier" is the general term used to cover these types of words in all the mentioned languages. There are some great reference books.Classifiers : a typology of noun categorization devices by Alexandra Y. Aikhenvald andThe world atlas of language structures both stand out and portions can be read on Google Books. Now it should be pretty well known by now here on Wiktionary that the POS categories used in linguistics and those used by dictionaries are not the same. My personal view would be to follow the established terminology as used by each language that has a dictionary tradition. How do big bilingual English dictionaries of Chinese, Japanese, and Korean treat these words and what terms do they use? On Wiktionary we seem to stick to dictionary-style POS in headings in articles but we have many categories with much more of a linguistics perspective. This generally adds detail and is a good thing. Given this, it seems that even if we choose to go with diverse terms as POS headings we might still go with a single term or a hierarchy of terms in our category structure. —Hippietrail 21:04, 3 October 2007 (UTC)[reply]

I checked several Chinese/English dictionaries. The term classifier was used as one of the translations for量词 in several of them→ISBN,→ISBN. These dictionaries also included other translations such asmeasure word andnumerary adjunct. Unfortunately, most Chinese/English dictionaries use Chinese characters to indicate part of speech/category etc. For example, in The Pinyin Chinese-English Dictionary:

本 běn ... ⑩ <量> ［用于书籍、簿册等］: 两~书 two books/这部电影有十二~。This is a twelve-reel film.

<量> indicates that本 is a 量词 (classifier; measure word). The stuff in the brackets (［用于书籍、簿册等］) is what we would call Usage notes here at Wiktionary. In English: "used for books, periodicals etc." The other two after the colon are example sentences. --A-cai 22:33, 3 October 2007 (UTC)[reply]