Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Talk:String (computer science)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is thetalk page for discussing improvements to theString (computer science) article.
This isnot a forum for general discussion of the article's subject.
Find sources: Google (books ·news ·scholar ·free images ·WP refs·FENS ·JSTOR ·TWL
This article is ratedStart-class on Wikipedia'scontent assessment scale.
It is of interest to the followingWikiProjects:
WikiProject iconComputer scienceHigh‑importance
WikiProject iconThis article is within the scope ofWikiProject Computer science, a collaborative effort to improve the coverage ofComputer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can jointhe discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science
HighThis article has been rated asHigh-importance on theproject's importance scale.
Things you can helpWikiProject Computer science with:

String Buffer was nominated fordeletion.The discussion was closed on04 June 2013 with a consensus tomerge. Its contents weremerged intoString (computer science). The original page is now a redirect to this page. For the contribution history and old versions of the redirected article, please seeits history; for its talk page, seehere.


Other related topics

[edit]

Anyone want to tackle - string (small version of rope), string (general chain of various things), string (music), etc. - not just the computer version.

Also,string theory of cosmology/physics.

Computing theory

[edit]

The first paragraph seems too "busy" to me. What about replacing it with something like this?

Astring (orstring of characters) is a data type used in mostprogramming languages to represent text, and is the focus of this article.
The computing termstring is also used in a broader sense to group a sequence of entities; for example, tokens in a language grammar, or a sequence of states in automata. See the theory ofcomputation.

This is a lot better. Also, I think the usage in computing theory could be expanded in its own paragraph: one starts with a finite alphabet, then considers all finite sequences consisting of letters from that alphabet (including the empty string) and defines concatentation of strings. The set of string with concatentation is then amonoid.


I think I wrote most the current paragraph and I agree your rewrite is better. Just do it! --drj


Ok, I'll move my text to the main article. I won't try to expand the second paragraph; I'm inclined to leave that to thecomputation article, or to whoever can concisely expand it without detracting from the rest of the page. --loh


I won't try to expand the second paragraph; I'm inclined to leave that to thecomputation article, or to whoever can concisely expand it without detracting from the rest of the page. --Hornlo

Other meaninngs

[edit]

I think something definitely needs to be added about the other meanings of string.Ukulele is already linked to this page, which is somewhat confusing. Although, I'm not sure how much content can actually be provided for the other meanings. Maybe this article should be moved toString (computer science) or something, and this page be turned into a disambiguation page.B4hand

I propose renaming this article toCharacter string (computer science). Comments? -Bevo 17:03, 13 Jun 2004 (UTC)

I oppose. A string doesn't need to be literal string in general. --Taku 18:44, Sep 25, 2004 (UTC)

Lexicographical order

[edit]

Thelexicographical order on Σ* isnot awell-ordering (for example, what is the least element ina*b?), but only a total order.fudo13:53, 2 August 2005 (UTC)[reply]

The least element of a*b is b. Thelexicographical order on Σ* is indeed awell-ordering. --Pexatus00:22, 19 March 2006 (UTC)[reply]
No. There is no least element of a*b if you use the alphabetical orderinga<b{\displaystyle a<b}. Assume there is some least elementm; this means thatm=akb{\displaystyle m=a^{k}b} for some non-negative integerk. Note thatn=ak+1b{\displaystyle n=a^{k+1}b} is also in a*b.n agreesm ink positions, but thek + 1th position ofn is less than thek + 1th position ofm, son<m{\displaystyle n<m}, which contradicts the assumption that there exists a least element of a*b. Therefore, there is no least element of a*b. --Bkkbrad16:28, 13 September 2006 (UTC)[reply]

strings, not characters

[edit]

There are lots of references to this page throughout wikipedia for a "string" that is not a set of characters; a string of bits or bytes, for example. I think it is important that the article be cleaned-up to make it clear that "character strings" are the most common uses of the type, but the term might apply to vectors of data not representable by a string in a particular language. I've made a couple of edits in this direction, but I think some more effort needs to go into the issue. --Mikeblas22:39, 29 January 2006 (UTC)[reply]

Just dont forget that avector has a fixed length,per definition (as an element in an N-dimensional space), while strings often has (chronologically) variable length.83.255.35.89 (talk)11:23, 4 March 2011 (UTC)[reply]
But (unfortunately?) the word "vector" in computer science is most often used to refer to a variable-sized storage (seestd::vector), while the word "array" is used to refer to the fixed-sized objects you are talking about. (In linear algebra libraries the word "vector" is used for fixed-sized objects, but these also demonstrate unusual (for programming) properties such as addition doing a component-wise action rather than an append operation).Spitzak (talk)17:09, 4 March 2011 (UTC)[reply]
Yes, I know, it's a disaster. The designers of the C++ STL must have been rather ignorant, or why would you create that kind of mess otherwise, with concepts turned upside down? They could have called itdynamic array,string oftype, or whatever; the well defined termvector was really the worst possible choice and the worst kind of hijacking of words. If they actually knew what they did, I guess they must have been inspired by the arrogant "C-syntax" (B→C→C++→Java etc), which, when spread to the world of webb languages some 15 years ago, caused the equality symbol to suddenly lose its meaning in large circles, a symbol that has been established in both mathematics and everyday use since hundreds of years. Too many young (or uneducated) people are now using == and != instead of = and ≠ in any everyday context (mathematics next...?) and they would interpret the equationa=b as a definition/mutation/initializing ofa...
I can't see why Wikipedia shouldn't do what it can to clarify backgrounds like this. I belive it is crucical to illustrate "misunderstandings" and unessescary discrepancy in terminology among branches of science, so more people can see that there are other conventions than the most vulgar ones that one may want to adhere to. It does not conflict with the goal of describing actual usage and terminology or with "following the sources", as there are many kinds of sources, and plenty of room for elaborations on Wikipedia.
(As a side note, while "addition" may mean several things (as you wrote), only algebraic superposition should use the + sign really; concatenation may use &, &&, ::, |, concat, or whatnot, at least in my world ;) Regards83.255.32.149 (talk)04:37, 5 March 2011 (UTC)[reply]
I agree, but you probably ought not to take it too far. "Character string" is usually implied, and the article shouldn't give the impression that "string" on its own is incorrect. For the most part, I think your edits were fine, although perhaps you could revert the edits to the string oriented languages section. More effort doesn't need to go into the issue, as that would just confuse the article. --StuartBrady22:59, 29 January 2006 (UTC)[reply]
I was confused by this too. "String" always refers to a string of characters. Vectors of other things are lists, arrays, vectors, ... I'm curious as to what language it is where the word "string" is used in reference to lists of objects.Richard W.M. Jones09:04, 1 May 2006 (UTC)[reply]
Yes. And keep in mind that WP article titles generally reflect the most common usage for a given term (unless it needs its own disambiguation page). In computer science, "string" most common refers to a string of characters. Other not-so-common meanings (e.g.,bitstring) can be linked to in the "See also" section. —Loadmaster18:35, 1 May 2007 (UTC)[reply]
My impression is that "string" virtuallyalways means "string of characters". The only counter-example I know is that C++std::string is based on a template and can be made to use any object. However I am almost certain this was doneonly to support bytes and "wchar" (16 bits, often mislabled "Unicode"). If it was not for "wchar" then they probably would not have made it a template. Since wchar is intended to store characters (or UTF-16) then the string is still a "string of characters". I would be interested if anybody has any real examples of usage of a std::basic_string template with any object for any purpose other than storing something that would be considered "characters".Spitzak (talk)17:19, 4 March 2011 (UTC)[reply]

Null and NUL

[edit]

We should decide on just one spelling and capitalization of null. I vote for two L's.24.186.138.18802:03, 2 May 2006 (UTC)[reply]

"Null" generally refers to the "null character" (or a "null pointer"). "NUL" (all caps) is the mnemonic name (ASCII,EBCDIC,Unicode, etc.) of the "null character" code. —Loadmaster18:35, 1 May 2007 (UTC)[reply]

Origin of the Term ?

[edit]

Anyone know the history behind using the term "string" to mean a sequence of characters? I mean its not like a series of characters looks much like a ball of string... I assume it's origins are as a mathematical term, but it would be interesting to know how it came to such common use in computing.

Presumably it comes from the rather obvious expression "a string of characters" (as in "these go some characters stringing by"), equivalent to "a string of pearls" or "a sequence of characters" or other similar phrases. —Loadmaster (talk)16:27, 9 February 2008 (UTC)[reply]
I heard that it originated because in the old days of physical type-setting, the type was held together in groups by literal string (rope). I don't have any references for this, though, so I can't back it up.Showeropera (talk)20:48, 14 December 2017 (UTC)[reply]

Trying to stop misuse of character encodings

[edit]

The edits I keep trying are to stop a whole army of uninformed but well-meaning programmers who think strlen() should parse and count the Unicode code points in a string. This is a totally useless definition and causing no end of grief when systems do this. For some reason otherwise intelligent programmers turn into these complete morons when presented with UTF-8 and this is actually seriously damaging any ability to do internationalization. If anybody can think of a wording that says these answers are different and that the fixed-size one is "better" it would help a lot. The reverted-to wording implies that the number of characters is the more important attribute, which is wrong.Spitzak (talk)04:10, 29 May 2009 (UTC)[reply]

StringBuilder .Net type

[edit]

As well as the standard String type, .Net has a StringBuilder type. I think this is implemented as a linked list, but I'm not too sure. This would be worth adding to the Implementation section.N4m3 (talk)09:35, 28 May 2011 (UTC)[reply]

Only finite strings?

[edit]

I noticed this statement:

Although formal strings can have an arbitrary (butfinite) length, the length of strings in real languages is often constrained to an artificial maximum. [emphasis mine]

I would like a citation on that, what about languages with lazy evaluation like Clojure and Haskell?

cycle"Is this finite? ""Is this finite? Is this finite? Is this finite? Is this finite? ..."letshouting='a':shoutinginputStrshouting"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa..."

BiT (talk)03:00, 8 December 2011 (UTC)[reply]

Reverse string

[edit]

In the "Formal theory" section, it might be useful to mention that:

A strings = ab, composed of zero or more characters (here, 'a' and 'b') of the alphabet, is said to be thereverse of stringt ift = ba. For example, if Σ = {0,1} the string 0011001 is the reverse of 1001100. The empty string and all strings of length 1 are reverses of themselves. A string that is the reverse of itself is also called apalindrome.

The problem is, I don't know for certain whether the terms "reverse string" or "string reversal" are correct or not. FWIW, practically all programming languages/libraries that provide this operation call it "reverse". Does anyone know what the proper term should be? (Inverse?Inversion?Opposite?Transpose?) —Loadmaster (talk)17:05, 9 November 2012 (UTC)[reply]

Reverse would be the correct terminology, but as currently formulated your statement would either not be general enough (only working for two character strings) or, worse, likely be misinterpreted as stating that WORLDHELLO and LOWORLDHEL would be reverses of HELLOWORLD (instead of DLROWOLLEH.) —Ruud23:09, 9 November 2012 (UTC)[reply]
How about this:
A strings = abc, composed of zero or more characters of the alphabet (here, 'a', 'b', and 'c'), is said to be thereverse of stringt ift = cba. For example, if Σ = {0,1} the string 0011001 is the reverse of 1001100. A string that is the reverse of itself is also called apalindrome, which includes the empty string and all strings of length 1.
I don't see the confusion; it states that we're talking about a string ofzero or more characters of the alphabet, and the example of 0011001 should make it additionally clear that we're talking about the ordering of thecharacters of the string, notsubstrings of the string. If you think this is still confusing, we could instead use HELLOWORLD and DLROWOLLEH, but this requires a larger symbol alphabet ({D,E,H,L,O,R,W}), which complicates the description somewhat. —Loadmaster (talk)18:02, 13 November 2012 (UTC)[reply]
The problem is that you're giving a very specific example but state it in such a form that it—at least at first reading—appears to be a very general definition. You're either using too much formal machinery for what is an informal statement or, conversely, make a statement that not precise enough to be a formal definition. This is how one of my formal language textbooks defines a reverse:

Thereverse of a string is obtained by writing the symbols in reverse order; ifw is a string as shown above, then its reversewR is

wR =an...a2a1.
Where the they explained "above" thata,b,c, ... denote elements from the alphabet Σ andu,v,w, ... strings over that alphabet. —Ruud18:45, 13 November 2012 (UTC)[reply]
I went ahead and added a "Reversal" subsection to the article, with (hopefully) simplified language. —Loadmaster (talk)22:25, 15 November 2012 (UTC)[reply]

External links modified

[edit]

Hello fellow Wikipedians,

I have just modified one external link onString (computer science). Please take a moment to reviewmy edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visitthis simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set thechecked parameter below totrue orfailed to let others know (documentation at{{Sourcecheck}}).

This message was posted before February 2018.After February 2018, "External links modified" talk page sections are no longer generated or monitored byInternetArchiveBot. No special action is required regarding these talk page notices, other thanregular verification using the archive tool instructions below. Editorshave permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see theRfC before doing mass systematic removals. This message is updated dynamically through the template{{source check}}(last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them withthis tool.
  • If you found an error with any archives or the URLs themselves, you can fix them withthis tool.

Cheers.—cyberbot IITalk to my owner:Online08:39, 4 April 2016 (UTC)[reply]

Discussion to move "String" to "String (disambiguation)"

[edit]

In order to make way for movingDraft:String to article space to take the place as the primary topic, I've posted a proposal atTalk:String#Requested move 16 January 2017 to move the disambiguation page currently at "String" to "String (disambiguation)". Your input would be helpful to establish a common consensus on whether or not this move, or something else, should be done. I look forward to your thoughts on the matter.The Transhumanist22:50, 16 January 2017 (UTC)[reply]

String length

[edit]

In sectionString datatypes/Representations/Null-terminated the IBM 1401 word-mark terminated string is discussed.

Somewhat similar, "data processing" machines like theIBM 1401 used a specialword mark bit to delimit strings at the left, where the operation would start at the right. This bit had to be clear in all other parts of the string. This meant that, while the IBM 1401 had a seven-bit word, almost no-one ever thought to use this as a feature, and override the assignment of the seventh bit to (for example) handle ASCII codes.

That seventh bit idea could not have been implemented. The wordmark bit is hardware implemented. The MCW (MoveCharactersWordmark) instruction for instance moved variable length fields terminating on the word mark. Numeric or alpha were treated no different. The Honeywell H200 H1200 H3200 and H4200 all had MCW instructions. Arithmetic operations also used wordmark field demarcation. The Honeywell computers had 8 bit memory having 6 data, a word mark and item mark bits.Steamerandy (talk)17:26, 24 April 2017 (UTC)[reply]

It does sound like there are 1 or 2 extra bits per character. Are you saying there was no way for a program to read or write these extra bits? Or that the implementation was somehow different from having extra bits per character (perhaps it was a table of locations with the bit "set" and thus you were restricted to how many times it was turned on). I think it is obvious that instructions designed to use these bits to end strings won't work but that is not an explanation as to why this extra storage was not taken advantage of. It is also surprising that they would in effect reserve 1/4 of their memory for such a limited use, when you consider how incredibly expensive the memory was at that time.Spitzak (talk)17:36, 24 April 2017 (UTC)[reply]

Another length prefixed representation

[edit]

Siemens PLCs use a form of length prefixed string representation with 2 length information bytes (seeSiemens Docs "Working with Strings in S7-SCL"). Maximum reserved memory is 256 bytes with maximum 254 bytes of actual text, where one byte denotes the allocated/reserved range for the string (the maximum count of characters allowed to be represented) and the other byte denotes the actual, currently valid length of the string. Maybe this could be added as length prefixed representation variant? --Ckonnerth (talk)17:18, 15 December 2017 (UTC)[reply]

DNA??

[edit]

Wondering why there's a bio-related image on this article's page, I don't see how it depicts what strings actually are in computer science.— Precedingunsigned comment added by67.165.80.152 (talk)05:19, 30 March 2020 (UTC)[reply]

I've changed the image to diagram a string. Though I haven't figured out how to make it the page image yet.TripleShortOfACycle (talk -contribs) - (she/her/hers)14:19, 31 January 2021 (UTC)[reply]
Now the page thumbnail works! It displays a diagram of a string when links to this page are hovered over.TripleShortOfACycle (talk -contribs) - (she/her/hers)14:30, 31 January 2021 (UTC)[reply]

Distinct, unambiguous symbols

[edit]

As far as I know, it is also required that each string can be uniquely decomposed into its symbols. For example, if the alphabet itself consists of strings (as inFree_monoid#Free_generators_and_rank, or in the lead ofAlphabet (formal languages), with Σ = {"0", "00"}), its symbols are distinct and unambiguous (as are the members of each mathematical set), but nevertheless, a string may be composed in different ways. I guess "unambiguous" is supposed to express the requirement of unique decomposition, but I'm not sure it is precise enough. The decomposition must be unambiguous, rather than just the symbols. -Jochen Burghardt (talk)18:03, 13 May 2024 (UTC)[reply]

Traditionally?

[edit]

WRT "In computer programming, a string is traditionally a sequence of characters..." What does 'traditionally' imply? What does string mean in a non-traditional sense? How is traditionality relevant? IMO it is a sequence of chars (period).Stevebroshar (talk)14:03, 20 December 2024 (UTC)[reply]

String is not a data type

[edit]

WRT "A string is generally considered as a data type"

Can't argue that string is a type of data, but string is not adata type. Maybe that's a subtle difference to some, but there's an important difference. String is a higher level concept than data type as it pertains to programming. Many programming contexts (i.e. languages) have a string data type (or multiple). But there's significant difference between string data and a type for string data.

To illustrate the difference between string data and data type, consider C. It has no string type. The most commonly used data type for string data is char*; pointer to char. That is not a string type, yet it is used for string data. Note that char* can be used for non-string data; a pointer to a single char storage, for example. FWIW, thedata structure is callednull-terminated string or c-string.

What is this article about? Is it about the concept of string in general (string data)? Or about particular data types in particular languages and contexts? I assume the intention is both. But, the two should not be conflated. It should say that a string is sequence of characters and that many languages define a type for string data. It should not say that stringis a data type.

TBO this article provides little value and should be deleted, but I'm sure folks don't like that idea. But, if it's going to exist, it shouldn't misrepresent the world.Stevebroshar (talk)13:19, 10 May 2025 (UTC)[reply]

I think you have a point here, and I tried to fix the lead accordingly. -Jochen Burghardt (talk)16:18, 11 May 2025 (UTC)[reply]
There are other languages than C. In some of them, strings are a native data type. You can make a similar argument about arrays. Citing a WP article to show that strings aren't adata type might carry more weight if that article didn't include themdata type#String and text types.Andy Dingley (talk)16:45, 11 May 2025 (UTC)[reply]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Talk:String_(computer_science)&oldid=1289911742"
Categories:

[8]ページ先頭

©2009-2025 Movatter.jp