Movatterモバイル変換

Home

The Craft of Text Editing

--or--
Emacs for the Modern World

-by-
Craig A. Finseth

This web site contains the full text of the book "The Craft ofText Editing." That book was published in 1991 by Springer-Verlag& Co. By arrangement between the author and the publisher, the bookversion is now out of print and all rights have been returned to theauthor. Note that there may be some slight differences in typographiccorrections between this version and the printed one.

If you wish to cite this work, please use the following URL:

http://www.finseth.com/craft

This book is also available in print form. It has ISBN978-1-4116-8297-9 (10-digit:1-4116-8297-1).It is available from Lulu and from Amazon.com:

Lulu storefront (PDF version available here)
Amazon page

If you should notice typos or formatting problems, please let meknow. I am not, however, planning on revising or updating the bookanytime soon. Typo corrections and minor changes will continue to bemade indefinitely.

Here is an.epub version that ispretty reasonable. It was created using calibre and appears todisplay just fine on the calibre viewer, iBooks, and a Kobo e-reader.Note that it does not pass strict epub verification nor does itdisplay well using Stanza (of course, I haven't found anything thatdisplays well using Stanza...).

Here is a.mobi version, againcreated using calibre.

Here is a.pdf version thatcontains better bookmarking than the one on Lulu.

Here is agzip'd.tar file thatcontains the complete work.

Here is a.gz file that contains aPostScript version ofthe complete work.This version is frozen as of June 2000 and willnot reflect any corrections made after that time. Thanks toFekete Krisztian forthe conversion.

Here is a.gz file that contains a tar file of aLaTeX versionversion of the complete work.This version is frozen as of June2000 and will not reflect any corrections made after that time.Thanks toFeketeKrisztian for the conversion.

Quick Contents:

Preface
Introduction: What Is Text Editing All About?
One: Users
Two: User Interface Hardware
Three: Implementation Languages
Four: Editing Models
Five: File Formats
Six: The Internal Sub-Editor
Seven: Redisplay
Eight: User-Oriented Commands: The Command Loop
Nine: Command Set Design
Ten: Emacs-Type Editors
Epilogue
Appendix A: A Five-Minute Introduction to C
Appendix B: Emacs Implementations
Appendix C: The Emacs Command Set
Appendix D: The TECO Command Set
Appendix E: ASCII Chart
Bibliography
Book Index

Credits

The chapter quotes comprise the verse "Jabberwocky" byLewis Carroll, from the workThrough the Looking Glass.

Trademarks

Annex is a registered trademark of Xylogics.

CP/M is a registered trademark of Digital Research.

DEC, Tops-20, VT52, VT100, VT200 and VAX/VMS are registeredtrademarks of Digital Equipment Corp.

FinalWord and MINCE are registered trademarks of Mark of the Unicorn.

IBM and IBM PC are registered trademarks of IBM Corp.

Apple ][ is a registered trademark of Apple Computer, Inc.

Macintosh is a trademark licensed to Apple Computer, Inc.

MS/DOS is a registered trademark of Microsoft Corp.

TTY is a registered trademark of Teletype Corp.

UNIX is a registered trademark of AT&T

Preface

	Questions to Probe Your Understanding
	Acknowledgements

Introduction: What Is Text Editing All About?

	1 The Basic Get_Line

		1.1 Version One
		1.2 Version Two
		1.3 Version Three
		1.4 Version Four

	2 The Forest
	Questions to Probe Your Understanding

One: Users

	1.1 User Categories

		1.1.1 Amount of Experience
		1.1.2 Type of Experience

	1.2 "Religion"
	1.3 User Goals
	1.4 Physiological Constraints
	1.5 Applying These Physiological Constraints
	1.6 Users Who Have Handicaps
	Questions to Probe Your Understanding

Two: User Interface Hardware

	2.1 Display Types

		2.1.1 TTY and Glass TTY
		2.1.2 Basic Displays
		2.1.3 Advanced Displays
		2.1.4 "Memory Mapped" Displays
		2.1.5 Graphics Displays

	2.2 Keyboards

		2.2.1 Special Function Keys
		2.2.2 Extra Shift Keys
		2.2.3 Key Placement
		2.2.4 Example Keyboards

	2.3 Graphical Input

		2.3.1 Touch Sensitive Display
		2.3.2 Tablet
		2.3.3 Mouse
		2.3.4 Trackball
		2.3.5 Joystick
		2.3.6 A Different Mouse
		2.3.7 Other Devices
		2.3.8 Conclusion

	2.4 Communications Path Issues

		2.4.1 Speed and Character Format
		2.4.2 Flow Control
		2.4.3 Echo Negotiation
		2.4.4 Fancy Modems

	Questions to Probe Your Understanding

Three: Implementation Languages

	3.1 General Considerations

		3.1.1 Availability and Implementation Quality
		3.1.2 Text Handling Power
		3.1.3 Support for Extensibility
		3.1.4 Large Project Support
		3.1.5 Efficiency

	3.2 Specific Language Notes

		3.2.1 TECO
		3.2.2 Lisp
		3.2.3 C
		3.2.4 PL/1
		3.2.5 Other Systems Languages
		3.2.6 Fortran
		3.2.7 Pascal
		3.2.8 Basic
		3.2.9 Ada
		3.2.10 Sine
		3.2.11 Custom Editor Languages

	Questions to Probe Your Understanding

Four: Editing Models

	4.1 One-Dimensional Array of Bytes
	4.2 Two-Dimensional Array of Bytes
	4.3 List of Lines
	4.4 Paged Models
	4.5 Objects
	4.6 Dealing with Real Text
	Questions to Probe Your Understanding

Five: File Formats

	5.1 Text Files

		5.1.1 Line Boundaries
		5.1.2 Line Contents
		5.1.3 End of File

	5.2 Binary Files
	5.3 Structured Files
	5.4 Where to Store the "Extra" Information

		5.4.1 In-Band
		5.4.2 Out-of-Band
		5.4.3 Conclusion

	5.5 The Additional Information

		5.5.1 Fonts, Sizes, Attributes
		5.5.2 Line, Paragraph, Page, and Other Formats
		5.5.3 Non-Text Objects

	5.6 Internationalization
	Questions to Probe Your Understanding

Six: The Internal Sub-Editor

	6.1 Basic Concepts and Definitions
	6.2 Internal Data Structures
	6.3 Procedure Interface Definitions
	6.4 Characteristics of Implementation Methods

		6.4.1 No Management
		6.4.2 Extra Space at the End
		6.4.3 Buffer Gap

			6.4.3.1 Multiple Gaps and Why They Don't Work
			6.4.3.2 The Hidden Second Gap

	6.5 Implementation Method Overview
	6.6 Buffer Gap
	6.7 Linked Line
	6.8 Paged Buffer Gap
	6.9 Other Methods
	6.10 Method Comparisons

		6.10.1 Storage
		6.10.2 Crash Recovery
		6.10.3 Efficiency of Editing
		6.10.4 Efficiency of Buffer/File I/O
		6.10.5 Efficiency of Searching
		6.10.6 Multiple Buffers
		6.10.7 Paged Virtual Memory
		6.10.8 Conclusions

	6.11 Editing Extremely Large Files
	6.12 Difference Files
	Questions to Probe Your Understanding

Seven: Redisplay

	7.1 Constraints
	7.2 Procedure Interface Definitions

		7.2.1 Editor Procedures
		7.2.2 Display Independent Procedures

	7.3 Considerations

		7.3.1 Status Line
		7.3.2 End of the Buffer
		7.3.3 Horizontal Scrolling
		7.3.4 Line Wrap
		7.3.5 Word Wrap
		7.3.6 Tabs
		7.3.7 Control Characters
		7.3.8 Proportionally Spaced Text
		7.3.9 Attributes, Fonts, and Scripts
		7.3.10 Breaking Out Between Lines
		7.3.11 Multiple Windows

	7.4 Redisplay Itself

		7.4.1 The Framer
		7.4.2 The Basic Algorithm
		7.4.3 Sub-Editor Interaction
		7.4.4 The Advanced Algorithm
		7.4.5 Redisplay for Memory-Mapped Displays

	Questions to Probe Your Understanding

Eight: User-Oriented Commands: The Command Loop

	8.1 The Core Loop: Read, Evaluate, Print

		8.1.1 The Evaluate Procedure
		8.1.2 Move by a Character
		8.1.3 Insert a Character
		8.1.4 Second-Level Dispatch
		8.1.5 Accept an Argument
		8.1.6 Philosophy
		8.1.7 A Minimalist Command Set Design

	8.2 Errors

		8.2.1 Internal Errors
		8.2.2 External Errors
		8.2.3 Exiting

	8.3 Arguments

		8.3.1 Numeric (Prefix) Arguments
		8.3.2 String (Suffix) Arguments
		8.3.3 Positional Arguments
		8.3.4 Selection Arguments

	8.4 Rebinding

		8.4.1 Rebinding Keys
		8.4.2 Rebinding Functions

	8.5 Modes

		8.5.1 Modes and Dynamic Rebinding
		8.5.2 Implementing Modes

	8.6 Changing Your Mind

		8.6.1 Command Set Design
		8.6.2 Kill Ring
		8.6.3 Undo
		8.6.4 An Undo Heresy
		8.6.5 Redo

	8.7 Macros

		8.7.1 Again
		8.7.2 Keystroke Recording
		8.7.3 Macro Languages
		8.7.4 Redisplay Interaction

	Questions to Probe Your Understanding

Nine: Command Set Design

	9.1 Responsiveness
	9.2 Consistency
	9.3 Permissiveness
	9.4 Progress
	9.5 Simplicity
	9.6 Uniformity
	9.7 Extensibility
	9.8 Modes
	9.9 Use of Language
	9.10 Guideline Summary

		9.10.1 Overall
		9.10.2 Modes
		9.10.3 Use of Language

	9.11 Structure Editors
	9.12 Programing Assistance
	9.13 Command Behavior

		9.13.1 Does Down Move the Point or the Text?
		9.13.2 Scrolling vs. Paging
		9.13.3 Page Breaks
		9.13.4 How Many Ways Can You Move by a Word?

			9.13.4.1 Moving by Words
			9.13.4.2 Deleting by Words

		9.13.5 Where Do Sentences and Paragraphs End?
		9.13.6 How to Search
		9.13.7 Commands to Handle Typos

			9.13.7.1 Capitalization Commands
			9.13.7.2 Twiddling

	Questions to Probe Your Understanding

Ten: Emacs-Type Editors

	10.1 "What Do You Mean, 'Emacs-type?' "
	10.2 The Command Set
	10.3 The Extended Environment
	10.4 Extensibility
	Questions to Probe Your Understanding

Epilogue

	Questions to Probe Your Understanding

Appendix A: A Five-Minute Introduction to C

	A.1 Case Conventions
	A.2 Data Types and Declarations
	A.3 Constants
	A.4 Pre-defined Constants
	A.5 Procedure Structure
	A.6 Statements
	A.7 Operators
	A.8 Standard Library Functions Used in This Book
	A.9 Non-Standard Library Functions Used in This Book

Appendix B: Emacs Implementations

Appendix C: The Emacs Command Set

	C.1 Notation
	C.2 Default GNU-Emacs Command List

		C.2.1 Base Commands
		C.2.2 Help Commands
		C.2.3 Control-X (^X) Commands
		C.2.4 Control-X 4 Commands
		C.2.5 Meta (^[) Commands

	C.3 The Author's Command Set

Appendix D: The TECO Command Set

	D.1 General notation:
	D.2 Commands
	D.3 E-Commands (most file commands are here)
	D.4 F-Commands
	D.5 Special Q-registers, names are of the form "..x"
	D.6 FS Variables

Appendix E: ASCII Chart

Bibliography

	1 Current
	2 Thesis

		2.1 Emacs-Type Editors

			2.1.1 ITS EMACS
			2.1.2 Lisp Machine Zwei
			2.1.3 Multics Emacs
			2.1.4 MagicSix TVMacs
			2.1.5 Other Emacs-Type Text Editors

		2.2 Non-Emacs Display Editors
		2.3 Structure Editors
		2.4 Other Editors

Book Index

Back to Contents.

Preface

This just in (18 Sep 2008). I was just appraised of thisdocument:

http://history.dcs.ed.ac.uk/archive/apps/Whitfield-Thesis/thesis.html

It is a thesis by C. H Whitfield of the University ofEdinburgh published in 1972. It puts forth many of the ideas thatlater appeared in my thesis. Due to the lack of the Internet at thetime (:-), I was unaware of this thesis when I wrote mine - CraigFinseth.

Just over eleven years ago I was faced with selecting a topic formy thesis. At the time, I was a student at the MassachusettsInstitute of Technology and was working on my bachelor's degree inComputer Science and Engineering. One of the degree requirements wasa thesis, and you can't have a thesis without a topic.

During my four years at M.I.T., a new type of text editor had comeinto being and widespread use. This type of text editor was called"Emacs," and it was a major step forward in many ways.Implementations of this type of editor were appearing on many computersystems. Some people even used an implementation as the basis fortheir thesis. I took a different tack. The idea that I settled onfor my thesis was a description of the technology that underlies alltext editors, but with a special emphasis on Emacs-type editors. Thethesis was written and published as a technical memo (Finseth1980).

* * *

Ten years later, I was reading the USENET News news groupComp.editors, one of the many facets of that worldwide electronicbulletin board. A discussion thread had started up in whichboth sides of the discussion were citing my thesis as theauthority in the field. Further inquiries (not by me: I was justreading along) showed that no one in that group was aware of any otherdocument that described general text-editing technology.

My thesis was ten years old: it predated most personal computersand workstations. There had even been a chapter in an early draftthat attempted to prove that it was not possible to implement anEmacs-type text editor on a small computer. (I invented a way, threwout the chapter, and with some friends started a software company tomarket such an editor. Oh well.) It was clearly time for a completerewrite, and that rewrite is what you are reading now.

If you don't have a copy of my thesis (or theTechnical Memo:the two have identical content), you won't miss anything. This bookhas all of the information from the earlier document, and is nowupdated. It also has a whole lot more. Every part has beencompletely rewritten and expanded, and major sections have beenadded.

As with my thesis, this book is written in an informal, almostchatty style. It is addressed directly to "you," who areassumed to care about how text editors are implemented. Be warned,however, that it also contains opinions about the "right"and "wrong" way of doing things and that these opinions holdthat many of the current directions and trends are -- shall we say? --not the "right" way. You should keep in mind that youshould not accept everything said in here as the gospel truth, butunderstand why I say what I say and make your own informedjudgment.

This book is addressed to anyone who implements large softwaresystems or who wants to know the considerations that go into suchsystems. It focuses around text editors. Although not required, anunderstanding of programming will be helpful.

Questions to Probe Your Understanding

Each chapter ends with a set of questions and problems designed toprobe your understanding of the material that was just presented.And, true to the Socratic method, some of these questions alsointroduce new material. The level of difficulty of the questionsranges from very easy to quite difficult, and each question islabeled to help you gauge how much effort is required. Just as withmost programming issues, most questions have no single correctanswer.

Acknowledgements

I would like to thank those people who helped me in variousways:

Owen "Ted" Anderson
Joe Austin
Jeff Brown
Bernard Greenberg
Brian Hess
Mike Kazar
Richard Kovalcik
Scott Layson
Jason Linhart
David Moon
Robert Nathaniel
Lee Parks
Jeffrey Schiller
Richard Stallman
Seth Steinberg
Peter Steinmetz
Liba Svobodova
Daniel Weinreb

Plus, of course, all of those people that I have left out. Specialthanks to my wife Ann and daughter Kari, who put up with my typingaway all the time.

Craig A. Finseth
St. Paul, Minnesota
February 1991

Back to Contents.

Introduction: What Is Text Editing All About?

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

In its most general form, text editing is the process of takingsome input, changing it, and producing some output. Ideally, thedesired changes would be made immediately and with no effort requiredbeyond the mere thought of the change. Unfortunately, the ideal caseis not yet achievable. We are thus consigned to using tools such ascomputers to effect our desired changes.

Computers have physical limitations. These limitations include thenature of user-interface devices; CPU performance; memory constraints,both physical and virtual; and disk capacity and transfer speed.Computer programs that perform text editing must operate within theselimitations. This book examines those limitations, explores tradeoffsamong them and the algorithms that implement specific tradeoffs, andprovides general guidance to anyone who wants to understand how toimplement a text editor or how to perform editing in general.

I do not present the complete source code to an editor, nor is thesource code available on disk (at least from me: see Appendix B). Forthat matter, you won't even see a completely worked out algorithm.Rather, this book teaches the craft of text editing so that you canunderstand how to construct your own editor.

The first chapters discuss external constraints: human mentalprocesses, file formats, and interface devices. Later chaptersdescribe memory management, redisplay algorithms, and command setstructure in detail. The last chapter explores the Emacs-type ofeditor. The Emacs-type of editor will also be used whenever areference to a specific editor is required.

This range of topics is quite broad, and it is easy to lose sightof the forest with all of those trees. The remainder of thisintroduction will sketch the outlines of the forest by examining aneditor-in-miniature: a get-line-of-input routine. We will start witha basic version of the routine, then make it more elaborate in aseries of steps. By the end, you will see where the complexity of atext editor arises from.

The program examples are written in the ANSI version of the Clanguage. Appendix A provides a brief introduction to the C languageand explains all of the features used in examples.

1 The Basic Get_Line

TheGet_Line routine accepts these inputs:

a prompt string
a buffer to accept the input; this buffer must be at least twocharacters long
an indication of the buffer length

and produces these outputs:

a success/fail status
if the status is "success," the input is stored in thesupplied buffer; the end of the input is marked with the NUL (^@, 0decimal) character
if the status is "fail," the buffer may have beenmodified but will not contain valid input

The editing performed by this routine is on the input buffer. Thisfirst version assumes that you are creating a new item from scratcheach time.

1.1 Version One

FLAG Get_Line(char *prompt, char *buffer, int len){char *cptr = buffer;int key;if (len < 2) return(FALSE);/* safety check */printf("%s: ", prompt);for (;;) {key = KeyGet();if (isprint(key)) {if (cptr - buffer >= len - 1) Beep();else{*cptr++ = key;printf("%c", key);}}else if (key == KEYENTER) {*cptr = NUL;printf("\n");return(TRUE);}elseBeep();}}

Version One accepts input until the user presses the Enter key. Ifa user's input will overflow the input buffer, the input is discardedand the program will sound an error beep. Once the Enter key has beenpressed, the program appends a NUL character to terminate the stringand returns True. Non-printing characters other than Enter also causethe program to sound an error beep. Simple, straightforward, anduseless, as there is no way for the user to correct any typingmistakes.

1.2 Version Two

Here is version Two. It adds editing:

FLAG Get_Line(char *prompt, char *buffer, int len){char *cptr = buffer;int key;if (len < 2) return(FALSE);/* safety check */printf("%s: ", prompt);for (;;) {key = KeyGet();if (isprint(key)) {if (cptr - buffer >= len - 1) Beep();else{*cptr++ = key;printf("%c", key);}}else{switch (key) {case KEYBACK:if (cptr > buffer) {cptr--;printf("\b \b");}break;case KEYENTER:*cptr = NUL;printf("\n");return(TRUE);/*break;*/default:Beep();break;}}}}

Version Two starts developing problems that can no longer be sweptunder the rug.

Version One glossed over exactly what is meant by the Enter key.That's sort of okay. Most keyboards have only one key labeled"Enter" or "Return" or something similar. Italmost always sends a Carriage Return character. The program cancompare against just that character and almost always operate"correctly,"i.e., as the user expects. However,most keyboards have at leasttwo keys for erasing: Back Spaceand Delete. Some people and computer systems use one of these. Otherpeople and computer systems user the other. (We will ignore any extra"erase" or "delete character" keys that you mightfind. For now.) The program can handle this problem in severalways:

accept only one or the other
accept both
if the operating system supports some sort of "terminalparameter configuration," ask the operating system what characterto use
provide a configuration option in your program to let the userset his or her preferred character; the option will most likelydefault to the operating system configuration setting if one isavailable

If you picked the first option, just over half of your users willbe upset with you. The second option is much better: almost all userswill like you, and this part of your program need not be operatingsystem specific at all. (I often select this option when writingsmall programs that should have a minimum of operatingsystem-dependent code.) The third option is a fine solution. Mostusers will like you, and you are building on other work (i.e.,the operating system) instead of reinventing the wheel.

If you picked the fourth option, you have already learned what anEmacs-type editor is about. Implicit in this option is recognizingthat users should be able to control their environment as much aspossible. Yes, it is more work to write such programs and, yes, itsometimes overlaps the existing operating system, but it can be wellworth the effort.

Another problem appears in the statement:

printf("\b \b");

This statement is a crude attempt at erasing a character. As itturns out, there are pretty powerful conventions regarding howprinting characters and newlines are handled by operating systems andoutput devices. These characters all move the cursor to the right orto the start of the next line. However, when you want the cursor toback up in any way or you wish to control it in any other way, you areon your own: there are no industry-wide conventions for specifyingthese operations. And, with no conventions to rely upon, your programhas to implement a method of coping with the range of outputdevices.

1.3 Version Three

Version Three assumes that the input buffer contains some text.This text is used for the response if the user just presses Enter(i.e., the text is the default value):

FLAG Get_Line(char *prompt, char *buffer, int len){char *cptr = buffer;FLAG waskey = FALSE;int key;if (len < 2) return(FALSE);/* safety check */for (;;) {ToStartOfLine();ClearLine();printf("%s: %s", prompt, buffer);key = KeyGet();if (isprint(key)) {if (!waskey) {*buffer = NUL;waskey = TRUE;}if (cptr - buffer >= len - 1) Beep();else{*cptr++ = key;*cptr = NUL;}}else{switch (key) {case KEYBACK:if (!waskey) {*buffer = NUL;waskey = TRUE;}if (cptr > buffer) {--cptr;*cptr = NUL;printf("\b \b");}break;case KEYENTER:printf("\n");return(TRUE);/*break;*/default:Beep();break;}}}}

Version Three returns the supplied response if the user justpresses the Enter key. Otherwise, the supplied response is erasedcompletely the first time a printing key or Back Space is pressed.The only other changes worth noting are that the prompt has been movedto the inside of the loop and a few terminal interface routines havebeen added. The first one moves the "cursor" to thebeginning of the line. The next clears the line.

1.4 Version Four

This version adds a number of features:

commands to move the cursor left and right
insert/replace editing
a command to delete the character to the right of the cursor
commands to move to the beginning and end of the response
a command to clear the response
a command to clear the changes and restore the default
a way to insert arbitrary characters, including commandcharacters, into the response
a cancel key
a redisplay key

This version of the routine also has a slight change to theinterface: the addition of a separate default value parameter.

FLAG Get_Line(char *prompt, char *buffer, int len, char *default){char *cptr = buffer;FLAG isinsert = TRUE;FLAG waskey = TRUE;int key;if (len < 2) return(FALSE);/* safety check */strcpy(buffer, default);for (;;) {ToStartOfLine();ClearLine();printf("%s: %s", prompt, buffer);PositionCursor(strlen(prompt) + 2 + (cptr - buffer));key = KeyGet();if (isprint(key)) {if (!waskey) {cptr = buffer;*cptr = NUL;waskey = TRUE;}if (isinsert) {if (buffer + strlen(buffer) >= len - 1) Beep();else{/* move rest of line and insert */memmove(cptr + 1, cptr, strlen(cptr) + 1);*cptr++ = key;*cptr = NUL;}}else{if (*cptr == NUL) {/* end of input, so append to buffer */if (buffer + strlen(buffer) >= len - 1)Beep();else{*cptr++ = key;*cptr = NUL;}}else *cptr++ = key;/* replace */}}else{switch (key) {case KEYBACK:if (!waskey) {cptr = buffer;*cptr = NUL;waskey = TRUE;}if (cptr > buffer) {xstrcpy(cptr - 1, cptr);cptr--;*cptr = NUL;}break;case KEYDEL:/* delete the following char */if (cptr < buffer + strlen(buffer))xstrcpy(cptr, cptr + 1);elseBeep();break;case KEYENTER:printf("\n");return(TRUE);/*break;*/case KEYLEFT:if (cptr > buffer) cptr--;waskey = TRUE;break;case KEYRIGHT:if (cptr < buffer + strlen(buffer)) cptr++;waskey = TRUE;break;case KEYSTART:/* move to start of response */cptr = buffer;waskey = TRUE;break;case KEYEND:/* move to end of response */cptr = buffer + strlen(buffer);waskey = TRUE;break;case KEYQUOTE:/* insert the next character,even if it is a control char */if (!waskey) {cptr = buffer;*cptr = NUL;waskey = TRUE;}key = KeyGet();if (isinsert) {if (buffer + strlen(buffer) >= len - 1)Beep();else{/* move rest of line and insert */memmove(cptr + 1, cptr,strlen(cptr) + 1);*cptr++ = key;*cptr = NUL;}}else{if (*cptr == NUL) {/* end of input, so append */if (buffer + strlen(buffer) >= len - 1)Beep();else{*cptr++ = key;*cptr = NUL;}}else *cptr++ = key;/* replace */}break;case KEYCLEAR:/* erase response */cptr = buffer;*cptr = NUL;waskey = TRUE;break;case KEYDEFAULT:/* restore default response */strcpy(buffer, default);cptr = buffer;waskey = FALSE;break;case KEYCANCEL:/* abort out of editing */return(FALSE);/*break;*/case KEYREDISPLAY:/* redisplay the prompt and resp */break;case KEYINSERT:/* set insert mode */isinsert = TRUE;break;case KEYREPLACE:/* set replace mode */isinsert = FALSE;break;default:Beep();break;}}}}

Version Four does all that was claimed for it, but not as well asone would like. In particular:

it did not check to ensure that the default response fits within the buffer
there was no way for the user to determine whether the program isin insert or replace mode except by typing a character and finding outwhat happens
it assumes that all characters are the same width when displayed
it did not address the question of what the commands are nor howthe user supposed to remember them all

2 The Forest

The examples presented in this chapter bumped into theseproblems:

What characteristics of the display and keyboard affect text editing?
How should the program cope with presenting output on different displays?
What view of the text should be presented to the user?
How should the text be managed so that large amounts of text couldbe edited efficiently?
How should display updating occur so that editing changes areefficiently presented to the user?
How should the command set be designed? What should the meaningsof the various commands be?
How should the program be designed so that the user can change howit operates?

These and other questions will be addressed in the remainder ofthis book.

Questions to Probe Your Understanding

Modify the latest version ofGet_Line to accept only numericresponses. What sort of error messages should be given? (Easy)

Modify the latest version ofGet_Line to accept onlyresponses from a list that is passed in as a parameter. What sort oferror messages should be given? (Easy)

What are two good formats for such a list (Easy for those familiarwith C, Medium otherwise)

What is the appropriate degree of control (key definitions, enable/ disable features, etc.) that the calling program should have overthe input editing? (Medium)

Back to Contents.

One: Users

"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!

The saying goes: "Business would be great if it weren't forcustomers." Well, programming would be easy if it weren't forusers. In the simple case, there would be exactly one user for yourprogram, - yourself - and you would use it only once. Most programs,however, are used many times by many people. You must take thoseusers into account when designing your program.

This chapter will only review those aspects of users that are mostrelevant to text editing: full discussions of users and design can anddo fill many books in themselves, some of which are listed in theBibliography. This chapter (and this book) does not address thequestion of non-people users.

1.1 User Categories

Each user can be placed in a category. Each category is describedin terms of theamount and thetype of experience. Itis important to understand users: each user creates a model of how anew program works based on his or her experience with other programscombined with the "hints" that your program's user interfacegives to him or her. It is up to you to either match your program'sbehavior to your users' model(s) or to give them enough information sothat they generate a model that is well-matched to your program.

1.1.1 Amount of Experience

The amount of experience that a user has is a point on a continuousscale. All users start with no experience and accumulate experienceas they learn. Although the scale is continuous, I have divided itinto five regions in order to simplify discussion. Also, this list isnot intended as a self-rating scale: most of you who are reading thisbook will be programmers.

Neophyte users barely know what a computer is. They lackunderstanding of such "basic" terms as "file" and"file name" (the concepts behind these terms are actuallyquite sophisticated). This lack of understanding doesnot meanthat they are unintelligent people, only that they have never had areason to learn these concepts. If you are designing a program forthis type of user, you may feel both blessed and cursed. Cursedbecause it can be so difficult, and blessed because this area ofprogram design has such a pressing need for good designs.

Experience from the field of artificial intelligence can shed morelight on this issue. AI researchers found it (comparatively) easy towrite programs that can handle advanced mathematics such as freshmancalculus. However, as the researchers pushed on to handle such easy(to most people) areas as filling in coloring books, the programmingproblems got harder and harder. Some of this difficulty is due to thefact that the task of teaching college-level courses is wellunderstood--especially by college professors--but teaching coloring isnot. For example, how many textbooks have you seen on "how tocolor"? More to the point, computers have been designed toprocess information in a certain way, one that is mathematicallyelegant, but not necessarily related to how people's minds work. Aspeople write programs for more and more "basic" tasks, thisdifference becomes increasingly apparent.

Many programs have been (mis-)designed for neophyte users. Theyoften offer a few simple commands, yet leave intact such difficultconcepts such as that of a "file." They solve the wrongproblem, sort of like travelling to a place where a foreign languageis spoken, and trying to communicate by speaking your native languageslowly and distinctly. As a program designer, you must understand thethought structure of your users, and design programs that match thatstructure. The blessing comes from designing programs that are verydifferent from "conventional" programs and which arewell-matched to their users.

Novice users have used a computer before, perhaps for textediting, word processing, spread sheet, or database applications. Inany event, novice users have some familiarity with the idea of typingthings into a box and seeing a response that somehow reflects theirtyping. They understand how a shift key works, that a lowercaseletter 'l' is not the same as a digit '1', and so forth. They evenhave some understanding of the idea of "context:" that keysdo different things at different times. Users with this amount ofexperience are able to operate almost any program that has a gooddesign and a decent manual.

Basic users are like novice users, only more so. Theyunderstand such programming concepts as thread of control, variables,and statements like "A = A + 1" (in fact, many people callsuch users "programmers"). These users can operate anyprogram, even one with a poor design. Given source code to theprogram they are able to customize and extend it, albeit in what mightbe an awkward fashion.

Power users know one or more application programsthoroughly. They understand not only how to use those programs fully,but can often go beyond the bounds of what the original designersintended. They may write large programs often in the form ofapplication macros, but do notdesign these programs. Theseusers understand the fine points of the programs that they use.

Programmer-level users understand the theory of programming.When writing a large program, they design the program beforeimplementing it. They generalize, applying their experience and theirknowledge of one program to guess how another program willoperate.

1.1.2 Type of Experience

The amount-of-experience scale is one-dimensional: people start atthe beginning and proceed along the scale as they gain experience.This type of experience scale is more like a collection of baseballcards. A user can collect the experience types (cards) in any order,and two people with the same number of experience types (cards) mayhave no experience types (cards) in common.

The experience types do not necessarily carry over: experiencegained on one type of system may or may not prove useful on another.Actually, experience gained on one system may make it more difficultto learn another. And, if users grow to like one type of system, theymay then dislike another one, thus making any experience transferproblematic.

These experience types can have a major effect on the design ofyour programs, as it is usually important for new programs to appearand operate in a manner similar to existing programs. Thus, the(possibly bad) designs of those existing programs may have to becarried into the design of your program.

1.2 "Religion"

This section might also be titled "religious preference."In the computer field, "religion" is a technical term thatrefers to the usually irrational and extreme preference of oneprogram, style, or method to another. Although you cannot really doanything about this phenomenon, you can keep it in mind when analyzingcomments on your design.

It has been observed that people often "get religion"over the first application (for example, a word processor) that theyuse. I can't recall the number of people who have tried to convinceme that the program that they just discovered (i.e., the firstone they used) is the best one in the world. This form of"religion" is normal and derives from the facts that (1) themove from manual to automated methods (e.g., from typewritersto word processors) involves a major increase in capabilities: eventhe simplest word processor provides vastly more capabilities thandoes a typewriter, and (2) new users do not have the experience torealize that all programs (e.g., word processors) are notequal. This form of religion usually fades away over time as newusers gain experience.

In a hauntingly close parallel to the "second systemeffect" (Brooks 1982), the "second program users" arethe ones to watch out for. These people started using one program,then gave that program up in favor of a second one. The problem isthat they think that since the second program is better than the firstone (which it usually is), it must therefore be better than all therest.

There is nothing in particular that you can do about users thatfeel religious about a program: rational arguments are in generalignored. You can, however, be aware that such users exist, andrecognize when you are dealing with one.

1.3 User Goals

Knowing your user's experience is essential, but a program designmust incorporate knowledge of what task or tasks the user is trying toaccomplish. For text editors, he or she might want to create:

some jottings, format not important;
some jottings, format important;
some jottings within a structure (e.g., outliners);
something with a specialized format (e.g., a businessletter or a poem);
a short narrative document (e.g., a school paper or short story);
a longer document (e.g., a long paper or book);
a long document with complex formatting (e.g., amathematics textbook);
a computer program;
data for a program; or
something else.

The frequency of doing these tasks can range from occasionally tocontinuously. Different tasks can be performed by the same user withdifferent frequencies.

The style of doing these tasks can also vary. One person may doall of one task, then start on the next. Another person may befrequently switching among two or more tasks.

1.4 Physiological Constraints

Users are people. There are limits to what people can do. Theselimits must be considered when designing a program.

Hands have a limited reach. The very act of reaching forone key draws a hand away from other keys. Thus, commands that youexpect to follow one another should be assigned with that constraintin mind. Function keys are often difficult to find and awkward topress. While there are almost always two shift keys, most keyboardsonly have one control (or equivalent) key and may only have one ofother types of shift keys. Thus, it is difficult to press someshifted keys (such as control-P) with just one hand.

Non-keyboard devices such as mice draw a hand far away from thekeyboard -- and you don't in general know whether it is the left orright hand that is drawn away. A sequence such as control-mousebutton may be very difficult for some (i.e., left-handed) usersto type.

Eyes can focus on a limited area of high resolutionsurrounded by a large area of lower resolution. However, areas ofstrong contrast such as reverse video are still visible inlow-resolution areas. Blinking items are not only visible, but willdraw the eye to them. "Status" displays should thereforechange as quietly as possible so as not to draw the eye away from thetext under edit. For example, it may make sense to place such statusareas on the top part of the display if insert/delete line operationscause visible motion of the bottom part.

The mind (or brain), however, places the greatestconstraints on editor design. It is only capable of processing a fewthoughts ("instructions") per second. In order for users tobe productive, it is important that these thoughts be directed as muchas possible to useful editing operations. There are several things toconsider regarding these thoughts.

First, mental effort (thought) is required to translate between thedisplay representation of the text being edited and the user'sinternal representation. The WYSIWYG ("what you see is what youget") principle reduces this effort by reducing the amount ofthought required. Note that in general WYSIWYG doesnot mean"fancy output on a graphics display." Rather, it means"it is what it appears to be, no more and no less."

Second, the mind has expectations: it sees (and in general senses)what itexpects to see. In extreme cases, if something totallyunexpected happens, it can take many seconds for the mind to evenrecognize that there is an unexpected image, in addition to the timerequired to process the image and make a decision. Thus, it isimportant for the program to anticipate what the mind will expect tosee and to arrange the display accordingly.

Third, it takes mental effort to handle special cases. Forexample, if the delete operation deletes everything except fornewlines, it takes effort to remember that difference and to monitoreach command that is being given to ensure that it conforms to therestriction.

Fourth, it takes mental effort to plan ahead. The design of theeditor should make it easy for the user to change his or her mind.

Last, it takes mental effort to track modes. (Chapter 9 goes intomodes in detail.) Each time a new mode is introduced, it takes mentaleffort to track the state of the mode and adds effort to the processof switching modes.

The mind's short-memory can hold from five to seven"chunks" of information (Norman 1990). These chunks areorganized in a cache-like form. When the chunk cache fills up, chunksmust be stored in "main memory," a process that takes time.Considering that some of these chunks are used to remember what isbeing edited, why the editing is being done, and other such context,it becomes clear that the editor should be designed to use as few ofthese "chunks" as possible.

The mind is poor at thinking numerically. It is much easier tothink in terms of "put that there" than "put object12856 at location 83456." These last two points mean that thecomputer should do as much remembering as possible for the user.

1.5 Applying These Physiological Constraints

Let us examine how these principles apply to a particular user: me.I select myself as the example for the simple reason that I understandhow my mind works better than I understand anyone else's.

First, I almost always work with plain ASCII files. Hence, I cantake advantage of WYSIWYG on even a simple ASCII terminal.

Second, the program/computer combination that I use can (mostly)keep up with my typing in real time.

Third, the Emacs command set that I use is very regular, so my mindneed only keep track of a few special cases.

Fourth, the basic paradigm behind the Emacs command set is"move to desired position, make desired change." Thisparadigm applieseven in the case where I made a mistake, as Isimply add the mistake to the list of changes to be made and continueto apply the paradigm. I never have to change mental gears. Thepenalty for making a mistake is thus minimized.

Fifth, the program minimizes what I need to remember: the textbeing edited is there to be seen, exactly as is, and there are veryfew state variables to track. In addition, the Emacs command set isdefined mainly in terms of objects (character, word, sentence, etc.)and has a convenient way of saying "some," "alot," "a whole lot," and "a huge amount."(Various aspects of Emacs command set are discussed in laterchapters.)

Going beyond these principles, I have used the Emacs command set solong (thirteen years) that I quip that most of my editing is performedby my spinal cord and not my brain. Although this quip is not truesince the spinal cord can only handle purely reflex actions, we willlook closely at how my mind functions when editing text. The mind ofany other experienced user should operate in a similar fashion.

As I write this text, part of my mind is articulating the pointthat I am trying to make, while another part is expanding those wordsinto their component characters. Call these parts the "sourceprocess." Another part of my mind is translating those charactersinto finger motions. Call this part the "keystrokeprocess." Other parts of my mind are reading the text as itappears on the screen, turning it back into words, and matching thesewords against the original word stream. Call this the "feedbackprocess."

These three processes work in any sort of writing: using acomputer, typewriter, or pen. All people who write use them.However, if the resulting text is to have few errors, one of twothings must have happened: either the user made very few mistakes(thus minimizing the number of errors to be corrected) or the usermust have written slowly, giving the feedback loop enough time torecognize an error before too much time has elapsed and the errorbecomes difficult to correct (such as an omitted character on theprevious line or page).

With the advent of computers, and their ability to make seamlesscorrections, a third option appeared: a new, fast feedback loop. Thisloop operates by giving the keystroke process the ability to recognizethat it made a mistake. This extra ability is not useful withoutseamless editing, as it takes a long time to use the eraser orcorrection tape. However, with (a lot of) practice, a fourth processcan be "running:" the "editing process."

The editing process takes the feedback from the keystroke processand inserts editing commands into the character stream created by thesource process. Here is an example of how this editing might operateto correct an error when writing the text "the quick redfox."

The source process generates the appropriate character string.
An error occurs. What is actually typed on the keyboard is thestring "teh".
The keystroke process recognizes the error just after the"h" was typed.
The editing process takes time to run. In this time, let ussuppose that the characters " quick" (that is a space,followed by "quick") were placed in the "outputqueue." It is reasonable to suppose that all of the charactersfor each word will be placed on the queue in one operation.
The editing process then places its own string of characters onthe output queue. This string will correct the error. For the Emacscommand set, the sequence might be "^[b^B^T^E". Thissequence means "move back a word ("^[b"), move back onemore character ("^B"), interchange the two switchedcharacters ("^T": swaps "eh" to "he"),return to the end of the line ("^E").
The rest of the line is then processed as usual, with thecharacters " red fox" placed on the queue in two chunks andeventually typed correctly. At some later time, the feedback processconfirms that the phrase was typed correctly.

(Other users may have variations on this process. For example,they may always delete all of any word with an error and retype theword.) With the extra fast feedback loop, the fingers were kepttyping at full speed all the time. Granted, an extra five characterswere typed, but consider what would happen without the extra loop. Itcould well be that the entire phrase would have been typed before theerror was noticed. The source process would have already started onthe next phrase. When the feedback process notices the error, thesmooth typing of characters would stop as the user's mind determinesexactly which corrections are required and how to perform them. Itmust then start the pipeline going again. The stopping, correcting,and starting again takes several seconds. A fifty word-per-minutetypist is typing about five characters per second. The Emacscorrection string would take one second to type. There is thus adirect saving of some seconds and an indirect saving due to not havinginterrupted the smooth flow of thinking.

Note that the design of the command set played an important part inmaking this loop usable. For example, if no "go backwardword" operation were available, the editing process would have tocompute how many characters were in the "output buffer," anoperation that is quite time-consuming (quick: how many letters in"brown"?) as well as not well matched to how the mindworks.

Some recent industry trends illustrate how some "userfriendly" designs clash with this editing process. Consider atypical, modern window system. In some ways, it acts to frustrate anexperienced user. For example, when a user closes a modified file,the computer may put up a dialog box that says "Discard changes?Yes, No, Cancel" (or words to the same effect). This prompt willbe displayed in a beautiful dialog box, neatly centered on the screen.Each response will have its own button. Unfortunately, even if theuser is expecting the dialog box, he or she may have wait for thesystem to catch up for these reasons:

The operating system does not know about the dialog box until theprogram has informed it of the box. Hence, if the mouse button ispressed too early, the button-press event will be sent to the mainwindow and not the dialog box.
While the user knows that the dialog box will appear in the centerof the screen, it is in general too difficult to predict preciselywhere the (dialog box) button will show up to be able to "mouseahead." Hence, the mouse button cannot be pressed until the boxis drawn.

For these reasons, an experienced user's editing process may beinterrupted. These interrupts no doubt contribute to the feeling ofsluggishness that many experienced users still feel when using suchsystems. The challenge is to design your program so that experiencedusers can productively use your program. The steps that you can taketo facilitate this use include:

keep dialog box choices consistent
provide keyboard responses for all choices
provide for type-ahead
be prepared to handle mis-directed events

In general, the goal is for an experienced user to be able toaccurately predict which responses will be required, and to reliablysupply those responses in advance of the prompts. In this way,experienced users can continue to do their work, without being sloweddown by the system.

1.6 Users Who Have Handicaps

When someone has a significantly reduced ability to do something,that person is considered to be handicapped in that area. The reducedability might be physical, such as reduced hand motion or pooreyesight, or it might be mental, such as a reduced ability to rememberthings.

While the number of people who have severe handicaps in many areasis small, a large number of users have at least limited handicaps in afew areas. As it is important for programs to accommodate as wide arange of users as possible, programs must accommodate users withhandicaps.

It is also important to keep in mind that those users that havesevere and/or multiple handicaps can benefit greatly from the use ofcomputers.

Sometimes, even users without a handicap benefit from designsintended to aid users with handicaps. For example, adding awheelchair ramp to an old building also allows other people to rollheavy objects up the ramp instead of having to use stairs.

The main design principles to follow to take into account userswith handicaps are:

Reduce mental complexity: have the user deal with only one objector concept at a time.
Reduce visual complexity: keep displays clean and to the point andavoid clutter.
Reduce manual complexity: allow the user to do everything withjust one finger (this doesnot mean toforce the user todo everything with one finger). Keep commands simple. Allowshortcuts and shorthand where applicable.
Provide for customization. If you are lucky, the operating systemwill do this for you.

It is not surprising that these are also good design rules forusers without handicaps.

Questions to Probe Your Understanding

(Some of these questions refer to marketing decisions. A designermust also take into account those people who are not yet users.Remember that purchasers are "users" too.)

Consider the case where the higher you go in an organization, theless computer experience people have. Assume that product purchasedecisions are made at a higher level than the product user. How doesthis inversion affect product design? Product marketing?(Medium)

Many product reviews include "feature checklists" or"scoreboards." These checklists in general include allfeatures found in all related products. What are the pros and cons ofthese checklists for manufacturers? For users? (Medium)

I have observed that, all other things being equal, people will buythemore expensive of two application programs. Why?(Easy)

Productivity falls off as computer response time increases.However, the fall-off is not linear, but happens in a series ofthresholds, where slight increases in response time cause large dropsin productivity. Why do these thresholds exist? What information doyou need about human physiology in order to calculate where thethresholds are? (Hard)

How would you design aprogram to best be used by someonewith dyslexia? What about the entire computer system? It is okay tobe extreme and to make it less usable by other people. (Medium)

Back to Contents.

Two: User Interface Hardware

Beware the Jubjub bird, and shun
The frumious Bandersnatch!"

User interface hardware is the collection of devices you use wheninteracting with the computer. The currently available user interfacehardware usually consists of a display screen for output and akeyboard and perhaps a mouse or other graphical input device forinput. This chapter will first discuss the output side: thescreen. It will then discuss the input side: the keyboard. Finally,it will discuss the communications paths that tie the two partstogether.

2.1 Display Types

In the old days (i.e., the early 1980s), almost all displayswere part of character-based terminals. Differences in capabilitiesamong the terminals were often crucial. These differences play animportant part in the types of redisplay schemes that are workable(redisplay is discussed in Chapter 7). Thus, it is worth reviewingthe old display types.

2.1.1 TTY and Glass TTY

A TTY is the canonical printing terminal. Printing terminals havethe property that what is once written can never be unwritten. Aglass TTY is the same as a TTY except that it uses a screen instead ofpaper. It has no random cursor positioning, no way of backing up, andno way of changing what was displayed. They are quieter than printingterminals, though.

When a text editor is used on one of these displays, it usuallymaintains avery small window (e.g., one line) andeither echoes only newly typed text or else constantly redisplays(i.e., reprints) that small window. Once a user is familiarwith a display editor, however, it is possible -- in a crunch -- toedit from a terminal of this type, but this is not generally apleasant way to work.

Although one would hope that this type of display was gone forgood, it does crop up from time to time in poorly implemented windowschemes. Some window schemes offer window interfaces that resembleprinting terminals -- all too well.

You may encounter one other type of "write only" scheme:a Unix-style output stream. As an editor writer, you may want tocheck for this and either:

alter your output accordingly, or
don't alter your output

You may want to alter your output if you feel that the user wantsto create some sort of "audit trail" type file. On theother hand, you would not want to alter your output if the user isattempting to diagnose problems by recording the data that is sent tothe display.

2.1.2 Basic Displays

A basic display has, as a bare minimum, some sort of cursorpositioning. It will generally also have "clear to end ofline" operation (put blanks on the screen from the current cursorposition to the end of the line that the cursor is on) and "clearto end of screen" (ditto, but to the end of the screen)functions. These functions can be simulated, if necessary, by sendingspaces and newlines. A typical basic terminal is (was) the DECVT52.

Such displays are quite usable at higher speeds (for example, overa 9600 bps connection) but usability deteriorates rapidly as the speeddecreases. It requires patience to use basic displays over a 1200 bpsconnection, and a dedication bordering on insanity to use them at 300bps.

2.1.3 Advanced Displays

Advanced displays have all of the features of the basic displays,along with editing features such as "insert" and"delete line and/or character." These features cansignificantly reduce the amount of data sent to the display for commonoperations. A typical advanced (circa 1980) terminal is theDEC VT100. Most terminals currently manufactured are at least aspowerful as this one.

There is a subtle difference among some of the advanced terminals.An "insert line" operation adds one or more blank lines atthe cursor: the lines that "drop off" the bottom of thescreen are lost. A "delete line" operation deletes one ormore lines at the cursor: blank lines are inserted at the bottom. A"scroll window" operation (move linesx throughy up/downn lines) affects only the specified lines: theother ones remain stationary.

The "scroll window" operation is more pleasing than theothers to see when there is some stationary text being displayed atthe bottom of the screen. With "insert/delete line," theappropriate number of lines must be deleted and then inserted; thetext at the bottom thus moves within the display's memory. Such jumpsare often visible to the user. With "scroll window," thewhole thing is performed as one operation and the lines at the bottomdo not jump.

2.1.4 "Memory Mapped" Displays

This designation covers a wide range of displays. Their commoncharacteristic is that display memory can be read or written atnear-bus speeds. The display is usually built into the computer thatis running the text editor. Many personal computers and workstationsfollow this design. But be warned: some computers have very fastdisplayhardware, but thesoftware that is used tointeract with the display is very slow. It is probably better for aredisplay scheme to consider such displays to be "advanced"or even "basic." Examples of such displays are the ROM BIOScalls on the IBM PC and Sun workstations. In both cases, third-partydrivers operate many times faster than the manufacturer-suppliedones.

The use of such fast displays has several implications for theredisplay process. First, many of the advanced features are typicallynot available. However, it may be possible to emulate the missingfeatures quickly enough that the lack of advanced features is almostalways not significant. Second, it may be possible to use the displaymemory as the only copy of the data on the screen. (This optimizationis discussed in Chapter 7.) Third, if reading from the screen doesnot cause flicker but writing does, the screen can be read and theincremental redisplay process will run and compare the buffer againstit, changing it only when necessary. Finally, if you can write to thescreen without flicker, the redisplay process merely boils down tocopying the buffer onto the screen, as copying is generally fasterthan comparing.

2.1.5 Graphics Displays

Most personal computer and workstation displays are actuallybitmap-oriented graphics displays. Software is used to make themappear to display characters. With a graphics display -- and theappropriate software -- a program can not only display text, butdisplay text using proportional spacing (where different letters takeup different amounts of space), take advantage of different sizes,styles, and display fonts, and even incorporate graphicalelements.

2.2 Keyboards

This section presents a review of salient keyboard features.Although most of us won't ever get the chance to design a keyboard, weall purchase keyboards, and more importantly we design programs withexisting keyboards in mind.

The keyboard is the main way of telling the computer what to do.In some cases, it is the only way of doing so. Many thousands ofcharacters will be entered in the course of a normal working session.Someone who types for a living (such as a typist, writer, or computerprogrammer) can easily typeten million characters eachyear.

The keyboard should thus be tailored for the ease of typingcharacters. While this statement might seem trite, there are a largenumber of keyboards on the market (i.e., most) which are prettypoor for entering characters. Below is a discussion of the variouskeyboard features and why they are or are not desirable.

N-KEY ROLL-OVER is a highly desirable feature.Having it means that you don't have to let go of one key beforestriking the next. The codes for the keys that you did strike will besent out only once and in the proper order. (Then means thatthis roll-over operation will occur even though every key on thekeyboard has been pressed before the first one is released.) Thebasic premise behindn-key roll-over is that you will not hitthe same key twice in a row. Instead, you will hit a different keyfirst and the reach for that key will naturally pull your finger offthe initial one.N-key roll-over loosens the timingrequirements regarding exactly when your finger has to come off thefirst key. Thus, typing errors are reduced. Note thatn-keyroll-over is of no help in typing double letters. Note also thatshift keys are handled specially and are not subject to roll-over.

Some keyboards implement "2-key roll-over/n-keylockout." This means that only the first two keys of a continuoussequence will be sent and the rest ignored (until all keys arereleased). This "feature" is actually a way of turning thestatement "we don't offern-key roll-over" into apositive-sounding statement "we offer 2-key ..."

AUTO-REPEAT means that if a key is pressed and held down,the code for that key is sent repeatedly. It is a very desirablefeature. It can cause problems (say, if you put something down on thekeyboard), but such problems are worth living with. Older terminalssometimes followed typewriter design in that only certain keys wouldrepeat (such as space, 'x', and dash). Repeating just these few keysis not useful. Other terminals repeat the printing characters but notthe control characters. This is also not useful. As we will seelater, it is the control characters that we are most likely to want torepeat.

There are three parameters associated with auto-repeat: the initialdelay to the first repeat, the rate at which a key will repeat, andthe acceleration of the repeat. Ideally, the user should be able toset these parameters. If they cannot be set, the values selected bythe manufacturer become an additional consideration.

"TYPEABILITY" (I trust that the English languagehas not sunk to the point where this is considered to be a valid word)is the single most critical feature. It is simply the ability to typethe useful characters without moving your fingers from the standardtouch-typing position (the "asdf" and "jkl;"keys). As more and more people who use (computer) keyboards are touchtypists and can thus type reasonably fast, they should not be sloweddown by having to move their hands out of the basic position. It cantake one or twoseconds to locate and type an out-of-the-waykey. The row above the digits is out of the way, as are numeric keypads and cursor control keys. One second is from three to tencharacters of time (at 30 - 100 words per minute). Thus, it takesless time in general to type a four- or five-character command fromthe basic keyboard than to type one "special" key.

Because of the desire for typeability, it is worth at leastconsidering doing away with such keys as Shift Lock or Caps Lock.They are rarely, if ever, used, and the keyboard space that theyoccupy is in high demand. (Yes, I realize that my anti-uppercase biasis showing here.)

Keyboard manufacturers have done other things that reducetypeability. Two examples are illustrative. First, the timing on theshift keys can be blown. The result of doing so is that when"Foo" is desired, "FOo," "fOo," and"foo" are as likely to result. The other example is havinga small "sweet spot" on each key. Missing this "sweetspot" will cause both the desired and the adjoining key to fireor not. Thus, striking "i" could cause either"io" or nothing to be sent.

PACKAGING or physical keyboard design is also veryimportant. Sharp edges near the keyboard or too tightly packed keyscan cause errors and fatigue. Can the keyboard be positioned so as tobe comfortable? Is there a palm ledge (this may be either good orbad)? Does the keyboard meet "ergonometric" standards? (Inmy experience, "ergonomic" standards equate to "hard touse.")

2.2.1 Special Function Keys

Keyboard manufacturers seem to have decided that a plethora ofspecial keys is more useful than adding shift keys. Thus, you can getkeyboards with Insert Line or "cursor up" or -- gasp -- PF1(if not LF1, F1,and RF1). These keys, when pressed, willeither do the function that they name, do something totally random, orsend a (usually pre-defined and unchangeable) sequence of charactersto the program.

With the advent of windowing systems, manufacturers have realizedthat the keyboard/display combination simply does not have theinformation required to properly perform the function locally. Theyhave also decided that random operations don't sell devices well.This is actually a change from the terminals made a few years ago.

That leaves us with character sequences. Ideally, the sequenceswould be programmable. Thus, the editor could save the current set ofprogrammed sequences (if any), load a set that would not interferewith any editing commands, then restore the user's settings uponexit. However, this is the real world and it is often the case thatthe sequences are not programmable.

Given this, the keys may or may not be useful. For example, the"cursor up" key might send Escape 'E'. You may wish thisparticular sequence to perform a "move to end of sentence"operation (I do). Thus, pressing the "cursor up" key willmove you to the end of the sentence!

Okay, you say, I won't use Escape 'E' to move to the end of thesentence. You then look up all of the sequences that may be sent byfunction keys and design your command set around them. All is welland good until you try to use a different keyboard. Your new keyboardwill in general use different sequences than the old one. Thesequences may even conflict: for example, the "cursor down"key on the new terminal might send Escape 'E'.

We got into this situation for two reasons: a major one and a minorone. The minor one is easy to deal with. All that we have to do istell the editor which keyboard we are using and have the editorperform any required adjustments. On UNIX systems, for example, therequired information can be found in the/etc/termcap orterminfo facilities.

The major reason why we are in this situation is that the programcannot tell when we are pressing a function key and when we are typingthe same sequence of characters explicitly. After all, there are only128 or 256 possible characters, and they must be shared by regularkeys and function keys.

Some systems that support directly attached terminals use timinginformation to make this determination. If a string of characterscomes in with no delays between them, they assume (usually correctly)that it is a single function-key press. This timing approach does notwork if the terminal (or other computer) is coming in via anetwork.

The problem could best be solved by standardizing the charactersequences sent by function keys so as to (1) have a single, obscureprefix (say, Escape, control-_) and (2) have a consistent syntax sothat all devices can easily determine when the sequence is over.Command set designers would just have to live with the hole in thecommand set, but that would be a small price to pay.

Aside from the problems of compatibility with whatever software isbeing run, the placement of the function keys is also a problem. Aswas mentioned before, keys that are off to one side take a long timeto hit. Thus, typing is slowed down considerably. The keys are bestused for infrequently used functions or functions where the extra timeis not a significant factor (e.g., Help).

There is yet one more problem. Additional keys are not free and sothe number of them that you'll want to pay for is limited. However, itis desirable to have the ability to specify a large number offunctions (i.e., have a large number of codes that can bespecified by the user). The number of function keys required growslinearly with the number of codes.

2.2.2 Extra Shift Keys

The other way to increase the number of codes available to the useris to provide extra shift keys. Shift keys are keys that modify theactions of the other keys. Shift and Control are the two most commonexamples of such keys. The IBM PC has an Alt key, the Apple Macintoshhas its "cloverleaf" key, and some terminals have a Meta keyoption.

As an example, a Meta key would set the top (value 128 decimal) bitof the character that is typed. Thus, while typing shift-A would sendthe code for uppercase A (65 decimal), meta-shift-A (often abbreviatedas simply meta-A or ~A) would send the code 128 + 65 or 193 decimal. Auser can thus specify 256 codes instead of the usual 128 from a fullASCII keyboard.

The number of possible codes grows exponentially with the number ofextra shift keys. Thus, 512, 1024, and even 2048 code keyboards (with2, 3, or 4 extra shift keys) are conceivable. You will have to usesystem-dependent techniques to take advantage of this extrainformation.

Finding room on the basic keyboard for these extra shift keys isnot easy. That is one reason why the removal of the Shift Lock keywas suggested earlier. These keys must be on the basic keyboard inorder to preserve touch-typeability.

2.2.3 Key Placement

A computer is not a typewriter. There are things that you do witha computer that simply do not apply to typewriters. Hence, a computerkeyboard should have more keys than a typewriter, and yet these keysmust be conveniently placed.

Several computer manufacturers have achieved good keyboard designs.Unfortunately, most of them have retired their good designs in favorof poor ones. (See the next section for examples.) Here are some ofmy criteria for good key placement:

Basic QWERTY keyboard (Dvorak keyboards are discussed later)
top row has: Escape, 1/!, 2/@, 3/#, 4/$, 5/%, 6/^, 7/&, 8/*,9/(, 0/), -/_, =/+
second row has: Tab, QWERTYUIOP, [/{, ]/}
third row has: Control, ASDFGHJKL, ;/:, '/"
fourth row has: Shift, ZXCVBNM, ,/<, ./>, //?, Shift
extra shift keys (Alt, Meta, etc.) should be immediately below theshift keys
the Back Space and/or Delete keys should be on the upper right, asclose in as possible
the keys `/~ and \/| should fit into the right somewhere
the Return (Enter, etc.) key should be on the right, as close inas possible, in the third or second and third rows
the Break key should be on the far upper right

These are the positions that have come to be accepted as standardfor computer keyboards. However, some manufacturers have gottenscared that their computers might actually resemble computers. Thus,necessary keys such as Escape and Control get moved out to the farreaches of the keyboard, and "<" and ">"characters get moved from their convenient, traditional positionsabove "," and "." to who knows where.

Dvorak keyboards are an underground fad. Their proponents swear bythem and claim significant performance improvements (i.e., youcan type faster on them). As the story goes, the standard QWERTYlayout was designed to slow typing on the early typewriters in orderto keep the mechanism from jamming. And, since jamming is no longer aconsideration, one can (and Dvorak did) design a layout that is"better." Regardless of the truth of the story (and Ibelieve it to be true), all keyboard layouts can take advantage of theimprovements in technology. For example, modern keyboards areactually a grid of switches. These switches are scannedelectronically. Their travel, feel, and other characteristics can beadjusted as desired. They have been adjusted so that both key traveland effort are much reduced from old, manual typewriters. Hence, handand finger motions are reduced overall and the benefits to be gainedfrom switching layouts are thereby reduced.

Considering that there are hundreds of millions of existingkeyboards that use the QWERTY layout, and that there are billions ofpeople trained to use it, it becomes clear that only an enormous gainin productivity (e.g., greater than 100%) would be able tojustify a switch to another layout. And while there are a number ofisolated success stories, not even the proponents of Dvorak layoutsoffer any controlled studies that show the requisite gains (Norman1990). Hence, these keyboards are not being adopted on a largescale.

2.2.4 Example Keyboards

This section will briefly review a number of widely availablekeyboards. The keyboards reviewed are the ones actually named: thereview does not transfer to "clones." The comments are, ofcourse, my personal opinions.

DEC VT100 terminal: The keyboard layout is excellent. Thefeel is clunky. The control keys don't repeat.

DEC VT200 terminal: The keyboard layout is poor (badlyplaced Escape key and "<" and ">" keys).The feel is pretty good.

IBM PC 83-key keyboard: This is the one sold with theoriginal IBM PC. Its layout is almost excellent (the"\"/"|" key placement is a little weird). Itmakes a clacking sound which I happen to like although many people donot. The feel is excellent. If only they didn't try to"improve" it with...

IBM PC 101-key keyboard: This is the only one that you canget from IBM now, and it is enough in itself to keep me from buying anew IBM PC. The Escape and control keys are very poorly placed. Thefeel is excellent.

Apple Macintosh original "slab" keyboard: Thelayout isn't too bad you consider that Apple intended this machine tobe its own universe, and not try to incorporate outside software. Onthe whole, however, it suffers from not having quite enough keys(especially Escape), so that terminal emulator programs are awkward touse. The feel is fair.

Apple Macintosh "Standard" keyboard: Perfectlayout, enough keys, great feel.

Apple Macintosh "Enhanced" keyboard: This keyboardis for people who like the IBM PC 101-key keyboard. Enough said.

Sun Microsystems SPARCstation keyboard: Excellent layout,poor feel, too many function keys.

2.3 Graphical Input

Another way of interacting with a computer is by means of agraphical input device. The advantage of a graphical input device isthat it can reduce the number of commands needed. Such a device isused for pointing at sections of the screen. It is possible tospecify items (i.e., "operate on that") withouthaving to specify the numerical address of the location or a commandstring to move there.

When a graphical input device is used, the screen is treated as onemenu with the device pointing to one entry. A cursor is used toprovide feedback to the user about which menu "item" iscurrently selected. There are usually one or more flags that can bespecified conveniently from the device. These flags provide controlinformation and are analogous to shift keys.

The basic way to use these devices is to track the position impliedby the graphical input device with the cursor. When a signal isgiven, the action implied by the current position is performed. Thescreen is logically broken up into two or more sections. One sectionhas the text that is being edited. Moving the cursor here provides aconvenient way to move the point around; typing a character couldcause it to be inserted wherever the cursor is. Other portions of thescreen can specify menus of possible actions to select from.Graphical input is thus a very sophisticated way of specifying aposition as an argument to a function.

The following sections discuss the advantages and disadvantages ofa variety of graphical input devices. Bear in mind that the commentsare generalizations: there are exceptions to each of the advantagesand disadvantages mentioned.

2.3.1 Touch Sensitive Display

A Touch Sensitive Display (TSD) is just what it sounds like. Thescreen is covered with a special transparent material (or a grid ofLEDs and receptors or other devices) that you touch with your finger:the absolute (x,y) coordinates of where you touched arethen reported. The only available flag is the "touch/notouch" flag. (Actually, experimental pressure-sensitive displaysexist that report all three positions and three pressure axes.) Thewell-engineered touch-sensitive displays are quite pleasant to use forlow-usage applications. For high-usage purposes such as text editing,it is tiresome to keep raising your hand to the screen, and yourfinger tends to cover the most interesting part of the display(i.e., the part that you are about to edit).

2.3.2 Tablet

A tablet is a special surface that reports the position of theinput device as an (x,y) coordinate. The input devicecan be a "puck" (a small box) or a special pen. At leastone flag ("touch/no touch") is always available: some puckshave four, sixteen or even more extra flags. Tablets are very handyfor converting paper documents such as maps into computer form. Theyare less useful for text editing, as they tend to be large andtherefore require a long reach and a lot of uncluttered deskspace.

2.3.3 Mouse

A mouse is a small box on wheels (or, in some cases, on felt padsover a special pad). As you move it around on the floor, desk, books,a leg, or most anything else, it reports the relative movement of themouse (i.e., "I was just movedn units up andm units left"). It can have several flags (buttons),although the correct number is one, as having extra buttons means thatprogram designers will try to put extra functions on them. And, whilethe functions themselves are not a problem (I do advocate extra shiftkeys, after all), the presence of these functions usually implies apoor program design. Fortunately, if the mouse has extra buttons, thesoftware can easily correct this "defect" just by makingthem all do the same thing.

2.3.4 Trackball

A trackball is an upside-down ("dead") mouse. Instead ofmoving the wheels by moving the box, you spin the slightly largerwheel directly.

2.3.5 Joystick

A joystick is a small stick mounted on a couple of potentiometers.They typically can report either absolute position, first derivative(relative movement) or second derivative (acceleration). As the stickis moved only over a small distance, it is difficult to construct onewith good resolution and that avoids "stickiness" and"jumpiness." It is generally not as nice to use as a mouseor trackball. Flags are simulated by regular keyboard keys.

2.3.6 A Different Mouse

Finally, an imaginary but useful device should be considered. Thatdevice is a foot-operated mouse (perhaps called a "rat?").Using your feet rather than your hand to operate the mouse solves oneof the most nagging problems of any of these devices, which is thatyour hands must leave the keyboard with the usual, aforementionedresults. Of course, this device makes it harder to edit with yourfeet up on your desk...

2.3.7 Other Devices

New types of input devices appear all the time. Thus, no listingof such devices can ever remain complete. An example of a recent suchdevice is "pen" input. The points to remember are that eachdevice should be judged on its own strengths and weaknesses and thatthe devices should be judged on how they help your users: not whetherthe devices are "neat" or "new."

2.3.8 Conclusion

These devices all assume a reasonably high bandwidth connection tothe computer (say, 2400 bps or faster). If you have a slow-speedconnection, the cursor tracking must be performed in the local displaydevice, which must somehow be programmed with the knowledge of when toreport events and what to do with the cursor (sometimes the cursorchanges shape as it crosses from one part of the screen to another).In this way, it is possible to supply the necessary immediatefeedback. A slow-speed connection would be quite satisfactory forcommunicating the significant events, but probably not satisfactoryfor the screen refresh that would follow, say, the selection of amenu.

2.4 Communications Path Issues

This section covers a number of miscellaneous issues concerning thecommunications path between the computer and the display/keyboarddevice.

2.4.1 Speed and Character Format

It almost goes without saying that the faster the communicationspath, the better. Consider it said.

It also almost goes without saying that a full-duplexcommunications path is necessary. Fortunately, we are long past thedays when users were forced to wait until the computer let them type.Except on automated teller machines.

If the communications are over an asynchronous serial path,character format is an issue. The considerations are:

Seven or eight data bits? Pick eight if possible. In this way,you at least have the potential to use extra key codes. The choice ofeight data bits also lets you use international character sets.
Even, odd, or no parity? Pick one to go with the data bits, asthis field has no effect on the data that the editor sees. Thecombinations that you will tend to find all over the place are sevendata bits and even parity (older systems) and eight data bits and noparity (newer systems).
Number of stop bits? One, unless your system wants a differentnumber.

Operating system designers make the quite valid and reasonableassumption that they should be doing some processing of the inputcharacters. Fortunately, they usually also offer the ability to turnsuch processing off. A text editor should follow these steps:

On entry to the editor:

Record the current processing parameters.
Turn off all character processing.

On exit:

Restore the saved processing parameters.

In this way, the text editor has complete control over what happenswith the input characters. This places an extra burden on you as thewriter of the editor, as you must replace the operating systemhandlers with versions of your own that mimic the existing functions.On the other hand, your versions will probably differ from theoperating system versions in a number of crucial ways. For example,if the operating system lets you suspend your process (for example,under a Unix that supports job control), you need to restore theterminal and input processing parameters before you turn control backto the operating system. When resumed, you need to return thesettings back to those used by the editor (noting any changes such asa new window size) and probably refresh the display. If you hadn'treplaced the normal handlers, the user would find yours to be a veryunfriendly program to use.

2.4.2 Flow Control

The faster the communications path, the less time the display hasto process each character. As the speed of the communications path isincreased, a point will be reached when the display can no longer keepup in real time. This is the point at which flow control is required.There are three methods currently in use to implement flowcontrol.

The first is in-band control. Two characters are reserved for flowcontrol purposes, typically the control-S and the control-Qcharacters. The first is used to mean "hold on, I can't keep upand my buffer is almost full." The second means "okay, I'vecaught up and you can proceed." This method works for the mostpart, but has the annoying property of using up two valuable controlcharacters. Using any control characters causes problems for someprograms. For example, there exist some communications protocols thatuse all 256 characters and allow no characters to be reserved.

The second method is out-of-band control. This method uses avariety of mechanisms, none of which interfere with sending data.Examples of such methods are hardware "handshake" lines andnetwork protocol mechanisims. This method is clearly superior toin-band.

The final method is flow control avoidance. This method takesadvantage of the facts that displays take different amounts of time toprocess different characters and that some characters (called paddingcharacters) take very little time to process. The program send thedata as a mix of useful characters and padding characters. Thespecific mix is computed so that the average time required to processeach character is less than the time taken to send a character overthe communications path and that the terminal's input buffer does notoverflow.

For example, let's say that we have these (fairly typical)figures:

communications path speed is one character per msec
time to process a printing character is .6 msec
time to process a line feed character is 17 msec
time to process a pad character is .1 msec

If we were just sending full lines of text to the display, we wouldsend 81 characters in 81 msec. These 81 characters would take 80 * .6msec + 17 msec = 65 msec to process. Hence, no padding would berequired.

On the other hand, if we were just sending single-character linesof text to the display, we would send 2 characters in 2 msec. These 2characters would take 1 * .6 msec + 17 msec = 17.6 msec to process.Padding would be required as the 17.6 msec processing time is greaterthan the 2 msec transmission time. As it turns out, 18 paddingcharacters will be sufficient (1 * .6 msec + 17 msec + 18 * .1 msec =19.4 msec, which is less than the 20 msec of transmission time). Itis not difficult to calculate the correct number of padding charactersrequired, given the character mix and the communications pathspeed.

This third method is the preferred method for text editors, as itworks on any communications path (i.e., even those with noout-of-band flow control) and it allows full use of all inputcharacters. If used over a network, it has the disadvantage ofcreating a modest additional amount of network traffic.

The ideal method would be for the editor to determine whetherout-of-band flow control is used along the entire communications path.If such control is in use, no padding characters need to be sent.Unfortunately, it is usually not possible to reliably determine thetype of flow control in use.

2.4.3 Echo Negotiation

Echo negotiation was devised for the Multics computer system. Itis a protocol for use by computer networks which can cut down onresponse time by reducing communications overhead. It is potentiallyuseful in an environment where the user's terminal is at one node andthe computer which is running the text editor is at another. In suchan environment, it can take a long time to send a character back andforth, and yet it takes little more time to send many characters.

Echo negotiation requires that it be easy to describe exactly whatis to be done with each character to a communicationsprocessor/terminal combination and that the combination be capable ofdoing enough of the editing to make it worthwhile.

Typically, echo negotiation can only be used when the editing point("cursor") is at the end of a line. The text editor sends alist of approved characters to the terminal or other nearbyprocessor. As long as the user types only those characters and doesnot reach the end of a screen line (thus necessitating a wrap), theterminal can safely echo the input characters to the display and holdonto the input text. When any non-approved character is typed (or theline fills up), the terminal reports all of the held input charactersand the reason why the input was sent on (i.e., non-approvedcharacter or line wrap) to the text editor. The editor then processesthe input data and the cycle repeats.

The Xylogics Annex terminal server incorporates an advanced versionof echo negotiation called the LEAP Protocol. It incorporates all ofthe above design.

Both standard echo negotiation and the LEAP protocol suffer fromthe same problem. This problem is severe enough to call into questionthe desirability of using them at all: Echo negotiation is onlypotentially useful when the terminal is separate from the computerthat is running the text editorand when the computer isoverloaded. The principle behind echo negotiation is that waking upthe text editor process for each character is inefficient. In extremecases, the wake-up may take so long that input echoing issignificantly delayed. The fix that echo negotiation offers is toperform the updates in batches, thus waking up the text editor processfewer times and thereby reducing overhead.

The problem with the fix is inherent in its own success. With noecho negotiation, input is echoed slowly but evenly (user typing is ingeneral much slower than process-switching times) and the text-editingprocess tends to stay in memory. With echo negotiation, input isechoed quickly until the non-approved character is typed, then a(comparatively) long pause is encountered while the text-editingprocess must be woken up and possibly even swapped in (we are talkingabout a situation where resources are tight, after all). Even thoughtheaverage per-character processing time might be lower, thevariance in per-character times is much larger with echonegotiation. It is usually the case that the variance is so high thatthe system as a whole becomes unpleasant if not impossible to use. Inone extreme test that I performed, I found the variance of times to beso great that editing was all but impossible: until you stopped typingfor many seconds, you could never tell whether the computer hadprocessed all of your input and hence couldn't safely continue typing(editing commands -- not new text to be inserted). In conclusion, echonegotiation is not a good feature to include.

2.4.4 Fancy Modems

High-speed modems (9600 bps and higher) are starting to becomequite common. The main problem with them is that the advertising forthem is focused around file-transfer protocols and dumping largequantities of text through them. The manufacturers add a variety ofcompression techniques to improve their modems' throughput in theseareas.

However, text editing is interactive. Low response time is moreimportant than high throughput. This is where the compression schemesimplemented by the modems can cause problems. The simple solution isto turn off all such compression. Do not forget to turn off control-S/ control-Q flow control while you're at it.

Questions to Probe Your Understanding

Devise at least three different ways of encoding cursor positioningcoordinates. Which is the most extensible? (Easy)

Why can character-oriented displays handle blinking text moreeasily than graphics displays? Does it matter? (Easy)

If you could change one physical attribute of the display(e.g., size, phosphor) that you use most, what would it be?(Easy)

Give an example of an application that can make effective use offunction keys. (Easy)

Devise an efficient, extensible encoding scheme for function keys.(Easy)

Some keyboards (such as that used by the IBM PC and compatiblecomputers) assign a priority to shift keys and only pay attention tothe highest priority key pressed. For example, pressing both Controland Shift gives the same code as does just pressing Control. Is thisbetter or worse than giving a different code to the combination keypresses? Why? (Easy)

How does the amount of buffering affect the need for padding? Doesit matter where in the system additional buffering is placed?(Medium)

A fourth way to handle flow control used to be common practice butis no longer. It is called "ETX / ACK" after the codes forthe characters that were used to implement it. In this method, thesender sends a block of text followed by an ETX character. It thenwaited for the receiver to return an ACK character. Why has thisscheme dropped from favor? How does it interact with terminals oncomputer networks? (Medium)

Back to Contents.

Three: Implementation Languages

He took his vorpal sword in hand:
Long time the manxome foe he sought --

The choice of implementation language has a major effect on thedesign of a text editor. In some environments, only one language isavailable. In such environments, you do the best that you can andyour editor may end up different from what it would be if the ideallanguage was available. However, most environments offer at least twolanguages. You thus have a choice, and this chapter offers guidancein making that choice. Of course, this may be a choice between Scyllaand Charybdis...

3.1 General Considerations

The general considerations in selecting a language to use forimplementing a text editor are:

availability and implementation quality
text handling power
support for extensibility
support for large projects
efficiency

Each of these considerations will be explored in detail.

3.1.1 Availability and Implementation Quality

You can only use those languages that are supported on the systemthat your text editor is first implemented upon. Nonetheless, youshould be thinking about the second, third, and later systems thatyour text editor will be ported to, and which languages all of thosesystems support in common.

In addition to the mere presence of a language processor on asystem, you should take into consideration the quality ofimplementation of such systems. An implementation's speed ofoperation, quality of diagnostics, quality of code produced, and othersuch factors can make a large difference in the usability of thelanguage on a particular system.

3.1.2 Text Handling Power

It may appear redundant to say that a text editor must handle text,but consider a spread sheet program: most of its work is in handlingcontrol flow, figuring redisplay, and setting up to execute commands.Only a small fraction of its time is spent in the floating pointinstructions that most users think is the program's "realwork."

At any given moment, a text editor -- or most any other similarinteractive program -- is mainly doing all of the following:

waiting for user input
parsing that input
setting up to execute the commands
executing the commands
determining the effect of those commands on the screen
updating the screen

Most of these operations involve processing text in some way orother. Text editors differ from other applications only in that the"executing the commands" item also involves manipulatingtext.

It is important to note that "text handling" does notnecessarily mean "string handling." In many cases, thelanguage's native string operations are not sufficient, and you mustwrite your own string primitives. For example:

Fortran does not support strings with dynamically varying lengths.
C does not support strings that contain the NUL (0 decimal)character.
Many implementations of Pascal do not support arbitrarily longstrings (the leading byte count is often only 8 or 16 bits wide).

3.1.3 Support for Extensibility

If they do nothing else, text editors change. The language shouldmake it easy to make and maintain changes. In some cases, the sourcecode must be changed and the editor recompiled. However, it is verydesirable to allow users to change some of the editor to suit theirtastes. The language should offer such support.

This support can take many forms:

late binding of names to procedures through indirect calls,dynamic linking, or other techniques
retaining and using the symbol table information at run time sothat the user can think of changes in terms of names, not addresses
internal error and consistency checking under program control sothat users can be protected from their mistakes
the ability to add code to the executing editor

Not all languages offer these features. You will have to simulatethe missing features when using those languages that lack them.

3.1.4 Large Project Support

Text editors are apt to grow quite large. All of the techniquesuseful for any large project are useful here. Examples of thesetechniques are:

division of the program into separate modules
division of the program into separate files
separate compilation
a way to organize the global name space
a way to keep objects out of the global name space
automatic verification of procedure call/declaration compatibility
conditional compilation
compilation constants
a way of constructing "data abstractions" that packageprocedures and private state information
a way of dynamically allocating memory

Add your own favorites to the list.

3.1.5 Efficiency

Programs spend most of their time doing simple operations suchas:

A = BA = B +/- 1A = B +/- C

No other expressions occur often enough to matter (Knuth 1971).Thus, the language should support these common operations well.Control structure implementations -- in particular, a procedure call-- should be kept efficient. Most languages do all right in thisrespect: the main thing is to ensure that they keep simple thingssimple.

3.2 Specific Language Notes

This section briefly examines a number of popular and/orinteresting language choices. It is important to keep in mind that atsome level, all languages are equivalent: anything that you can do inone, you can do in any other, given sufficient CPU time, memory, andprogrammer elbow grease. However, each language is intended to makesolving one type of problem easy, and in most cases that type ofproblem is not text editing.

3.2.1 TECO

TECO (Text Editor and COrrector) was developed at the MassachusettsInstitute of Technology. It was one of the first text editors everwritten. It grew over the years, gaining both popularity andfeatures. During one of its more stable periods, Digital EquipmentCorporation took a "snapshot" of its commands and produced(subset) versions for all of DEC's computers.

But TECO kept growing. Along the way, it turned into aTuring-complete programming language. Several sets of editor macroswere developed and used. Sometime around 1975, Richard Stallmanorganized these Editor MACroS into the first Emacs-type texteditor.

TECO is clearly a language capable of supporting text editing.However, unless you have a DECSYSTEM 20 computer to run it on, you'reout of luck: M.I.T.'s version of TECO is written in assembly languageand only runs on such systems.

The TECO command set is described in Appendix D. There are tworeasons why it is not a good choice as an implementation language:

As has been mentioned, its only implementation is on thePDP-10/DEC 20 series of computers. Implementations on other machinesinvolve answering the question of what you write the TECO in -- thevery question that this chapter discusses.
It is the only language less readable than APL. A listing of aTECO program has a more than passing resemblance to transmission linenoise. Writing and maintaining TECO programs is a definite problem.

3.2.2 Lisp

Lisp -- especially Common Lisp -- is an excellent choice. It isreadily extensible, as even compiled Lisp code usually has provisionsfor evaluating new expressions. It thus provides an interpretivelanguage that can be readily used to write even complex editingmacros. Modern implementations usually have excellent string support.The language has features such as macros and packages that supportlarge projects well, and Lisp programs are fairly readable (if youdon't mind lots of parentheses (like these (and these))). CompiledLisp code is usually as efficient as that of any other language.

Its view of memory management makes it well suited to the linkedline form of buffer management (described in Chapter 7).

3.2.3 C

C was designed by people who wrote operating systems and utilities.Since text editors are among those utilities, it is not surprisingthat C would be a good choice.

C supports extensibility as well as any other compiled language,and better than most. For example, it provides the ability to callprocedures through a pointer.

C lacks a built-in string type, but this lack is not a hindrance,as you would probably need to re-implement strings anyway. There is astrong tradition in C of creating new data types, so the requirementis well supported.

C supports many of the features needed for large projects. Inaddition, as the language was designed by its users, and only cameinto widespread use after it was stable, there is a large existingbase of compatible implementations. Due to this heritage, you don'tneed "improvements" in the language in order to get usefulwork done.

C's basic data types are focused around characters, integers, andpointers. These are exactly the core data types needed by texteditors. C allows the ready manipulation of complicated datastructures and yet remains generally readable.

C++ is a variant of C that provides much improved support forobject-oriented programming. It, too, is a good choice.

3.2.4 PL/1

PL/1 is another example of a "systems language." Thus,most of the comments regarding C also apply to PL/1. However, itsmain failing is a lack of multiple implementations: the only vendorthat seriously supports it is IBM Corp.

3.2.5 Other Systems Languages

There are a number of other systems languages (e.g.,Modula). However, like PL/1, they have only a limited availability.Many were designed as research projects. None of them even distantlyapproach C in the number of implementations or trainedprogrammers.

3.2.6 Fortran

Well, some people think that it's a great language for writingastronomy programs. I have even written a text editor in it. Not bychoice.

3.2.7 Pascal

Many people consider this language to be a good alternative (read"better") to C. It is worth reviewing Pascal's history: itwas originally intended as a language to present (relatively small)algorithms in an academic setting. It was also targeted tointroductory programming courses. For those purposes it is anexcellent choice.

However, the standard language is not targeted towards developinglarge projects and does not provide the features that make developinga large project practical. On the other hand, each Pascal vendor hassupplied those features. Unfortunately, they have in general chosendifferent ways to provide the features. Thus leading to incompatibleimplementations that make porting code difficult.

3.2.8 Basic

Basic has Pascal's problems, only more so: the core version is noteven standardized (by the industry: there is an ANSI standard which ishonored in the breach). Implementations range from "TinyBasic," which can be run in only a few Kilobytes of memory to"True Basic," as defined by Kemeny & Kurtz (Kemeny1985), which offers all the advanced features that you could want andall but omits line numbers. But "True Basic" bears littleresemblance to what most programmers think of as the Basiclanguage.

3.2.9 Ada

Ada was designed as a language to support embedded, real-timesystems. It has many features which allow compilers to validate codeand use external information to produce small, reliable objectmodules. However, these features do not mesh well with the need forextensibility (for example, there is rarely a need to reprogram analtimeter while in flight). Further, the general computingenvironment that is the home for most text editors is simply outsidethe scope of what Ada is intended for. However, it should beseriously examined as a choice if the text editor is to execute in anembedded, real-time system.

3.2.10 Sine

Sine (Anderson 1979) was a Lisp-like language tailored for textapplications. Its only implementation to date is on Interdata 7/32(or Perkin-Elmer 3200) minicomputers running the MagicSix operatingsystem developed at M.I.T.'s Architecture Machine Group. It isinteresting because it is a language tailored for implementingeditors. It is an example of an "ideal" implementationlanguage.

Sine is composed of two parts. Sine source code is assembled intoa compact format. This object code is then interpreted. It allowsfunction rebinding and other such niceties. In addition, theinterpreter implements such things as memory management and screenredisplay. Thus, the resulting editor is nicely structured, with"irrelevant" details hidden away. This mention of Sine leadsnicely into...

3.2.11 Custom Editor Languages

No traditional language (except perhaps for Common Lisp) offerscomplete support for text editing. The solution, used by virtuallyevery implementation of Emacs-type text editors, as well as manyimplementations of other editors, is the creation of a custom editorlanguage.

An existing language -- very often C -- is selected. This languageis used to write an interpreter for the custom editor language. Theinterpreter manages memory, handles display refresh, and in generalprovides all of the necessary utility functions. The editor languageis then used to write the logic of all the user-visible commands.

As the editor language is implemented using an interpreter, thecommand set is readily extensible. Also, because the editor languageis designed around text editing, it can offer excellent text-handlingpower.

The division of the programming tasks into two components providesan excellent base for supporting large projects. And, since theinterpreter is usually implemented in a language such as C, theinterpreter can be quite efficient.

For these reasons, custom editor languages are the preferred methodfor implementing text editors.

Questions to Probe Your Understanding

What is a good way of implementing a command dispatch table in C?Fortran? Pascal? Ada? (Easy)

Why is a string-oriented language such as SNOBOL not a good choice?(Easy)

How much compilation is appropriate for the custom editor language(none, just interpret the text; tokenization; full)? (Medium)

Following on the previous question, how would an opcode-orientedinterpreter compare to a threaded-code interpreter? (Medium)

Back to Contents.

Four: Editing Models

So rested he by the Tumtum tree,
And stood a while in thought.

An editing model is the view of the file that the editor presentsto the user. This chapter describes several editing models. You canbuild other models by varying and combining these models.

The following discussions review the models themselves, not thecommands available to the user. You should assume that essentiallythe same commands are available in all models.

4.1 One-Dimensional Array of Bytes

The most general form of a data file is a one-dimensional array ofbytes. The one-dimensional editing model presents this form of a datafile directly to the user. In it, the bytes of the file are displayeduninterpreted for the user to see. The basic editing operations are"insert" and "delete bytes."

This model is very pure, but it is a little difficult for mostusers to deal with. Text editors that appear to use this modelactually use a slightly modified form of the model where somecharacters -- in particular, the tab and newline characters -- areinterpreted during the display process. Thus, text files appear as aseries of lines.

In this model, line breaks may or may not require special handling.Whether they do depends on how they are represented. Variousrepresentations are described in detail in the next chapter.

This model supports both insertion and replacement editing equallywell. Replacement editing is probably best implemented as a hybridscheme where it automatically switches to insert mode to preventreplacing a line break.

4.2 Two-Dimensional Array of Bytes

This model is the basic two-dimensional form. Instead of editing aline, the user is editing in a quarter-plane, with the origin usuallyin the upper-left corner. Conceptually, the user can move freely inthe two-dimensional quadrant. In practice, the editor usually onlystores the non-blank portions, as storing an infinite-quadrant's worthof data can be prohibitively expensive. Some systems may impose fixedupper bounds on the width or length of the quadrant.

Line breaks are implicit in the editing model itself. Hence,implementations usually provide explicit commands to split and joinlines.

Both insertion and replacement editing are possible, although themodel lends itself to replacement editing in a natural manner.Editors that use this model often have explicit commands to insert(and delete) both rows and characters within a row.

While the pure form of this model arranges the text into arectangle, most implementations actually impart a left-to-right, thentop-to-bottom (or one of the other seven combinations) bias. Thisbias affects all of the editing operations. For example, it is oftenthe case that implementations offer many commands for editing within aline, but only a few commands for editing entire lines.

4.3 List of Lines

This model is halfway between the first two. It consists of aone-dimensional array of lines. Each line is then a one-dimensionalarray of bytes. From the user's viewpoint, this model differs fromthe two-dimensional model in that text exists only where it has beenentered. If the user wants to extend a line to the right, he or shemust go into insert mode and type space characters.

In the two-dimensional model, on the other hand, the quadrant isassumed to be filled with blanks. Hence, there is no concept ofextending the line to the right, as the line is assumed to extendinfinitely far. To add text to the right, the user simply moves tothe desired position.

Implementations that use this model usually make a very sharpdistinction between editing within a line and editing lines. Forexample, lines may have a maximum length or cut and paste operationsmay only operate on line boundaries.

4.4 Paged Models

It once was popular to divide the text into a series of pages.Editing was performed within a page, and explicit commands wererequired to move to another page or to re-paginate the text. Any ofthe models could be used for editing within a page. This division wasthought to be natural: not coincidentally, it just"happened" to make it easier to write editors on systemsthat had very limited amounts of memory.

Most modern editors show page breaks as a "framework"that "floats" over the underlying text. This framework canbe placed over any of the other underlying editing models.

4.5 Objects

Editing is a very general concept: there is no reason to limit thebasic objects being edited to characters (or bytes) and lines. It maymake sense in some cases to provide ways of editing such objects aswords, sentences, paragraphs, sections, chapters, and other"natural" objects as explicit objects. Most editors providecommands to manipulate these objects without having them affect thefundamental editing model.

Other objects cannot be readily simulated. Examples of theseobjects are links to other documents, "opaque" objectsincluded from other objects (e.g., bitmaps), and graphicalobjects (lines, boxes, circles, etc.).

In addition, text can be viewed in more than two dimensions. Forexample, multiple files can be "stacked" into a thirddimension, multiple versions of a single file can be combined into atime-like dimension, or portions of a file can be viewed andmanipulated as a tree or list structure. The possibilities areendless.

4.6 Dealing with Real Text

The models just listed are more or less pure forms. Each model hasits advantages and disadvantages because text has a more complexstructure than is represented by any of the models.

On the one hand, text is composed of a hierarchy of lexicalunits:

characters
words
phrases
sentences
paragraphs
subsections
sections
chapters
documents

These units reflect themeaning of the text. When the useris thinking in terms of meaning, the editor should provide an editingmodel -- and commands -- that reflect these units. Since text is readsequentially, the one-dimensional model is well-matched to thismode.

On the other hand, the printed page is composed of:

characters, which are arranged into
words, which are arranged into
lines, which are arranged into
pages, which are arranged into
documents

These units reflect thelayout of the text. When the useris thinking in terms of appearance, the editor should provide anediting model -- and commands -- that reflect these units. As a pageis a two-dimensional object, the two-dimensional model fits this modewell.

Many "simple" editors and word processors support thismode of thinking. This mode is attractive for new users. After all,isn't the whole purpose of a word processor to put characters on apage? So doesn't it follow that users should be thinking in terms ofplacing each character on the page, one after the other? If taken toextreme, the user is forced to make every placement decision, asituation that doesn't leave the user with much time or energy left todecide what to write.

While layout is important, it does not directly relate to themeaning of the text. And while meaning is important, the user seesthe text in a particular layout, so layout-oriented editing is alsoimportant. The challenge, then, is to design an editing model -- andan editor -- that allows the user to select the most appropriatefeatures of each model with minimal effort. Thus, it can takeadvantage of the best of both models while avoiding thedisadvantages.

Questions to Probe Your Understanding

Explore the ramifications of a two-dimensional editing model wherethe origin is in the center of the document instead of the upper-leftcorner. What additional commands might be required? What operations(if any) does such a model make easier? Harder? (Easy)

Provide an algorithm for transforming between the one-dimensionaland two-dimensional models. (Medium)

What is a good way to support proportionally spaced text in thepure two-dimensional array of bytes model? (Hard)

What problems are encountered when trying to support more than onemodel at the same time? (Easy) What is a good solution to theseproblems? (Hard)

Back to Contents.

Five: File Formats

And, as in uffish thought he stood,
The Jabberwock, with eyes of flame,

This chapter surveys the range of file formats that a text editormight encounter.

5.1 Text Files

Each operating system has a standard way of storing textfiles. Text editors must be able to edit these standard system textfiles. From the user's point of view, such files consist of a seriesof reasonable-length lines of "reasonable" characters.

5.1.1 Line Boundaries

From the program's point of view, system text files consist of asequence of characters, divided into lines in a variety of ways. Eachof the most popular methods will be described.

Card and Print Images: These files are a series of lines,all exactly the same length (typically 80, 132, or 133 characterslong). They may also include another form of line divisor(e.g., 80 characters, then a CR/LF sequence). These files willmostly be found on older systems.

Newline Character: Marker bytes are used to signal the endof one line and the start of another. Popular choices are:

Line Feed: Used by UNIX systems.
Carriage Return: Used by Apple computers and some DEC computers.
Carriage Return/Line Feed combination: Used by CP/M and MS/DOScomputers. The two-character sequence can be awkward to use whenediting. You can usually get away with dropping the CR (but only whenit appears as part of a CR/LF sequence), use the LF as a newlinecharacter, and put the CR back when the file is written out. Whenediting an existing file, record whether you found CRs to remove, sothat you don't put extra CRs in when writing out binary files.

Character Count: Some systems use an initial count ofcharacters (typically the count is one or two bytes long), followed bythat many characters. There may or may not be padding between linesin order to align their start on a word boundary.

Record Markers: Some operating systems store one line perrecord, and store the record markers "out of band." In thiscase, you must read and write one line at a time, and record the linebreak information somehow. (If the operating system lets you readmultiple lines at once, it must have some method of indicating whatthe line boundaries are, which leads us to one of the earliermethods.)

5.1.2 Line Contents

Some systems place restrictions on the contents of each line. Themost frequently encountered restrictions are:

Long Lines: Some systems have no limit on the length of aline. Others place a fixed limit. Typical limits are 80, 127 or 128,255/6, 511/2, 32,767/8, and 65,535 characters. If a program attemptsto write lines that exceed the system limit, some systems return anerror, others split the line, and still others will silently truncatethe line.

Short Lines: Most systems support zero-length (empty) linesquite well. However, some systems do not allow such lines while othersallow them in theory but not in practice. For example, thesystem-supplied text editor may not allow the entry of empty lines.Because of this limitation, no files will be created that have suchlines. Hence, the code to handle such lines may not be tested well,and some programs may not behave properly when such lines areencountered.

Partial Last Line: This problem can only occur in systemsthat use a newline character. As with short lines, thesystem-supplied text editor may not allow the entry of partial lastlines (i.e., a missing newline character) and some programs maynot behave properly when such lines are encountered.

Non-Printing Characters: All systems generally allow allprinting characters and the Space character to appear in text files.These character have codes that range from 32 to 126 decimal in ASCIIor the equivalent characters in EBCDIC. Difficulties arise in howprograms handle other characters. For example, are Tab characterstreated as one character or the appropriate number of spaces, and ifthe latter, what is the appropriate width? Limitations onnon-printing characters usually fall into the following groups:

Tab
Form Feed
Back Space
Carriage Return (this is a "bare CR")
other control characters (in the range 0 to 31 decimal and 127decimal)
meta characters (128 to 255 decimal)

Typically, systems that allow a given group allow all precedinggroups. Given that the characters are allowed, the next question ishow should the character be displayed. Typical methods are:

Just send the character as is without translation.
Expand into caret notation (see Appendix E).
Expand into octal or hexadecimal notation.

5.1.3 End of File

Most systems record the exact file length and make this informationavailable to the program. However, there are two special cases to beconsidered:

CP/M systems only record the file length to the nextmultiple of 128 bytes. By convention, a control-Z (^Z) character isused to mark the end of the file. Data after the first Z character isignored. Note that if the file ends exactly on a 128-byte boundary,some programs do not add the trailing ^Z character. Some programsfilled the entire remainder of the block with the ^Z character: otherprograms relied on this convention and only removed trailing ^Zcharacters.

MS/DOS systems started off following the CP/M convention butlater changed to omit the ^Z character. The safest algorithm to useon these systems is:

If you are editing an existing file, record whether the fileoriginally ended with ^Z. When the new version is written, add a ^Z ifthe file had one.
Otherwise, do not add a ^Z.

As always, the user should have a way of selecting bothmethods.

5.2 Binary Files

From a text editor's point of view, abinary file is anyfile that is not a text file. These files have none of the followingrestrictions found in text files:

Files may not be divided into lines at all.
Lines may be any length.
Lines may contain any character.

As a general rule, it is a nice feature to be able to edit a binaryfile. The rules to be followed are these:

You should be able to read a file in and write the identical fileback out.
It should be possible to move to and usefully view any portion ofthe file.
It should be possible to insert any character.
It should be possible to precisely control any deletions(e.g., "delete the following three characters").

5.3 Structured Files

If your editor will only encounter standard system-text files andbinary files, you can skip the rest of this chapter which describesconsiderations for designing file formats for holding information inaddition to pure ASCII text.

Basic text files use 94 printing characters and a Spacecharacter. They also need some way to indicate line breaks. Often,users will want to include the Bell, Back Space, Tab, and Form Feedcharacters in their text files. Thus, a total of 99 characters arereserved for representing themselves. This leaves 29 codes (with 7bit characters) or 157 codes (with 8 bit characters) available forother uses.

If all computer manufacturers used only the ASCII character set,the analysis could stop here. However, IBM Corp., Apple ComputerCorp., Hewlett-Packard, and other vendors all support"extended" character sets that make use of many of theseother codes. (Not to worry, though, the world is still safe: all ofthe manufacturers supportdifferent extended character sets.)What were previously unused codes are now in use to the extent thatyour users wish to be able to make use of the extended characters.

(Actually, as this book is being written, many vendors are jointlydeveloping a 16-bit character set that is intended to encompass mostcharacters and glyphs in use, although not doing a complete job onChinese, Japanese, and Korean.)

5.4 Where to Store the "Extra" Information

Whether or not "extended" character sets are supported,it is likely that you will want to store more information than can fitinto the unused character codes. This leads us to the basic choicethat will affect many other aspects of the implementation: is theextra information stored in-band or out-of-band?

5.4.1 In-Band

Storing informationin-band means that some of the charactercodes are used to signal the presence of this additionalinformation. Once the presence of this information is indicated, allcharacter codes can potentially be used to represent theinformation.

The use of these character codes for non-character purposes has tworamifications. First, those codes are not available for representingcharacters. Second, if those characters are present, redisplay mustknow how to display them (and their associated information) and theuser commands must know how to process them.

Depending upon the purpose of the extra information and your users'expectations, it may be appropriate to allow this in-band informationto be visible to the user, at least in some display modes. Further,it may be appropriate to allow the user to edit the informationdirectly. On the other hand, the best choice might be to hide thisinformation from the user and allow only indirect manipulation.

Note that programs should be able to parse in-band information ineither direction (i.e., both when working forwards through thebuffer and when working backwards). It is also important in thisrepresentation that it be reasonably easy to determine how to displaya file when starting from an arbitrary point in the middle of thefile. In particular, the program shouldn't have to examine the entireprevious contents of the file in order to figure out how to displaysomething.

5.4.2 Out-of-Band

Storing informationout-of-band means that none of thecharacter codes are used for any special purpose. Rather, theadditional information is stored somewhere else and is tied back tothe text by means of pointers and offsets.

The disadvantage of choosing the out-of-band method is that youmust find some place to put the information. While a file is beingedited, the information can (and probably should) be stored in specialpurpose structures within the editor. However, when the file isstored, the additional information must be put somewhere. This placecan be a separate file or a separate part of the same file (either adifferent file "fork," or at the file's beginning orend).

5.4.3 Conclusion

There may be enough additional information that manipulating it canitself require significant overhead. The techniques described in thenext chapter can apply to the additional information as to the textitself.

There is no preferred choice: both in-band and out-of-band havetheir good points and their bad. The choice must be made on acase-by-case basis.

Actually, they are almost two ends of a continuous scale. Thedifference between them could also be considered like this:

in-band data is parsed at each use
out-of-band data is parsed at file load

5.5 The Additional Information

This section describes some of the categories of additionalinformation that you may wish to store in files. These categories areillustrative examples only: you will probably want to store othertypes of information or the same types in different ways.

5.5.1 Fonts, Sizes, Attributes

Afont describes the shapes of the characters.Sizeinformation describes how large a character should be.Attributes are variations on a font such as boldface, italics,or underscoring. Together, these are used in word processors toprovide character formatting.

These three share common qualities such as the ability to change ata character boundary, and the ability to change one without changingthe others. The representation that you select needs to take thesequalities into account.

5.5.2 Line, Paragraph, Page, and Other Formats

This information determines such things as line margins,justification types, tab stops, page headings and footings, pagelength, and so forth. This information has a major effect on theredisplay code described in Chapter 7.

5.5.3 Non-Text Objects

These can be arbitrary non-text objects such as graphical bitmapsor object, spread sheets, database excerpts, or other information usedby non-editor applications. The editor needs to know such thingsas:

how to display them
how much space they occupy
how to invoke the application that defines them
how to obtain current or updated versions of them

5.6 Internationalization

This section lists some of the U.S. and English language biasesthat might be encountered in text files. Techniques for removingthese biases from the program are outside the scope of this book. Bytheir nature, these biases are hard to sort out: my apologies if Ihave missed some.

Except for this section, this book contains U.S. and Englishlanguage biases. However, the programming and design techniquesdescribed in the rest of this book are applied in pretty much the sameway in non-U.S. and non-English language editors.

The first bias is the character set used to represent information.There are many different international character sets and, while theytend to incorporate the U.S. ASCII character set (presented inAppendix E) in them, they all differ in the other characters.

The second bias is the character size (i.e., the number ofbits required to represent the number of distinct characters that canbe stored in the document). If you are limiting your users to ASCII,7-bit characters are sufficient. However, international charactersets may require 8, 16, or even 32 bits per character. In the case ofthe larger character sizes, it may make sense to store most charactersas 8-bit codes and to have multiple-byte characters for the others.So long as your implementation handles them consistently and caninterchange data with the other programs on your system, the exactrepresentation does not matter.

The third bias is the language direction. English usesleft-to-right, then top-to-bottom. Other languages use differentpatterns. You must also properly handle cases where you mix languages(say, English and Arabic).

The fourth bias is the general conventions for handling such thingsas character case (some languages do not have English'supper/lowercase distinction), characters changing representationdepending upon their position within a word (contextual forms), and soforth.

The fifth bias is in handling numbers. For example, in the U.S.,numbers are written as "1,000.5". In Europe, they arewritten as "1.000,5". In addition, languages differ in theorder that digits are entered (left to right vs. right to left) andthe placement of the most significant digit.

The sixth bias is in handling dates: day - month - year, month -day - year, and year - month - day are all popular, as are differingpunctuation characters between them.

The seventh bias is in handling calendars. Gregorian and Julianare both in use and quite similar, but there are lunar and othercalendars also in use.

The eighth bias is how punctuation characters are handled. Forexample, in Spanish, questions are introduced with an inverted"?" character and terminated with "?".

The last bias is how hyphenation is handled. In English, it isoften difficult or impossible to determine how a word should behyphenated. In Portuguese, for example, it is very easy to determinehow to hyphenate a word and is considered mandatory to handlehyphenation properly.

Questions to Probe Your Understanding

How visible should the representation of line boundaries instandard system-text files be to the user? (Easy)

Why is the ability to edit binary files useful? (Easy)

Is it reasonable to require that font, size, and attributedefinitions always be properly nested? (Medium: note that the programcan automatically make non-nested change requests into nestedones)

Define a representation for fonts, sizes, and attributes.(Medium)

Define agood representation for fonts, sizes, andattributes. (Hard)

Identify a bias that I missed. (Easy for non-U.S. readers, probablyHard for U.S. readers)

Back to Contents.

Six: The Internal Sub-Editor

Came whiffling through the tulgey wood,
And burbled as it came!

There are many ways to decompose the implementation of a texteditor into smaller pieces. This book analyzes one particulardecomposition: that into a sub-editor to manage the text being edited,redisplay, and the user-oriented commands. (There are no otherpieces: when these have been assembled, the editor is complete.) Thisparticular decomposition was chosen for two reasons. First, it is anatural one with relatively simple interfaces between the parts.Second, it has been chosen for many different implementations: it isthus known to be a decomposition that works well. This chapter coversthe internal sub-editor. The following chapters describe the otherparts.

The purpose of the internal sub-editor is to hide all of thedetails of how the text is stored from the redisplay and theuser-oriented commands. This chapter will begin by presenting somebasic concepts and definitions. It will then list internal data needsand describe a procedural interface. Finally, it will present anumber of ways to implement the actual sub-editor and discuss thetrade-offs between them.

6.1 Basic Concepts and Definitions

Abuffer is the basic unit of text being edited. It can beany size, from zero characters to the largest item that can bemanipulated on the computer system. This limit on size is usually setby such factors as address space, amount of real and/or virtualmemory, and mass storage capacity. A buffer can exist by itself, orit can be associated with at most one file. When associated with afile, the buffer is a copy of the contents of the file at a specifictime. A file, on the other hand, can be associated with any number ofbuffers, each one being a copy of that file's contents at the same orat different times.

Awrite operation replaces the contents of a file with thecontents of a buffer. Thus, the two have identical contents for atleast one moment. Aread operation replaces the buffer withthe contents of a file. Again, the two have identical contents for atleast one moment. Aninsert operation adds the contents of thefile to the buffer: the two will not have identical contents unlessthe buffer was empty just before the insertion.

The buffer interface presented here follows the"one-dimensional array of characters" model as described inChapter 5. As will be seen, however, the implementation need notfollow that model. When stored as text in the buffer, line breakswill be represented by a single character, callednewline.

At any one time there is one and only one special position withinthe buffer. This position is called thepoint. All operationsthat change the contents of the buffer occur at this position. Thereare also operations whose sole purpose is to move the point. Thepoint can exist onlybetween two characters: it is never on acharacter. On those displays that can only position a cursor on acharacter, the convention is to place the point at the left edge ofthe cursor.

The start of the buffer (which corresponds to the first location inthe file) is consideredbefore orbackward from thepoint. The end of the buffer is considered to beafter orforward from the point.

From time to time, it is useful to be able to remember positionswithin a buffer. Amark is an object that can remember aposition. There can be any number of marks within a buffer, and morethan one mark can remember the same position. As with the point, amark is always located between two characters.

When there is exactly one mark, the range of characters between thepoint and the mark is called theregion. It does not matterwhich order the point and mark are in.

Here are two examples of how marks are used:

A mark may remember a specific location for future reference. Forexample, a command might paginate a file. In this case, a mark wouldremember where the point was when the command was invoked. Thus, thepoint could be moved during the re-pagination and returned to itsinitial starting place.
A mark can serve as bounds for iteration. For example, the"fill paragraph" command might place a mark at its startingplace, move to the end of the paragraph and place a mark there, thenmove to the beginning of the paragraph. It then performs a "fillregion" operation, filling from the point to the location of thesecond mark.

There are two types of marks. They differ only in how they behavein the case that an insertion is made at the location of the mark.Normal marks move with the insertion. Thus, the newly insertedcharacter will be just before the mark.Fixed marks remain inplace: the newly inserted character will be just after the mark. Anexample of the difference is in the case where a command is toidentify the characters that are inserted. The command merely needsto create both a fixed and a normal mark at the same place. After theinsertion, the two marks will bracket the new characters.

Amode is a set of alterations to the user-oriented commandset. For example, "C mode" might alter the definitions ofthe word-, sentence-, and paragraph-oriented commands to apply totokens, language statements, and block structure. Modes are describedin more detail in Chapter 8.

Finally, the termcharacter denotes the basic unit of changewithin a buffer. While characters can be any size, they are mostoften eight bits long. In such a case, the termbyte may beused interchangeably withcharacter.

6.2 Internal Data Structures

This section discusses the sub-editor's data structures. All ofthe sub-editor's state information is defined in this chapter. Thus,if your implementation retains this information across invocations,you can offer the user the ability to resume editing where he or sheleft off, thus reducing the amount of work required to edit afile.

The other place where state information is kept is in the screenmanager which is described in the following chapter. The screenmanager is that part of the software which knows what was displayed onthe user's screen. If that knowledge is not retained acrossinvocations, the information displayed on the user's screen may changewhen the user exits and re-enters the editor. If that knowledge isretained, the information can be recreated exactly.

All of the specifics of the data structures listed here areexamples only. You will undoubtedly wish to change some or all ofthem.

Theworld contains all of the buffers in use by the editor.It is a circular list of buffer descriptors and a variable thatindicates which is the current buffer. In C syntax:

struct {struct buffer *buffer_chain;struct buffer *current_buffer;} world;

Each buffer descriptor has this internal information:

struct buffer {struct buffer *next_chain_entry;char buffer_name[BUFFERNAMEMAX];location point;int cur_line;int num_chars;int num_lines;struct mark *mark_list;struct storage *contents;char file_name[FILENAMEMAX];time file_time;FLAG is_modified;struct mode *mode_list;};

Next_chain_entry is the mechanism used for implementing thecircular list of buffers. The list is circular because there is nopreferred (or "origin") buffer and it should be possible toget to any buffer with equal ease.

Buffer_name is a character string that allows the user to beable to refer to the buffer.

Point is the current location where editing operations aretaking place. It is defined in terms of a private data type, sincedifferent implementations will use different representations. As itturns out, there is never a need for any code outside of thesub-editor to ever be aware of the representation of this datatype.

Cur_line is optional. If implemented, it provides ahigh-speed way to track the current line number.

Num_chars is optional. If implemented, it provides ahigh-speed way to track the total number of characters in the buffer(its length).

Num_lines is optional. If implemented, it provides ahigh-speed way to track the total number of lines in the buffer.

Mark_list is the list of marks defined for this buffer. Themark structure is defined later.

Contents indicates the actual buffer contents. As with thelocation data type, its specifics will vary with theimplementation.

File_name is the name of the file associated with thebuffer, or the empty string if there is no associated file.

File_time is the last time at which the contents of the fileand buffer were identical (i.e., the time of the last read orwrite). On multi-process systems, this value can be used to determinewhether the contents of the file have been changed by another process,and thus whether the copy being edited is in synchronization with theactual file.

Is_modified indicates whether the buffer has been modifiedsince it was last written out or read in.

Mode_list is the list of modes in effect for the buffer. Themode structure is defined next.

struct mark {struct mark *next_mark;mark_name name;location where_it_is;FLAG is_fixed;};

This structure is a linked list and is repeated for every mark. Thechain is not circular. It probably is a good idea to keep the listsorted in the order that the marks appear in the buffer.

Next_mark is a pointer to the next mark in the chain. ANULL pointer indicates the end of the chain.

Name is the name of the mark. This name is returned by themark creation routine and provides a way for the user to refer tospecific marks. If your implementation permits, you can just return apointer to the mark structure instead of making up names.

Where_it_is is the mark's location.

Is_fixed indicates whether the mark is a fixed mark.

struct mode {struct mode *next_mode;char *mode_name;status (*add_proc)();};

This structure is a linked list and is repeated for every mode thatis in effect for the current buffer. The chain is not circular. Whilemodes should be defined in such a way that it does not matter whatorder they are invoked in, it is probably not possible to meet thisrequirement in actual practice. Thus, this list must be kept sortedin invocation order. Modes are discussed in more detail in Chapter8.

Next_mode is a pointer to the next mode in the chain. ANULL pointer indicates the end of the chain.

Mode_name, if non-NULL, is the name added to the list ofnames of modes in effect. This list is ordinarily displayed somewhereon the screen. Note that there should be a mechanism for definingmodes that do not have displayed names.

Add_proc is a pointer to a procedure to execute whenever thecommand set for this buffer needs to be created or re-created. Theprocedure should make all required modifications to the global commandtables and return a success/fail status.

6.3 Procedure Interface Definitions

This section defines the interface provided by the sub-editor. Theprocedures will be described in terms of their logical function only,leaving out specific implementation details. An example of such adetail is a method of determining whether the operation succeeded.(The undefined type "status" will be used to indicate placeswhere status information is especially desirable.) All data typesmentioned (e.g., string) are intended to be generic, and nospecific implementations are assumed.

One question that is important to your implementation but notaddressed in this definition is whether the caller or callee allocatesthe data structures. This chapter will assume that the calleeallocates all data.

The names are selected for their mnemonic value. Actualimplementations may be forced to change them to conform to locallimits. In addition, you may wish to add a unique prefix (such as"Buf" or "SE") to them all to prevent nameconflicts.

status World_Init(void);status World_Fini(void);status World_Save(char *file_name);status World_Load(char *file_name);

World_Init is the basic set-up-housekeeping call. It iscalled once, upon editor invocation. It should perform all requiredone-time initialization operations. No other sub-editor procedureexcept forWorld_Fini can be legally called unlessWorld_Init returns a successful status. After this call, oneempty buffer exists (perhaps called "scratch" or somethingsimilar).

World_Fini terminates all sub-editor state information.Once called,World_Init must be called again before othersub-editor calls can be legally made.

World_Save saves all editor state information in thespecified file.

World_Load loads all editor state information from thespecified file. These two routines implement state-saving acrosseditor invocations. The possibility of retaining multiple savedenvironments is interesting but, while it has been implemented, is nota feature that receives much use. It is perhaps too difficult forusers to keep track of multiple editing environments or users mayprefer to be able to switch among tasks without having to perform asave and load.

If you are creating a "stripped down" editor, theWorld_Save andWorld_Load routines would not doanything. They can be put in as stubs if there is a reasonablepossibility that the editor will be embellished later.

status Buffer_Create(char *buffer_name);status Buffer_Clear(char *buffer_name);status Buffer_Delete(char *buffer_name);status Buffer_Set_Current(char *buffer_name)char *Buffer_Set_Next(void);status Buffer_Set_Name(char *buffer_name);char *Buffer_Get_Name(void);

These routines all manipulate buffer objects. Their definitionsassume that character string names are used to specify buffer objects.As with marks, pointers to buffer structures can also be used if yourimplementation permits.

As with many other questions, it is an implementation choice as towhether the sub-editor retains a "current buffer" or whethera buffer is explicitly provided to all remaining sub-editor calls.This definition chooses the former; as buffer changes are performedonly (comparatively) rarely, and hence it is probably helpful to thesub-editor to be able to cache the information relating to the currentbuffer.

Note that most of these calls are not useful for single bufferimplementations as might be found in very resource-limitedenvironments (e.g., a toaster). Such implementations shouldonly include the calls if there is a reasonable chance of expanding toa multiple buffer editor in the future.

Buffer_Create takes a name and creates an empty buffer withthat name. Note that no two buffers may have the same name.

Buffer_Clear removes all characters and marks from thespecified buffer.

Buffer_Delete deletes the specified buffer. If thespecified buffer is the current one, the next buffer in the chainbecomes the current one. If no buffers are left, the initial"scratch" buffer is automatically re-created.

Buffer_Set_Current sets the current buffer to the onespecified.

Buffer_Set_Next sets the current buffer to the next one inthe chain, and it returns the name of the new buffer. This mechanismallows for iterating through all buffers looking for one which meetsan arbitrary test.

Buffer_Set_Name changes the name of the current buffer tothat specified.

Buffer_Get_Name returns the name of the current buffer.

status Point_Set(location loc);status Point_Move(int count);location Point_Get(void);int Point_Get_Line(void);location Buffer_Start(void);location Buffer_End(void);

Point_Set sets the point to the specified location.

Point_Move moves the point forward (ifcount ispositive) or backward (if negative) byabs(count)characters.

Point_Get returns the current location.

Point_Get_Line returns the number of the line that the pointis on. Note that while characters are numbered starting from zero,lines are numbered starting from one.

Buffer_Start returns the location of the start of thebuffer.

Buffer_End returns the location of the end of thebuffer.

int Compare_Locations(location loc1, location loc2);int Location_To_Count(location loc);location Count_To_Location(int count);

These are miscellaneous utility routines.

Compare_Location returns 1 if locationloc1 is afterloc2, 0 if they are the same location, or -1 ifloc1 isbeforeloc2.

Location_To_Count accepts a location and returns the numberof characters between the location and the beginning of the buffer.The point's percentage position can be computed by:

((float)Location_To_Count(Point_Get()) * 100.) / ((float)Get_Num_Chars())

Count_To_Location accepts a non-negative count and convertsit to the corresponding location. You can set the point to a positionspecified by an absolute character count by:

Point_Set(Count_To_Location(count));

status Mark_Create(mark_name *name, FLAG is_fixed);void Mark_Delete(mark_name name);status Mark_To_Point(mark_name name);status Point_To_Mark(mark_name name);location Mark_Get(mark_name name);status Mark_Set(mark_name name, location loc);FLAG Is_Point_At_Mark(mark_name name);FLAG Is_Point_Before_Mark(mark_name name);FLAG Is_Point_After_Mark(mark_name name);status Swap_Point_And_Mark(mark_name name);

These routines manage marks. They allow for creating both normaland fixed marks, deleting marks, and otherwise manipulating them.Except when creating them, there is no difference in usage with theseroutines between normal marks and fixed marks (although their behaviorwill differ).

Mark_Create creates a new mark of the specified type andreturns its name. The new mark is positioned at the point.

Mark_Delete deletes the specified mark.

Mark_To_Point sets the location of the specified mark to thepoint.

Point_To_Mark sets the point to the location of thespecified mark.

Mark_Get returns the location for the mark. This is notactually used all that much, as the location value can change wheneverany sub-editor call is made.

Mark_Set moves the specified mark to the specifiedlocation.

Is_Point_At_Mark returns True if the point is at thespecified mark.

Is_Point_Before_Mark returns True if the point is before thespecified mark.

Is_Point_After_Mark returns True if the point is after thespecified mark.

Swap_Point_And_Mark swaps the locations of the point and thespecified mark.

With these definitions, the basic way of doing something over aregion would be like this:

status Do_Something_Over_Region(mark_name name){FLAG was_before = Is_Point_Before_Mark(name);mark_name saved;status stat = OK;/* ensure that the point is before the mark */if (!was_before) Swap_Point_And_Mark(name);/* remember where we started */if (Mark_Create(&saved) != OK) {if (!was_before) Swap_Point_And_Mark(name);return(NOT_OK);}/* loop until you get to the mark */for ( ; !Is_Point_At_Mark(name); Point_Move(1)) {if (<do something> != OK) {stat = NOT_OK;break;}}/* all done, put the point back */Point_To_Mark(saved);Mark_Delete(saved);/* put the point and mark back where they started */if (!was_before) Swap_Point_And_Mark(name);return(stat);}

The way that this procedure records the initial positions of thepoint and mark (a flag and a saved mark) is a little confusing.Unfortunately, it is less confusing than the alternative of creatingtwo saved marks.

char Get_Char(void);void Get_String(char *string, int count);int Get_Num_Chars(void);int Get_Num_Lines(void);

These routines return buffer-related information.

Get_Char returns the character after the point. Its resultsare undefined if the point is at the end of the buffer.

Get_String returns up tocount characters startingfrom the point. It will return fewer thancount characters ifthe end of the buffer is encountered.

Get_Num_Chars returns the number of characters in the buffer(i.e., the length of the buffer).

Get_Num_Lines returns the number of lines in the buffer. Itis undefined whether or not one counts an incomplete last line.

void Get_File_Name(char *file_name, int size);status Set_File_Name(char *file_name);status Buffer_Write(void);status Buffer_Read(void);status Buffer_Insert(char *file_name);FLAG Is_File_Changed(void);void Set_Modified(FLAG is_modified);FLAG Get_Modified(void);

These routines provide file-related operations.

Get_File_Name returns the file name that is currentlyassociated with the current buffer.Size is the size of thebuffer allocated for the returned file name.

Set_File_Name sets the file name for the current buffer.

Buffer_Write writes the buffer to the currently named file,making any required conversions between the internal and externalrepresentations. The modified flag is cleared and the file time isupdated to the current time.

Buffer_Read clears the buffer and reads the currently namedfile into the buffer, making any required conversions between theexternal and internal representations. The modified flag is clearedand the file time is updated to the current time.

Buffer_Insert inserts the contents of the specified fileinto the buffer at the point, making any required conversions betweenthe external and internal representations. The modified flag is setif the file was not empty.

Is_File_Changed returns True if the file has been changedsince it was last read or written.

Set_Modified sets the state of the modified flag to thesupplied value. It is most often used manually to clear themodification flag in the case where the user is sure that any changesshould be discarded. This flag is set by any insertion, deletion, orother change to the buffer.

Get_Modified returns the modification flag.

status Mode_Append(char *mode_name, status (*add_proc)(), FLAG is_front);status Mode_Delete(char *mode_name);status Mode_Invoke(void);

These routines manage the multiple mode capability.

Mode_Append appends a mode with the supplied name and addprocedure to the mode list. Ifis_front is True, the new modeis added to the front of the mode list. Otherwise, it is added at theend.

Mode_Delete removes the named mode from the mode list.

Mode_Invoke invokes the "add" procedures on themode list to create a command set.

void Insert_Char(char c);void Insert_String(char *string);void Replace_Char(char c);void Replace_String(char *string);status Delete(int count);status Delete_Region(mark_name name);status Copy_Region(char *buffer_name, mark_name name);

These routines manipulate the buffer. All of them set themodification flag.

Insert_Char inserts one character at the point. The pointis placed after the inserted character.

Insert_String inserts a string of characters at the point.The point is placed after the string.

Replace_Char replaces one character with another. Thisroutine is logically equivalent to:

Insert_Char(c);Delete(1);

but is potentially more efficient. If the point is at the end ofthe buffer, the routine simply does an insert.

Replace_String replaces a string as ifReplace_Charis called on each of its characters.

Delete removes the specified number of characters from thebuffer. The specified number of characters are removed after thepoint ifcount is positive or before the point ifcountnegative. If the specified count extends beyond the start or end ofthe buffer, the excess is ignored.

Delete_Region removes all characters between the point andthe mark.

Copy_Region copies all characters between the point and themark to the specified buffer, inserting them at the point. The basicEmacs Wipe Region command is actually implemented as:

Copy_Region(kill_buffer, mark);Delete_Region(mark);

This example also shows that even though an implementation presentsonly a single buffer to the user, a multiple buffer implementation mayactually be required.

status Search_Forward(char *string);status Search_Backward(char *string);FLAG Is_A_Match(char *string);status Find_First_In_Forward(char *string);status Find_First_In_Backward(char *string);status Find_First_Not_In_Forward(char *string);status Find_First_Not_In_Backward(char *string);

These routines handle searching and matching strings. While it ispossible to implement these routines in terms of routines that havealready been defined, because of their repetitive nature, it helpsperformance if they are built into the sub-editor. (Actually, thesame can be said of several of the other routines that have beendefined such asInsert_String.)

Search_Forward searches forward for the first occurence ofstring after the point and, if found, leaves the point at theend of the found string. Successive searches will thus locatesuccessive instances of the string. If not found, the point is notmoved. Types of searches are discussed in Chapter 9.

Search_Backward works likeSearch_Forward, exceptthat the search proceeds backward and the point is placed at the startof the found string (i.e., the end closest to the start of thebuffer).

Is_A_Match returns True if the string matches the contentsof the buffer starting at the point. In other words, it returns TrueifSearch_Forward would move the pointstrlen(string)characters forward.

Find_First_In_Forward searches the buffer starting from thepoint for the first occurrence of any character in the supplied string.Thus,Find_First_In_Forward("0123456789") would leavethe point before the first digit found after the point. Unlike theSearch_* routines, this routine leaves the point at the end ofthe buffer if no characters in the string are found. A typical use oftheFind_* routines is this sequence, which skips over thefirst number after the point:

Find_First_In_Forward("0123456789");Find_First_Not_In_Forward("0123456789");

Find_First_In_Backward works in the obvious way.

Find_First_Not_In_Forward searches for the first occurrenceof any characternot in the supplied string. Thus,

Find_First_Not_In_Forward("0123456789") wouldleave the point before the first non-digit found after the point.Unlike theSearch_* routines, this routine leaves the point atthe end of the buffer if no characters in the string are found.

Find_First_Not_In_Backward works in the obvious way.

Here are some examples of using theFind_* routines:

To move to the start of the next line:

Find_First_In_Forward(NEWLINE);

To move to the start of the next word:

Find_First_In_Forward(word_chars);Find_First_Not_In_Forward(word_chars);

int Get_Column(void);void Set_Column(int column, FLAG round);

Get_Column returns the zero-origin column that the point isin, after taking into account tab stops, variable-width characters,and other special cases, butnot taking into account the screenwidth. (After all, the width of the display that the user happens tobe using should not affect the actions of an editing command.)

Set_Column moves the point to the desired column, stoppingat the end of a line if the line is not long enough. If the specifiedcolumn cannot be reached exactly (due to tab stops or other specialcases), it uses theround flag. If the flag is set, the pointis "rounded" to the nearest available column position. Ifthe flag is clear, the point is moved to the next highest availablecolumn position.

6.4 Characteristics of Implementation Methods

This section describes how implementation methods may becharacterized, and then describes three of those methods in detail.All of those methods are assumed to store the buffers in theequivalent of main memory. Depending upon the physicalcharacteristics of the computer, "main memory" can be actualmemory, in virtual memory, or readily mappable into virtual memory. Alater section describes methods for dealing with files that do not fitinto main memory.

The implementation methods discussed here use two-level"divide and conquer" strategies. The first level dividesthe buffer into pieces. The size ranges for each piece are:

one character
a small number of characters (e.g., 16 to 64)
a line
a large number of characters (e.g., 512 to 4,096)
the entire buffer

The pieces can be kept in an array, a linked list, or some otherstructure. The second level describes how the pieces are managed:

no management
extra space at the end
buffer gap

Each of these second-level techniques will be described indetail.

6.4.1 No Management

In this technique, the piece is allocated exactly enough memory tohold it. The length of the piece is the only "overhead"information. A deletion is done by allocating a new piece of thedesired (smaller) size, then copying the non-deleted portions from theold piece to the new one. An insertion is done by allocating a newpiece of the desired (larger) size, then copying the old piece to thenew one, inserting the new characters on the way. In code:

struct piece {int length;char data[1];/* length characters */}/* delete LEN characters starting from START */struct piece *Delete_From_Piece(struct piece *pptr, int start, int len){struct piece *newptr;int newlen = pptr->length - len;/* allocate new piece */newptr = (struct piece *)malloc(sizeof(struct piece) +newlen - 1);if (newptr == NULL) return(NULL);/* copy non-deleted parts */memmove(&newptr->data[0], &pptr->data[0], start);memmove(&newptr->data[start],&pptr->data[start + len],pptr->length - (start + len));newptr->length = newlen;free(pptr);return(newptr);}/* insert LEN characters starting at START */struct piece *Insert_Into_Piece(struct piece *pptr, int start,int len, char *chrs){struct piece *newptr;int newlen = pptr->length + len;/* allocate new piece */newptr = (struct piece *)malloc(sizeof(struct piece) +newlen - 1);if (newptr == NULL) return(NULL);/* copy existing parts */memmove(&newptr->data[0], &pptr->data[0], start);memmove(&newptr->data[start + len], &pptr->data[start],pptr->length - start);newptr->length = newlen;/* copy new part */memmove(&newptr->data[start], chrs, len);free(pptr);return(newptr);}

6.4.2 Extra Space at the End

In this technique, the piece is allocated enough memory to containit, and possibly additional memory as well. The length of the pieceand the amount of the piece currently in use are kept as overheadinformation. A deletion never requires a re-allocation. An insertionwill require a re-allocation only when the free space is used up. Asthe bulk of insertions are one character at a time (i.e., asthe user types), insertions will only require a re-allocation atrelatively infrequent intervals. In code:

struct piece {int length;int used;char data[1];/* length characters */}/* delete LEN characters starting from START */struct piece *Delete_From_Piece(struct piece *pptr, int start, int len){memmove(&pptr->data[start], &pptr->data[start + len],pptr->used - (start + len));pptr->used -= len;return(pptr);}/* insert LEN characters starting at START */struct piece *Insert_Into_Piece(struct piece *pptr, int start,int len, char *chrs){struct piece *newptr;int newlen;int amt = min(pptr->length - pptr->used, len);/* do as much as will fit */memmove(&pptr->data[start + amt], &pptr->data[start],pptr->used - (start + amt));memmove(&pptr->data[start], chrs, amt);pptr->used += amt;len -= amt;if (len <= 0) return(pptr);/* done */start += amtchrs += amt;newlen = Round_Up_To_Block_Size(pptr->length + len);/* allocate new piece */newptr = (struct piece *)malloc(sizeof(struct piece) +newlen - 1);if (newptr == NULL) return(NULL);/* construct new contents */memmove(&newptr->data[0], pptr->data[0], start);memmove(&newptr->data[start], chrs, len);memmove(&newptr->data[start + len],pptr->data[start],pptr->length - start);newptr->length = newlen;newptr->used = pptr->used + len;free(pptr);return(newptr);}

When this version of the delete routine is compared to the "nomanagement" version, it is simpler and will run faster. Theinsert routine is more complex, but most of the complexity will beexecuted only rarely. The path most often followed is again simplerand faster than before.

This technique has an additional benefit. In the "nomanagement" version, memory is allocated in character-size unitsranging from one character to the entire piece. In the "extraspace" technique, memory is allocated in (typically) sixteen bytechunks. Typical allocation units will range from eight bytes to thepiece-size limit (if any), in steps of sixteen bytes. (Actually, youprobably never want to allocate a piece that is not a multiple ofsixteen bytes). In any event, the dynamic range of the size ofallocated units will be much smaller than in the "nomanagement" technique. Thus, memory management will consume lessoverhead, and less memory will be lost to allocationfragmentation.

6.4.3 Buffer Gap

The buffer gap technique system stores the text as two contiguoussequences of characters with a (possibly null) gap between them.Changes are made to the buffer by first moving the gap to the locationto be changed and then inserting or deleting characters by changingpointers. It thus uses memory efficiently, as the gap can be keptsmall and so a very high percentage of memory can be devoted toactually storing text. The overhead information includes the lengthof the piece, the location of the start of the gap, and the locationof the end of the gap.

Here is an example buffer which contains the word"Minneapolis".

0   1   2   3   4   5   6   7   8                       9  10  11| M | i | n | n | e | a | p | o |   |   |   |   |   | l | i | s |-----------------------------------------------------------------0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16 30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45                P              GS                  GE

In this example, the buffer is 11 characters long and it containsno spaces. The blanks between the "o" and the "l"show where the gap is and do not indicate that the memory has spacesstored in it. The point is between the "n" and the"e" at location 4 and is labeled with a "P" in thebottom line (legal values for the point are the numbers from zero tothe length of the buffer, 11 in this case). There are also threedifferent sets of numbers (coordinate systems) for referring to thecontents of the buffer.

First is the user coordinate system. It is displayed above thebuffer. The values for it run from 0 to the length of the buffer(11). As you will note, the gap is "invisible" in thissystem. The coordinates label the positions between the characters andnot the characters themselves. Thought of in this way, the arithmeticis easy. Thought of as labeling the characters, the arithmeticbecomes fraught with special cases and ripe for fencepost errors.

Second is the gap coordinate system. It is displayed immediatelyunder the line. The values for it run from 0 to the amount of storagethat is available and it, too, labels the positions between thecharacters. The internal arithmetic of the buffer manager is done inthis coordinate system. The start of the gap (labeled "GS"in the bottom line) is at position 8 and the end of the gap (labeled"GE") is at position 13.

Conversion from the user coordinate system to the gap coordinatesystem is quite easy. If a location (in the user coordinate system)is before the start of the gap, the values are the same. If a locationis after the start of the gap (not the end of the gap!), itscorresponding location in the gap coordinate system is (GapEnd -GapStart) + the location in the user coordinate system. It is a goodidea to isolate this calculation either in a macro or a subroutine inorder to enhance readability. Most routines (e.g., the searchroutines) will then use the user coordinate system even though thoseroutines are essentially internal.

The third coordinate system is the storage coordinate system. Itis the bottom row of numbers in the diagram. It is the means wherebythe underlying memory locations are referenced. It is labeled from Xto X + the amount of memory that is available. The origin (the valueof X) was chosen here to be 30 to help distinguish between the variouscoordinate systems. Its absolute value makes no difference. Notethat it labels the memory locations themselves and so caution must betaken to avoid fencepost errors.

This technique has a very low overhead for examining the buffer.The user coordinate location is first converted to the gap coordinatesystem. The memory location is then looked up and its contentsreturned. Essentially, one comparison and a few additions arerequired. The purpose of the conversion is to make the gap invisible.Note that the contents of the buffer are not moved.

However, there is further overhead associated with inserting ordeleting, since the gap may have to be moved so that it is at thepoint. There are three cases:

The gap is at the point already. No motion is necessary.
The gap is before the point. The gap must be moved to thepoint. The characters after the gap but before the point must be movedbefore the insertion or deletion can take place. The quantityConvertUserToGap(point) - GapEnd characters must be moved. Thisquantity is numerically point - GapStart.
The gap is after the point. The gap must be moved to thepoint. The characters after the point but before the gap must be movedbefore the insertion or deletion can take place. The quantityGapStart - ConvertUserToGap(point) characters must be moved. Thisquantity is numerically GapStart - point.

After the gap has been moved to the point, insertions or deletionsare performed by moving the GapStart pointer (or the GapEnd pointer --it makes no difference). A deletion is a decrementing of the GapStartpointer. An insertion is an incrementing of the GapStart pointerfollowed by placing the inserted character in the memory location thatwas just incremented over.

Note that after the first insertion or deletion the gap is alreadyin the correct place. Thus, the insertions or deletions that followcan take place without moving the gap . Further, the point can bemoved away and back again with no motion of the gap takingplace. Thus, the gap is only moved when an insertion or deletion isabout to take place and the last modification was at a differentlocation.

This scheme has a penalty associated with it. The gap does notmove very often, but potentially very large amounts of text may haveto be shuffled. If a modification is made at the end of a buffer andthen one is made at the beginning, the entire contents of the buffermust be moved. (Note, on the other hand, that if a modification ismade at the end of a buffer, the beginning is examined, and anothermodification is made at the end, then no motion takes place.) The keyquestion that must be asked when considering this scheme is, when amodification is about to be made, how far has the point moved sincethe last modification?

How far can the point be moved before the shuffling delay becomesnoticeable? Assume that an interval of 1/10 second is noticeable andthat the editor is running on a dedicated system. Assume 250nanosecond, 16-bit wide memory. Assume also that ten memory cyclesare required for every two bytes moved (load, store, and eightoverhead cycles for instructions). Then, 80,000 bytes can be movedwith a just noticeable delay.

Because of the locality principle and because most files that areedited are less than 80,000 bytes in size, it seems reasonable toconclude that the average distance moved will be less than 80,000bytes and so the shuffling delay will not be noticeable. Note that thesize of the gap does not affect how long the shuffling will take andso the gap should be as large as possible.

6.4.3.1 Multiple Gaps and Why They Don't Work

Assume that we were still uncomfortable with the shuffling delayand a possible fix was put forth. This fix would have, say, tendifferent gaps spread throughout the buffer. What would the effectsbe? The idea behind this discussion is to help one understand thebuffer gap system by seeing how it changes to the scheme fail.

First, the conversion from the user to the gap coordinate systemwould be more complex and take longer. Thus, some ground would belost. However, this is a small loss on every memory reference inorder to smooth out some large bumps, so it might still be areasonable thing to do.

Second, the average amount of shuffling will go down, but not byanywhere near a factor of ten. Because of the locality principle,most shuffling occurs over short distances and so cutting out the"long shots" will not have a large effect.

Third, unless the writer is very careful, the gaps will tend tolump together into a smaller number of "larger" gaps. Inother words, two or more gaps will meet with the GapEnd pointer forone gap the same as the GapStart pointer for the next gap. There isjust as much overhead in referencing them, but the average amount ofshuffling will increase (or, more precisely, not be decreased).

On the whole, the extra complexity does not seem to returnproportional benefits and so this scheme is not used.

6.4.3.2 The Hidden Second Gap

On some computers, for example the two-dimensional memory systemused by Multics, a second gap at the end of the buffer is providedwith almost no extra overhead. The key to this gain is that thebuffer is not stored in a fixed-size place. Rather, the size of thememory (or address space, to be more precise) that is holding thebuffer can also increase.

The extra overhead is a check to see whether a modification istaking place at the end of the buffer. If so, the modification can bemade directly with no motion of the gap.

The second gap has a greater effect than one might think because adisproportionately high percentage of modifications take place at theend of the buffer. This distortion is due to the fact that mostdocuments, programs, etc., are written from beginning to end and sothe new text is inserted (or changed) at the end of the buffer.

The increased overhead due to the second gap method is low becausethe check for the end of the buffer is already there (in the systemhardware). There is no problem of the gaps coalescing because one ofthem is pegged into place. The gains are not all that great, butneither are the costs and so it should be used where supported by theoperating system.

6.5 Implementation Method Overview

We started by not internally managing the pieces at all. We thenadded some slack space at the end of each piece. We moved to thebuffer gap technique, which allowed that slack space (now called agap) to move within the piece. Finally, we reviewed an optimizationof the buffer gap technique that in some cases added the slack spaceat the end again.

This table presents a summary of the implementation methods. A"-" means that the combination makes no sense. An"=" means that the combination tends to be very inefficientto implement. A "*" indicates those combinations that areplausible.

size of separately managed piece	no management	at the end	extra space buffer gap
character	*	-	-
16-64 chars	*	*	*
line	=	*[2]	*
1/2-4K chars	=	=	*[3]
buffer	=	=	*[1]

Combination [1] is the standard buffer gap management method.Combination [2] is the "linked line" management method.Combination [3] is the "paged buffer gap" management method.The following sections will describe each of these methods. Latersections will compare and contrast them.

6.6 Buffer Gap

This method was first used by TECO. This method treats the entirebuffer as a single object. The buffer gap technique is used to handleinsertions and deletions. It is simple and straightforward, easy toimplement and easy to debug. As the text is contiguous, the buffercan be transferred to or from a file with just one or two systemcalls. It also translates easily to the modern world of workstationswith large virtual address spaces.

6.7 Linked Line

This method came into common use for Emacs-type editors when theybegan to be implemented on top of Lisp environments. In this method,the buffer is stored as a doubly linked list of lines. The followinginformation might be stored for each line:

struct line {struct line *next;struct line *previous;struct piece *theline;int version;/* optional */struct marks *mark_lists;/* optional */};

Thenext andprevious fields implement thedoubly-linked list. They point to the following and preceding lines,respectively.

Theline field is managed using one of the managementtechniques described in the earlier characteristics of implementationssection. Typically, the "leave space at the end" techniqueis used.

Theversion field is optional. If implemented, it is foruse by the redisplay code and will be discussed in Chapter 7.

Themark_lists field is optional. If implemented, it recordswhich marks are located on this line.

A bufferlocation in this method is typically represented asa (line pointer, offset) pair. It follows from this representationthat marks are always associated with a line (think about it). Markscan thus be efficiently implemented by a per-line mark list. By doingso, less time is required to update the marks after insertions ordeletions because only those that are on the affected line canpossibly be changed.

The operation of this method is straightforward. New lines, whencreated, are simply spliced into the list at the appropriate place.Note that no characters are stored to indicate line breaks. If thenew line is inserted into the middle of an existing line, somemovement of the text on the end of the old line to the newly allocatedline is all that is required.

The line itself is typically stored by the "extra space atend" technique. However the buffer gap technique could also beused. Regardless of the technique used, it is important to ensure thatno limits are placed on the length of a line.

If your implementation includes automatic word wrap, donotsplit lines on "soft" newlines, because the overhead ofshuffling the line allocations while the user types will be large.Instead, use a buffer gap technique within a line and only split lineson hard newlines (i.e., paragraphs).

6.8 Paged Buffer Gap

In this method, the buffer is divided into "pages" of oneto two Kilobytes each while the file is read in. Each page is thenmanaged with the buffer gap technique. The pages are organized intoan array or linked list. This method has two points in its favor:

Since each page is small, the gap need never be moved very far.
Since all pages are the same size, memory management is kept simple.

These points are very important in resource-limited environments.For example, this method was used in the Mince text editor thatinitially ran on CP/M systems with 48 Kilobytes of memory and smallfloppy disks. That editor implemented a complete paged, virtualmemory environment for its buffers. (The implementation included allof the then-current optimizations found in virtual memory operatingsystems.) The size of the buffer was limited only to the amount ofavailable disk space.

Since main memory was so limited (in some systems, only three orfour pages could be kept in memory at once), the excess pages wereswapped to disk. A descendant of the Mince editor called TheFinalWord used the disk storage to even greater advantage: the controlinformation was written to disk as well, thus allowing the completeediting state to be saved between invocations as well as beingrecoverable in the event of a system crash.

6.9 Other Methods

The other methods involve tracking small chunks of characters oreven individual characters. While they are in principle do-able,their small object size serves to increase the amount of memory andCPU overhead, unfortunately without offering any compensatingadvantages. Thus, they remain largely unused.

6.10 Method Comparisons

This section compares the three main methods in a variety ofways.

6.10.1 Storage

These comparisons are on a per-buffer basis. They also assumeeight-bit characters. Our sample buffer will consist of 150,60-character lines.

A buffer gap implementation requires a fixed-size header (say,eight bytes) plus one byte per character of text. Total size is 9,008bytes.

A linked line implementation requires a fixed-size header (say,eight bytes), plus a fixed-size header per line (say, twelve bytes)plus one byte per character of text, plus on the average eight bytesof fragmentation per line. Total size is 8 + 150 * 12 + 9,000 - 150(don't store newline characters) + 8 * 150 = 11,858 bytes.

A paged buffer gap implementation requires a fixed-size header(say, eight bytes), plus a fixed-size header per page (say, twelvebytes), plus the pages (say, two Kilobytes each). Total size is 8 +12 * 5 + 2,048 * 5 = 10,308 bytes.

The linked line method pays a large storage price because of itsrelatively high per-line overhead. In this example, the per-lineoverhead was about 33 percent.

The paged buffer gap method pays a large price in this examplebecause of the mismatch between the page size and the buffer. Ifmemory is tight, a smaller page size can be selected. However, theextra overhead is paid only once per buffer, since it occurs only atthe end.

6.10.2 Crash Recovery

These comparisons assume that a recovery program is examining thecore image of an edit session that was interrupted.

In the buffer gap method, crash recovery is relatively easy andfail safe. In general, the start and end of the buffer can be foundif a marker is left around the buffer (say, a string of sixteenstrange (value 255) bytes) and the buffer is everything betweenthem. The gap can be recovered and manually deleted by the user or, ifit too is filled with a special marker, can be automaticallydeleted.

In the linked line method, crash recovery is harder. Recovery isgreatly aided by erasing freed memory. Basically, you perform therecovery by picking a block at random and examining it. If it can beparsed into a line header (i.e., the pointer values, etc., arereasonable), continue (a careful selection of header formats willhelp). Otherwise, pick a different block. You can then follow thenext and previous pointers and parse them. If this works three orfour times in a row, you can be confident that you have a handle onthe contents. If a header doesn't parse, it is because it is either apart of a line (either pick again at random or go back one chunk andtry again) or a header that was being modified (in which case you areblocked from continuing down that end of the chain). In the lattercase, go in the other direction as far as possible. You now have onehalf of the buffer. Repeat the random guess, but don't pick frommemory that you have already identified as part of the buffer. Youshould get the other half of the buffer. Leave it to the user to putthe two halves together again. If the freed blocks are not erased,the chance of finding a valid-looking header that points to erroneousdata is very high.

In the paged buffer gap method, crash recovery is easier than withlinked line, but harder than with buffer gap. As with buffer gap,marker bytes can help you locate the buffer pages, and the gap can berecovered either manually or automatically. The pages are strungtogether just like lines were: it is just that there are fewer pagesto work with.

6.10.3 Efficiency of Editing

These comparisons examine the typical types of effort required toinsert a character or line.

operation:	insert character	insert line	maximum motion
buffer gap	move gap pointer update	same as insert character	buffer
linked line	move/scroll line pointer update	allocate header line splice into line	none
paged buffer gap	if full, split page move gap pointer update	same as insert character	page

As might be expected, the buffer gap scheme is the most efficient,although you will occasionally encounter a comparatively long pause.The linked line scheme adds lots of overhead to the simple operationsand cuts out the occasional comparatively long pauses. But you willoften be hit badly if, for example, you insert in the middle of a4,000-character line. The paged buffer gap method removes the pausesat the price of a moderate increase in complexity.

6.10.4 Efficiency of Buffer/File I/O

This section compares the way that the methods handle buffer/fileI/O.

The buffer gap method is extremely efficient. Reading a file intoa buffer consists of these operations:

Determine the file's length.
Allocate enough memory to hold the file, plus some extra for growth.
Read the file in.

On some systems, even this can actually be improved. For example,on many systems you can just map the file into the address space ofyour process. No actual data motion takes place until you modify oneof the pages. At that time, the page is copied and the modificationsare written to the new copy (this is sometimes called"copy-on-write"). Writing the file out can take two calls(one to cover the text in front of the gap and the other to cover thetext after the gap).

The linked line method has a very obvious but poor algorithm toread the file in. This code fragment illustrates the algorithm:

if (fd = fopen(FILE, "r")) == NULL) {...error...}while (fgets(buf, sizeof(buf), fd) != NULL) {if (!Allocate_Line(strlen(buf)) {...error...}Build_Line(buf);}fclose(fd);

This algorithm has a system call (in thefgets) andallocation for every line in the buffer. An improved algorithm wouldread the whole file into memory (or at least read in large chunks),then allocate lines out of that memory. This improvement at leastreduces switching between the system and program contexts. (Asufficiently goodfgets implementation would effectively dothis. Unfortunately, the libraries that come with many C compilersare not sufficiently good...)

The paged buffer gap method could be just as efficient at readingas the buffer gap method. It would operate by reading the entire filein as a block, then dividing that block up without moving any data.The first insert on each page will cause a page split, though.Writing would have a worst case number of system calls equal to twicethe number of pages in use.

6.10.5 Efficiency of Searching

This section compares the way that the three methods handlesearching.

If your implementation is such that the search time dominates thesetup time, all three methods are equivalent. In the case where thesetup time dominates the search time, the methods do performdifferently, and that is the case that will be examined.

This comparison assumes that the search routines are "builtin" to the buffer management code for performance reasons. Whilethey could callGet_Char for every character, doing so wouldprobably not be very efficient. Given equivalent implementations ofthe actual search code, the main difference among the methods is thenumber of times that the inner search loop is called. In other words,it is the number of distinct pieces that must be searched.

The buffer gap method calls the inner loop twice: once for the textin front of the gap and once for the text after the gap. While animplementation could move the gap so that the search routine onlyneeds to be called once, doing so goes against the reason for havingthe gap in the first place. Keep in mind that searching happens alot. For example, two searches are done whenever the "forwardword" command is given. The whole point of the buffer gap methodis to avoid moving the gap until it is necessary. An even worse wayto go astray is to move the gap as you search. This replaces onelarge, efficient gap move with many smaller ones. We have alreadyobserved that even fairly large gap moves are not very noticeable, so"optimizing" them out is not a wise move. In conclusion,invoking the search loop twice is quite efficient.

The linked line method invokes the inner search loop once for everyline in the buffer. In our earlier example, this means that it wouldbe invoked 150 times. What's worse, the linked line method does notstore newline characters. Rather, they are implied by the linestructure. Hence, whenever a newline character is in the searchstring, that character must be handled in a special manner. Whilesome optimizations can (and should) be made (for example, searchingfor "x<newline>" means that you only have to look atthe last character of each line), the code complexity required to makethese optimizations adds its own performance penalty.

The paged buffer gap method lies somewhere between the other two.In the earlier example, the search routine would have to be invokedten times. This is not enough to incur a significant performancepenalty, but it is one more reason not to use this method if buffergap will work.

6.10.6 Multiple Buffers

This section compares the way that the methods handle multiplebuffers.

The buffer gap method offers no choice: the buffers must follow oneanother in memory (where else could they be?). This arrangementbecomes bad when the total size of all buffers becomes large enoughthat an objectionable pause occurs when switching buffers. Thearrangement can be improved by leaving extra "gaps" betweenbuffers.

The linked line method has two choices. First, all lines can beallocated out of a common pool. Thus, over time, the buffers tend to"intertwine" (i.e., the lines of one buffer are mixedin with the lines from other buffers in physical memory). This choicetends to maximize the density of text and thus makes the mostefficient use of memory. (See also the discussion in the next sectionabout paged environments.) The other choice is to allocate memoryamong buffers, then allocate all lines for a buffer from within eachbuffer's allocation. There must, of course, be some way to change thebuffer allocations.

The paged buffer gap method has the same two choices as linkedline.

6.10.7 Paged Virtual Memory

This section compares the way that the methods perform in paged,virtual memory environments. It concentrates on the effects thatoccur when main memory is full and paging is going on ("tightmemory").

Some operations, such as searching the entire buffer, require thatthe buffer be accessed sequentially. In situations where the entirebuffer does not fit into memory, no management method can avoid somepage swapping. That type of situation is not analyzed here.

The buffer gap method generally works well in this environment.Its highly compact format allows for accessing large portions of thebuffer with only a few pages in memory. Its sequential organizationalso implies that it has a very good locality of reference and so thenearby pages are heavily referenced and likely to be around.

Its major problem is, as usual, the worst case situation of a largegap movement. In tight memory situations, moving the whole bufferimplies that all of the buffer's pages must be swapped in and -- mostlikely -- swapped out again. Overall, the buffer gap method does aswell as can be expected. The nearby portions of the buffer will tendto be in memory because of locality of reference, but distant portionsmay in general have to be paged in.

The linked line method has many disadvantages and no realadvantages in tight memory situations. First, if an intertwiningmultiple buffer scheme is used, over time the effective page size isreduced by a factor that tends to increase over time to equal thenumber of buffers. This reduction is due to the random nature of thebuffer memory allocations, the fact that many lines tend to fit intoone virtual memory page, and the consequence that over time a virtualmemory page tends to hold lines from as many buffers as possible.However, when a given buffer is in use, the storage used by the otherbuffers is consuming memory.

Even separating the buffer memory does not resolve this problem.When buffers get large, different parts of the same buffer may act inthe same fashion as the separate buffers to decrease the effectivepage size. In extreme cases, a desired "target" line may bein memory, but in the process of following the linked list, the targetline may be swapped out!

Notwithstanding the above, this method does not pack data astightly as the others. An earlier example showed that the overheadfor the linked line method is about 33 percent. Thus, the page sizeis effectively reduced by 25 percent.

When all factors are combined, a typical linked line system isprobably reducing its effective page size by about 50 percent. Theresult is that, for example, if a computer has one Megabyte of memoryin 512, two Kilobyte pages, the linked line method would effectivelytreat this as 512 Kilobytes of memory (512, one Kilobyte pages).

The paged buffer gap method is essentially the buffer gap method,modified to improve performance in tight memory situations. Itremoves the lengthy gap moves and consequently lowers the probabilityof thrashing. When designing such a system, the buffer page sizeshould be set the same or as a multiple of the virtual memory pagesize. Thus, in even the tightest memory situations an insert ordelete of one character will only affect at most two pages of buffermemory.

6.10.8 Conclusions

Use the buffer gap method if at all possible.

Only use the linked line method if you are implementing in anenvironment that likes to manipulate lists of small objects; forexample, Lisp environments.

Only use the paged buffer gap method if resources are tight.

6.11 Editing Extremely Large Files

This section examines techniques for editing extremely large files.The first type of extremely large file is those files that are solarge that reasonable assumptions based on current workstation andmainframe architectures are no longer valid. Given the currentgeneration of computing hardware, this starts happening around 512Megabytes (ten years ago, I had set this number at 64 Megabytes). Atthat size, even simple operations such as a string search can takeseveral minutes to run on a fast processor with the whole file inmemory.

Although there are one or two interesting hacks you can do to stayalive, life is simply not bearable when trying to edit such a largeunstructured file. The alternative which large data base implementershave known about for years, is to structure the file. Thisalternative is palatable because an unstructured editor can still beused to edit the pieces of a structured file. The other reason whythis limitation is not bothersome is that there aren't all that manysuch files to edit. (For example, the largest file on a computersystem that I often use is only about 33 Megabytes.) The vastmajority of files are much smaller. Gigantic files call for specialtools.

The other type of "extremely large" file is encounteredon resource-limited systems. In these cases, files that wouldotherwise be handled easily can now cause the system to bog down. Forexample, in the first generation of microprocessor systems, anextremely large file might have been 100 Kilobytes. There are severalways of dealing with such files.

One way is to divide the file into chunks, each of which fits intomemory. You read in the first chunk, edit it, write it out, read inthe second chunk, and continue until you are done. While arbitraryediting can be done within a chunk, in general you cannot back up to aprevious chunk without finishing the file and starting over. Thismethod was used by the original TECO editor.

Another way is to use a three-file system. As the user moves downin the file, the "from" file is read and a "to"file is written. When the user wants to back up, the "to"file is read and a "holding" file is written (the chunkswill appear in reverse order in this file). When the user movesforward again, the "hold" file is read (backwards) untilexhausted, then the "from" file is read from again.

The best method to use if main memory is tight is paged buffer gap.If disk storage is also tight, serial chunking is best.

6.12 Difference Files

There is another type of buffer management that has been used togood advantage in several cases. It is called thedifferencefile method. It works best when recording relatively few changes,and those changes are small when compared to the size of the buffer.In this method, the buffer is not kept in memory at all. Instead,only a list of differences between the buffer and the"original" file is kept. When information is being retrievedfrom the buffer, it is read from the file as needed and thedifferences applied.

This method has much promise. For example, in many cases, a filewill be read into a buffer, looked at by the user, and the bufferdeleted. In this example, the difference file method essentially actsas a file viewer. This is particularly encouraging when you realizethat the larger a file is, the less likely it is to be dramaticallychanged.

On the other hand, this method does not scale well. For example, Ihave been editing this chapter continuously for several hours. As itturns out, the current version bears little resemblance to theoriginal. Thebest description of the changes is "throwout everything and insert X," where "X" is the entirechapter. I would expect most "reasonable" descriptions ofthis chapter to wind up being several times as large as the chapteritself. Hence, you now have to address the question of how to editthe description of the differences. Let's see, we can use buffer gap,linked line, or paged buffer gap...

In addition, this method does not work well in tight memorysituations. As I write this chapter, it occupies about 60 Kilobyes ofthe roughly 100 Kilobytes of free RAM disk space on my lap-topcomputer. I simply don't have the room to store what amounts to boththe "old" and "new" versions at once.

But, one might argue, you don't have to store the "old"version as it appears on disk. Well, that's true, but the disk issitting on the table by my side, not in the floppy drive. So how canit be read?

In conclusion, this method works well when one is essentiallyviewing files. However, it breaks down badly as changes to the fileaccumulate. It can easily end up taking several times as much memoryto track all the changes as to simply store the modified version.Finally, the changes can easily become so large that either a"real" buffer management method must be implemented tosimply track the changes or "snapshot" files must be createdso that changes can be tracked from a new base. Hence, why botherwith the extra overhead?

Questions to Probe Your Understanding

Rectangular regions include only those characters between the pointand mark that are also in columns between those of the point and themark. Define a set of interface procedures to handle rectangularregions. (Easy)

Come up with a situation where it would be a good idea to implementthe buffer as a linked list of characters. (Medium)

The first buffer gap editor was TECO, which was also among thefirst text editors ever written. It was written in the early 1960s.Explain why many people spent the next fifteen years reinventinghard-to-use, limited-functionality line editors. (Medium, but if yousucceed I would like to hear your explanation)

Devise a buffer management scheme better than buffer gap. (Hard,but if you succeed, you can probably get a Ph.D. thesis out of it)

Back to Contents.

Seven: Redisplay

One, two! One, two! And through and through
The vorpal blade went snicker-snack!

The previous chapter described a way of dividing the implementationinto parts and covered one of those parts, the internal sub-editor.This chapter describes the redisplay part.

This chapter will start by discussing the general constraints thataffect redisplay. It will then describe the external interface andsome of the internal interfaces used by redisplay (the "procedureinterface" definitions). It goes on to discuss many of theconsiderations that affect implementations of the algorithms.Finally, it describes the redisplay algorithms.

7.1 Constraints

Redisplay, or incremental redisplay, to give it its full name, isthat part of the implementation that is responsible for ensuring thatall changes to the buffer are promptly reflected on the user'sdisplay. As is evident from the definition, there are two parts toredisplay's job.

The first part is to ensure that all changes are indeed tracked.In the absence of the second part, this part would be quite easy.

The second part of the job is to ensure that the changes are made"promptly." In this context, "promptly" means thatthe amount of clock time required to make the updates visible isminimized. Clock time is the combination of transmission time, CPUtime, and disk access time that is perceived by the user as the delayfrom when a command has been entered to when the display has beenupdated.

In general, the buffer's contents will change only a small amountduring any one command. The screen will thus only have to be changedby a small amount in order to reflect the changed buffer contents.Hence, the algorithms concentrate on incrementally redisplaying thebuffer; the entire process is thus referred to as incrementalredisplay. Fortunately, it turns out that in cases where the bufferis changed drastically, the increment-oriented approach to redisplayworks quite well and so there is no need for multiple algorithms.

This discussion of incremental redisplay assumes a model of thesystem where the editing is done on a main processor whichcommunicates with a display. If the main processor is the same as thedisplay, the bandwidth of the CPU to display communications channelcan be very high. However, the considerations remain unchanged: onlythe relative weights change. Incremental redisplay is an optimizationbetween CPU time, display-processing time, and communications-channeltime, with a few memory considerations thrown in.

The first major constraint is the speed of the communicationschannel. Typical speeds that are available are 300, 1200, 2400, and9600 bps. Memory-mapped and other built-in displays run at busspeeds: communications speed is essentially infinite.

A typical video display has a 24 x 80 character screen. At 300bps, it takes three seconds to reprint a line and over a minute torefresh the whole screen. At 1200 bps, less than one second isrequired to reprint a line and about sixteen seconds to refresh thescreen. At 9600 bps, it will take one to two seconds to refresh thescreen. The speed of the communication thus greatly affects theamount of optimization that is desired. At 300 bps, the user maynotice even one extra transmitted character, while at 9600 bps,reprinting entire lines does not take an appreciable amount of time.One dimension of the optimization is thus clear: the importance ofoptimizing the number of characters sent increases in proportion tothe slowness of the communication line.

The second major constraint is the speed of the display device. Ittakes time for the display to handle each command, and this time canaffect the choice of commands sent to the display. For example, if aline ending in:

whale

was changed to:

narwhale

the redisplay code could elect to position the cursor just beforethe "w", insert three characters, then send "nar".Alternatively, it could position to the same place and just send"narwhale". The latter would be more efficient unless thedisplay could accept and perform the "insert threecharacters" command in less than five character times (possible).As has been mentioned before, some memory-mapped displays actuallyprocess commands very slowly. As the effective display speed is solow, it is important to use good redisplay algorithms on thosedisplays.

User interface considerations also affect which command sequencesshould be sent. For example, while it might be acceptable from a pureclock-time point of view to reprint an entire line, users do not liketo see text which has not changed in the buffer "change" bybeing reprinted. The flickering that is generated by the reprintingprocess attracts the user's attention to that text, which isundesirable as the text has not, after all, changed. Thus, avoidingextraneous flickering and movement of text is good. The amount ofperceived flicker will vary from display to display, being highlydependent upon such factors as display command set, display speed,internal display data-structures, timing, and phosphor.

The third major constraint is the CPU speed. On some computers,computing an optimal redisplay sequence takes longer than is saved bythe optimizations ("optimal" considering only thecommunications channel and display). On those machines, the correctoptimization is to send the "less-than-optimal"sequence.

CPU time must be spent in order to perform any optimizations. Ifthe CPU time spent exceeds some small amount of clock time, the userwill perceive response to be sluggish. It is therefore desirable tominimize the CPU time spent on optimizing the redisplay. However, thecommunications channel speed also makes a difference. If the line isslow, extra CPU time can and should be spent (at 300 bps, it isworthwhile to spend up to 30 msec. of CPU time to eliminate onecharacter from being transmitted (which takes about 30msec.)). However, at higher speeds it is generally not practical toheavily optimize the number of characters sent, as it can easily takelonger to compute the optimizations than to transmit the extra data.This relaxation of the optimization is subject to the user interfaceconstraint outlined above.

The fourth constraint is the memory size. For example, onetechnique stores a copy of the entire screen, character by character.This technique works quite well in general. However, where memory istight this technique may not be feasible.

7.2 Procedure Interface Definitions

This section describes two interfaces. The first one is the(external) interface that redisplay presents to the rest of the editorimplementation (i.e., the sub-editor and the user-orientedcommands). The second interface is the internal interface used by theredisplay to isolate the display-specific portions of its internalcode.

In this section, the termdisplay refers to the hardwareoperated by the user. A display has a keyboard, screen, and perhaps agraphical input device. Thescreen is the part of the displaythat shows output. Awindow is a logical screen. One windowcan occupy the entire screen, or more than one window can share thescreen, perhaps even overlapping. A window can never be larger thanthe screen.

7.2.1 Editor Procedures

status Window_Init(char *display);status Window_Fini(void);status Window_Save(FILE *fptr);status Window_Load(FILE *fptr);

Window_Init is the basic set-up-housekeeping call. It iscalled once, upon editor invocation. It should perform all requiredone-time initialization, including keyboard and screen initialization.No other editor interface call except forWindow_Fini can belegally called unlessWindow_Init returns a successful status.The parameter indicates the display type. Presumably, if theparameter is null, the routine will determine a default display.

Window_Fini terminates all state information. Once called,Window_Init must be called again before other editor interfacecalls can be legally made.

Window_Save saves the current redisplay state information inthe file opened on the specified file descriptor. Presumably,World_Save opened the file, then called this routine.

Window_Load loads all redisplay state information from thefile opened on the specified file descriptor.

If you are creating a "stripped down" editor, then theseroutines would not do anything. They can be put in as stubs if thereis a reasonable possibility that the editor will be embellishedlater.

void Redisplay(void);void Recenter(void);void Refresh_Screen(void);

Redisplay performs one incremental redisplay. If it runs tocompletion, the screen will accurately reflect the buffer. However,this routine also checks for type ahead. If the user does type (oruse the graphical input device) before the redisplay has finished,this routine will notice that event and abort the redisplay in a safemanner. Presumably, the user will quit typing ahead at some point andredisplay can then complete.

Recenter operates as doesRedisplay, except that itmoves the point to the center of the window (technically, thepreferred percentage). This procedure is needed because there istypically a user-level command to perform this operation.

Refresh_Screen operates as doesRecenter, except thatit assumes that the screen has been corrupted. Thus, this routineensures that the screen is correct no matter what else may havehappened.

void Set_Pref_Pct(int percent);int Get_Point_Row(void);int Get_Point_Col(void);

Set_Pref_Pct sets the preferred percentage. After aRecenter or other similar operation, the point will be on aline approximately this percentage of the way through the window. Agood default value is about forty percent. With this default on a22-line window, the point will be on line 9.

Get_Point_Row returns the number of the row within thewindow that the point is at.

Get_Point_Col returns the number of the column within thewindow that the point is at. This may not be the same as the columnreturned byGet_Column, as that routine does not take intoaccount line wrap.

window_data Window_Create(window_data wind);status Window_Destroy(window_data wind);status Window_Grow(window_data wind, int amt);int Get_Window_Top_Line(window_data wind);int Get_Window_Bot_Line(window_data wind);location Get_Window_Top(window_data wind);location Get_Window_Bot(window_data wind);

These routines are used to manipulate multiple windows, if youchoose to offer that feature. The definitions provided here onlyallow for horizontal windows (i.e., all windows occupy the fullwidth of the display). Vertical windows and overlapping windows arenot covered, although they are often implemented.

Window_Create creates a new window. It operates bysplitting the supplied window in two. Both windows initially show thesame data. It returns a window descriptor for the second window.

Window_Destroy destroys the supplied window. The windowabove this one expands to occupy the vacant screen space. Note thatWindow_Destroy(Window_Create(wind)) results in no change.

Window_Grow grows the specified window by the specifiednumber of lines. The window is grown by moving the top line up.

Get_Window_Top_Line returns the screen line that containsthe top line of the specified window.

Get_Window_Bot_Line returns the screen line one after thebottom of the specified window. This is the same value asGet_Window_Top_Line of the next lower window.

Get_Window_Top returns the location in the buffer justbefore the character that is at the top left part in the window.Several of the user commands use this information.

Get_Window_Bot returns the location in the buffer just afterthe character that is at the bottom right part of the window (or asclose as you can get to it). This is the same location that would bereturned byGet_Window_Top if the window were exactly onewindow below its current position. Several of the user commands usethis information.

7.2.2 Display Independent Procedures

This section describes the display functions used by redisplay.The routines that implement these functions are not part of redisplayitself. A full discussion of this topic is beyond the scope of thisbook (but is covered in Linhart 1980). In essence, the problem is thatevery display manufacturer has decided on a different set of featuresand offers different ways of accessing those features. To solve theproblem, a set of routines is needed which can isolate thesedifferences, as well as a way of selecting among different sets ofsuch routines as the display changes.

Although not recommended, itis possible to cover a lot ofdisplays by assuming that the display accepts the ANSI escapesequences (i.e., the display is a DEC VT100). Most moderndisplays accept these sequences. However, many older displays do not.In addition, not all displays take the same amount of time to processa given command. Thus, there is still per-display information toconsider.

There is one piece of existing technology to mention, and that isexemplified by thecurses package available on many UNIXsystems (similar packages are available under other names for othersystems). It not only provides display independent functions, albeitwith a somewhat different interface, it also performs the redisplayfor you! However, as the purpose of this chapter is to explain howredisplay works, that package will receive no further mention.

The following set of procedures will allow display-independentoperations for most displays. The procedure interfaces isolate theoperations that are used by redisplay.

status Key_Init(char *display);status Key_Fini(void);char Key_Get(void);FLAG Key_IsInput(void);private Key_FunctionKeys(void);

Key_Init andKey_Fini operate in the by now familiarfashion. They will be called by theWindow_Init andWindow_Fini routines. In particular, though, these routinesmake sure that all processing of input characters is turned off(i.e., set it for "raw" input) and all configurationinformation is loaded.

Key_Get waits for a key to be pressed and returns it. Keysthat send multiple characters (e.g., function keys) arereturned one character at a time.

Key_IsInput returns True if input is available or False ifit is not. It is used, for example, byRedisplay to determinewhether to abort.

Key_FunctionKeys returns information about the function keysavailable on this keyboard. This information includes key placement,key labeling, and the codes returned by the keys. The information isreturned in an implementation-defined manner (i.e., you get toinvent your own representation).

status Screen_Init(char *screen);status Screen_Fini(void);int Screen_Rows(void);int Screen_Columns(void);private Screen_Atrributes(void);

Screen_Init andScreen_Fini operate in the by nowfamiliar fashion. They, too will be called by theWindow_InitandWindow_Fini routines. In particular, though, theseroutines make sure that all processing of output characters is turnedoff (i.e., set it for "raw" output) and allconfiguration information is loaded.

Screen_Rows returns the number of rows in the screen. Inthis case, a row is the granularity of screen output. On a graphicsscreen, a row would be one pixel.

Screen_Columns returns the number of columns in the screen.In this case, a column is the granularity of screen output. On agraphics screen, a column would be one pixel.

Screen_Attributes returns information about the attributes(e.g., boldface, reverse video, blinking, etc.) that the screensupports. The information is returned in an implementation-definedmanner (i.e., you get to invent your own representationagain).

void Set_Cursor(int row, int column);void Set_Row(int row);void Set_Column(int column);void Set_Attr(private attributes);int Get_Row(void);int Get_Column(void);private Get_Attr(void);void Put_Char(char c);void Put_String(char *str);void Beep(void);

Set_Cursor sets the cursor to the specified row and column.It is assumed that the optimal (i.e., least cost) commandsequence will be selected.

Set_Row sets the cursor to the specified row, withoutaffecting the column. Instead of a separate routine, this could bemultiplexed ontoSet_Cursor, say by one of the following:

Set_Cursor(row, -1);Set_Cursor(row, Get_Column());

Set_Column sets the cursor to the specified column withoutaffecting the row. The functionality provided by this routine couldalso be multiplexed ontoSet_Cursor.

Set_Attr sets the current attributes to those specified.

Get_Row returns the row that the cursor is on.

Get_Column returns the column that the cursor is on.

Get_Attr returns the current attributes.

Put_Char outputs the supplied character to the screen,updating the cursor position. The character is always displayed: itis never part or all of a command sequence.

Put_String outputs the supplied string and leaves the cursorafter the string. It otherwise works as perPut_Char. Thesestrings are always displayed, even if they appear to contain screencommands. Commands may be sent to the screen only by means of thesupplied procedures.

Beep rings the screen's bell or flashes the screen.

CLEOL(void);Clear_Line(void);CLEOS(void);Clear_Screen(void);

CLEOL sends the command sequence that optimally clears fromthe current cursor position to the end of the current line. Thecursor does not move.

Clear_Line sends the command sequence that optimally clearsthe entire current line, and leaves the cursor in column 0 (you areassuming a zero-origin on all of these numbers, aren't you?).

CLEOS sends the command sequence that optimally clears fromthe current cursor position to the end of the screen. Lines after thecurrent one are completely cleared. The cursor does not move.

Clear_Screen sends the command sequence that optimallyclears the entire screen, and leaves the cursor at the upper-leftcorner (row 0, column 0).

void Insert_String(char *str);void Delete_Chars(int count);void Insert_Lines(int count);void Delete_Lines(int count);void Scroll_Lines(int from, int to, int count);

These commands are available on advanced displays only (by thedefinition of an advanced display from Chapter 2), and each is assumedto send the optimal command sequences to effect its purpose.

Insert_String takes a string, determines the optimal commandsequence required to insert it, and inserts it starting at the currentcursor location. Line wrap is not performed: excess characters aredropped off the right edge of the screen. This routine could havebeen defined to accept a count instead of a string and to insertblanks. However, it is easier to optimize command sequences by havingthe string to be inserted available.

Delete_Chars accepts a count and deletes that many columns.Line wrap is not performed. Blank columns are inserted from the rightedge of the screen.

Insert_Lines accepts a count and inserts that number ofblank lines, starting with the line that the cursor is on (thus, youcan insert lines at the very top of the screen).

Delete_Lines accepts a count and deletes that number oflines, starting with the line that the cursor is on (thus, the line atthe very top of the screen can be deleted). Blank lines are scrolledin from the bottom.

Scroll_Lines accepts afrom line, ato line,and acount. The lines starting with thefrom line, andup to but not including theto line, are scrolled bycount lines (positive scrolls the lines up and negative scrollsthe lines down).

private Screen_Timings(private goal);

Screen_Timings accepts a description of a goal, and returnstiming information on the various choices of screen routines thatcould be used to achieve that result. The description takes intoaccount the current screen status. The information is to help theredisplay code select the best screen routine, not to help the screenroutines optimize their own performance (such optimizations areassumed to be done anyway). As with the other private data types, theinformation is returned in an implementation-defined manner(i.e., you get to invent your own representation). Note thatthe two private data types in this procedure's definition (that forgoal and the procedure itself) refer todifferent datatypes with different representations.

7.3 Considerations

This section describes various considerations that go into theredisplay algorithm. In other words, these are the ways in which thealgorithm gets complicated. While none of these ways are particularlydifficult to implement in themselves, collectively they would clutterthe redisplay algorithms presented later. Hence, you should keepthese topics in mind when reviewing the algorithms.

The topics in this section are only vaguely related to each otherand are in no particular order.

7.3.1 Status Line

In general, each buffer will have some lines of status information.In addition, there may be general editor status information. Finally,there may be lines of separators between windows. (One hopes that on"small" screens (i.e., those with less than, say,fifty lines), the numbers for these are "one,""none," and "none: use the buffer status line as awindow separator" in order to devote as many lines as possible toshowing the text being edited.)

In any event, this "framework" information must beretained and displayed. The user-oriented command routines andredisplay must work together to provide this infrastructure.

Here are some sample types of per-buffer status information:

the file name
the buffer name (may be the same as the file name)
the buffer status: unmodified, modified, read-only
the current modes
the point position in characters and/or buffer length
the point position as a percentage
the location of the top of the window as a percentage (or"top", "bot", or "all" as appropriate)
the point column
the current attribute
the current line and number of lines

Of course, any one editor implementation will only show some ofthis information at a time. This list is not definitive.

Here are some sample types of editor status information:

the name and version number of the editor
copyright information
the current date and time
the current system "load average," or other systeminformation

Again, any one implementation may only show some of thisinformation, and this list is not definitive.

7.3.2 End of the Buffer

There are two cases that must be handled.

First, if the entire buffer fits in the window, you will run out ofbuffer before you run out of window. The caveat here is to ensurethat this case is properly detected and that the entire buffer isshown, with the start of the buffer at the upper-left corner.

Second, if the entire buffer does not fit in the window but the endof the buffer appears, the end should be close to but not at thebottom of the window.

Those portions of the window that follow the end of the buffer canbe left blank or marked in some fashion. As a rule, Emacs-typeeditors leave that part of the window blank.

7.3.3 Horizontal Scrolling

A window has a finite width. Some lines will not fit within thatwidth. There are two popular ways of handling such a situation:horizontal scrolling and line wrap. Ideally, your editor should offerthe user a choice between them. This section will describe thefirst.

When performing horizontal scrolling, a line longer than the windowwidth will spill off the edge: the part of the line that does not fitthus will not be visible to the user. As the user types, the textbeing displayed will adjust so that the text around the point isalways visible. In addition, the user should be provided withcommands to move the window left or right (with a few characters ofoverlap). In addition, the status line should contain indicators thatshow whether text is currently lost off of either the left or theright sides (use separate indicators).

7.3.4 Line Wrap

When performing line wrap, the window never moves left or right atall. Instead, the text that would have been clipped off of the rightedge of the window is wrapped to the next line. If the line issufficiently long, it may wrap two or more times.

In this type of display, no window motion commands are required.In addition, the status indicators are also not required, although youmay wish to mark the wrapped lines.

Line wrap introduces a new problem that must be handled properly:that of single lines that, when wrapped, occupy the entire window.Although rare, such lines do show up from time to time in non-textfiles.

When horizontal scrolling and line wrap are compared, neither comesout a clear winner and both offer valuable features, hence theassertion that your implementation should support both line wrap andhorizontal scrolling.

The advantages to horizontal scrolling are that it is easy toimplement, and can be processed quickly.

One of the disadvantages is that it requires a fast display.Consider the case when the user has a 160-column line displayed in an80-column window. On the average, the window will have to be shiftedtwice per line of typing. Another disadvantage is that clipped textappears to have been deleted. It can be rather disconcerting to theuser to have this text vanish and reappear.

One of the advantages of line wrap is that all of the text isalways visible. In addition, when editing a very long line, theentire window shows the immediate context. In contrast, when editingthe end of a long line when using horizontal scrolling, most or all ofthe remainder of the window will be blank, having been scrolled offthe left edge.

The main disadvantages to line wrap are the additional complexityin the redisplay required to handle the line wrap, the very poorpresentation when lines are only slightly wider than the window, andthe disconcerting multi-line "shifting" that occurs when auser is inserting or deleting near the start of a wrapped line.

7.3.5 Word Wrap

Once you have line wrap, the next logical step is to break the lineon a word boundary instead of a character boundary. You then offer"word wrap," a feature found in almost every word processoravailable today. Typically, a word processor will store eachparagraph of text as a single line and simply perform word wrap uponit. Ruler lines are used to adjust the margins and change the type ofjustification.

This is a very nice feature to offer. It does have some pitfallsfor the unwary implementer, however:

Your redisplay now has to handle look-ahead.
You are going to have to decide where to break the lines (whitespace only, include dashes, include other punctuation?).
Your users are going to want ruler lines, and so you must provideall of that infrastructure.
You will have to track where the word wrap actually occurs becausethe user thinks (and hence the line-manipulating commands operate) interms of the lines as displayed.

If you do implement word wrap, you may as well go the whole way andsupport flushing right, centering, and justification of text duringdisplay.

7.3.6 Tabs

Tab characters can be handled in two ways. The first way is to nothandle them at all. Instead, convert them to spaces upon entry. Inthis case, the redisplay code never sees those characters and hencedoesn't need to deal with them.

The second - and by far the most common method - is to treat thetab as a "cursor control command" that in effect says"think of me asn blanks, wheren is the number ofunits to the next tab stop." Thus, when the redisplay codeencounters a tab, it computesn, then pretends that it isdisplayingn consecutive blanks (or a single blank of widthn).N can be computed in one of three ways.

First, tab stops can be set everyc columns (orcharacters). In a zero-origin numbering system, tabs set everyc columns are set at columns 0, C, 2*C, 3*C, ... For example,whenc is 8, tabs are in columns 0, 8, 16, 24, ... TheC language expression to computen is:

n = c - x % c;

wherex is the current column.

The second way to set tab stops is to allow them to be set atarbitrary column positions. This way is often used in ruler lines insimple word processors. In this case, you must decide on arepresentation such as a bit array or an array of the columns wheretabs are set.

The third way to set tab stops is to allow them to be set atarbitrary positions, where the positions are measured in units such asinches, millimeters, etc. This way is most useful on graphicsscreens.

So far, only "traditional" tabs have been described.These might be termed "left" tabs because the left edge ofthe text is placed at the tab stop. Other types of tabs have becomepopular (again) with the advent of word processors:

Right tabs adjust the position of the text to the left ofthe tab stop so that its right edge is at the tab stop. Typically,all text back to the previous tab stop or the start of line isadjusted.
Decimal tabs search for a decimal point (comma in Europe)and place that character at the tab stop. Again, typically, all textback to the previous tab stop or the start of line is adjusted. Thesetabs act as right tabs if the text does not contain a decimal point.
Centering tabs center the preceding text between thecurrent and previous tab stop.

Again, other variations are possible. Note that only the(traditional) left tabs can be implemented without some sort of lookahead.

7.3.7 Control Characters

Control characters are those that are not printing characters, aspace, a newline, or a tab. ("Printing characters" meansjust that: if your system supported "extended" or"enhanced" character sets, then those characters may notcount as control characters.) In addition, a word processor may storesome information "in band." That information would be eitherinterpreted or skipped on redisplay.

However, even in a word processor or on a system with an extendedcharacter set, there should be a way to view (and edit) a"pure" binary file. In order to view such a file, theremust be a standard representation for non-printing characters.

One representation is to show such characters in octal("\###") or hexadecimal ("\x##").

However, the most common representation -- and possibly the mostuseful one -- is to show such characters in caret notation (for acomplete list of the caret notation, see Appendix E). The easiest wayto define this notation is with a code excerpt:

void Caret(char c){if (c == NL) {...handle newlines...return;}if (c == TAB) {...handle tabs...return;}if (c & 0x80) {Put_Char('~');c &= 0x7f;}if (c < SP || c > '~') {Put_Char('^');c ^= '@'}Put_Char(c);}

When handling these multiple-character characters, yourimplementation must be consistent. For example, be sure that yourcursor-positioning code takes the extra characters into account. Yourimplementation must also properly handle the case where such acharacter spans a line boundary. It doesn't matter which choice ismade here (i.e., the choice is between splitting the characterat the boundary and moving the whole character to the next line), onlythat your implementation handle it consistently and correctly.

7.3.8 Proportionally Spaced Text

Once you have tabs and control characters down, displaying text ina proportionally spaced font is not too difficult. The main variationis that you no longer assume that all printing characters are the samewidth. Instead, you look up the width of each one as you display it.Actually, you can even supportkerning by looking up eachconsecutive pair of characters to decide how to handle them.

The main "gotcha" in supporting proportionally spacedtext is that one character no longer always exactly overwrites anotheron the screen. Thus, if you change an "m" to an"i", you have to figure out what to do with the extra width.Fortunately most displays that handle proportionally spaced text(mainly graphics displays) offer a high-performance primitive toscroll a region of the screen.

7.3.9 Attributes, Fonts, and Scripts

The next level of generality is the support of attributes, fonts,and scripts. Attributes include such modifiers as boldface, italics,underscoring, and superscripting. Fonts include the differenttypefaces such as Times Roman and Helvetica. Scripts include languagefamilies such as European and Japanese.

With these, your support can be as complex as you wish. Especiallywhen it comes to scripts, your time and energy are going to give outlong before you can provide support for all languages.

However, each one is fairly simple to handle. The first step is tostore the attribute, font, and script information somewhere (seeChapter 5). The second step is to interpret that information.

7.3.10 Breaking Out Between Lines

As was mentioned in the procedure interface definitions, theredisplay process does not have to run to completion before editingresumes. Instead, it can get to a convenient spot and check for anyuser input. If input has arrived, the redisplay can be aborted("broken out of") and the input processed. It is importantto keep in mind that the purpose of redisplay is to provide feedbackto the user. If the user has already typed something, there is noimmediate need for the feedback. Hence, redisplay can be broken outof and then restarted after the user's input has been processed.

In order to keep the amount of state information to a minimum, itmay make sense to not abort instantly, but instead to finish a currentchunk of redisplay (say, a line) before checking for input. At theminimum, you must keep track of how far along you had proceeded, sothat you don't wind up redisplaying your redisplayed text.

The presence of between-line breakout can affect how your redisplayis done. For example, if resources are tight, it may make sense tostart by redisplaying the line that the point is on, then to go on tothe other lines as you have time. In that way, the information thatis most important to the user is the first to get updated.

Lest there be any doubt: between-line breakout is a very importantfeature and should only be left out of the very simplestimplementations or those implementations that can complete even themost complex redisplay in under 100 msec.

7.3.11 Multiple Windows

Supporting multiple windows implies that the screen is divided intosections, with each section showing a possibly different buffer orpart of the buffer. There are several ways that multiple windows canbe supported:

Don't support them. Instead, rely on the (presumed) operatingsystem ability to run multiple instances of the editor. This is notdesirable because the different instances may not be able tocommunicate quickly with each other. For example, you may not be ableto "cut" from one window and "paste" into another.
Support horizontal windows only. Horizontal windows occupy theentire width of the screen. This is a good and popular choice. It isnot too difficult to implement, yet it provides a large chunk of therequired functionality.
Support both vertical and horizontal windows (tiled). (Better)
Support arbitrary overlapped windows. (Best, and supported bymany windowing packages)

The main thing to keep in mind when implementing multiple windowsis that, when two or more windows contain the same text, changes madeto one should be immediately reflected in the other.

If you do support multiple windows, you can implement status andprompt lines as buffers in themselves and simply fit them in asadditional windows to be displayed. In this way, you no longer haveto consider them as special cases.

7.4 Redisplay Itself

The basic role of redisplay is to ensure that all changes to thesub-editor are promptly reflected on the screen. Two major approachesare used by implementers to performing redisplay.

   -----------------   | user commands |   -----------------     / \    /  \   v   v---------------------------| sub-editor || redisplay |---------------------------     First Approach

The first approach is for the routines which are invoked by theuser to tell the redisplay code exactly what they did (e.g.,"I deleted 5 characters from here"). This approach is not avery clean one and it is prone to error, as the same information mustbe given twice (once to the sub-editor and once to redisplay), andhence an implementation must handle the situation where the two setsof instructions are not consistent (e.g. the application tellsthe sub-editor to delete a line but tells redisplay to insert a line).This is an especially important consideration because we would like toencourage novice users to write their own commands. The extra effortof getting the redisplay correct might discourage such efforts.

   -----------------   | user commands |   -----------------     /    /   v---------------------------| sub-editor |<-------->| redisplay |---------------------------     Second Approach

The second -- and preferred -- approach is to have the redisplaycode communicate with the sub-editor to track the changes. Thisapproach also has two methods of operation.

The first method (which might be called"sub-editor-driven") is to have the sub-editor callscommunicate directly with the redisplay. For example,Insert_Char would make a call to display saying, "Iinserted this character at this place." The second method (whichmight be called "redisplay-driven") is to have the redisplayoperate on its own and ask the sub-editor for information.

The sub-editor-driven method appears to be simple to implement, butupon closer examination turns out to be quite complex. Thiscomplexity arises for several reasons.

First, the desirable operations for a sub-editor to offer (as shownin the sub-editor procedure interface definitions) do not match wellto the available operations on displays. Hence, the redisplay codewill have to perform this conversion. An example would be deleting aline. The code to perform the delete might be:

void Delete_Line(void){mark_name beg;Find_First_In_Backward(NEWLINE);/* to start of line */if (Mark_Create(&beg) != OK) return;Find_First_In_Forward(NEWLINE);/* to end of line */Point_Move(1);/* skip over newline */Copy_Region(kills, beg);/* save in kill buffer */Delete_Region(beg);/* gone */Mark_Delete(beg);}

The sub-editor operation is "delete a region" and theregion just happens to contain a line.Somebody has to examinethe region to determine that it contains a line and that a"delete line" call to the displaymight be thecorrect one to use.

Second, the redisplay code will have to filter the sub-editoroperations (and subsequent directives) that happen outside thewindow.

Third, every change made in the buffer does not necessarily imply achange in the display. For example, if the buffer contains thetext:

Here is a line.Here is a line.Here is a line.Here is a line.

and the first line is deleted, the following lines do not in factchange. That particular case may be rare, but the following happensfairly often:

Here is line 1.Here is line 2.Here is line 3.Here is line 4.

In this case, the "Here is line " strings should not beredisplayed.

Fourth, the change might be no change at all. For example, the"lower case region" command applied to the text:

Most people believe the Unicorn to be a mythical animal.

might have in its inner loop:

Replace_Char(tolower(Get_Char()));Point_Move(1);

This would have the effect of telling redisplay 56 times that acharacter had changed, when in fact only two of those characters werechanged. One might argue that theReplace_Char routine couldcheck to see whether the new character was in fact different beforeinforming redisplay, however:

You haven't gained anything, just changed who is doing the checking.
The inner loop could have been written:

c = Get_Char();Delete(1);Insert_Char(tolower(c));

The most telling reason for not using the sub-editor-driven method,however, is more fundamental. The sub-editor's responsibility is tohandle the buffer, not redisplay. It is the redisplay'sresponsibility to handle the redisplay function.

The algorithms presented in the remainder of this chapterillustrate the basic algorithms. They do not handle all possibleerror cases, nor do they handle many of the options listed above, suchas variable width characters, line wrap, between-line breakout, andothers.

7.4.1 The Framer

The framer is that part of the redisplay code that decides whatpart of the buffer will appear in the window. The redisplay codemaintains two marks, one at the top of the window and the other at thebottom. The algorithm is fairly simple. Here it is:

int num_lines_window;/* the number of lines in the window */int point_pct;/* the preferred percentage */void Framer(void){mark_name saved;location new_start_loc;int cnt;/* remember where we started */if (Mark_Create(&saved) != OK) {Fatal("can't create mark for redisplay");}Find_First_In_Backward(NEWLINE);/* count at most one window's worth of lines */for (cnt = 0; cnt < num_lines_window; cnt++) {/* stop at the start of the window */if (Is_Point_At_Mark(top_of_window)) break;/* stop at the start of the buffer */if (Compare_Locations(Buffer_Start, Point_Get) >= 0)break;/* record where a fresh screen would start,just in case we need it */if (cnt == point_pct * num_lines_window)new_start_loc = Point_Get();Point_Move(-1);Find_First_In_Backward(NEWLINE);}/* has the window moved? */if (cnt >= num_lines_window)Mark_Set(top_of_screen, new_start_loc);Point_To_Mark(saved);Mark_Delete(saved);}

In essence, the algorithm followed by this routine is: "solong as the point would still wind up in the window, leave the startof window unchanged. If the point would not wind up in the window,place it at the preferred percentage."

This version of the algorithm assumes that a buffer line willalways occupy exactly one window line and that all buffer lines arethe same height.

7.4.2 The Basic Algorithm

The basic redisplay algorithm is as follows:

int num_lines_window;/* the number of lines in the window */int num_chars_window;/* the number of characters in thewindow (its width) */char window[MAX_ROWS][MAX_COLS];/* window contents */void Redisplay(void){mark_name saved;int row;int col;int i;int point_row;int point_col;char c;/* remember where we started */if (Mark_Create(&saved) != OK) {Fatal("can't create mark for redisplay");}Framer();Point_To_Mark(top_of_window);/* loop over the whole window */for (row = 0; row < num_lines_window; row++) {for (col = 1; col < num_chars_window; col++) {/* save the coordinates of the point so that we can put the cursorthere later */if (Is_Point_At_Mark(saved)) {point_row = row;point_col = col;}c = Get_Char();if (c == NL) {/* at a newline? *//* check whether the rest of the window line is blank.  if it is not, clear it */for (i = col; i < num_chars_window; i++) {if (window[row][i] != SP) {Set_Cursor(row, i);CLEOL();memset(&window[row][i], SP,num_chars_window - i);}}}/* no newline, so has there been a change in the sub-editor? */else if (window[row][col] != c) {Set_Cursor(row, col);Put_Char(c);window[row][col] = c;}Point_Move(1);}}/* clean up */Mark_To_Point(bottom_of_window);Set_Cursor(pointrow, pointcol);Point_To_Mark(saved);Mark_Delete(saved);}

The preceding code shows your basic, garden variety redisplayalgorithm. It will work on any screen that supports cursorpositioning (theCLEOL call can be simulated by sending Spacecharacters). It will work quite well on communications channelsrunning at 4800 bps or over. Its only memory requirements are anarray large enough to hold the window (typically 1920 characters).There are no special redisplay "hooks" in the sub-editormanagement code.

This algorithm is sufficient (and nearly optimal) in those caseswhere CPU and memory are plentiful and the screen does not performinsert/delete line or character operations. If memory is tight, thealgorithm can be modified to only retain a complete copy of thecurrent line. If you must be prepared to emulate theCLEOLoperation, it may be worthwhile to record the last non-blank column ineach screen line. Doing so minimizes the number of Space charactersthat must be sent.

7.4.3 Sub-Editor Interaction

The basic algorithm can be sped up tremendously if someredisplay-specific hooks are placed into the sub-editor. There are anumber of different ways that the hooks can be introduced. All ofthese methods track the changes made to the buffer in one way oranother.

The first way is to keep a separate modification flag that tellswhether any changes were made to the buffer since the last redisplay.If no changes were made, then redisplay will consist of either asimple cursor motion or a complete screen regeneration.

The second way, and much more useful, is to keep the modificationflag on a per-window-line basis. A general interface to accomplishthis that works with all sub-editor implementation schemes is todefine a third type of mark, called awindow mark. This markhas a flag associated with it. There is one window mark for each linein the window. Just after a redisplay has been completed, all theflags for all window marks are clear. Each time the sub-editorchanges any of the buffers' contents, it sets the flag on the windowmark that is located before and closest to the change. The redisplaycode can examine the flags. Only window lines that have theircorresponding window marks set need to be examined closely duringredisplay.

Note that window marks need not be located at the start of a bufferline. If lines are being wrapped, one will be at each wrap point. Ifhorizontal scrolling is being performed, one may be at the start ofthe buffer line and another at the right edge of the window. In thisway, changes made to the right of the window won't cause the redisplaycode to examine unchanged text.

This interaction is easy to define and implement in the sub-editor.It is inexpensive to implement as the marks have to be examined forupdating anyway. It is also highly effective at reducing CPUoverhead, as most commands change only a single line. And, althoughredisplay has to examine the flags for every line, most of the timeonly one or two will show changes..

A third way is to associate a unique identifier with each windowmark instead of a flag. This identifier would be changed by thesub-editor whenever the associated text changes (i.e., insteadof setting a flag it changes the unique identifier). Typically, theidentifier will be a 32-bit integer. Whenever an identifier isrequired, the current value of the integer is used and the integer isincremented.

The only problem that can arise with using unique identifiers is ifa unique identifier is not in fact unique. This problem can arise ifall 2^32 unique identifiers are consumed before all lines in thewindow have changed.

Some sub-editors that use the linked-line scheme use the addressesof line structures as the unique identifiers. While doing so is spaceefficient, the sub-editor must ensure that if a line is freed, theaddress is not re-used until all windows have been completelyredisplayed.

Finally, there is one more flag that can help redisplay a greatdeal. This flag is only useful if the point is located at the end ofa buffer line. The flag would say whether any buffer modificationother than "insert one or more characters" has beenperformed. If the flag says not, all that redisplay needs to do is tooutput those characters. As this situation is very common, it cansave a significant amount of computation.

7.4.4 The Advanced Algorithm

The advanced-redisplay algorithm has two improvements over thebasic algorithm. First, it provides a way of efficiently takingadvantage of the insert/delete line and character functions which aresupplied with many screens. Second, it provides a low CPU overhead wayof performing a redisplay on basic displays.

The basic idea used by this algorithm is to assign a uniqueidentifier to each window line. (See the preceding section.) Whenthe redisplay encounters a modified line (the unique identifiers don'tmatch), it performs a pattern match on the unique identifiers for theremainder of the window. It then uses the information derived fromthat match to determine the best sequence of insert/delete linecommands to issue to the screen.

In more detail, this algorithm loops over the window lines,checking each saved unique identifier against the current identifierreturned by the sub-editor. If they match, no work needs to be doneand the algorithm proceeds to the next line. If they don't match, itcan be for one of three reasons.

The first reason could be that an additional line or lines wereinserted between the two window lines. This condition is detected bycomparing the window-line unique identifier against the rest of theunique identifiers returned by the sub-editor and finding amatch. (Remember that the window-line unique identifiers are theunique identifiers returned by the sub-editor one redisplay iterationago.) The insertion case is where we once had lines:

AB

and now have:

ACB

We determine how many lines are in "C" (because we knowhow far down we had to go to find a match) and tell the screen toinsert that many lines. (If there is information after this window onthe screen, you will first have to delete that many lines from the endof the window.)

The second reason could be that a line (or lines) was deleted.This condition is detected by comparing the unique identifier returnedby the sub-editor for the next line against the unique identifiers ofthe rest of the window lines and finding a match. The deletion caseis where we once had lines:

ABC

and now have:

AC

We determine how many lines are in "B" (because we knowhow far down we had to go to find a match) and tell the screen todelete that many lines. (If there is information after this window onthe screen, you will eventually have to insert that many lines at theend of the window.)

The third reason could be that the line was changed. Thiscondition is detected by comparing the unique identifiers of thefollowing window lines against the unique identifiers returned by thesub-editor. This case is either where we once had lines:

ABC

and now have:

ADC

or:

ADE

In other words, neither the insertion condition nor deletioncondition was met. Knowing now that a line has been changed, the nextstep is to determine exactly how the line has changed.

The algorithm starts by comparing the buffer line against thewindow line and determining how many leading characters are in common.(If the whole line is common, no changes need to be made to the screenand the algorithm stops.) For example, if the window line is:

abcdef

and the buffer line is:

abcxef

the two have three characters in common from the start.

The next step is to repeat the comparison, but work backwardsstarting from the end. The example strings have two characters incommon from the end.

The third step is to compare the line lengths. If the two linesare the same length, only the changed part in the middle needs to beupdated on the screen. In the example strings, the lengths are thesame (six6). This optimization can be done even on a basicdisplay.

If the two lines are not the same length (for example, the bufferline is "abcxyzef"), the characters in the window line thatare replaced by characters in the buffer line can be rewritten (in theexample, the "x" replaces the "d"), then therequisite number of characters can either be inserted or deleted andthe remainder of the changes written (insert two characters,"yz"). If there is no common text at the end of the lineand the buffer line is shorter than the window line, aCLEOLcall can be used instead of deleting characters.

Line wrap can pose a problem. The window and buffer lines may haveno end text in common, and yet an insert or delete character operationmight be the appropriate one. For example, consider the case wherethe window width is six characters, the window line is"abcdef", and the buffer line is "abcxdef". Here,the buffer line will ultimately become two window lines,"abcxde" and "f". This case is detected by havingno common portion at the end and noticing that the line wraps. A morecomplicated matching process can detect the situation and appropriateaction can be taken.

This entire section considered only the (admittedly very common)case where line and character insertions and deletions were only madein one place. It is very reasonable and appropriate to use moregeneral pattern-matching techniques to properly optimize multipleinsertions and deletions (Miller 1987).

7.4.5 Redisplay for Memory-Mapped Displays

Redisplay for memory-mapped displays boils down to one of threecases. Each case is relatively simple.

First is the case where both reading from and writing to the screencauses flicker. The solution is to use the basic redisplayalgorithm.

Second is the case where reading does not cause flicker but writingdoes. The solution is to use the basic redisplay scheme, but changeit to use the actual window memory for storing the window array.

In the third case, neither reading nor writing causes flicker. Oneach redisplay cycle, merely copy the buffer text into window memory,not forgetting to process new lines, etc., as needed.

Questions to Probe Your Understanding

Define a set of editor procedures to handle vertical windows.(Easy) Extend that set to handle overlapping windows. (Medium)

Implement the procedures that you just defined. (Hard)

Define a representations for the private data types mentioned here(function keys, attributes, command times). (one is Easy, alltogether are Medium)

Identify the places where left-to-right, top-to-bottom biases arebuilt into the interface definitions. English and European languageshave this bias. (Easy)

Rework the interface definitions to remove this bias and to be ableto handle all eight (yes, eight) combinations ofdirections. (Medium)

Outlining is popular these days. "Outlining" is theability to selectively skip over parts of the text during redisplay.For example, one level might only display the chapter and sectiontitles. Another level might include all titles and the first sentenceof each paragraph. Identify how adding outlining would affectredisplay. (Medium)

Identify how the editor's redisplay algorithm changes when it makesuse of the UNIXcurses library. (Hard)

How do the presence of ligatures and contextual forms used bynon-Roman languages affect cursor motion? (Medium)

What modifications to the redisplay algorithm are required tohandle ligatures or contextual forms used by non-Roman languages?(Hard)

Back to Contents.

Eight: User-Oriented Commands: The Command Loop

He left it dead, and with its head
He went galumphing back.

The previous two chapters described a way of dividing animplementation into parts and covered the internal sub-editor andredisplay. This chapter describes the last part: the user-orientedcommands. This last part is what gives the editor its"feel." It determines the overall command structure (thesyntax) and what each of the commands does (the semantics).

Command structure is a large enough topic to be divided into twochapters. This chapter describeshow to implement the commandstructure. The next chapter covers command set design issues. Itthus describeswhat commands should be implemented.

8.1 The Core Loop: Read, Evaluate, Print

The command loop is built around a basic core. This core reads incommands, evaluates (or executes) them, and prints the results.

Reading commands is the process of accepting user input anddetermining what operations the user wishes to perform.

Evaluating commands is the process of carrying out theuser's wishes. In general, this is done by executing a series ofsub-editor calls.

Printing the results is the redisplay.

The core loop looks like this:

char c;while (1) {c = Key_Get();if (Evaluate(c)) break;Redisplay();}

This loop accepts user input (a single character), evaluates it,exits (if the user has requested to quit the editor, causingEvaluate to return True), and invokesRedisplay. This,like all program examples in this chapter, is a simplified version ofjust one of the many ways you can implement these functions. They aremeant as examples, not as limits.

8.1.1 The Evaluate Procedure

FLAG Evaluate(char c){FLAG is_exit = FALSE;FLAG is_arg = FALSE;int arg = 1;while (!(*commands[c])(&is_arg, &arg, &is_exit, c)) {c = Key_Get();Redisplay();}return(is_exit);}

This is the core of theEvaluate routine. Theis_exit flag records whether the command is one to exit theeditor. Theis_arg flag records whether the user has specifieda repeat-count argument. Thearg variable records therepeat-count argument.

This routine -- and the editor implementation -- is built around aset of command dispatch tables. Each table is an array of pointers toprocedures, indexed by command characters. Thus, the element:

commands['a']

would specify the procedure to handle the command designated by the"a" character. These procedures all have the sameinterface. This interface is:

FLAG Command_Procedure(FLAG *is_argptr, int *argptr,FLAG *is_exitptr, char c);

The first three arguments arepointers to the three statevariables. They are pointers instead of the values so that thecommand procedures can alter their values. The fourth argument is thecharacter that is used to invoke the procedure. The procedure returnsTrue if the command has completed, or False if the command isincomplete.

The reasons for selecting this interface will be made clear throughfour sample command procedures: "move by character,""insert character," "second-level dispatch," and"accept an argument."

8.1.2 Move by a Character

This procedure moves forward byarg characters ifargis positive or backward byarg characters ifarg isnegative. It looks like this:

FLAG Move_by_Character(FLAG *is_argptr, int *argptr, FLAG *is_exitptr, char c){Point_Move(*argptr);return(TRUE);}

8.1.3 Insert a Character

This procedure insertsarg copies of the character used toinvoke it. Ifarg is negative, its absolute value is used. Itlooks like this:

FLAG Insert_A_Character(FLAG *is_argptr, int *argptr, FLAG *is_exitptr, char c){int arg = *argptr;if (arg < 0) arg = -arg;while (arg-- > 0) Insert_Char(c);return(TRUE);}

8.1.4 Second-Level Dispatch

This procedure doesn't implement a command itself. Rather, itaccepts a second character and uses that to select a command from asecond dispatch table. In this case, the "^X" characterwill be used as the dispatch.

FLAG Ctrl_X_Dispatch(FLAG *is_argptr, int *argptr, FLAG *is_exitptr, char c){char c;c = Delayed_Display(CTRL_X_DELAY, "^X ");return(*ctrl_x_commands[c])(is_argptr, argptr,is_exitptr, c));}

TheDelayed_Display routine waits for a character andreturns it. If more than a specified amount of time passes with noinput, the prompt string is displayed.

Note that this routine passes the arguments and exit status to andfrom the second-level command routine.

8.1.5 Accept an Argument

Again, this procedure doesn't implement a command itself. Rather,it performs one step of accepting a numeric argument. For thepurposes of this example, we will assume that all digit charactersspecify an argument and do not insert themselves.

FLAG Argument(FLAG *is_argptr, int *argptr, FLAG *is_exitptr, char c){if (!*is_argptr) {/* no arg yet */*is_argptr = TRUE;*argptr = 0;}*argptr = *argptr * 10 + c - '0';return(FALSE);}

This routine is the first one that does not completely execute thecommand. Rather, it modifies the state information that is passed tothe command procedure itself.

(Note: this routine doesnot implement the Emacs"universal argument" command, but is a simplified versionfor the purposes of this example only. It actually performsvi-style argument handling.)

8.1.6 Philosophy

The loop as described puts few (theoretical) restrictions on thecommand syntax. Each character, in its raw form, is mapped to aprocedure which is in turn evaluated. State information is passed toand from this procedure, which can either update the stateinformation, perform an operation, or both. Arbitrary syntax andsemantics can be implemented with this base.

In theory, a syntax of commands being words (e.g.,"delete," "move," etc.) could be implemented inthis structure by having either a large number of dispatch tables (andthus implementing a symbol-state table architecture) or a procedurewhich parses the syntax of the command via conditional statements. Ifyou really want to do one of these, you will want to invent your own-- different -- internal structure.

8.1.7 A Minimalist Command Set Design

Consider the thought that every character that is typed at thekeyboard causes a procedure to be executed. The first conclusion thatresults is that it is silly to type "insert x" or anythinglike that when you want "x" to be inserted. As this is avery common operation, it makes more sense to bind the key"x" to an "InsertX" function (or, more probably,theInsert_A_Character procedure just defined).

This architecture binds all of the straight, printing ASCIIcharacters to commands that insert the character. The remainingthings that can be entered from most keyboards are the controlcharacters, the delete key, and the break key. These could be boundto functions that implement a complex syntax, but why bother? It isnot too difficult for users to learn even a large number of keybindings, so let us bind the control keys directly to usefulfunctions. For example, ^F could be "move forward acharacter," ^D could be "delete the followingcharacter," and so forth. Note that the "break" keydoes not have an ASCII value and is therefore difficult to use withoutwriting operating system-specific code.

Thirty-three functions (the 32 control characters plus the Deletecharacter) are not enough for even the commonly used functions. Thus,some of the keys should be bound to functions which temporarily rebindthe dispatch table. For each of these rebinding functions, 128 newfunctions are made available (there is no reason for the printingcharacters in those second-level tables to be bound to "selfinsert").

Thus, even though we began with a structure for the command loopthat did not impose any constraints on the syntax of commands (andthus was as general as possible), we arrived at a specific syntax forcommands. This syntax is to bind the printing characters to"self insert," bind the control characters to a mixture ofuseful functions and second-level dispatch tables, and to have threeor four alternate dispatch tables (enough to supply many hundreds ofcommands). Thus, commands are rarely more than two keystrokes long.The price that is paid for this brevity is a possibly longer timelearning to use the editor effectively.

Note that most of the increased time spent learning the editor isnot from the brevity of the commands, but because there aremore commands to learn. Given a "conventional" editor ofsome other command set design (e.g., insert/replace modes orcommand lines) and an equivalent subset of this "minimalist"editor, learning times will probably be comparable when the samenumber of commands from each are covered (assuming sensible commandassignments in both cases).

8.2 Errors

There are two main types of errors: internal and external.Internal errors are those that occur in the editor itself. Examplesare a subscript being out of range and division by zero. Externalerrors are those that are caused by the user. Examples of these arean attempt to delete off the end of the buffer. There are also"non-error" errors, such as a normal exit condition. Errorscan be detected both from within the editor and from outside theeditor (for example, by the operating system).

8.2.1 Internal Errors

Internal errors will be considered first. These errors cause animmediate exit to the operating system with no questions asked and nodelays tolerated. They will be internally generated by such things asarithmetic overflows and bad subscripts. (While the editor might catchand process some of these, it will not in general process them all.This section only discusses the non-processed ones.) These errors areunpredictable and the state of the editor should remain intact.

The user should also be able to signal such an error to abort outof the editor. He or she might want to do this signaling because of aproblem with the editor itself (e.g., infinite loop) or becausehe or she wants to do something else (e.g., suspend thisprocess and do another task). This signaling is usually done with thehelp of the operating system. In any case, the precise state of theeditor should be retained so that it can be resumed exactly where itleft off. Most operating systems have some facility for doing this;they differ principally in the freedom of action that they allowbefore losing the state. This freedom ranges from nothing to doingarbitrarily many other things.

At the user's discretion, the editor should be restartable eitherfrom exactly where it left off or at a safe restart point. This pointis ordinarily a portion of the editor which recovers the buffers andother current state information and then resumes the command loop.Note that in many implementations, the editor must perform actionsboth on the process suspension and when it resumes. These actionsmust handle saving and restoring the state, restoring and saving thedisplay modes, and taking note of any changes in the environment, suchas a window resizing.

8.2.2 External Errors

External errors are principally user errors. The action ordinarilytaken is the display of an error message and a return to commandlevel. The implementation of this level of recovery is built into theprocedures which implement the commands.

There is a variation of external errors which are generatedmanually by the user. Typically, these involve backing out of anundesired state (e.g., the unwanted invoking of a dispatchtable rebinding or aborting an undesired argument). The ^G characterhas often been used for this purpose. In this case, the procedureswill know that this character has been typed and will implement theback-out protocol.

8.2.3 Exiting

Finally, provisions to exit the editor must be made. Theseprovisions often take the form of a flag variable such as theis_exit variable described earlier.

Note that various other uses might be multiplexed onto this flag,signifying varying levels of "exiting." For example, onelevel could be used by buffer switching in order to rebind thedispatch tables (see the section on modes later in this chapter).Alternatively, the different functions could use multiple flagvariables.

Ordinary exiting involves several types of processing. The editormight ask the user what to do with buffers that have been modified butnot written out. If, as is ordinarily assumed, the state of theeditor is preserved across invocations, the state must be saved. Ifnot, it must be sure that all memory is deallocated. Finally, theuser's environment should be restored as it was found. This impliessuch varied things as cleaning up the stack, closing files,deallocating unneeded storage, and resetting terminal parameters.

8.3 Arguments

Arguments are specified by the user to modify the behavior of afunction. The Emacs argument mechanism will be described as anexample of three diverse ways in which arguments are obtained.

There are three standard argument types. First are numeric(prefix) arguments. These are invoked by a string of functions (whichare in turn invoked by characters typed before the "actual"command character) and are an example of using the key/functionbinding to implement a more complicated syntax. Next are string(suffix) arguments. When obtaining a string argument, the editor isinvoked recursively on an argument buffer, and upon return from therecursive invocation the contents of that buffer are given to therequesting procedure. Last are positional arguments. These are theinternal variables of the editor.

8.3.1 Numeric (Prefix) Arguments

Prefix arguments are entered before the command whose behavior thearguments are modifying, thus, their syntax does not depend upon thecommand. The interpretation of prefix arguments can vary from commandto command. Emacs type editors limit these arguments to numericvalues.

Ordinarily, commands will have an internal variable available tothem named something likearg, and it will have a value of one.Prefix arguments allow the user to change that value to any otherpositive or negative integer. It is useful to provide a mechanism forcommand procedures to determine whether an argument has been given atall. This mechanism allows the procedures to handle the default casewhere no arguments are supplied differently than the case where anargument is supplied.

Each command uses arguments for different, but related,purposes.

The first purpose is to specify a repeat count for a command.Thus, specifying an argument of "12" to the "forwardcharacter" command would cause the command to move forward 12characters.

The second purpose is to tell a command to use a specific value.For example, it doesn't make sense to say "move to the end of thebuffer" 12 times. Instead, that command might interpret itsargument as a line number and move to the specified line of thebuffer. In this case, the "default" value would be the (endof the) last line.

An Emacs-type text editor uses the ^U character as the"universal argument" function. It can be used in either oftwo ways. ^Ucommand means to supply an argument of"4" to command. Adding another ^U means to multiply thecurrent argument by four. Thus, ^U ^U ^Ucommand means tosupply an argument of 64 to the command. The factor of 4 was selectedbecause 5 is too large (1, 5, 25, 125 goes up too fast) and, while 3might have better spacing (1, 3, 9, 27, 81, 243), the powers of 4 areknown by all people who are likely to be around computers. Inaddition, on a 24 x 80 display, 64 is about the number of charactersper line and 16 is 2/3 of the screen height.

The other use is to specify a value exactly. ^Unumbercommand means to supply an argument ofnumber to thecommand. For example, ^U - 1 4 7command means to supply anargument of -147 to the command. The ^U in this case serves as an"escape" to logically rebind the digit and "-"keys. If you want to supply an argument to the commands normallyinvoked by the digit and '-' characters, you use the quote command,located on ^Q.

On some terminals, there are two sets of numeric keys. One set isacross the top row and always sends the ASCII code for thecorresponding digit character. Another set may form a numeric pad andits keys can be configured to send either the ASCII codes for thedigit characters or different codes. In this case, these "othernumbers" can be bound directly to functions that set up theimplied arguments and the initial ^U is not needed.

8.3.2 String (Suffix) Arguments

Numeric arguments are made available in the same way to allcommands. Suffix arguments, however, must be explicitly requested bythe commands that use them. A command may also request multiplesuffix arguments. Most suffix arguments are for strings, notnumbers.

The program notifies the user of the string argument by displayinga prompt. This prompt indicates the type of argument that isrequested. The user responds by entering the value up through andincluding a terminating character. The command then proceeds toexecute, using the value in whatever way is appropriate.

The following points should be taken into consideration regardingstring arguments.

First, the prompt should clearly state what is being asked for --for example, "Name of the file to be read."

Second, the key or key sequence used to terminate the end of thestring should be able to vary and should be indicated in the prompt --for example, "Name of the file to be read (Return): " or"String to search for (ESC): " There should be a way tocleanly abort out of the prompt and its requesting command. Thisshould be the same command used for the "abort" command(e.g., ^G).

Third, in order to facilitate the abort process, the command shouldfirst ask for all user input, and onlythen perform anyactions. This organization means that any abort will result in noeffect rather than leave inconsistent state information.

Each command that requests a string prompt should provide a defaultvalue for the prompt. This value should be used if the user enters anull response. The value should be the program's best guess aboutwhat value the user would most likely want to enter. If no otherguess is available, the last value entered should be used.

Here are some examples of string arguments:

Search string: Ask for a string and search for the next occurenceof it in the buffer. If the user enters a null string, use the samestring that was last entered.
Write file: Ask for a string and, using it as a file name, writethe contents of the buffer to the specified file. If the user entersa null string, use the current file name associated with the buffer.
Change buffer: Ask for a string and switch to the buffer whosename is the user's response. If the user enters a null string, usethe buffer that was last the current one (i.e., the one thatthe user was in before the one that the user is in now). Note thatthis default maynot be the one whose name was last entered.

Here are some example prompts:

Name of the file to write to (Return, default /home/fin/test):
String to search forward for (ESC):
Name of the buffer to switch to (^M, default chapter5):

While these examples were requesting a character string, this neednot always be the case. For example, to enter numeric values, therequesting procedure merely has to convert the read-in characterstring to a numeric value. An example of such a command would be a"go to line" command.

One way to implement the routine that accepts string arguments isto use a variation of theGet_Line routine defined in theIntroduction. However, a better way to implement this routine is tocreate an argument buffer in a new window, display a prompt, and callthe editor recursively with that as the current buffer. By followingthis scheme, the full power of the editor is available to correcttyping mistakes or otherwise make the entry process easier. It hasthe additional advantage of not creating a new "mode": theuser is free to continue editing while responding to the prompt.

Further, the full power of the editor can be brought to bear on aproblem. For example, suppose that someone sends you a mail messagethat says "the answer is in file X." While reading the mailmessage, you give the "find file" command. This commandprompts you to enter a file name. You switch buffers (from the promptbuffer to the mail message buffer), copy the file name, switch back,and paste it into the prompt, then type the prompter terminator.Voila! A fully integrated, modeless environment.

Finally, the prompts need not be "lifeless" and"passive." A passive prompt just accumulates the input untilcomplete, then passes it back as a block. It has no interaction. A"lively" and "active" prompt offers interactionwith the user. For example:

Searching: The search can beincremental, with thesearch proceeding as the user types.
File names: The program can offerfile namecompletion, where the user can enter a prefix, press a key, andthe program fills in as much of the file name as possible. Adifferent key might display a list of all file names that match whathas already been typed.

8.3.3 Positional Arguments

Positional arguments are not directly specifiable by the user. Theyare the editor's internal state variables. Such variables include boththose required by the editor (e.g., the length of the buffer,the locations of the point and the mark, etc.) and those which have aspecialized purpose (e.g., the current value of the right-handmargin, the tab spacing, etc.).

Often these values are used in unusual ways. For example, thehorizontal position (column) of the point can often be a more pleasantway of specifying a value than entering a number. The user canindicate that "this is where I want the right margin to be"instead of having to count characters to get a number.

A specialized positional argument is theregion. This isthe range of text delimited by the point and the mark. By convention,it does not matter whether the point or the mark is placed earlier inthe buffer.

8.3.4 Selection Arguments

The use of graphical input devices opens up new ways of issuingcommands and specifying arguments. For example, the cursor can bemoved by a graphical input device as well as the more traditionalpoint-motion commands. In addition, a region can be specified by a"click and drag" operation (or whatever sequence is used bythe operating system).

8.4 Rebinding

Binding is the act of connecting a name and a meaning,rebinding the act of changing the binding. In the case ofeditors, there are two different levels that binding (and rebinding)can occur on.

The first is at the key level. Binding in this case mean attachingan operation to a key. These bindings are often implemented by meansof a dispatch table.

The second is at the function level. Binding in this case meansattaching a procedure to an operation. Again, these bindings areoften implemented by means of a dispatch table.

For example, the alphabetic keys may be bound to the"insert" operation. This operation, in turn, can be boundto a variety of procedures:

The basic "insert a character" procedure.
The basic procedure, but one that saves a copy of the buffer everyso often.
The basic procedure, but one that performs word wrap (by insertinghard newlines, not in redisplay). This is often called something like"fill mode."
A different basic procedure, say one that performs replacement(overwrite) instead of insertion.

Implementations can perform at one of two levels of rebinding:static and dynamic. Static rebinding is when the new procedure isknown about at the time that the editor is invoked. Allimplementations can perform this level of rebinding. Dynamicrebinding is possible when the new procedure can be defined after theeditor is invoked. Unless otherwise stated, this discussion assumesdynamic rebinding.

To a first approximation, editors that are written in compiledlanguages (e.g., C and Pascal) can only perform staticrebinding, and editors that are written in interpreted languages(e.g., Lisp) can also perform dynamic rebinding. Dynamiclinking, however, allows compiled editors to include new procedures atrun time, and so this distinction is not always a proper one to make.Dynamic bindings are also possible when a compiled language is used toimplement an interpreted language, which in turn implements at leastthe user command portion of the editor.

8.4.1 Rebinding Keys

The process of key rebinding is relatively simple and is doneessentially the same way in all implementations. A set of dispatchtables is used to map keys (represented by their ASCII values) totheir respective functions.

In languages such as C and Lisp, the table can contain the pointerto the procedures themselves. In less powerful languages such asFortran and Pascal, the dispatch table branches to a different part ofthe same routine that contains the table. There, the procedure callis made. In languages that supply it, a case statement can be usedinstead of the n-way branch.

All of these command procedures have the same formal parameters,and so they can all be invoked with the same calling sequence. Thus,the C and Lisp direct invocations can work properly. Note also thatsimple commands do not have to have a separate procedure assigned tothem, but the code to execute them can be placed in-line in place of acall (where the case-statement equivalent is used). Making thissubstitution loses some potential flexibility.

8.4.2 Rebinding Functions

Dynamic rebinding is ordinarily a language-supplied feature and soit will not be discussed in depth. Two comments will, however, bemade on how to simulate it.

If the underlying operating system has dynamic linking(e.g., Multics, OS/2, and some new UNIX systems), a proceduremay be rebound at run time. Dynamic linking is a way of linkingprocedures together in which the actual link is not made until theprocedure is about to be executed. At that time, the procedure islocated in the file system and brought into memory. The link mayeither be left alone, in which case the next call will have theprocedure re-located (a relatively expensive process) or it may besnapped. Snapping a link is the process of converting thegeneral call instruction (which is kept in a special, writeable partof the program) into a call instruction to the appropriate address.If a link is snapped, it must be explicitly unsnapped before anyrebinding is done.

If the operating system does not support dynamic linking, you mightchoose to simulate it manually. Such a process is complex, and somethought will have to be given to the desirability of rebindingfunctions. The process is tantamount to explicit overlaying.

This all has a straightforward bearing on rebindingfunctions. Rebinding a function involves changing the definition ofthe procedure that is invoked by referencing it. What has beendiscussed are ways of changing such a procedure definition. Note thatif the code to execute a function is inserted in-line in the basiceditor, it cannot be rebound by any of these methods.

If dynamic linking is not available and is not feasible tosimulate, there is still one way out. This way will only providestatic rebinding. Instead of just using one dispatch table whichindicates a procedure to be called directly, use two. Use thefirst table to map from keys to the operation to be performed(e.g., ^F is mapped to "moving forward onecharacter") and the second table to map from the operation to beperformed to a procedure that will perform it (e.g.,"moving forward one character" is mapped to theForward_Char procedure).

8.5 Modes

Amode is a collection of command rebindings. Modes can beinvoked implicitly, explicitly, or automatically.

Animplicitly invoked mode is one that is not visible to theuser. Implicit modes are used to support large, infrequently usedcommands. For example, suppose that you had an editor command thatplayed the game Adventure. You probably wouldn't want the code forthat command to be occupying resources whenever you were using youreditor for editing. However, you might still want to make your"adventure" command available at all times. In this case,you would use an implicit mode. The "adventure" commandwould then take these steps:

Load the modules that implement the command.
Rebind the key that invoked the "adventure" command torun the new code.
Run the code the first time.

From now on, whenever the user gives the "adventure"command, the editor will directly execute that code.

Anexplicitly invoked mode is one that the user asks to use.Examples of such modes are "auto fill" mode, "autosave" mode, and alternate command sets. The common element isthat the user gives a command, knowing that that command itself has nofunction other than to persistently alter the key bindings.

Anautomatically invoked mode is one which theimplementation determines is appropriate to invoke, based on a commandgiven by the user that "appeared" to do something else.

One example of an automatically invoked mode is a language mode(for example, a "C" mode). This mode will automatically beinvoked whenever the user edits a C source file (by convention, onewhose name ends in ".c" or ".h"). Such a modemight do the following:

Rebind the internal variable that identifies which characters arelegal in tokens (i.e., variable names) to also include the"_" character, which can occur within C names. This changewould make theForward_Word function treat a C variable name asa word.
Similarly rebind the sentence and paragraph operations to operateon statements and language blocks.
Rebind the ";" key to be an "electric"semicolon so that typing a ";" to finish one statement wouldcause the editor to determine and insert the appropriate indentation.
Similarly rebind the Tab, Return, and Line Feed keys.
Replace the "fill" or "reformat" paragraphcommand with one that "prettyprints" the current languageblock.

And so forth. Another example of an automatically invoked mode isthe specialized mode that the editor places you in when executing suchcommands as "help," "read mail," and "view adirectory." In these commands, the user is effectively placed ina specialized application that shares as much as possible with theregular editor commands, but does have its own extra commands. Forexample, in the "view a directory" application, the"d" key might delete a file, the "r" key mightrename a file, and so forth. However, the "buffer" or"window" switch command should still be available so thatthe user can perform other editing while the special application isactive.

8.5.1 Modes and Dynamic Rebinding

The function rebindings that are commonly done by an editor areknown in advance and so they can be done by any implementation (seethe preceding section for a discussion of the difficulties involved infunction rebinding). Fully dynamic rebinding (the new definition ofthe procedure is not known until run time) is desirable for severalreasons:

Debugging is greatly eased if the trial-and-error cycle time isreduced by not having to compile and link the whole editor each time.Instead, only one function has to be recompiled and linked.
Space savings are achieved if unneeded modes and autoloaded singlefunctions are not brought into memory until called.
If the editor is implemented in an interpreted language, users candevelop their own functions relatively easily. Such"sideline" development is advantageous because it allowsmany people to develop useful programs. Thus, the editor can bespecialized in many more ways than any reasonable support group couldever implement on its own. Implementation in an interpreted languagealso encourages tailoring the editor to a user's own taste, enhancinghis or her productivity.

8.5.2 Implementing Modes

Modes are defined on a per-buffer basis and so an implementationmust provide for changing these bindings as the current buffer isswitched. The general technique for doing this is to have a set ofdefault bindings for the editor, a set of current bindings for eachbuffer, and a set of procedures that can be invoked to change theformer into the latter. When a buffer switch is made, the currentbindings are used to dispatch all commands.

Whenever a change to the mode list is made -- especially one thatremoves a mode -- the editor must initialize the current bindings tothe default bindings, then invoke each mode procedure in turn to makeits changes.

8.6 Changing Your Mind

This section discusses the methods used to help users who want tochange their minds about an editing command.

8.6.1 Command Set Design

By far the most effective step that you can take is to design thecommand set to minimize both state variables and changes ofperspective. Such proper design is far more effective than any othertool. However, as these topics are covered in the next chapter and inChapter 1, they won't be discussed here.

8.6.2 Kill Ring

The most basic way in which a user can change his or her mind is todelete something, then say "oops, I didn't want to deletethat." After all, if the user inserted extra text, deleting it iseasy and straightforward. However, retyping accidentally deleted textis in general neither easy nor straightforward. Hence, an importantfeature to provide is the ability to save that text for the user.This feature can be added as a single-level save (referred to as thekill buffer) or a multiple-level save (thekillring).

As an extra benefit, once the deleted text is saved, that featurecan be used for "cut and paste" operations. In addition,the saved text can be tied into the system "clipboard" orsimilar facility.

An Emacs-type editor records multiple text-deletions in a"kill ring" (for historical reasons, commands that save thedeleted text were called "kill" commands). Some small butfixed number of the successive deletions are stored together. The"yank" or "paste" command retrieves the last suchdeletion (inserting it at the point). Alternate "yank"commands are available that cycle through the kill ring, re-deletingthe last-yanked text and replacing it with the next item. (A ring isbetter than a single kill buffer because it can store multipledeletions. It is superior to a stack of separate buffers because ofthe ease with which various "undeletions" can be triedout.)

It is easy to implement such a kill ring. A buffer is designatedas the one to hold the deleted text and the commands that perform thedeletion simply copy the text that they are about to delete to thatbuffer before performing the actual deletion. There are two finepoints to the implementation.

First, successive deletion commands should add to the currentdeletion, not create a new one. Thus, the Emacs commands:

^U ^[ d

which deletes the next four words, should have exactly the sameeffect as:

^[ d ^[ d ^[ d ^[ d

i.e., four successive "delete word" commands. Inboth cases, all four words should be part of the same deletion.

Second, your implementation must take care to delete properly.Deletingfollowing items (characters, words, sentences, etc.)should add to theend of the deleted text. Deletingprevious items should add to thebeginning of thedeleted text. Not coincidentally, all deletion commands in thecommand set of Emacs-type editors are of the form "deletefollowing..." or "delete previous..." with theexception of the "delete current line" command. Thiscommand must take the text from the point to the end of the line andadd it to the end of the deleted text, and take the text from thepoint to the beginning of the line and add it to the start.

8.6.3 Undo

The "kill ring" approach requires explicit support fromthe user commands and provides considerable power, yet there are manyways in which it does not make it easy for a user to change his or hermind. An "undo" facility is the most general way that youhelp a user when he or she changes his or her mind.

In principle, an undo facility provides a mechanism for reversingchanges made by the user. These effects can be as simple as movingthe point or inserting or deleting a character, or as complex as afile write or global replace.

Note, however, that you may still want to provide the kill ring, asit offers both one intuitive (if limited) type of undo, as well asoperations that undo cannot perform. For example, you cannotimplement "cut and paste" with undo, except for the limitedcase where you only want to paste the text exactly where you cut itfrom!

Unlike the kill ring, which requires explicit support in the usercommands, the best place to provide support for undo is in thesub-editor interface. Note that this is in the interface, not thesub-editor itself, although in general the undo facility will workclosely with the sub-editor.

The support works like this: each sub-editor procedure that makesany change to a state variable or the buffer first makes a note ofwhat is to be changed, then records the pre-change value, then finallymakes the change. For example, thePoint_Set routine wouldrecord "the point is about to change," and the old (current)location of the point. It would then change the point's location.Note that some procedures (such as the ones to delete a block of text)must record arbitrarily large amounts of state information.

This type of recording allows you to back up as far as you like.Since, in essence, each change to the state information is recorded,all earlier states are recoverable by reversing each state change (inreverse order, of course).

There is more to implementing undo than just recording statechanges, but the additional items are more icing than cake.

First, most undo commands operate on a user-command, not sub-editorcall, basis. Thus you must also record when a new user command isgiven. Thus, each undo then undoes consecutive changes until itreaches such a command marker. In this way, even a complex globalreplace can be undone. Note that in general you will wish to haverepeated undo commands undo successively-earlier other commands.

Second, it probably makes sense to undo an entire consecutive setof newly typed characters as a single command.

Third, the resources available to retain the undo information maybe limited. This design minimizes that problem, as the"excess" undo state can simply drop off the end. The partthat is retained will still be consistent.

Fourth, the operating system may not support the undoing offile-level operations or other commands. In some cases, you cansimulate such undoing (say, by making a backup file), but in generalyou will have to live with some limits to undo. (Undoing a printoperation after the printing is complete is quite difficult.)

Fifth, state information is kept in places other than in thesub-editor. These other places must also be incorporated into theundo facility.

8.6.4 An Undo Heresy

Is undo a nice feature to offer? Yes. Is it vital to an editor?Probably not. Will adding it make a poorly designed editor into agood one? No. Will it make such an editor acceptable? Maybe. Aswas said earlier, the best way to help the user is with a good commandset design: it will minimize the need or desire for an undo.

While undo is a general-purpose facility that has goodapplications, it is not clear that a text editor is one of them. The"good design" approach (using the Emacs command set as anexample) and the undo approach will now be compared in their approachto moving around in text and deleting text.

Moving around in text is simply solving the problem "I am at Xand I want to be at Y." The good design solution involvestranslating this difference into a sequence of commands to move thepoint from X to Y. If a mistake is made in the process ofimplementing the solution, the problem is merely restated to "Iam at X' and I want to be at Y" and it is re-solved. The undosolution differs by detecting the error (i.e., deviation fromthe intended solution), saying "undo" to put you back on theoriginal path, and proceeding. Ordinarily this difference between thetwo solutions is not very great.

If the user has accidentally moved a large distance (say, to thestart of the buffer), it becomes a little more difficult for the userto recover his or her earlier position. Emacs-type editors resolvethis issue by having the large-movement commands set the mark to whereyou were. Thus, an interchange point and mark sequence will recoverfrom the error. Keep in mind that almost all of the time, the userdoes not care where the mark happens to be.

The two approaches are all but identical in the text deletion case.On the one hand, accidentally-deleted text is recovered with a"yank" command, and on the other hand, with an"undo" command.

In conclusion, undo is a nice extra feature, but is no substitutefor a good design.

8.6.5 Redo

Redo is the mechanism for undoing an undo. Conceptually, therecord of undos is:

most-recent command changes
next-most-recent command changes
next-next-most-recent command changes
...

The first invocation of the "undo" command undoes #1.The next invocation undoes #2, and so forth. Redo redoes the mostrecent undo, with repeated "redo" commands moving back upthe undo stack. Let's look at an implementation:

FLAG Undo(FLAG *is_argptr, int *argptr, FLAG *is_exitptr, char c){if (undo_ptr == NULL) undo_ptr = last_command_undo;Undo_Command(undo_ptr);undo_ptr = Previous(undo_ptr);return(TRUE);}FLAG Redo(FLAG *is_argptr, int *argptr, FLAG *is_exitptr, char c){if (undo_ptr == NULL)Error("No undo to redo");else {Redo_Command(undo_ptr);undo_ptr = Next(undo_ptr);}return(TRUE);}

and, in the main command loop:

if (last_cmd != Undo && last_cmd != Redo) undo_ptr = NULL;

TheUndo procedure checks and, if it hasn't been called"recently," starts at the latest command. It undoes thecommand, then sets a pointer to point to the undo information for theprevious command.

TheRedo procedure checks and, ifUndo hasn't beencalled "recently," can't run, as there is no undo to redo.Otherwise, it redoes the command and sets a pointer to point to theundo information for the next command.

The main command loop determines whether theUndo orRedo procedures have been called "recently."Basically, if the last command wasn't undo or redo, it resets the undopointer to null.

This implementation allows arbitrary undoing and redoing in anycombination, so long as the commands are given sequentially. Bear inmind that with the "change in state" recording of undoinformation, it is only legal to apply those changes in the correctorder.

This implementation ignored the arguments to the undo and redocommands. Feel free to assign reasonable interpretations to thearguments in your implementations.

8.7 Macros

In a sense, macros allow a user to give commands by specifying themimplicitly instead of explicitly. Macros fall into three generallevels.

8.7.1 Again

The "again" facility allows a user to say "do what Ijust did again." For it to be most useful (i.e., easier totype than retyping the command), it should be assigned to a short keysequence (i.e., one shifted key).

The "repeat count" argument to the "again"command should be used instead of the previous "repeatcount" argument to the command being repeated.

8.7.2 Keystroke Recording

This facility allows the user to say "start recording,"then give a series of commands (observing their effects as they aretyped), then say "stop recording." Later, the entiresequence of keystrokes can be replayed with a "playrecording" command. A "repeat count" argument to the"play recording" command will cause the recording to bereplayed the specified number of times. Note that commands within therecording can have repeat count arguments of their own.

8.7.3 Macro Languages

Finally, the editor should provide the user full access to theeditor's macro language (if any). This language will in generalprovide a full programming language, thus allowing the user to specifyan arbitrary set of editing operations as well as a way of namingthese procedures for later invocation and key rebinding.

8.7.4 Redisplay Interaction

The introduction of keystroke recording and macro languages onlyservers to underscore the separation between the redisplay code andthe rest of the editor described in the previous chapter. Theplayback of recorded keystrokes will almost certainly complete with nointervening redisplay. Thus, if any of the code based its actions onthe current window contents, it would almost certainly executeincorrectly.

Questions to Probe Your Understanding

Expand the main command loop to include other types of input suchas mouse operations. (Easy)

Generalize the main command loop and related code to use a generalevent-driven mechanism. (Medium)

Write routines to move by a word and to delete a sentence, but besure to consider all of the punctuation and white space aspects of theproblem. (Medium)

What is a good memory management scheme to use for holding the undoinformation? (Medium)

What fundamental support is required in the editor to bestimplement the keystroke recorder? (Easy)

Back to Contents.

Nine: Command Set Design

"And hast thou slain the Jabberwock?
Come to my arms, my beamish boy!

Previous chapters have discussed the external constraints onimplementations and the division of the editor into more manageablepieces. You now understand how to build an editor. But which editorshould you build? This chapter discusses various issues involved withdesigning the editor's command set.

Your editor implementation will (you hope) be used by many people.Regardless of the expertise of these people, there are some designprinciples that should be followed. A good design incorporates thesequalities:

responsiveness
consistency
permissiveness
progress
simplicity
uniformity
extensibility

Each of these qualities will be discussed in turn.

9.1 Responsiveness

Responsiveness means that each action taken by the user is handledand confirmed immediately. In other words, the application respondswell to the user. This good response allows skilled users to workfast.

A different name for responsiveness isvisible effect, whichis to say that every action taken by the user has a visible effect onthe display. This effect might be simple cursor motion, a change inthe text being shown, or a message. Even if only an internal statevariable is changed, that state variable should have an indicator onthe display. (Ringing the display's bell is considered to be a"visible" effect for the purposes of this section.) Byfollowing this principle, the user is never in doubt whether theapplication is keeping up with his or her typing or has received acommand: there is consequently never a need to issue a command becauseof doubt or uncertainty.

On some displays, the desire for a visible effect can conflict withother design goals such as minimizing display flicker. One solutionto this problem is to delay the display of visible effect until theuser stops typing for a few seconds. In this way, as long as a useris typing, he or she is presumed to know what he or she is doing andso the feedback is less important. However, if the user should stoptyping, the application would quickly show the current state.

An important ramification of responsiveness is good error-checking.User input should be checked as soon as logically possible after ithas been entered. If the input passes the check, some sort ofconfirmation is given (e.g., a message or beep). If the inputfails the check, an error indicator is displayed. As a general rule,no incorrect input should be accepted, unless it is infeasible toperform the check. Different checks will naturally be performed atdifferent times. For example, if the user is being asked to enter anumber in a specified range, the individual characters can be checkedas they are typed to ensure that they are digits. The number as awhole cannot be range-checked until the user has indicated that it iscomplete (say by pressing the Return key).

9.2 Consistency

Consistency means that all parts of the application work the sameway. This topic is covered in more detail in the section on"modes."

9.3 Permissiveness

Permissiveness means that the user is in control of the applicationand not vice versa. While this might sound tautological at first, itis a principle that is often honored only in the breach. Think of theapplications that you have used that lead you through a step-by-stepprocess, with only limited choices at each step. Often, theseapplications do not allow you to review or change earlier decisionsshort of aborting the whole application and starting over fromscratch.

Writing a permissive application is both easier and harder thanwriting a non-permissive version of the same one. It is harder becausethe implementation has to be able to handle any request at any time.(An "event - message - object" design model makes it easy tohandle such unpredictable requests.) If some process is multi-step,the application must have interlocks to prohibit processing the stepsout of order (and of course these interlocks should have their statevariables displayed). It is easier because you as a designer do nothave to anticipate all possible paths through the application anddecide in advance which ones are reasonable.

9.4 Progress

Progress means that each command should meet some part of theuser's goals. An example of a command that does not make progresswould be a command to "show the current line." This commanddoes not contribute to making any of the changes that the user(presumably) desires: it just wastes the user's time and effort. Itis the elimination of such commands that makes screen-oriented texteditors such an improvement over the older ones.

In general, there should be no commands for the user to give thatmerely tell the application to do something that it has enoughinformation to figure out on its own. For example, if the user movesto the end of the buffer, the application should display that part ofthe buffer. (Actually, as stated this principle is a bit harsh. Fromtime to time, the application will not be able to guess correctly andit is acceptable to have a way for the user to take control. Forexample, sometimes the user wants to position the window to includetwo areas of particular interest. The application in general cannotdetect this case. But if that happens often, there is a designproblem.)

9.5 Simplicity

Simplicity goes under the sobriquet of "keep simple thingssimple." Complicated things can be complicated (or simple if thatworks out), but simple things should never be complicated.

This principle means that the basic editing operations (insert,delete, move) should be as conceptually simple as possible. Forexample, inserting a character is a conceptually simple operation.The simplest way of expressing that operation is to just type thatcharacter. Having input/edit (or "input/overtype" or"insert/replace") modes is an example of making a simplething complicated. With the input/edit mode, inserting a characterbecomes "am I in insert mode? No. Then type the 'go into insertmode' command, type the character, and maybe type the 'leave insertmode' command." Hardly simple.

This principle is closely related to efficiency. It is natural tothink that the command set that requires the fewest user operations isthe best one to use. Unfortunately, that natural thought does notremain valid when taken to extremes. On an extreme basis, the set ofediting operations could be Huffman encoded into a command set. Whilethe resulting command set would be optimally efficient, it wouldprobably not be usable. For example, the command to insert the string"the " might be ^X 7. On the other hand, simple things tendto be efficient if for no other reason than that they don't have thebaggage of being complicated. Ideally, the most-often used commandsshould be the shortest.

9.6 Uniformity

Uniformity also goes under the names "regularity,""predictability," and "orthogonality." Basically,a command set is uniform if, when a user knows some part of it, he orshe can predict the unknown parts. Another way of looking at it isthat the command set fits into a pattern.

This principle is important in that the user is freed from learningeach command separately. Instead, the user learns some of thecommands and a set of rules for generating the rest. For example,here are the basic Emacs character commands:

^F	move forward character
^B	move backward character
^D	delete the following character
^H	delete the preceding character

(Keep in mind that ^H is also the Back Space key.) Here are theword commands:

^[ F	move forward word
^[ B	move backward word
^[ D	delete the following word
^[ ^H	delete the preceding word

A user learns these commands by learning this basic commandset:

F	move forward ...
B	move backward ...
D	delete the following ...
^H	delete the preceding ...

and these rules:

control-shifted means "character"
^[-prefixed means "word"

It is rare that a command set will ever be completely uniform.However, it is important to take advantage of uniformity wherepossible.

9.7 Extensibility

Extensibility means the ability to accommodate changes. Thisprinciple has a number of aspects. First, changes can be accommodatedby designing in "holes" or "gaps" where users caninstall commands of their own. Second, the command set can be maderich ("large") as the larger number of commands providesmore places for commands to be placed. Third, a uniform command sethelps extensibility. For example, if the command set has a set of"sentence" operations (move by, delete by), these can beconverted as a set into "statement" operations for use inprogramming languages.

9.8 Modes

This chapter uses the word "mode" in a different way fromthe command-set-oriented use of "mode" from earlierchapters. It is unfortunate that the same word is used by theindustry in different ways.

What are "modes," and why should you care about them?Simply put, a design has a mode when an end user has to do an"unnecessary" action in order to do the desired action. Youshould care about modes because having modes can make a program harderto use. As a programmer, you are used to modes and deal with themconstantly without conscious thought. Your end users, however, may beconfused by the presence of modes. Their thinking might go somethinglike "I pressed the 'f' key (at command level) and it showed mean 'f', so why does pressing the 'f' key here move me forward by ascreen?"

The definition just presented is a little abstract, so an exampleis in order. A piano is a device that has (almost) no modes. If youwant a piano to make a sound, you just press a key. Each key isindependent: it produces the same note regardless of which other keyswere pressed before.

A piano really does have modes. You select the modes with the footpedals. It is easy to learn to overlap the key presses with the footpedal ups and downs, so the modes, although present, do not interferewith playing the piano. Moreover, the modes have a great deal ofoverlap with the basic keys. In a manner of speaking, the piano's"command set" is quite orthogonal.

A typical scientific calculator has many modes. An example of sucha mode is the degrees/radians selection. If you are in degrees modeand want to compute the sine of an angle in radians, you must firstswitch to radians mode and then compute the sine. Switching toradians is not part of your calculation. Rather, it is something thatyou have to do in order to perform your calculation. Hence, thecalculator has a mode. A similar -- but less useful -- mode in a texteditor would be an insert/overwrite (or replace) mode.

These examples are at the extremes of the range. The piano is analmost modeless device, while a calculator has many modes.

What does all this have to do with designing an editor's commandset? These things:

you should have as few modes as possible;
modes should be aligned with the activity;
if you must have a mode, make it visible.

Adding keys (or a mouse) can reduce the number of modes. Forexample, having both "sine in degrees" and "sine inradians" keys would eliminate the degrees/radians mode. However,there is an upper limit to the number of keys that you can put on auser input device, be it a calculator or a keyboard.

Rearranging the modes so that they coincide with natural breaks inthe activity also reduces confusion. For example, it is not toounreasonable to have a "text editor mode" and a "spreadsheet" mode, where the two modes correspond to completelydifferent applications. Switching modes is less confusing because theend user is mentally changing gears at the same time. On the otherhand, with modern computers' ability to rapidly switch betweendifferent application programs, it can be very valuable to have thedifferent applications present the same interface to the end user:doing anything else is often not in the end-users' best interests. Ofcourse, if you did provide additional functions within the texteditor, the differences in the modes would be minimized.

Returning to the example of an insert/edit mode for an editor, wecan see how that mode is not aligned with a change in activity. Tothe user, "editing" is a single activity that comprises bothdeletion and insertion. Changing between insert and edit modes isthus a mode change with no accompanying activity change.

Some text editors offer an insert/replace mode which affects hownewly typed text affects the existing text. In insert mode, newlytyped text is inserted. In replace mode, each newly typed characterusually replaces an existing character. However, in many cases usersdo not want to replace characters: they want to replace words,sentences, or other higher-level objects. In these cases, simplereplacement is not sufficient since it is unlikely that the new textis exactly the same length as the old. The correct effect can bereadily achieved when insert mode is combined with an operation thatdefines a region or selection that identifies the old text.

Modes should be made visible. For example, in older calculators,the degrees/radians mode was hidden. New calculators have anindicator on the display that shows the current mode. Although themode is still there, the indicator reminds the operator that the modeexists and can even guide the operator's next input.

The best way to make the modes visible is to show all stateinformation on the display. In this way, it ispossible for aknowledgeable end user to predict accurately the effect of the nextcommand by examining the current display. Of course, not all endusers are "knowledgeable," but how could anon-knowledgeable user ever succeed if a knowledgeable onecannot?

Note how the reasons for making modes visible dovetail with thereasons mentioned earlier for commands having visible effects.

9.9 Use of Language

DO NOT PRODUCE OUTPUT IN ALL UPPER CASE. UPPER CASE IS MOREDIFFICULT TO READ THAN lower case. Proper capitalization andpunctuation are also important. Would you rather read:

ENTER CITY STATE AND ZIP CODE:

or:

Enter city, state, and zip code:

Use full words and phrases: do not abbreviate. Displays are largeenough and output fast enough so that abbreviations are no longerrequired.

Prompts should have the formverb object (at least inEnglish). A prompt of:

Username:

doesn't tell the user what to do. However, a prompt of:

Enter your username:

or even:

Please enter your username:

is reasonably unambiguous. In the first case, a user is apt tofeel confused and unsure of what to do next. (What about a username?Should he or she go get one? Whose username?) In the second case,that confusion vanishes.

Error messages should statewhether the operation wasperformed,why something went wrong, andwhat to doinstead. Instead of:

File write error.

or (gasp!):

SIO-FI-ERR-12

this:

The file was not written because the disk was full.  Clear spaceon the existing disk and try again or write the file to adifferent disk.

Longer, but much better.

The application should do what computers do best: arithmetic,checking, recording. Users should do what they do best: direct theapplication to solve a problem. Don't make the user count things orkeep track of what was done in the past.

Be generous in what you accept. If both Delete and Back Space areused for erasing a character, accept both. Unless there is a goodreason otherwise, don't distinguish between upper and lower-caseinput. In general, if there is a possible way to unambiguouslydetermine what the user wants, accept it.

9.10 Guideline Summary

This section presents a brief summary of the guidelines alreadypresented.

9.10.1 Overall

Responsiveness: reflect all input immediately to the display.
Consistency: the same input should always have the same result.
Permissiveness: the user controls the application.
Progress: the user input should achieve the user's goal, not befor the application's benefit.
Simplicity: keep simple things simple.
Uniformity: make the commands easy to learn and to predict.
Extensibility: plan for growth and change.

9.10.2 Modes

Avoid modes where possible.
Where modes cannot be avoided, align mode changes with activitychanges.
Remind the user what mode(s) are in effect.

9.10.3 Use of Language

Use mixed case, proper capitalization, and punctuation.
Use full words and phrases: do not abbreviate.
Prompts should tell the user what to do.
Use full error messages.
Have the program do what computers do best.
Be generous in what you accept.

9.11 Structure Editors

One idea that keeps recurring is that of a "structureeditor." In general, a structure editor limits editing to validtransformations on the object being edited. They are often used asprogramming language editors. In those cases, there may be a commandto "insert an `if' statement." The user then sees somethinglike this:

if <#> then <>else <>

(This example does not use the C language. In general, peopledon't do structure editors for the C language.) The point ispositioned at the "#" character and the user is then allowedto make syntactically valid transformations to continue programming.These editors are often found as the subject of research papers. Forreasons that will be described, it is fortunate that they are notoften found anywhere else.

Of course, the user is in trouble if, for example, he or shedecides to negate the condition and make the "else" clauseinto the "then" clause. If the user is lucky, there will bean editor command to do this operation. If not, the user may have to"cut" the else part and "paste" it into the then.Or worse, the user may be forced to delete the else part and retype itat the then.

It may well be possible to create a structure editor that is also agood editor design. However, in all my research I have never seenone. There are two reasons why creating such an editor isdifficult:

First, while the structure-oriented operations may be well suitedto the process ofwriting a program, they are not well suitedto the process ofediting one. The distinction is a subtle butimportant one. The examples (usually shown in the papers) all showhow easy it is to write programs this way. After all, it is a nicetyping aid to be able to insert many characters of language statementwith a short command. However, most of the work involved inprogramming is in editing programs that are already written. Editingoperations are often ugly and involve intermediate states that are notvalid language syntax. It is just in those areas that thestructure-oriented operations start getting in the way.

Second, there is no carryover from one part of the editing task toanother. Sure, it may be easier to write the program,but the taskof editing text strings and comments to the program has not beenaddressed by the programming-language editing commands. The userstill needs a full-feature editor to handle the strings constants, thecomments, and other documentation that are an integral part of anyprogramming project. By adding the structure editor, either acompletely separate editor or a complicated new mode has beenintroduced and consistency has been lost. (I will completely skipover the question of how to handle the programmer that is editing morethan one programming language. I will point out that I have worked onprojects where I have been editing programs written in more than fivelanguages at the same time.)

Note that the arguments just presented address the concept of astructure editor, not any one editor in particular.

9.12 Programing Assistance

Even if the structure editor approach is not the best, there arestill techniques that can be used to help write and edit programs. Ifyou like, it could be said that these techniques are adapted fromstructure editors. However, the origins of these techniques are lostin the mists of history and no one knows which was developed first:structure editors or the adaptations to general editors.

Typing aids: The first technique is to have commands thatserve as typing aids. These aids would insert statements or statementparts by typing just a few characters (presumably fewer than thestatements or parts themselves!). In this way, users gain the"express typing" benefits of structure editors.

Language modes: Further, a language mode can tailor theeffects of commands to suit the characteristics of the language. Forexample, the commands that move by or manipulate words would beadjusted to use language tokens; those that use sentences would beadjusted to use language statements; and those that use paragraphswould be adjusted to use a statement block or procedure. In addition,commands that perform indentation can also be modified to handlestatement nesting.

However, these alterations change how commands work and so somepredictability is lost. In addition, while the alterations wouldpresumably only be made for buffers that hold programs, these programsinclude comments. Hence, the commands need to "know"whether they are operating in a comment and thus whether to use thealtered behavior. Even so, not all users are happy with such changes.Hence, there should be a way for users to turn them off. (I, forexample, prefer to disable all language modes when editing.)

Syntax checking: Structure editors offer good syntaxchecking. Most even prevent you from creating a syntacticallyincorrect program. While possible to implement, syntax checking isnot clearly appropriate for an editor, given that a better alternativemay be available. This better alternative is for users to be able toinvoke the compiler from within the editor, and to have the editor beable to parse the compiler's error and warning messages.

Simple syntax checking is a feature that looks useful on thesurface, but turns out not to be very useful in practice, as goodprogrammers tend to make relatively few syntax errors. Syntaxchecking can catch errors like:

ovid Foo(){return (a -+ b;}

This example has a misspelled keyword, a missing close parenthesis,and an illegal operator combination. Syntax checking can catch thelast two of these, but not the first: at the syntax level, there is noway to tell whether "ovid" is a misspelled keyword or aprogrammer-defined type.

Semantic checking can catch the "ovid" problem, as wellas missing declarations, mis-matched types, and other such problems.With programs spread across multiple files, there is simply no waythat an editor would be able to assemble the information required toperform the correct analysis. It is up to the language compiler toperform that function.

Compiler invocation: And so we bring up the best way for aneditor to help in program development. That is for the user toquickly and easily be able to invoke the compiler and work with theresults. The commands might be "compile this file,""move to the place indicated by the next error message," andso forth. The lesser features such as syntax checking would only beused on systems where invoking the compiler is expensive (in usertime, not computer power) to do from within the editor.

9.13 Command Behavior

This section describes some of the considerations involved withdesigning some of the commands. As with other parts of the book, thepurpose of this section is not to say merely "do it thisway," but to review why different approaches should beconsidered.

9.13.1 Does Down Move the Point or the Text?

Let's say that the point is somewhere in the middle of a largebuffer, large enough that neither the top or bottom is on the display.The user gives the "move down a line" command (say, bypressing the down-arrow key). What happens?

First, the point (and cursor) could both move down one line. Theuser is thinking "I want to move down" and lo, the pointmoves down.

A variant on this choice is to move the point down one line, but tomove the text of the bufferup in the window. This variant hasthe unfortunate property that the cursor is always kept in the centerof the window.

There is a another choice. The cursor could stay in the sameplace, and thetext of the buffer move down. In effect, thismoves the pointup one line.

A moment's thought shows that both choices are indeed validinterpretations of "down." In fact, both have beenimplemented many times. All modern editors now use the firstinterpretation. In fact, you might be wondering why anyone wouldselect the second interpretation.

The descriptions just given do not reveal why the twointerpretations arose. For that information, we have to step from theworld of text editing into the world of computer graphics.

Consider the following picture:

 o|\The quick---| \red fox || /jumps over/ \|/the lazyuserdisplaybuffer

In this view, the user sees the text on a display. Now let usredraw this picture more abstractly:

 o------The quick---|ed f|red fox ||umps|jumps over/ \------the lazyuserwindowbuffer

In this view, the user is "looking through" a window ontothe text. The user sees only that part of the text that can be seenthrough the window. Now the question "when the user gives the`move down one line' command, does the command move the window or thetext?" makes sense.

9.13.2 Scrolling vs. Paging

Closely related to the previous point is whether to scroll or pagethe screen. Again, there are two choices, and again, we areconsidering the case where the user is giving "move down oneline" commands. In all cases, thepoint is moving downone line at a time. The question relates to how thecursormoves on the screen.

First, the cursor can move down one line at a time until it reachesthe bottom of the screen (possibly with a line or so of "guardzone"). Once there, the whole screen moves up one"page" and the cursor is re-centered.

Second, the cursor could stay in the same place on the screen andthe text could move up by one line.

The second method offers the advantage that the maximum amount ofsurrounding text is always visible. However, it offers the much moresevere disadvantage that "just moving around" appears to beconstantly changing the text. That is quite distracting to users. Italso ties what the user sees with where the user is making changes.Thus, if the display has a 24-line screen and the preferred row isline 10, the user is out of luck if he or she wants to make changeswhile looking at text that is more than 10 lines before the point or14 lines after it.

A third method would be to have the cursor move down to a guardzone, then scroll the screen instead of paging it. This method offersbetter continuity than does just paging. It is especially nice if yourepeatedly give the "move down line" command. Personally, Ifind it exasperating because your context gradually reduces to thesize of the guard zone, thenstays there. With paging, thecontext is automatically restored to about one-half of the window.When using editors that select this method, I have to give the"recenter window" command much more often than I liketo.

9.13.3 Page Breaks

This is more of an issue with word processors than text editors,but it is worth mentioning anyway. The problem arises when theprogram is displaying the buffer in its "printed form,"including page breaks. It is very tempting for the implementer toalways position the page break at the top of the display. However, itis important that users be able to see the text just before and afterthe page break at the same time.

9.13.4 How Many Ways Can You Move by a Word?

When writing commands that operate on words, the first questionthat arises is "what is a word?" We need a definition thatis both possible and easy to implement. We will approach a definitionin a series of refinements.

The first step is to consider all of the characters between whitespace to be one word. With this definition, the sequence:

This is a very-strange test sentence, isn't it?

would be considered to be eight words. While a good start, it isnot sufficient, as the following sequence would be considered only twowords:

here--a phrase

But this sequence would be considered four words:

here -- a phrase

Thus, the next step is to define a word as a string of letters anddigits. With this definition, the three examples would be consideredto have ten, three, and three words, respectively. This definitionhas the advantage that for something to be considered a word, thereshould be something "word-like" about it (i.e., thecharacters "--" are not considered to be a word. Inaddition, the presence or absence of extra spaces around the"--" does not change the number of words: a good sign thatwe are on the right track.

Our refinement can stop right here and be considered acceptable.There are some changes that we can make, but these changes are notuniformly considered improvements.

The first change is to add some characters to the "word"characters in language modes. For example, when writing C programs,the underscore character ("_") is legal within a token. Byadding this character to the letters and digits, a "move byword" command will now properly move by tokens.

The second change is to add other characters, such as dash("-") and quote (" ' "). But these are added in aspecial way: they must be surrounded by letters or digits in order tobe considered as part of a word. This change allows the"very-strange" and "isn't" parts of the sequenceto be considered as single words.

However, suppose that you had a very-long-hyphenated-phrase. Itprobably makes sense to consider this phrase as four separate words.In particular, it is better to err on the side of dividing one"word" into two rather than combining two "words"into one. For example, in our very-long-hyphenated-phrase, it wouldbe difficult to change the "hyphenated" to"dashed" if word motion commands considered the whole thingas one word.

9.13.4.1 Moving by Words

As with many of the other topics here, there are two popular waysof moving forward by words. Oddly, though, there is only one popularway of moving backwards.

One way to move forward is to move to the end of the word. Forexample, if we had the text:

one two three four     ^

and the cursor was at the "w" (which means that the pointis between the "t" and the "w"), and a "moveforward word" command were given, this move would leave the pointhere:

one two three four       ^

i.e., just before the white space after the word. The othermethod would move the point to the start of the next word:

one two three four        ^

When moving backwards, both methods leave the point at the start ofthe word:

one two three four    ^

The difference may become slightly more clear if we look at thecode that might be used to implement these commands. Assuming thatthe constantWORDSTRING contains the characters that comprise aword, the code to move backward a word is this:

Find_First_In_Backward(WORDSTRING);Find_First_Not_In_Backward(WORDSTRING);

The code to move forward to the end of the word is the same code,with "Backward" changed to "Forward":

Find_First_In_Forward(WORDSTRING);Find_First_Not_In_Forward(WORDSTRING);

Finally, the code to move forward to the start of the next word isthis:

Find_First_In_Forward(WORDSTRING);Find_First_Not_In_Forward(WORDSTRING);Find_First_In_Forward(WORDSTRING);

The first method, to leave the point at the end of the currentword, has the property that it is symmetric with respect to backwardmotion. The second method, to leave the point at the start of thefollowing word, lacks that symmetry. On the other hand, it makes iteasier to move to the beginning of the following word.

The choice is up to you. I strongly prefer the first method.

9.13.4.2 Deleting by Words

This question is "okay, so I have decided what happens when Imove by words. What should happen when Idelete bywords?"

The first answer, which works very well, is simply that you shoulddelete whatever the corresponding move command would move over. Thus,if the motion command were to move to the start of the next word, thedeletion command should delete that same text.

However, consider this case:

This is some text.   ^--------------------And, three lines later, more text.

The "move forward word" command would move to just beforethe "A". However, is it really desirable that the"delete forward word" command delete the lines in between,including the row of dashes? Well, yes, if you want to be consistent(and predictable). This is one of the reasons why this style of wordmotion is not the best one to use.

The second answer would be to be "intelligent." This viewis used by Apple Computer (see Apple 1987). In it, you would changeexactly what was deleted based on circumstances. For example, withthe text:

Here is some text.   ^

a "delete forward word" command would delete the word"some" and the following space, thus leaving this:

Here is text.        ^

instead of this:

Here is  text.        ^

This definition has the advantage that deleting a word deletes the"supporting structure" for the word as well and thus makesthe text as if the word was never there. It also means that"sloppy" users don't leave stray extra spaces around. Ithas the disadvantage that if you wanted to replace one word with a newone -- a very common operation -- you now have to re-type the space.This definition also loses predictability in that only white space isso treated: commas, periods, and other punctuation marks are not. So,to replace "old" with new in the following text:

This is an old word.           ^

you give the "delete forward word" command then type"new ", but to do the same replacement with the text:

This is an old, word.           ^

you give the "delete forward word" command, then simplytype "new". Now explain that to a user quickly andpainlessly. (Note that Apple Human Interface Guidelines do notprovide for word-level operations other than selection.)

In all fairness to Apple -- and I believe that their guidelines areexcellent -- their guidelines are build around a keyboard/mousecombination for user input. This book assumes that only a keyboard isavailable. Changing such a basic assumption will result in changingsome of its conclusions: that is why you should make your choicesbased on a full understanding of the options and assumptions thatapply in your situation.

These examples have all operated on full words. The Appleguidelines do not have guidelines for how to handle deleting parts ofwords because the guidelines only support whole words as objects.However, you are free to invent your own semantics for handlingpartial words in an "intelligent" manner.

9.13.5 Where Do Sentences and Paragraphs End?

I will start with the cheery statement that there is no way tocorrectly determine the ends of all possible valid English sentencesby analyzing syntax alone. Why not?

Consider these text sequences:

This is a sentence.The value 3.14159 is close to the value of pi.The value 3. is close to the value of pi.Dr. Martin is a medical doctor.I hate typing long words and prefer to abbrev. them.

We all know that a period is used to end a sentence. The secondexample shows that periods can occur within a sentence. The thirdshows that periods can end a token and yet not end a sentence. Thefourth shows that the token can be a word. The last shows that youcan't just work off a list of known abbreviations. When consideringthese examples, remember that we are applying semantic knowledge tothe statements, something that is beyond the ability of most computerprograms. From the program's point of view, the sequences might aswell be:

Xxxx xx x xxxxxxxx.Xxx xxxxx 0.00000 xx xxxxx xx xxx xxxxx xx xx.Xxx xxxxx 0. xx xxxxx xx xxx xxxxx xx xx.Xx. Xxxxxx xx x xxxxxxx xxxxxx.X xxxx xxxxxx xxxx xxxxx xxx xxxxxx xx xxxxxx. xxxx.

All that said, what is a way out? The definition that I have foundthat works best was worked out by trial-and-error and refined over aperiod of years is this:

Find_First_In_Forward(".?!:");Point_Move(1);Find_First_Not_In_Forward("\"')]}");if (xiswhite(Get_Char())) <you are at a sentence end>;

This definition looks for any of the sentence-ending characters(colons are considered to end sentences here). It then skips over anyof the characters that tend to follow sentence ends. Finally, itchecks for white space.

This definition has the unfortunate property that the last threeexamples are considered to be two sentences each. On the other hand,it has the advantage that it works fairly well on non-contrivedexamples. And, as with words, it is better to count one sentence astwo than to treat two sentences as one.

Note that only one white-space character (any character of Space,Tab, newline, or whatever else you wish to include) is required to enda sentence. Depending upon which style manual you follow, either oneor two spaces should be included after the end of a sentence.However, even if your style manual requires two spaces, your users maynot use that manual or may simply forget to type the extra space.Hence, you should not penalize them for the omission.

I know of one implementation that makes life difficult for itsusers. Its end-of-sentence definition requires two spaces. Yet, whenyou refill a paragraph, it removes the second space. Thus, you canmove by sentences until you first refill the paragraph, after whichthe entire paragraph is treated as one sentence...

Paragraph ends are much easier than sentence ends. If you areusing word wrap, each paragraph will be one line long. Thus, anewline character will end a paragraph. Otherwise, a newline followedby any white-space character (Space, Tab, newline, etc.) will mark theend of the paragraph.

If you are fortunate (or unfortunate, it depends on your outlook)enough to use a text formatter, you will want to include yourformatter-command characters as paragraph-break characters. Forexample, if you are using the nroff formatter, you will want aparagraph break here:

This is some text..LPThis is some more text.

So just look for a newline followed by either white space or adot.

Moving and deleting by sentences and paragraphs involves all of thesame problems as moving and deleting by words. See the earlierdiscussion.

9.13.6 How to Search

This topic supplies lots of choices, all of them good. Thequestion is more one of how much work you want to put into yourimplementation than which is the correct approach.

The first choice is between buffered searching and incrementalsearching.Buffered searching means that your implementationprompts for a search string, waits for the user to enter the completestring, then performs the search. It works and is easy to implement,but not as good as incremental search.

Incremental search means that the implementation searches asthe user types. Here is an example:

user types	what is done


^S	start incremental search
a	find the first "a"
b	(1) find the first "ab"
c	find the first "abc"
^H	erase the "c" from the string; go back to where you were after (1)
d	find the first "abd"
^S	(2) search again for "abd": you are now at the second one
^S	search again, you are now at the third one
^H	get rid of the last ^S; go back to where you were after (2)

You get the idea. The commands available can be as powerful as youlike. This is clearly a much nicer way to search than bufferedsearching. Just as clearly, it is more work to implement.

The next question is what the search string should be. The mostsimple case is that the editor should search for the string exactly astyped. Thus, this string:

Some text.

would match only the string "Some text." in the buffer.While simple, it is not necessarily useful.

The first alternate way to search would be to simply ignore upperand lower case. Thus, the string would also match "sometext." and "SOME TEXT." and "SoMe TeXt."

Another way to search would be to have lower-case characters in thesearch string match either upper or lower case in the buffer, but anupper-case character in the search string match only upper-casecharacters in the buffer. The search string would then match"SOME TEXT." and "SoMe TeXt." but not "sometext." This way is more useful than one might think, because youcan enter "ROM" in order to find "ROM" but not the"rom" in "from", yet you can still find both"the" and "The" with one search string.

Another search variation is whole-word match. Thus, one couldsearch for "the" without finding "then". Inaddition, it would be handy to be able to allow varying amounts ofwhite space to match. Thus, our "Some text." string wouldmatch

Sometext.

and

Some      text.

You can get as complex as you like with all of this, up to wildcards and UNIX-style regular expressions. Just don't forget toinclude a way to quote any special characters so that the user cansearch for them exactly.

9.13.7 Commands to Handle Typos

There are two very common forms of typographic errors for whichspecial commands can be helpful.

9.13.7.1 Capitalization Commands

One typographic error is the incorrect upper/lower case of acharacter, part of word, or word. Two different forms of commands canbe defined to handle this task.

The first form operates as a "move forward word" command,but forcing all characters moved over to UPPER, lower, or Capitalizedcase (the latter is first character in upper case, and all others inlower case). Note that three separate commands are required.

The second form is a "rotate case" command. Thedefinition that I use acts differently depending upon whether you arewithin a word or between words. In the former case, it affects allcharacters between the point and the end of the word. In the lattercase, it affects the entire previous word. In either case, itexamines the current state of the word and rotates it among word ->Word -> WORD -> word, etc. The point is not moved. Thisdefinition turns out to be very handy.

9.13.7.2 Twiddling

Another typographic error is interchanging two characters. Forexample, "teh" instead of "the". There are threeforms of the command to fix this.

The first form interchanges the two preceding characters.Thus:

abcd  ^

becomes

bacd  ^

The second form interchanges the two surrounding characters. Thusour original becomes:

acbd  ^

Neither of the first two forms moves the point. The third formmoves the point and our original becomes

acbd   ^

The advantage to this form is that repeated executions of thecommand serve to "drag" a character along.

Questions to Probe Your Understanding

Consider how your favorite editor's command set and implementationmeet the design guidelines. (Medium)

Define a command that you would like to see in aneditor. (Easy)

What are some different ways to handle moving down a line (hint:consider how to handle variable-width characters)? (Easy)

How do the word, sentence, and paragraph problems change whenlanguages other than English are considered? (Medium)

Define a complete command set. (Hard)

Define a good structure-editor command set. (Hard)

Back to Contents.

Ten: Emacs-Type Editors

Oh frabjous day! Callooh! Callay!"
He chortled in his joy.

With a thesis subtitle of "A Cookbook for an Emacs" andthis book's subtitle of "Emacs for Modern Times," it is asafe bet that I would get around to discussing Emacs-type editors atsome point. And so I have.

First, I will admit to being biased. I have used Emacs-typeeditors exclusively for going on fifteen years now (except for a briefstint in durance vile on a Macintosh). In addition, I haveimplemented or worked on implementations of a half-dozen Emacs-typeeditors. This chapter will describe what it is about an Emacs-typeeditor that makes it special.

10.1 "What Do You Mean, 'Emacs-type?' "

This chapter discusses "Emacs-type" editors and not aparticular "Emacs" editor because, due to its very nature,there isn't one definitive "Emacs" editor. The closest thatyou can come to such a thing is either the original Emacs-on-TECO orGnu-Emacs.

10.2 The Command Set

The specifics of the Emacs command set will always be shifting andchanging. Even details on how the basic commands operate will change.However, the broad outlines of the basic commands are pretty constant,which is to say that an editor that does not implement them is notconsidered to be an Emacs-type editor. Appendix C lists one Emacscommand set and a set of changes to it.

By and large, the Emacs command set follows the design guidelinespresented in the previous chapter. Those places where it falls shortare usually either physical constraints (there are only a limitednumber of keys available), design compromises (to achieve A one musttrade off B), or the ever-present "historical reasons" (itwas a good idea at the time and, while we know better now, it is tooingrained to change). All things considered, the Emacs command setmeets the design guidelines better than any other editor that I amaware of.

Some editors never get beyond providing the command set. Theseeditors are neither extensible nor provide an extended environment.These limitations may be due to the implementation environment ormerely the amount of effort that is available to devote to theproject. There is a place for such editors (they are listed inAppendix B as "command set"). After all, an implementershould be able to decide to create a small editor without feelingobligated to spend years on the project. Given that this decision hasbeen made, it is better to use the Emacs command set than any other(at least in my opinion).

10.3 The Extended Environment

Emacs-type editors have been and continue to be used for manythings besides editing text. Here are some examples.

Emulation: If you or another user likes a different editor'scommand set for some reason, you can just emulate it in an Emacs-typeeditor. Thus, you can still have all of the power of an Emacs andyour favorite command set as well.

Electronic mail: A text editor can be the primary interfaceto a mail system. Messages can be composed by editing a buffer and canbe sent with a command. Mail can be read and managed by reading itinto a buffer and having commands to perform such operations as"move to the next message" and "summarize allmessages." Having the full power of an editor available makes iteasy to un-delete an accidentally deleted message or to copy part ofthe text of a message into one's reply. In addition, you have onlyminimal additional learning in order to use the system veryeffectively.

Command shell: A text editor can be the primary interface tothe operating system. Command lines can be edited with the full powerof the editor before being evaluated. The past record of interactioncan be kept and parts of it examined or re-used in new command lines.If the operating system does not have support for advanced terminals,a display editor can offer its interface for use by other programs.Other programs would then take advantage of the terminal independenceof the editor. Alternatively, other programs would insert theiroutput into a buffer and the editor would become an entireterminal-management system. This function has been somewhatsuperseded by a window system. But again, why learn two systems whenyou only need to learn one?

Compilation: A text editor can work closely with a compilerto speed turnaround when developing software.

Debugging: A text editor can be used by a debugger.Multiple buffers and multiple windows can be used to examine (perhapsmultiple) source files, interact with the debugger, and see theoutput/input of the program as it runs. In addition, a debugger mightuse a window or two to do such things as constantly show the values ofselected variables.

File interface: A text editor can be an interface to acomplicated file. For example, an indexed sequential file can beupdated by providing editor commands to read and write entries (addingor deleting them can be managed as well). The full power of theeditor is available for editing the contents of the entry.

File system interface: A text editor can provide a smoothinterface to the file system. A directory can be read by the editorand "edited" by the user. Files can be deleted or otherwisechanged in a smooth manner by merely moving to the file name andgiving a command (e.g., "delete").

Binary files: A text editor can be used to examine and --when absolutely necessary -- modify binary files. It can thus replacevarious patching programs.

Again, all of these functions are currently performed by one ormore Emacs-type editors. The main advantage to building them off ofthe editor is that, even in the absence of such features, most usersspend the bulk of their time using the text editor (or word processor,etc.). By extending that environment, only minimal learning isrequired to use those features. Users are thus free to get work doneinstead of having to spend their time reading manuals. This extendedenvironment is one of the hallmarks of an Emacs-type editor.

10.4 Extensibility

The final hallmark of an Emacs-type editor is extensibility. Emacswas born because of the extensibility of its predecessor, TECO. TECOwas extensible and many of its users took advantage of thatextensibility to write their own command or change the existing ones.Eventually, one person (Richard Stallman) took a large number of thoseextension packages and created a "standard" package ofEditor MACroS: EMACS. With extensibility in its heritage and even inits name, an Emacs-type editor is expected to be extensible.

Extensibility means, quite simply, that end users can change any ofthe features of the editor. There should beno feature that asufficiently dedicated end user cannot change. This implies that thefull source code of the implementation is available to all end users(and that is why GNU-Emacs is distributed under its"CopyLeft" policy).

Now, not every end user will want to recode redisplay. However,the principle remains. Most end users will only want to tweak a fewparameters and maybe "fix" a command or two. That's greatand you should encourage such changes by making it easy to makethem.

Now that the importance of extensibility has been explained, it iseasy to see why the Emacs command set cannot be standardized: eachuser will want to change it. In a way, the Emacs command set can becompared to a ball of mud. You can add more to it, or take some away,and you will still have a ball of mud. (Actually, this property istrue of most extensible systems.) That is also why any comparisonbetween an implementation of an Emacs-type editor and any other editoris pointless: the Emacs-type editor can be changed (and may alreadyhave been) to overcome any shortcoming.

Questions to Probe Your Understanding

Identify those ways in which the Emacs command set does not meetthe design guidelines listed in the previous chapter. (Easy toMedium)

Identify other editors or similar programs that offer extendedenvironments. (Easy)

Identify other editors or similar programs that areextensible. (Easy)

What other programs would you like to be extensible that aren't?(Easy)

Back to Contents.

Epilogue

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

And so concludes the verse "Jabberwocky" by LewisCarroll. A piece of nonsense verse embedded in a nonsense work, itresembles the "real world" about as much as a computerprogram does. Which is perhaps why Lewis Carroll is so popular amongcomputer programmers.

And, before you ask, the chapter structure of this book was workedout well before the fit of the verse was noticed. Right.

Questions to Probe Your Understanding

What parts of "Jabberwocky" fit particularly well withthe chapters that they lead off? Particularly poorly? (Easy -- or isit?)

Back to Contents.

Appendix A: A Five-Minute Introduction to C

This Appendix provides a brief introduction to that part of the Cprogramming language as used in the examples presented in this book.This appendix is not a reference manual, nor does it describe all ofC. Rather, it assumes that you are familiar with other programminglanguages in general and just need a nudge or two to follow theexamples presented in this book. If you want to learn more about theC language, consult ANSI (1990) or Kernighan (1978). If you areinterested in C, you should also look into the C++ language.

The best way to think of the C programming language is to considerit as providing the programmer with a mechanism for allocating andnaming memory, control structure, and an extension mechanism. Manypeople consider it barely a high-level language. Perhaps for thatvery reason, it is ideally suited for systems and utilities such astext editors.

Thedeclaration mechanism provides the programmer a way toallocate and name memory. Data types are oriented around what is bestsuited to the hardware.

Thelanguage statements provide a control structuremechanism. All of the usual control structures are available. Inaddition, a full suite of arithmetic and bit operators is available,again focusing around what is best suited to the hardware.

Theprocedure definition and call mechanism provide theextension mechanism. Many "standard" C operations such asstring copy and input/output are implemented in terms of thismechanism.

Comments are enclosed in "/* ... */".

A.1 Case Conventions

The names in this book follow the convention thatUPPERCASEnames are pre-defined constants,MixedCase names areprocedures, andlowercase names are variables. Again, these areconventions, not requirements.

A.2 Data Types and Declarations

All variables and procedures used are declared. Declarations areof the following form:

type variable;

The language supports the following data types:

char	The variable holds one character.Characters are typically 8 bits wide. They may either be signed(range -128 to +127) or unsigned (range 0 to 255) at theimplementation's discretion.
int	The variable holds an integer of a sizeconvenient to the hardware. This size is typically 16 or 32bits.
float	The variable holds a floating pointnumber.
FILE *	The variable holds a filedescriptor.
structname { <list of declarations>};	the declarations are combined into a single, largerdata type namedname.
type (*name)();	Thenameis the address of a procedure that returns a value of typetype.
void	No value. Used to indicate that aprocedure does not return anything or accepts no arguments.

A declaration of the form

type *

means that the variable holds the address of an object of thespecified type. The variable is called apointer to thespecified type. A declaration of the form

type name[constant]

means that the variable holds an array of objects of the specifieddata type. The array isconstant object long. The form

type name[constant1][constant2]

is used for a two-dimensional array.

The following data types are not part of the language, butrepresent types used in examples. An implementation would definethese in terms of existing-language data types.

FLAG	The variable holds a True or False value.(In C any value is considered to be False and any non-zero valueis considered to be True.)
status	The variable holds a success or failure statusvalue. This value may include warning or error information.
location	The variable holds a value that represents apoint or mark location within a buffer.
time	The variable holds a value that represents the timedate and time.
private	You get to define this.

A.3 Constants

Integers are written as themselves (e.g., "56"means the value fifty-six).

Hexadecimal constants are written in the form "0x##",where the ##s are hexadecimal digits.

Character strings are enclosed in double quotes (" "). ANUL-terminator (a byte of decimal value 0) is automatically appendedto the string by the compiler. Character strings are considered tohave the type "array of char."

Character constants are enclosed in single quotes (' '). They areautomatically converted to integers whose value is that of thecharacter. For example:

"a"	is an array of two characters,consisting of the characters 'a' and NUL (values 97 and 0 decimal,assuming ASCII).
'a'	is an integer whose value is 97, assumingASCII.

while:

"abc" is an array of four characters, consisting of thecharacters 'a', 'b', 'c', and NUL (values 97, 98, 99, and 0 decimal,assuming ASCII).

The form 'abc' is officially undefined (some compilers mightconsider this as an integer whose value is 97 * 65536 + 98 * 256 + 99,but don't count on it).

The character '\b' refers to the ASCII Back Space character(8 decimal).

Note: the "" / '' syntax is used in all code excerpts.However, the normal English typographical conventions of using"" are followed in the body of the text.

A.4 Pre-defined Constants

NEWLINE	The character string that represents asystem-specific newline, written "\n".
NL	The character that represents a newline, 10 decimal.
SP	The space character, 32 decimal.
TAB	The horizontal tab character, 9 decimal.
NUL	The nul character, 0 decimal. Characterstrings are terminated by this character.
NULL	The null pointer: no data object can be atthis address.
BUFFERNAMEMAX	The size of the longest possiblebuffer name plus 1 for the trailing NUL. Possibly 33.
FILENAMEMAX	The size of the longest possiblefile name plus 1 for the trailing NUL. Typically 1,025.

A.5 Procedure Structure

Procedures have the following structure:

type Name(<arguments>){<local variables><statements>}

The procedure is namedName and returns data of typetype (type can be a structure or pointer as well as abasic type). The argument list contains a list of declarations or thekeywordvoid if the procedure takes no arguments. The localvariables are then declared (and may be initialized at each procedureinvocation). Last are the procedure statements.

A.6 Statements

The statements are the usual ones. A semi-colon (";")terminates a statement. Comments start with "/*" and endwith "*/". Statements can be grouped with "{" and"}" characters, so the sequence

{<statement 1><statement 1>...<statement n>}

is equivalent to one statement. White space and columns are not significant.

if (condition)then-statementif (condition)then-statementelseelse-statementfor (initializer;end-test;increment)statementsexecute the initializer, then the end test, thenrepeat the statements, the increment, and end testuntil the end test becomes Truebreak;exit a loop immediatelycontinue;skip the rest of the loop body, but don't exit the loopwhile (end-test)statementsrepeat the end test and statements until the endtest becomes Truefor (;;)statementsrepeat the statements forever: a break, continue,or return statement is used to exit the loopswitch (expression) {case LABEL1:statementsbreak;case LABEL2:statementsbreak;...default:statementsbreak;}execute the statements after the label whose valuematches the expression, or the statements after"default" if present and no label matchesreturn(expression);return a value from a procedure

A.7 Operators

The (possibly unusual) language operators are these:

=	assignment; not test for equality
==	test for equality; not assignment
!=	test for not equal
!	logical negation: !FALSE becomes TRUE

a+b	returns the sum of a and b
a-b	returns the difference of a and b
*ab**	returns the product of a and b
a/b	returns a divided by b
a%b	returns a modulo b
a&b	returns the bitwise and of a and b
a\|b	returns the bitwise or of a and b
a^b	returns the bitwise exclusive or of a andb

The construct "a @= b" where "@" is any of theoperators "+" through "^" does the same as "a= a @ b", except that "a" is only evaluated once.

Operators that return True / False results return 1 for True and 0for False. However, when a True / False value is required (say, in anif-condition), any non-zero value means True and zero means False.

&v	returns the address of v
s.m	selects member m of the structure s (s isof type "struct")
p->m	selects member m of the structurepointed to by p (p is of type "struct *")
++v	increment the value in v and return the newvalue
v++	increment the value in v and return thepre-increment value
--v	decrement the value in v and return the newvalue
v--	decrement the value in v and return thepre-decrement value

The construct(type)expression (calleda "cast") converts the value returned by the expression tothe specified type.

A.8 Standard Library Functions Used in This Book

fclose(<fileptr>) closes a file opened earlier withfopen.

fgets(<buffer>, <length>, <fileptr>) readsone line from a file opened earlier with fopen.

fopen(<name>, <mode>) opens a file for reading.A <mode> of "r" means "read-only."

free(<ptr>) frees memory previously allocated bymalloc.

isprint(<key>) returns True if <key> is aprinting character or False if not.

malloc(<size>) allocates a block of memory at least<size> characters long.

memmove(<to>, <from>, <length>) moves<length> characters from <from> to <to>, workingproperly if the areas overlap.

memset(<start>, <char>, <length>) sets<length> characters starting from <start> to the character<char>.

printf(<format>) orprintf(<format>,<arg>) copies the characters from the format string to thescreen "as is" until a '%' character is encountered. Thesequence "%s" means to take the next argument and send it asa string to the screen. The sequence "%c" means to take thenext argument and send it as a single character to the screen. (Theroutine does a lot more, but the examples in this book don't use theextra functionality.)

strcpy(<to>, <from>) copy the from string to theto string.

strlen(<sting>) returns the number of characters inthe string, not counting the terminating NUL.

A.9 Non-Standard Library Functions Used in This Book

Fatal(<message>) handles a fatal error.

xiswhite(<c>) returns True if the supplies characteris a white-space character (space or tab) or False if not.

xstrcpy(<to>, <from>) works like the Cstrcpy routine to copy one string to another, but is defined towork properly if the strings overlap.

Back to Contents.

Appendix B: Emacs Implementations

This appendix has not been converted. Thecurrent EmacsImplementations list is maintained online. Its URL for citationpurposes is:

http://www.finseth.com/emacs.html

Back to Contents.

Appendix C: The Emacs Command Set

First, there is no such thing as an "official" Emacscommand set. One of the main reasons why Emacs-type text editorsexist is that each user is free to change the commands to suit his orher own tastes. Another reason is that there are many differentimplementations of Emacs editors and each will have a slightlydifferent command set. You should consult the documentation thatcomes with your implementation (or the documentation that you writefor your own implementation) for specifics.

That said, this appendix will list most of the default command setthat comes with GNU-Emacs. It will also list my own alterations tothese defaults.

Emacs-type text editors implement the "one-dimensional arrayof bytes" editing model. Line breaks are represented as singleNewline characters, regardless of the representation used by theoperating system.

C.1 Notation

The first Emacs took advantage of a particular keyboard's features.This keyboard allowed you to type all of the normal characters. Inaddition, bothControl andMeta keys were available.These keys acted as full Shift keys, allowing the user to specifyControl- and/or Meta-shifts to any key. For example, one couldtype:

notation	key combinations

5	5
%	Shift-5
C-5	Control-5
C-%	Control-Shift-5
M-5	Meta-5
M-%	Meta-Shift-5
M-C-5	Meta-Control-5
M-C-%	Meta-Control-Shift-5

The last two could also be written C-M-5 and C-M-%. Most currentkeyboards can only send the usual ASCII characters. That limitationremoves the possibility of typing the multiply shifted characters andhence the need for the unusual notation. The command sets presentedhere will thus use the familiar caret notation, a copy of which islisted in Appendix E. There are still some systems that use extendedkeyboards like these. The documentation for those systems will usethe original notation.

The Escape key (usually labeled "Esc") which sends theescape character is used as a prefix to make up for the missing Metashift key. By convention, those keyboards that have an extra shiftkey (perhaps labeled Meta or Alt) can specify meta commands bysetting bit 2^7 in characters sent from the keyboard.

Emacs commands are, by convention, always upper/lowercase-independent. Thus, ^X B and ^X b both refer to the same command.The only exception is the "self-insert" command, where Binserts a "B" and b inserts a "b". Only theuppercase versions will be listed.

The terms "S-exp" and "defun," while familiarto Lisp programmers, are probably not familiar to others."S-exp" refers to a parenthesized list and "defun"refers to a function definition. Most Emacs language modes remapthese Lisp-specific commands to comparable commands for otherprogramming languages.

C.2 Default GNU-Emacs Command List

C.2.1 Base Commands

Place the mark at the point.

Move to the beginning of the current line.

Move backward one character.

Prefix for mode-specific commands.

Delete the following character.

Move to the end of the current line.

Move forward one character.

Abort execution and return to the edit loop.

Help

Tab to indentation in a language-specific manner.

Insert a line break and indent in a language-specific manner.

Delete the text to the end of the current line; if at the end of the line, delete the line break. With an argument, delete that many complete lines.

Rebuild the display from scratch, centering the point.

Insert a line break, leaving the point after the break.

Move down one line, staying in as nearly the same column as possible. If at the end of the buffer, grow the buffer.

Insert a line break, leaving the point before the break.

Move up one line, staying in as nearly the same column as possible.

Quote: Insert the following character as typed.

Reverse search (see ^S).

Search incrementally for a string after the:

^R	Search for the previous occurrence.
^S	Search for the next (2nd, 3rd...) occurrence.
^?	"Untypes" the last character (including ^R, ^S, etc.)
^G	Abort search: return to starting place.
^[	Complete the search.
^Q	Quote a search command.
ctrl-XX	Complete the search and execute the XX command.
other	Add the character to the search string.

Interchange the characters on each side of the point, leaving the point after the second one.

Universal argument. There are two:

*^U ^U .... <cmd>*	does <cmd> 4, 16, 64, 256, ... times depending upon the number of ^Us (each ^U is a multiplier by 4).
*^U <integer> <cmd>*	does <cmd> <integer> times. (e.g., ^U 3 5 ^F means to ^F 35 times).

Move the bottom of the current screen to the top of the screen:i.e., move down one screen.

Delete the text between the point and the mark ("cut").

Prefix for the ^X commands listed below.

Copy the top item from the kill ring to the point; place the mark at the beginning of the block and the point at the end ("paste").

Suspend the program's execution and return to whatever invoked the editor.

Prefix for the Meta commands listed below.

Undefined.

Aborts a recursive edit.

Undefined.

Undo.

SP ... ~

Insert themselves.

Delete the preceding character.

Same as ^H, also BACK SPACE.

TAB

Same as ^I.

Same as ^J, also LINE FEED.

Same as ^M, also CARRIAGE RETURN or RETURN.

ESC

Same as ^[, also ESCAPE.

DEL

Same as ^?, also DELETE or RUBOUT (obsolete).

C.2.2 Help Commands

^H ^C	Describe the copying policy.
^H ^D	Describe the distribution policy.
^H ^H	Help on help.
^H ^N	View Emacs news.
^H ^W	Describe the (lack of) warranty.
^H ?	Help on help.
^H A	Apropos (which commands deal with ...?).
^H B	Describe the current key bindings.
^H C	Briefly describe a key.
^H D	Describe a function.
^H F	Describe a function.
^H I	Invoke the "info" subsystem.
^H K	Describe a key.
^H L	Describe problems with the (current) version ("lossage").
^H M	Describe a mode.
^H N	View Emacs news.
^H S	Describe syntax.
^H T	Tutorial.
^H V	Describe a variable.
^H W	Where is ...?

C.2.3 Control-X (^X) Commands

^X ^@	Flush mouse queue.
^X ^A	Add mode abbreviation.
^X ^B	Display a list of all buffers and associated information.
^X ^C	Exit editor.
^X ^D	Display the current directory.
^X ^E	Evaluate the last S-exp.
^X ^F	Ask for the name of a file and read it into a new buffer whose name is derived from the file name.
^X ^G	Cancel ^X prefix.
^X ^H	Remove mode abbreviation.
^X ^I	Indent rigidly
^X ^L	Convert the region to lower case.
^X ^N	Set the goal column.
^X ^O	Delete the blank lines around the point.
^X ^P	Place the point and the mark around the current page.
^X ^Q	Toggle the "read only" marker.
^X ^R	As ^X ^F, but mark the file "read only."
^X ^S	Save the current buffer if it has been modified.
^X ^T	Transpose lines.
^X ^U	Convert the region to upper case.
^X ^V	Find alternate file.
^X ^W	Ask for the name of a file and write the buffer to that file.
^X ^X	Exchange the point and mark.
^X ^[	Repeat a complex command.
^X $	Set selective display.
^X '	Expand an abbreviation.
^X (	Start collecting a keyboard macro.
^X )	End collecting a keyboard macro.
^X +	Add global abbreviation.
^X -	Remove global abbreviation.
^X .	Set the fill prefix.
^X /	Point to register.
^X 0	Delete window.
^X 1	Delete other windows.
^X 2	Split window vertically (one above the other).
^X 3	Split window vertically (one above the other), but stay in the first.
^X 4	Prefix for the Control-X 4 commands listed below.
^X 5	Split window horizontally (one beside the other).
^X ;	Set the comment column.
^X <	Scroll the window left.
^X =	Display the current cursor position.
^X >	Scroll the window right.
^X A	Append the region to a buffer.
^X B	Switch to a buffer.
^X D	Edit directory ("DIRED").
^X E	Invoke the last keyboard macro.
^X F	Set the fill column to the horizontal position.
^X G	Insert a register.
^X H	Place the point and the mark around the entire buffer.
^X I	Insert a file.
^X J	Register to point.
^X K	Kill a buffer.
^X L	Count the number of lines in the page.
^X M	Send electronic mail.
^X N	Narrow the editing bounds to the region.
^X O	Switch to the other window.
^X P	Narrow the editing bounds to the page.
^X Q	Keyboard macro query.
^X R	Copy a rectangle to a register.
^X S	Save some buffers.
^X U	Advertised undo.
^X W	Widen the editing bounds.
^X X	Copy to a register.
^X [	Move backward one page.
^X ]	Move forward one page.
^X ^	Grow the current window.
^X `	Move to the location implied by the next error message.
^X {	Shrink a window horizontally.
^X }	Grow a window horizontally.
^X ^?	Delete to the beginning of the current sentence.

C.2.4 Control-X 4 Commands

^X 4 ^F	Find file other window.
^X 4 .	Find tag other window.
^X 4 A	Add change log entry other window.
^X 4 B	Switch buffer other window.
^X 4 D	DIRED other window.
^X 4 F	Find file other window.
^X 4 M	Mail other window.

C.2.5 Meta (^[) Commands

^[ ^@

Place the point and the mark around the S-exp.

^[ ^A

Move to the beginning of the current defun.

^[ ^B

Move backward one S-exp.

^[ ^C

Exit recursive edit.

^[ ^D

Move down one level of list structure.

^[ ^E

Move to the end of the current defun.

^[ ^F

Move forward one S-exp.

^[ ^G

Format ("grind") the current S-exp.

^[ ^H

Place the point and the mark around the S-exp.

^[ ^I

Complete a Lisp symbol.

^[ ^J

Indent a new comment line.

^[ ^K

Delete the following S-exp.

^[ ^N

Move forward one list.

^[ ^O

Split line: move the rest of this line vertically down.

^[ ^P

Move backward one list.

^[ ^Q

Indent an S-exp.

^[ ^S

Incremental search forward using regular expressions.

^[ ^T

Transpose the adjoining S-exp.

^[ ^V

Scroll the other window.

^[ ^W

Append the next delete to the top item on the kill ring.

^[ ^X

Evaluate a defun.

^[ ^[

Evaluate an expression.

^[ ^\

Indent the region.

^[ SP

Insert exactly one space.

^[ !

Ask for and execute a shell command.

^[ $

Invoke the spelling checker on a word.

^[ %

Query replace: ask for an old string and a new string. At each occurrence of the old string, display it and ask for a:

^L	Redisplay the screen.
^R	Invoke the editor recursively.
^W	Delete the old string and invoke the editor recursively.
^[	Exit.
SP	Replace the old with the new and continue.
,	Replace and wait for confirmation.
.	Replace and exit.
!	Replace the rest without asking.
^	Return to previous old string (jump to mark).
^?	Don't replace but continue.

^[ '

Set abbreviation prefix mark.

^[ (

Insert paired parentheses.

^[ )

Move past the closing parenthesis.

^[ ,

Tags loop continue.

^[ -

Make the argument negative.

^[ .

Find a tag.

^[ /

Abbreviation expand.

^[ 0 .. 9

Use digits as argument (similar to ^U).

^[ ;

Indent for comment.

^[ <

Move to the beginning of the current buffer.

^[ =

Count the lines within the region.

^[ >

Move to the end of the current buffer.

^[ @

Place the point and mark around the current word.

^[ A

Move to the beginning of the current sentence.

^[ B

Move backward one word.

^[ C

Capitalize the following word.

^[ D

Delete the following word.

^[ E

Move to the end of the current sentence.

^[ F

Move forward one word.

^[ G

Fill text in the region.

^[ H

Place the point and the mark around the current paragraph.

^[ I

Tab to tab stop.

^[ J

Indent new comment line.

^[ K

Delete the remainder of the current sentence.

^[ L

Convert the following word to lower case.

^[ M

Move back to indentation.

^[ Q

Fill the current paragraph. ^U ^[ Q means to justify the paragraph.

^[ R

Move to window line.

^[ T

Transpose the adjoining words.

^[ U

Convert the following word to upper case.

^[ V

Move the top of the current screen to the bottom of the screen:i.e., move up one screen.

^[ W

Copy the region to the kill buffer.

^[ X

Execute extended command.

^[ Y

After ^Y: Delete the just-yanked text and yank the previously killed text.

^[ Z

Zap to character.

^[ [

Move to the beginning of the current paragraph.

^[ \

Delete the spaces and tabs around the point.

^[ ]

Move to the end of the current paragraph.

^[ ^

Delete the indentation on the current line.

^[ |

Execute a shell command on the region ("pipe the region through a shell command").

^[ ~

Clear the buffer modified flag.

^[ ^?

Delete the previous word.

C.3 The Author's Command Set

This section lists the changes the author makes to thejust-presented default command set.

^C	Rotate case of word: foo -> Foo -> FOO -> foo ...
^H	Delete the previous character.
^I	Just insert a TAB: no special indentation.
^J	Insert a line break and indent the new line the same as the current one.
^K	Delete lines as usual, but don't treat an argument specially.
^M	Just insert a line break: no special actions.
^N	Move to the next line: don't extend the buffer.
^T	Always transpose the two preceding characters.
^V	A slightly different move down screen.
^Z	Move up screen.
^\	Delete all white space (indentation) on the current line.
^]	Invoke keyboard macro.
^_	Suspend the program's execution and return to whatever invoked the editor.
^?	Delete the previous character with no special actions.

^X ^E	Execute one shell command.
^X ^I	Insert a file.
^X ^M	Invoke a subshell.
^X ^N	Null (to prevent typing by accident).
^X ^P	Null (to prevent typing by accident).
^X ^R	Re-read file.
^X ^T	Return to top-level editing (exit all recursive editors).
^X \	Get rid of all "^H_" strings in the buffer (used to make UNIX man pages more readable after doing "^X ^E mantitle").
^X C	Invoke compiler.
^X ^H	Help.
^X N	Null (to prevent typing by accident).
^X P	Null (to prevent typing by accident).
^X R	Read electronic mail.

^[ ^H	Delete previous word.
^[ ^R	Query replace.
^[ ^[	Null (to prevent typing by accident).
^[ SP	Set the mark.
^[ <	Move to the beginning of the buffer. With an argument, move to the specified line number.
^[ =	Display the number of lines in the buffer and the number of the line that the point is on.
^[ >	Move to the end of the buffer. With an argument, move to the specified line number.
^[ G	Ask for a line number and go to the specified line.
^[ I	Null (to prevent typing by accident).
^[ J	Null (to prevent typing by accident).
^[ M	Null (to prevent typing by accident).
^[ N	Null (to prevent typing by accident).
^[ O	Null (to prevent typing by accident).
^[ P	Null (to prevent typing by accident).
^[ R	Replace string (as ^[ ^R, but don't ask).
^[ S	Center the current line.
^[ Z	Null (to prevent typing by accident).
^[ \	Delete all surrounding spaces, tabs, and line breaks and re-insert one space.

Back to Contents.

Appendix D: The TECO Command Set

This appendix presents a summary of the MIT TECO editor's commandset. Should you actually find an ITS or TOPS-20 system and wish torun TECO on it, this appendix will be useful but not completelyreplace the full manual.

TECO implements the "one-dimensional array of bytes"editing model. Line breaks are stored as single newline characters.Large files are divided intopages. Pages within a file mustbe edited in order (i.e., all editing on the first page isdone, then the second page is read in and edited, etc.). The only wayto go backwards is to finish editing the file and to start over. Onthe other hand, on most systems, only very large files (over 64Kilobytes) are split into pages.

Commands are single or double characters. Upper/lower case isignored. The basic forms of commands are "C","nC" and "n,mC". The first form executes command"C" with the default arguments, the second supplies oneargument of "n", and the third supplies two arguments of"n" and "m".

Some commands take string arguments. The string consists of allcharacters after the command up to the terminator character. Thischaracter, calledaltmode, is the Escape character, and iswritten as "$" (for historical reasons, of course).

Commands are accumulated into acommand string, until adouble altmode terminator is entered, at which time the commands areexecuted. One of the altmodes may serve to delimit a string. Forexample:

5C$$

moves forward five characters, and the command string:

Ithen$$

inserts the string "then", and the string:

5CIthen$8R$$

moves forward five characters, then inserts the string"then", then moves backward eight characters.

TECO also supports the concept ofQ-registers. These arevariables that can either hold arbitrarily large strings of text ornumbers. Each Q-register can either hold a string or a number and maychange back and forth as desired. However, it may only hold one orthe other at any given time. Q-registers are named by singlecharacters. When holding strings, Q-registers act just like buffers:you can switch to, insert into, delete from, move around in, display,and search in them. Q-register names are either

one alphanumeric character preceded by zero, one, or two periods(names with two periods are reserved for system variables), a"variable name" of the form $name$ (these are dollar signs,not altmodes),`
a subscripting expression such as :Q(index),
a * (for certain commands, it causes them to return their datainstead of storing it),
an expression in parentheses (for certain commands), or
up to 3 periods followed by a ^R or ^^ and any ASCII character.

TECO was first written for PDP-10s running ITS (the IncompatibleTimesharing System) and its command set incorporates some knowledge ofthat system's file name syntax. Briefly, an ITS file name has thesefour parts:

DEV;DIR:PRTONE PRTTWO

Each part can be up to 6 characters long and is stored in one36-bit word. Within that word, each character is squeezed into a6-bit character set, so lower-case characters are folded into theirupper-case equivalents. The operating system and applicationsprograms maintain a default value for each of the four parts. Forexample, if a user specifies a name of "foo", the defaultdevice, directory, and second part of the file name are automaticallyfilled in. Multiple versions of a file are kept by setting the secondpart of the file name to a number. Successive versions are maintainedby incrementing the number. A second part of "<" refersto the earliest version around. A second part of ">"refers to the latest version around (for reading) or one past thelatest version around (for writing).

D.1 General notation:

^X	Denotes the specified control character (see Appendix E for a listing of all ASCII characters).
$	Denotes the altmode character unless otherwise specified.
\|	Denotes a choice (either the form on the left or the form on the right are acceptable).
m \| n \| arg	Denote integer arguments.
cmd	An arbitrary command string.
dir	Denotes a directory.
file	Denotes a file name.
k	Denotes either "m,n" or "n": either a text range of characters m through n or n successive lines.
string	Denotes a string argument.
: \| @	Modify the operation of certain commands.

D.2 Commands

n^@

Argument: if n > 0, same as ".,.+n". If n <= 0, the same as ".+n,."

m,n^@

Returns the value n - m.

m,n:^@

Returns the value n,m.

Logical xor operator.

Used for cleaning up after failed searches.

When typed from the console, it terminates the command string and starts execution.

^Fstring$

Inserts its string argument, after deleting the last string found or inserted. Same as "FKDIstring$".

Executes immediately. Erase's the command string as typed so far. Also aborts current command if one is executing.

Inserts a Back Space character.

Inserts a Tab character.

Flushes any pending values.

^Kstring$

Executes string as a system command.

Flushes any pending values.

n^N

Sets FS LINES$ to n.

:^N

Toggle the FS TTMODE$ flag.

:n^N

Does n^N:^N.

^Ofilename$

Bigprints filename on the device open for output.

^Pcmd0$cmd1$cmd2$

ASCII sort command. Assuming the point is at the start of a record, cmd0 should move the point to the start of the key, cmd1 should move the point to the end of the key, and cmd2 should move the point to the end of the record (= start of the next record). If FS ^P CASE $ is non-zero, then this command ignores case.

Quotes the next character.

Invokes Real time (Emacs-like) mode.

n^S

If n > 0, sleeps for n 30ths of a second. Otherwise, sleeps until system uptime is >= -n.

n:^S

Sleeps for at most n 30ths of a second, returning immediately when input becomes available.

Displays the current directory in a user supplied manner. Executes immediately if it is the first character in a command string.

Pops the next position off of the ring buffer of positions. Successive ^Vs move through the 8-item ring.

:^V

Returns the value on the top of the ring.

n^V

Pushes n onto the ring buffer unless n is the same as the value on the top of the ring.

n:^V

Pushes n onto the ring buffer.

Returns to the top level.

Its value is m, the first argument to m,nMq. It is only valid inside macros.

Its value is n, the second argument to m,nMq. It is only valid inside macros.

Suspends the editing process.

$ (^[)

Terminates text argument; two successive altmodes terminates a command string.

Exits from the innermost macro invocation.

^]x

Specially processes character x.

^]^X

Reads and returns the string argument which follows kMq. It is only valid inside macros.

^^x

Returns the ASCII value of "x".

Same as "+", except that just a space is not an argument.

!label!

Defines a label or a comment. It is a comment if no command attempts to go there.

arg"x then-cmd '

Conditional. It checks the arg according to condition x. Executes the command string then-cmd if the condition is true. Arg is discarded. Conditionals:

B	is arg the ASCII value of a delimiter (..D)?
C	is arg the ASCII value of a non-delimiter?
D	is arg the ASCII value of a digit?
E	is arg == 0?
G	is arg > 0?
L	is arg < 0?
N	is arg != 0?
U	is arg the ASCII value of an upper-case character?
:x	Reverses the meaning of condition x

arg"x then-cmd '"# else-cmd'

Conditional with else.

Logical or operator.

Increments the numeric contents of q and returns the result.

Logical and operator.

Terminates a conditional.

( | )

Specify precedence for argument operators.

Multiplication operator (no precedence).

Addition operator.

Separates numeric arguments.

Subtraction operator.

(by itself) Specifies the position (number) of the point. See also the special .. Q-register names listed below.

Division operator (no precedence).

0-9, .

Digits. XXX is interpreted in base FS IBASE$ (usually 10); XXX. is interpreted in base FS I.BASE$ (usually 8).

Modifies the action of certain commands.

Does nothing if n < 0. Otherwise, it passes control to the character after the next >. In other words, it is used to terminate iteration. If n is null, it uses the value of the last search.

Like ;, but reverses the condition.

@@;

Like ;, but exit if arg == 0.

:@@;

Like ;, but exit if arg != 0.

n<cmd>

Does command cmd n times, or indefinitely if n is null.

:<cmd>

Begins errset. If an error occurs inside <>, execution will resume after the >.

Types k.

k:=

Types k, omits CR/LF.

k@=

Types k in the echo area.

k@:=

Types k in the echo area, omitting CR/LF.

If first character after an error message, displays the last several command characters. Otherwise, enters trace mode.

Leaves trace mode.

Modifies the action of certain commands.

Appends the next page of the input file to the buffer.

n:A

Appends the next n lines (up to a page marker) from the input file to the buffer.

Appends all of the input file to the buffer and closes the input file.

Returns the ASCII value of the character m characters to the right of the point.

Returns the ASCII value of the character at the point.

Argument; equivalent to 0 (i.e., the beginning of the buffer). However, its value is modified by the virtual buffer boundaries.

Moves forwards n characters.

n:C

Same as moving, but returns -1 if the move succeeds or 0 if the move would fail.

Deletes forward m characters.

-mD

Deletes backward m characters (there is no equivalent to R).

E...

See E-Commands listed below.

F...

See F-Commands listed below.

Inserts the contents of Q-register q into the buffer. If q has a number, its string representation is inserted. FS INSLEN$ is set to the length of the inserted text.

m,nGq

Inserts the range m to n from Q-register q.

:Gq

Returns a copy of the string in Q-register q.

n:Gq

Returns the value of the character at position n in Q-register q.

Argument: wHole buffer: equivalent to "B,Z".

Istring$

Inserts the string at the point.

@Ixstringx

Inserts the string delimited by the "x" characters at the point (lets you insert a string that contains an altmode).

Inserts the character with ASCII value n.

m,nI

Inserts m copies of character n.

:Iq

Inserts the Q-register q into the buffer.

n:Iq

Inserts the character with ASCII value n into Q-register q.

m,n:Iq

Inserts m copies of character n into Q-register q.

:Iqstring$

Inserts the string into Q-register q, replacing any prior contents.

@n:Iq

Inserts the character whose ASCII value is n into Q-register q, replacing any prior contents.

@m,n:Iq

Same as n:Iq, but inserts m copies.

@:Iqxstringx

Same as :Iqstring$, except that string is delimited by the characters x.

Sets the point to the specified position (BJ or just J is move to start; ZJ is move to end).

n:J

Does the set and returns -1 if successful and 0 if not.

m,nK

Kills (deletes and saves the deleted text) the characters in the range; moves the point there.

Kills what L would move over.

n:K

Kills what :L would move over.

Like K, but only LFs preceded by CRs are recognized.

Moves to start of mth line after the point.

Moves to the start of the current line.

m,nL

Same as m+n-.J, used by some other commands.

m:L

Moves to the end of the m-1th line.

0:L

Moves to the end of theprevious line.

Like L, but only LFs preceded by CRs are recognized.

m,nMqstring$

Executes the contents of Q-register q as TECO commands. If the Q-register contains a number, it executes the corresponding ^R mode command.

Tail-recursive form of M. Like M then ^\, but the current function is removed from the stack before the new one is called.

Fools the called macro into thinking it was called from ^R mode.

nNstring$

Same as nSstring$, but it does P and continues the search if the end of the buffer is reached.

Olabel$

Goes to the specified label. Generates an error if the label is not found.

:Olabel$

Returns if the label is not found.

@Olabel$

Allows the label to be abbreviated.

Writes out the buffer and a ^L (page mark), kills the buffer, and reads one page from the input file. All of this is repeated n times.

m,nP

Writes out the specified range of the buffer, but does not kill it or append input.

nPW

Writes out the buffer and a ^L (page mark), no killing or reading. All of this is repeated n times.

As P, except that the low-order bit of each word written should be preserved and not cleared. Used for writing binary files.

Returns the value in Q-register q as a number. If the Q-register holds text, this returns the pointer to that text.

Moves backwards n characters (same as -nC).

n:R

Same as moving, but returns -1 if the move succeeds or 0 if the move would fail.

nSstring$

Searches forward for the nth occurrence of the string and places the point after the string. If the argument is null, the last non-null argument is used. Special characters in search:

^B	Matches any delimiter (see ..D).
^O	Divides string into alternate patterns. Thus, Sfoo^Obar$ will find the first of "foo" or "bar".
^Qx	Quotes x.
^X	Matches any character.
^Nx	Matches any character except for x, where x is any character.
^N^B	Matches any non-delimiter.
^N^X	Matches nothing.

Note that Sfoo^O$ will always succeed and will move the point overthe next three characters if and only if they are "foo".-2-(:Sfoo^O$) will do that and return non-zero if they were"foo".

-nSstring$

Same as nSstring$, but searches backwards and leaves the point before the string.

n@Sxstringx

Same as nSstring$, but the string is delimited by the character specified by the "x"s. If the argument is null, searches for the null string.

n:Sstring$

Same as nSstring$, but returns the value -1 if successful or 0 if not. If a ^O is used within -n is returned if the nth subpart is found.

Types out the text in the range (n lines or m,n characters).

Types out in the echo area.

nUq

Inserts number n into Q-register q, returns no value.

m,nUq

Inserts number n into Q-register q, returns m.

Displays what T would type. Puts "/\" where point is and does "--MORE--" processing.

Does standard buffer redisplay.

kVW

Does V, then waits for one character and returns its ASCII code as a value.

Flushes current value except when part of PW or VW.

kXq

Inserts text range k into Q-register q, replacing any prior contents.

k@Xq

Same as kXq, but appends to q.

Kills the buffer and reads one page from the input file into the buffer.

Kills the buffer and reads the rest of the input file into the buffer.

Argument: specifies the length of the buffer in characters.

Push the text or number from Q-register q onto the Q-register push down list.

Moves past number, returns its value.

Inserts a printed representation of character n (in base ..E).

m,n\

Is like n\, but pads with spaces to m columns.

Returns the representation of n as a string instead of inserting it.

Pop the text or number from the Q-register push down list into Q-register q.

Replaced by @.

n_string$

Same as nSstring$, but it does Y if the end of the buffer is reached.

Erases the last character of command string.

D.3 E-Commands (most file commands are here)

E^Udir$	Displays the specified directory in a user-defined manner.

E?file$	Tries to open file. Returns 0 if successful or an error code if not.

EC	Closes the input file.

EDfile$	Deletes the specified file.
:ED	Deletes the currently open file.

EEfile$	Same as the sequence infinityPEFfile$EC.

EFfile$	Closes the output file and changes its name to file.

EG	Inserts various information into the buffer on successive lines: the date as YYMMDD, the time as HHMMSS, the current username, the default file names, the real names of the files open for input and output, the date in text form, a 3-digit value (day of week, day of week of 1st of this year, leap year status), and the phase of the moon. There are better ways of getting most of this information.

EI	Opens a file "_TECO_ OUTPUT" for writing on the default device.
:EI	Same as EI, but uses the current default file name.
@EI	Same as EI, but sets the default device to DSK:

EJfile$	Restores the complete state from the file, which must have been saved with @EJ.
@EJfile$	Saves TECO's complete state to the file.

EL	Types out a listing of the default directory.

EM	Inserts a listing of the default directory into the buffer.

ENold$new$	Renames file old to file new.

EPfile$	Does ERfile$, then bigprints the file name twice on the device open for writing.

EQfrom$to$	Creates a link from the file "from" to the file "to".

ERfile$	Opens a file for input.

ETfile$	Sets the default file name to the specified file name.

EWdir$	Same as EI, but with the specified directory.
:EWdir file$	Same as EW, but with the specified file name.

EYdir$	Types out a listing of the specified directory.

EZdir$	Inserts a listing of the specified directory into the buffer.

E[	Push the input channel.

E\	Push the output channel.

E]	Pop the input channel.

E^	Pop the output channel.

E_old$new$	Copies file old to file new.
:E_old$new$	Copies file old to file new, preserving the old file's date.

D.4 F-Commands

m,nF^@

Returns m and n in numerical order, such that the new m will be > n.

nF^@

Returns 2 arguments that specify the range from the point to the location n lines away.

F^A

Runs every character in the buffer through a dispatch table.

nF^Bstring$

Searches in string for the character whose ASCII value is N.

@F^Bstring$

Searches the buffer forward for a character not in string.

-@F^Bstring$

Searches the buffer backward for a character not in string.

m,n@F^Bstring$

Searches the buffer in the range for a character not in string.

F^Estring$

Overwrites the next length-of-string characters with string. Same as deleting and inserting, but the gap does not need to move.

F^K

Reads a string argument from within a macro.

m,nF^Sq

Searches Q-register q for a word that contains n, starting at m.

F^X

Its value is k, all arguments to m,nMq. It is only valid inside macros.

F^Y

Its value is the number of arguments to m,nMq. It is only valid inside macros.

string:F^^

Determines whether string is a short Q-register name.

argF"x

Same as regular conditional, but passes the arg to the then or else command string.

F$ (dollar)

Returns FS CASE$ and inserts in the buffer the case shift and lock characters, if any. If FS CASE$ is non-zero, all characters are converted to uppercase (if > 0) or lower case (if < 0) on input. The case-shift character causes the next character to be read in the other case. The case-lock character temporarily complements the preferred case. On output, if FS CASE$ is odd, characters in the non-standard case will be preceded by case-shifts. If even, no translation is done.

nF$string$

Sets FS CASE$ to n and sets the case shift and lock characters to the first two characters in string.

Is like (, except that F( returns its arguments, making it easy to use a value twice without using a Q-register.

Is like ), except that F) returns its arguments exactly, discarding the data saved by (.

F*string$

Reads and discards a string argument.

F+

Clears the screen.

F6string$

Returns string with the first six characters packed into a word (this TECO is running on a 36 bit machine).

nF6

Expands n into an ASCII string and inserts it into the buffer.

n:F6

Expands n into a string.

F;tag$

Throws to the tag. This is a "long jump."

F<!tag!cmds>

Catches a throw and executes the commands.

:F<!tag!cmds>

Is an errset and a catch at the same time.

F=qstring$

Compares the Q-register q to string. Returns 0 if ==, positive if q is > string, or negative if q is < string. If value is not zero, the value's absolute value is 1 + location of the difference.

@F=qxstringx

Compares the Q-register q to string delimited by x.

m,nF=string$

Compares the buffer in the range m to n to string.

m,n@Fxstringx

Compares the buffer in the range m to n to string delimited by x.

Same as 30F?

0F?

Same as 30F?

nF?

Mbox control. Argument is a bit string,:

bit 2^0	close the gap
bit 2^1	run garbage collect
bit 2^2	clear the jump cache
bit 2^3	flush unused core
bit 2^4	close the gap if it is >5000

m,nFA

Justifies text within the range.

nFA

Justifies n lines of text.

@FA

Fills without justification.

kFBstring$

Same as Sstring$ in the domain defined by k. If k is of the form m,n and m > n, search backwards. ":" and "@" modifiers work.

kFC

Converts text range k to lower case.

k@FC

Converts text range k to upper case.

n:FC

Returns the upper-case version of the character whose ASCII value is n.

nFD

Returns the range ".,x", where "x" is the position just after the nth level down in parenthesis after the point.

-nFD

Goes backward.

Inserts a list of all TECO error messages into the buffer.

nFE

Inserts only the message whose error code is n.

@FEstring$

Returns the code of the error whose message is string.

Process an error.

@FG

Process an error and throw away type ahead.

Reads one character and returns its ASCII value.

:FI

As FI, but don't flush the character (it will be re-used).

@FI

As FI, but returns the value in the 9-bit TV character set.

@:FI

As @FI, but don't flush the character (it will be re-used).

Insert the command line used to invoke TECO into the buffer.

Returns the value - FS INSLEN$,i.e., length of the last string inserted or found by a search or FW. FK is always < 0 except for a backwards search or FW.

nFL

Returns the range ".,x", where "x" is the position just after the nth list after the point.

-nFL

Goes backward.

n@FL

As nFW, but does S-expressions.

nFLD

Same as nFLK.

nFLK

Kills what nFL implies.

nFLL

Does the move implied by the nFL.

nFLR

Same as nFLL.

nFLXq

Combines nFL with Xq.

m,nFM

Attempts to move the point so that the cursor will appear at column n, m lines below where you started.

Is the same as "[..n:I..N". It is needed to eliminate the possibility of a ^G within the string.

FOqname$

Performs a binary search of a table of fixed-length entries. It is intended for symbol tables. Q-register q contains the table and "name" is what should be searched for. The first word of the table contains the number of words for each entry in the table.

objectFP

Returns a number describing object:

-4	A number, none of the below.
-3	A number that could be in pure string space.
-2	A number that could be in impure string space.
-1	A dead buffer.
0	A living buffer.
1	A Q-vector.
100	A pure string.
101	An impure string.

FQq

Its value is the number of characters in Q-register q or -1 if the Q-register holds a number.

Updates the display.

FSname$

Returns the value of the specified variable (listed below).

FTstring$

Types its string argument.

:FTstring$

Types its string argument at the top of the screen.

@FTstring$

Types its string argument in the echo area.

@:FTstring$

Types its string argument in the echo area but only if no input is available.

nFU

Returns the range ".,x", where "x" is the position just after the nth level up in parenthesis after the point.

-nFU

Goes backward.

FVstring$

Displays its string argument.

:FVstring$

Displays its string argument, then clears the rest of the screen.

nFW

Returns the range ".,x", where "x" is the position just after the nth word after the point.

-nFW

Goes backward.

n:FW

As nFW, but only does n-1 words.

n@FW

As nFW, but does Lisp atoms and not words.

nFWD

Same as nFWK.

nFWK

Kills what nFW implies.

nFWL

Does the move implied by the nFW.

nFWR

Same as nFWL.

nFWXq

Combines nFW with Xq.

kFXq

Same as X and K combined: kXqkK.

Inserts all that remains of the input file before the point.

nFY

Inserts at most n characters.

FZfile string$

Creates and starts a non-exec fork.

FZ$

Resumes the inferior fork.

F[flag$

Pushes the value of FS flag on the Q-register PDL.

nF[flag$

Pushes and sets the flag to the new value.

nF[^R CMACRO$

Pushes the definition of the character whose number is n.

m,nF[^R CMACRO$

Pushes and sets.

Mostly the same as _, but keeps working regardless of the setting of FS _DISABLE$.

F]flag$

Pops the value of FS flag from the Q-register PDL.

nF]^R CMACRO$

Pops the definition of the character whose number is n.

Like F=, but both strings are compared as if converted to upper case.

D.5 Special Q-registers, names are of the form "..x"

..0	^P puts its three arguments into these.
..1
..2
..A	Holds the string to represent the cursor (default is "/\").
..B	Holds the macro to display the user buffer.
..D	Holds the delimiter dispatch table, which tells several commands (FW, FL, "B, "C and search ^B) how to treat ASCII characters.
..E	Holds the output radix for = and \.
..F	Holds the ^R secretary macro. Can be used for auto save.
..G	Holds the user-specified directory display macro.
..H	Is the "suppress-display" flag.
..I	Holds the value of . at the start of the command.
..J	Holds user-specified label for --MORE-- processing.
..K	Holds deleted text.
..L	Executes when TECO first starts.
..N	Macro that to be executed when another macro exits.
..O	The current buffer.
..P	Holds the user-defined error-handler macro.
..Q	Holds the symbol table used to define TECO variables.
..Z	Safety backup copy of ..O.

D.6 FS Variables

Names can be up to six characters long. Spaces in names areignored. Only as much of a name as is required to make it unique isrequired, although programs should include the entire name. SayingFSname$ returns the value of the flag. Saying nFSname$ or m,nFSname$sets the value. If a flag can be set and you want to use the flag asthe second operand of an arithmetic operator (e.g.,.+FSname$C), enclose the FS in parentheses (.+(FSname$)C).

These names can never include control characters. The"^" in some of the names is a leading caret. However, thecombination usually relates to the implied control character.

% BOTTOM

Size of the bottom margin as a percentage of the number of lines being displayed.

% CENTER

Where TECO should prefer to put the cursor.

% END

Size of the area at the bottom of the screen, such that TECO should never choose to put the cursor there.

% OPLSP

(Read only) Non-zero if the input is coming from a Lisp job

% TOP

Size of the top margin (analogous to %BOTTOM).

% TOCID

(Read only) Non-zero if the terminal can insert and delete characters.

% TOFCI

(Read only) Non-zero if the terminal can generate 9-bit characters.

% TOHDX

(Read only) Non-zero if the terminal is half-duplex.

% TOLID

(Read only) Non-zero if the terminal can insert and delete lines.

% TOLWR

(Read only) Non-zero if the terminal can generate lower case characters.

% TOMOR

(Read only) Non-zero if the use wants --MORE-- processing.

% TOOVR

(Read only) Non-zero if the terminal can overprint.

% TOROL

(Read only) Non-zero if the user has selected scroll mode.

% TOSAI

(Read only) Non-zero if the terminal can print the SAIL character set.

*RSET

Initially 0. If set to non-zero, trace information is not cleared automatically.

.CLRMOD

Normally -1. If negative, screen is normally cleared automatically. If 0, automatic screen clears are not done (used for debugging). If positive, the screen is never cleared.

.KILMOD

Normally -1. If 0, FS BKILL$ doesn't actually do the kill.

.TYI BACK

Backs up the point FS .TYI PT$ by one step. After backing up n steps, you can use FS .TYI NXT$ to re-get those n input characters.

.TYI NXT

Extracts one character from the ring buffer of past input characters.

.TYI PT

Pointer into the ring buffer that contains the last 60 input characters.

:EJ PAGE

Is the number of the lowest page used by :EJ'd shared pure files.

ADLINE

Is the line size used by the FA command.

ALTCOUNT

Is the number of $$s that TECO has seen at interrupt level.

BACK ARGS

(Read only) Returns the arguments to a macro in a different stack frame (i.e., one of the macros that was called that eventually called you). Returns 0, 1, or 2 values in the same ways that F^X does. If the argument to this is 0 or positive, it returns the arguments for the specified frame number (0 is outermost). If negative, returns the arguments for the relative frame number (-1 is your caller).

BACK DEPTH

(Read only) Returns the number of stack frames, not counting you.

BACK PC

Returns the PC of the stack frame that is specified in the same way as FS BACK ARGS$. m,nFS BACK PC$ sets the PC to m.

BACK QP PTR

(Read only) Specifies where a ^\ will return to. Arguments are as for FS BACK ARGS$.

BACK RETURN

(Write only) Returns from the specified stack frame. Arguments are as for FS BACK ARGS$. -1 FS BACK RETURN$ is equivalent to ^\.

BACK STRING

(Read only) Returns a pointer to the string or buffer being executed. Arguments are as for FS BACK ARGS$.

BACKTRACE

Returns a copy of the program being run by the stack frame. Arguments are as for FS BACK ARGS$.

BBIND

is useless, but F[B BIND$ and F]B BIND$ are useful for pushing to and popping from a temporary buffer.

BCONS

(as in n FS BCONS$) returns a new buffer n characters long. It is initially filled with 0's (NULs).

BCREATE

is like FS BCONS$ U..0. In other words, the buffer is selected instead of returned.

BKILL

Kills the specified buffer.

BOTHCASE

Initially 0. If == 0, case is significant during searches. If > 0, case is ignored. If < 0, case of special characters (@[\]^_ and `{|}~^?) is also ignored.

BOUNDARIES

Reads or sets the virtual buffer boundaries.

BS NO LF

If non-zero, suppresses the LF that follows any backward motion or rubbing out in ^R mode on printing terminals.

CASE

Like F$, but neither gets nor sets the case-shift or case-lock characters.

CLK INTERVAL

Is the interval between real time clock ticks in 1/60 seconds. Only active during user input.

CLK MACRO

Is the real-time interrupt handler macro. If the macro types anything out, it must not leave ..H set.

CTL MTA

If negative, it suppresses the ^R mode definitions for all control-meta characters. This makes it easy to edit TECO commands.

DATA SWITCHES

(Read only) The contents of the PDP-10 console switches.

DATE

(Read only) The current date and time as a number in file-date format. It can be fed to FS FD CONVERT$ or FS IF CDATE$.

D DEVICE

Is the default device name.

DD FAST

(Read only) Is non-zero if the current device is fast (i.e., local disk).

D FILE

Is the default file name.

D FN1

Is the default file name first part.

D FN2

Is the default file name second part.

D FORCE

Setting this to non-zero forces a complete redisplay of everything except the mode line. It is used for putting up temporary displays.

D SNAME

Is the default sname.

D VERSION

Is the default versions number, a reflection of FD D FN2$. If the latter is numeric, reading this value returns the corresponding number. If it is ">" or "<", this value is 0 or -2, respectively. If it is not numeric, this returns -1. if FD D FN2$ is numeric, setting this value sets the file name. Otherwise, the setting is ignored.

D WAIT

When set to non-zero, causes the display to pause slightly between lines of output.

ECHO ACTIVE

When set to non-zero, indicates that output has been written to the echo area, so the echo area needs to be cleaned up.

ECHO CHAR

When a ^R mode character is being executed, this value holds the character that caused the invocation.

ECHO DISPLAY

(Write only) As for FS ECHO OUT$, but outputs in display mode.

ECHO ERRORS

When set to non-zero, error messages are printed in the echo area.

ECHO FLUSH

When set to non-zero, automatic clearing of the echo area in ^R mode is enabled.

ECHO LINES

Then number of lines at the screen bottom that can be used for command echoing.

ECHO OUT

(Write only) Used for outputs to the echo area. If it has a numeric argument, the argument is the ASCII code of a character to echo. With a string argument, the string is echoed.

ERR

Same as FS ERROR$ if read. If written to, creates an error with the specified error code.

ERRFLAG

When negative, signals to redisplay that the first -n lines of the display contain an error message and should not be overwritten.

ERROR

The error code of the most recent error.

ERR THROW

(Write only) Return to the innermost error catcher.

EXIT

(Write only) Does a .break 16.

FDCONVERT

With a numeric argument, converts it from an ITS file date to a string of the form "dd/mm/yy hh:mm:ss" and inserts the string into the buffer. The form n:FS FDCONVERT$ returns the string. With no argument, reads the string from the buffer and converts it to numeric form.

FILE PAD

Is the character used to pad the last word of files. Usually 3 (^C).

FLUSHED

Is set to non-zero if a --MORE-- has been flushed, and thus output is being discarded.

FNAM SYNTAX

Controls what happens when only one file name is present. If this is 0, the file name is used as part two. If > 0, the file name is used as part one. If < 0, the file name is used as part one and ">" is used for part two.

GAP LENGTH

(Read only) The length of the gap.

GAP LOCATION

(Read only) The buffer position of the gap.

HEIGHT

(Read only) The number of lines on the screen.

HELP CHAR

Contains the character used for the help character. Normally, ^_. If set to -1, help is not recognized (e.g., useful for ^Q).

HELP MAC

Is the macro to execute when the help character is typed.

H POSITION

(Read only) Returns the column that the point is in.

HSNAME

The user's home directory.

I&D CHR

When set to non-zero, TECO tries to use the terminal's insert and delete character functions.

I&D LINE

When set to non-zero, TECO tries to use the terminal's insert and delete line functions.

IBASE

The input radix for numbers not ended by ".". Initially 8 + 2.

I.BASE

The input radix for numbers ended by ".". Initially 8.

IF ACCESS

(Write only) Sets the access pointer for the input file.

IF CDATE

The creation data for the input file.

IF DEVICE

(Read only). The device for the input file.

IF DUMP

The dumped bit for the input file.

I FILE

(Read only) The name of the input file.

IF FN1

(Read only) The first name of the input file.

IF FN2

(Read only) The second name of the input file.

IF LENGTH

(Read only) The length of the input file.

IF REAP

The reap bit for the input file.

IF SNAME

(Read only) The sname of the input file.

IF VERSION

(Read only) The version number of the input file or FS IF FN2$.

IMAGE OUT

Outputs its argument in super-image mode (no translations at all).

IN COUNT

Is an old name for FS TYI COUNT$

INSLEN

Is the length of the last string inserted with "I", "G", or "\", or found with "S" or "FW". It will be negative after a backward search.

JNAME

(Read only) Returns the jobname.

JRN EXECUTE

(Write only) Opens a journal file for playing back. The form :FS JRN EXECUTE$ closes the file. The default file names are used.

JRN IN

(Read only) Is non-zero when a journal file is being replayed.

JRN INHIBIT

When set to non-zero, input is taken from the terminal even though a journal file is being replayed. This is how FS JRN MACRO$ can work.

JRN INTERVAL

Specifies how often a journal file being recorded is updated on disk. The interval is in units of commands.

JRN MACRO

This macro is called when a journal file is being replayed and TECO encounters a colon or ^G in the file. The character is passed as an argument. In the case of a ^G, the macro should execute a ^R and then quit by doing -1 FS QUIT$. In the case of a colon, this macro should read more characters from the file by doing FS JRN READ$ and acting upon them.

JRN OPEN

(Write only) Opens a journal file for writing (recording). The default file names are used. The form :FS JRN OPEN$ closes the file.

JRN OUT

(Read only) Is non-zero when a journal file is being recorded.

JRN READ

(Read only) Reads a character from the journal file being replayed. If there is no such file, it returns a random value.

JRN WRITE

(Write only) Outputs its argument, either a character or a string, to the journal file being written. If there is no such file, it does nothing.

LAST PAGE

(Read only) Set to -1 when a file is opened and set to 0 when the last character has been read.

LINES

Is the number of lines used by a standard buffer redisplay. 0 means to use the whole screen.

LISPT

When set to non-zero, it means that text is supposed to be passed between TECO and its superior.

LISTEN

Returns non-zero if input is available to be read by FI. If it is given an argument and no input is available, the argument is typed out.

MACHINE

(Read only) Returns the name of the machine that TECO is running on.

MODE CHANGE

When set to non-zero, the FS MODE MACRO$ needs to be run eventually.

MODE MACRO

The macro to update Q-register ..J and the mode line.

MODIFIED

When set to non-zero, the buffer has been changed since last read or written.

MP DISPLAY

(Write only) Outputs text to the main program display.

MSNAME

The name of the working directory.

NOOP ALTMODE

When set to a negative number, an altmode is considered a no-op. When set to 0, an altmode is considered an error. When set to a positive number, altmode ends execution as ^_ does.

NOQUIT

Gives the user control over ^G.

OF ACCESS

(Write only) Sets the access pointer for the output file.

OF CDATE

The creation data for the output file.

O FILE

(Read only) The name of the output file.

OF LENGTH

(Read only) The length of the output file.

OF VERSION

(Read only) The version number of the output file or FS OF FN2$.

OLD FLUSHED

Saves the value of FS FLUSHED$ when that is set to zero upon returning to ^R.

OLD MODE

Is the last Q-register ..J actually displayed.

OSPEED

The terminal's output speed in baud or 0 if the speed is not known.

OUTPUT

When set to non-zero, suppresses output to the EW'd file.

PAGENUM

The number of form feeds read from the input file.

PJATTY

Set to a negative value whenever TECO detects that the terminal has been taken away. This negative value means that a complete redisplay must be done.

PROMPT

The ASCII value of the prompt character.

PUSHPT

(Write only) Same as n^V.

QP HOME

Returns a string that says where the Q-register PDL (Push Down List == stack) slot n was pushed from. The form :FS QP HOME$ returns a pointer to the Q-register. The form n@FS QP HOME$ converts the pointer returned by :FS QP HOME$ into the string form.

QP PTR

The Q-register PDL pointer.

QP SLOT

Read the specified PDL slot.

QP UNWIND

(Write only) Like FS QP PTR$ but pops slots back into the Q-registers they came from.

QUIT

When set to a negative value, execution will quit at the next opportunity.

Q VECTOR

Returns an n character long newly-consed up Q-register vector.

RANDOM

Reads or sets the random number generator seed.

READ ONLY

When set to non-zero. Attempt to modify the buffer become an error.

REAL ADDRESS

Returns the value of the machine address of the start of the buffer.

REFRESH

When set to non-zero, this macro is executed whenever TECO really clears the whole screen. It is executed after the screen has been cleared.

REREAD

When set to non-negative, the 9-bit TV code will be read by the next invocation of FI.

RGETTY

0 if printing terminal, or contains the tctyp word of a display terminal.

RUB CRLF

When set to non-zero, both characters of a CR/LF pair are erased together.

RUB MACRO

The macro called by ^R mode when it wants to do a ^? or ^D.

RUNTIME

(Read only) TECO's runtime in milliseconds.

SAIL

When set to non-zero, the terminal is assumed to support the SAIL character set.

S ERROR

When set to non-zero, a failing search within an iteration or a ^P sort will generate an error.

SHOW MODE

When set to non-zero, FR will type on the mode line on a printing terminal. Has no effect on displays.

S HPOS

Is the horizontal position of the point when everything is taken into account, but assuming an infinitely wide line.

S STRING

Is the default search string.

STEP MACRO

When set to non-zero and numeric, TECO displays the buffer and waits at the start of every line in a program. When set to non-zero and a string, TECO executes this macro at the beginning of every line in a program. Macros that start with W are never stepped.

STEP DEPTH

When set to -1, stepping occurs always. Otherwise, it is the number of the stack level at which to cut off stepping.

SUPERIOR

Is the macro invoked when superiors want to put text into TECO.

S VALUE

Is the value returned by last search command.

TOP LINE

The number of the first line of the screen that TECO should use.

TRACE

When set to non-zero, TECO is in trace mode. See ?.

TRUNCATE

If negative, long lines should be truncated. If 0 or positive, long lines are wrapped to the next line.

TTMODE

When set to non-zero, tells TECO that normal buffer display should display on printing terminals.

TTY INIT

(Re)initializes TECO's TTY information.

TTY MACRO

Performs user-specified TTY initialization.

TTYOPT

(Read only) The TTYOPT word for the terminal. Use the %TOxxx values instead.

TTYSMT

(Read only) The TTYSMT word for the terminal.

TYI BEG

The value of FS TYI COUNT$ the last time through the main ^R command loop.

TYI COUNT

The number of characters read so far.

TYI SINK

When set to non-zero, is a macro that is executed every time a character is actually read from the terminal.

TYI SOURCE

When set to non-zero, it a macro that is called to obtain "terminal input."

TYO HASH

Returns the hash code of screen line n. Doing -1,n FS TYO HASH$ forces line n to be redisplayed.

TYO HPOS

(Read only) Holds the horizontal position at which type out will next appear.

TYO VPOS

(Read only) Holds the vertical position at which type out will next appear.

TYPEOUT

Tells where type out will next appear. If -1, the next type out will appear at the top of the screen. Otherwise, type out will appear just after the last type out.

U HSNAME

Determines a user's hsname.

UINDEX

(Read only) The user index of the TECO job.

U MAIL FILE

The complete file name of the user's mail file.

UNAME

(Read only) The user name of the TECO job.

UPTIME

(Read only) Returns the time that the system has been up in units of 1/30 second.

UREAD

(Read only) Is -1 if an input file is open, otherwise it is zero.

UWRITE

(Read only) Is -1 if an output file is open, otherwise it is zero.

VAR MACRO

When set to non-zero, a macro can be run whenever a variable is set.

V B

Is the distance between the real beginning of the buffer and the virtual beginning.

VERBOSE

When set to non-zero, TECO prints long error messages. Otherwise, TECO prints only short messages and ^X must be typed to see the long version.

VERSION

(Read only) The TECO version number.

V Z

Is the distance between the real end of the buffer and the virtual end.

WIDTH

Width of the terminal in characters.

WINDOW

The position of the first character in the display window, relative to the virtual beginning of the buffer.

WORD

Gets or sets words in the current buffer.

XJNAME

(Read only) Returns the xjname of the TECO job.

X MODIFIED

Just like FS MODIFIED$, only it doesn't affect the display of the modified flag in the mode line. Thus, the user can track whether changes were made by intervening commands.

X PROMPT

Printed and zeroed with each printing terminal prompt.

XUNAME

(Read only) Returns the xjname of the TECO job.

Y DISABLE

When set to 0, the Y command is legal. When set to 1, the Y command is always illegal. When set to -1, the Y command is treated as @Y.

(Read only) The number of characters in the buffer.

^H PRINT

When set to negative, a ^H on output will backspace and overprint. Otherwise, ^H will type as a ^ and H.

^I DISABLE

When set to 0, ^I is an insert command. When set to 1, ^I is illegal. When set to -1, ^I is a no-op.

^L INSERT

When set to non-zero, form feeds read from files always go into the buffer and P and PW never output anything except what is in the buffer.

^M PRINT

When set to negative, a ^M on output will output as a CR/LF. Otherwise, ^M will type as a ^ and M.

^P CASE

When set to non-zero, ^P ignores case.

^R ARG

Is the explicit numeric argument and is 0 (not 1!) if no argument was entered

^R ARGP

Describes the ^R command's td.

bit 2^0	set if any argument was specified
bit 2^1	set if a number was typed
bit 2^2	set if the argument is negative

^R CCOL

The comment column.

^R CMACRO

Converts the ASCII value n to a form required for ^R command dispatch.

^R DISPLAY

When set to non-zero, this macro is executed whenever ^R is about to do a non-trivial redisplay.

^R EXIT

(Write only) Exits from the innermost ^R invocation.

^R ECHO

When set to 1, the characters read in by ^R should not be echoed. When set to 0, they should be echoed only on printing terminals. When set to -1, they should be echoed on all terminals.

^R EC SD

When set to non-zero, this macro is executed whenever a space command is typed. Used for auto-filling and such.

^R ENTERED

When set to non-zero, this macro is executed whenever ^R is invoked.

^R EXPT

Is the ^U count for the next ^R mode command.

^R H MIN

(Read only) Is the leftmost horizontal position requiring redisplay.

^R HPOS

The current horizontal position of the cursor.

^R INDIRECT

Given a 9-bit character, follows the alias definitions to find what it is equivalent to.

^R INHIBIT

When set to non-zero, ^R will not update the display.

When set back to zero, ^R will catch up.^R INIT

Returns the initial definition of the character whose ASCII value is n.

^R INSERT

Inserts its argument.

^R LAST

Holds the most recent character read by any ^R.

^R LEAVE

When set to non-zero, this macro is executed whenever ^R returns.

^R MARK

Records the position of the mark.

^R MAX

The maximum number of characters of insertions or deletions printed out by ^R on a printing terminal before it switches to printing a description of the change. Default is 50.

^R MCNT

The secretary mode counter.

^R MDLY

The secretary mode limit value.

^R MODE

(Read only) Non-zero while in ^R mode.

^R MORE

When positive, --MORE-- is used for the ^R mode line instead of --TOP--, --BOT--, and --nn%--. This is used when in an environment where Space means "show me the next screenful." When negative, no --XXX-- is displayed.

^R NORMAL

When set to non-zero, this macro is executed for all normally self-insert characters.

^R PAREN

When set to non-zero, this macro is executed for every ")" character.

^R PREVIOUS

Holds the previous (second most recent) command.

^R REPLACE

When set to non-zero, ^R runs in "replace" mode instead of "insert" mode.

^R RUBOUT

The internal ^R rubout routine.

^R SCAN

When set to non-zero and a printing terminal is in use, displays characters that are being moved past.

^R STAR

When set to non-zero, a star appears in the mode line if the buffer has been modified.

^R SUPPRESS

When set to 0 or positive, built-in ^R mode commands are suppressed and characters insert.

^R THROW

Returns control to the innermost invocation of ^R.

^R UNSUPP

When set to -1, one character will be unsuppressed.

^R V MIN

(Read only) Is the topmost line requiring redisplay.

^R VPOS

The current vertical position of the cursor.

_ DISABLE

When 0, _ is "search and yank." When 1, _ is illegal. When -1, _ is treated like -.

Back to Contents.

Appendix E: ASCII Chart

This character set is as specified in ANS standards, and is knownas the ASCII character set. It has been extended to include Metacharacters (characters with their top, or eighth, bit turned on).

Decimal	Octal	Hex	Graphic	Name (Meaning)

0.	000	00	^@	NUL (used for padding)
1.	001	01	^A	SOH (start of header)
2.	002	02	^B	STX (start of text)
3.	003	03	^C	ETX (end of text)
4.	004	04	^D	EOT (end of transmission)
5.	005	05	^E	ENQ (enquiry)
6.	006	06	^F	ACK (acknowledge)
7.	007	07	^G	BEL (bell or alarm)
8.	010	08	^H	BS (backspace)
9.	011	09	^I	HT, TAB (horizontal tab)
10.	012	0A	^J	LF (line feed)
11.	013	0B	^K	VT (vertical tab)
12.	014	0C	^L	FF (form feed, new page)
13.	015	0D	^M	CR (carriage return)
14.	016	0E	^N	SO (shift out)
15.	017	0F	^O	SI (shift in)

16.	020	10	^P	DLE (data link escape)
17.	021	11	^Q	DC1, XON (device control 1)
18.	022	12	^R	DC2 (device control 2)
19.	023	13	^S	DC3, XOFF (device control 3)
20.	024	14	^T	DC4 (device control 4)
21.	025	15	^U	NAK (negative acknowledge)
22.	026	16	^V	SYN (synchronous idle)
23.	027	17	^W	ETB (end transmission block)
24.	030	18	^X	CAN (cancel)
25.	031	19	^Y	EM (end of medium)
26.	032	1A	^Z	SUB (substitute)
27.	033	1B	^[	ESC (escape, alter mode, SEL)
28.	034	1C	^\	FS (file separator)
29.	035	1D	^]	GS (group separator)
30.	036	1E	^^	RS (record separator)
31.	037	1F	^_	US (unit separator)

32.	040	20		space or blank
33.	041	21	!	exclamation mark
34.	042	22	"	double quote
35.	043	23	#	number sign (hash mark)
36.	044	24	$	dollar sign
37.	045	25	%	percent sign
38.	046	26	&	ampersand sign
39.	047	27	'	single quote (apostrophe)
40.	050	28	(	left parenthesis
41.	051	29	)	right parenthesis
42.	052	2A	*	asterisk (star)
43.	053	2B	+	plus sign
44.	054	2C	,	comma
45.	055	2D	-	minus sign (dash)
46.	056	2E	.	period (decimal point, dot)
47.	057	2F	/	(right) slash

48.	060	30	0	numeral zero
49.	061	31	1	numeral one
50.	062	32	2	numeral two
51.	063	33	3	numeral three
52.	064	34	4	numeral four
53.	065	35	5	numeral five
54.	066	36	6	numeral six
55.	067	37	7	numeral seven
56.	070	38	8	numeral eight
57.	071	39	9	numeral nine
58.	072	3A	:	colon
59.	073	3B	;	semi-colon
60.	074	3C	<	less-than sign
61.	075	3D	=	equal sign
62.	076	3E	>	greater-than sign
63.	077	3F	?	question mark

64.	100	40	@	atsign
65.	101	41	A	upper-case letter ALPHA
66.	102	42	B	upper-case letter BRAVO
67.	103	43	C	upper-case letter CHARLIE
68.	104	44	D	upper-case letter DELTA
69.	105	45	E	upper-case letter ECHO
70.	106	46	F	upper-case letter FOXTROT
71.	107	47	G	upper-case letter GOLF
72.	110	48	H	upper-case letter HOTEL
73.	111	49	I	upper-case letter INDIA
74.	112	4A	J	upper-case letter JERICHO
75.	113	4B	K	upper-case letter KAPPA
76.	114	4C	L	upper-case letter LIMA
77.	115	4D	M	upper-case letter MIKE
78.	116	4E	N	upper-case letter NOVEMBER
79.	117	4F	O	upper-case letter OSCAR

80.	120	50	P	upper-case letter PAPPA
81.	121	51	Q	upper-case letter QUEBEC
82.	122	52	R	upper-case letter ROMEO
83.	123	53	S	upper-case letter SIERRA
84.	124	54	T	upper-case letter TANGO
85.	125	55	U	upper-case letter UNICORN
86.	126	56	V	upper-case letter VICTOR
87.	127	57	W	upper-case letter WHISKEY
88.	130	58	X	upper-case letter XRAY
89.	131	59	Y	upper-case letter YANKEE
90.	132	5A	Z	upper-case letter ZEBRA
91.	133	5B	[	left square bracket
92.	134	5C	\	left slash (backslash)
93.	135	5D	]	right square bracket
94.	136	5E	^	uparrow (caret)
95.	137	5F	_	underscore

96.	140	60	`	(single) back quote (grave accent)
97.	141	61	a	lower-case letter alpha
98.	142	62	b	lower-case letter bravo
99.	143	63	c	lower-case letter charlie
100.	144	64	d	lower-case letter delta
101.	145	65	e	lower-case letter echo
102.	146	66	f	lower-case letter foxtrot
103.	147	67	g	lower-case letter golf
104.	150	68	h	lower-case letter hotel
105.	151	69	i	lower-case letter india
106.	152	6A	j	lower-case letter jericho
107.	153	6B	k	lower-case letter kappa
108.	154	6C	l	lower-case letter lima
109.	155	6D	m	lower-case letter mike
110.	156	6E	n	lower-case letter november
111.	157	6F	o	lower-case letter oscar

112.	160	70	p	lower-case letter pappa
113.	161	71	q	lower-case letter quebec
114.	162	72	r	lower-case letter romeo
115.	163	73	s	lower-case letter sierra
116.	164	74	t	lower-case letter tango
117.	165	75	u	lower-case letter unicorn
118.	166	76	v	lower-case letter victor
119.	167	77	w	lower-case letter whiskey
120.	170	78	x	lower-case letter xray
121.	171	79	y	lower-case letter yankee
122.	172	7A	z	lower-case letter zebra
123.	173	7B	{	left curly brace
124.	174	7C	\|	vertical bar
125.	175	7D	}	right curly brace
126.	176	7E	~	tilde
127.	177	7F	^?	DEL (delete, rub out)

128.	200	80	~^@	Meta NUL (used for padding)
129.	201	81	~^A	Meta SOH (start of header)
130.	202	82	~^B	Meta STX (start of text)
131.	203	83	~^C	Meta ETX (end of text)
132.	204	84	~^D	Meta EOT (end of transmission)
133.	205	85	~^E	Meta ENQ (enquiry)
134.	206	86	~^F	Meta ACK (ackowledge)
135.	207	87	~^G	Meta BEL (bell or alarm)
136.	210	88	~^H	Meta BS (backspace)
137.	211	89	~^I	Meta HT, TAB (horizontal tab)
138.	212	8A	~^J	Meta LF (line feed)
139.	213	8B	~^K	Meta VT (vertical tab)
140.	214	8C	~^L	Meta FF (form feed, new page)
141.	215	8D	~^M	Meta CR (carriage return)
142.	216	8E	~^N	Meta SO (shift out)
143.	217	8F	~^O	Meta SI (shift in)

144.	220	90	~^P	Meta DLE (data link escape)
145.	221	91	~^Q	Meta DC1, XON (device control 1)
146.	222	92	~^R	Meta DC2 (device control 2)
147.	223	93	~^S	Meta DC3, XOFF (device control 3)
148.	224	94	~^T	Meta DC4 (device control 4)
149.	225	95	~^U	Meta NAK (negative acknowledge)
150.	226	96	~^V	Meta SYN (synchronous idle)
151.	227	97	~^W	Meta ETB (end transmission block)
152.	230	98	~^X	Meta CAN (cancel)
153.	231	99	~^Y	Meta EM (end of medium)
154.	232	9A	~^Z	Meta SUB (substitute)
155.	233	9B	~^[	Meta ESC (escape, alter mode, SEL)
156.	234	9C	~^\	Meta FS (file separator)
157.	235	9D	~^]	Meta GS (group separator)
158.	236	9E	~^^	Meta RS (record separator)
159.	237	9F	~^_	Meta US (unit separator)

160.	240	A0	~	Meta space
161.	241	A1	~!	Meta exclamation mark
162.	242	A2	~"	Meta double quote
163.	243	A3	~#	Meta number sign (hash mark)
164.	244	A4	~$	Meta dollar sign
165.	245	A5	~%	Meta percent sign
166.	246	A6	~&	Meta ampersand sign
167.	247	A7	~'	Meta single quote (apostrophe)
168.	250	A8	~(	Meta left parenthesis
169.	251	A9	~)	Meta right parenthesis
170.	252	AA	~*	Meta asterisk (star)
171.	253	AB	~+	Meta plus sign
172.	254	AC	~,	Meta comma
173.	255	AD	~-	Meta minus sign (dash)
174.	256	AE	~.	Meta period (decimal point, dot)
175.	257	AF	~/	Meta (right) slash

176.	260	B0	~0	Meta numeral zero
177.	261	B1	~1	Meta numeral one
178.	262	B2	~2	Meta numeral two
179.	263	B3	~3	Meta numeral three
180.	264	B4	~4	Meta numeral four
181.	265	B5	~5	Meta numeral five
182.	266	B6	~6	Meta numeral six
183.	267	B7	~7	Meta numeral seven
184.	270	B8	~8	Meta numeral eight
185.	271	B9	~9	Meta numeral nine
186.	272	BA	~:	Meta colon
187.	273	BB	~;	Meta semi-colon
188.	274	BC	~<	Meta less-than sign
189.	275	BD	~=	Meta equal sign
190.	276	BE	~>	Meta greater-than sign
191.	277	BF	~?	Meta question mark

192.	300	C0	~@	Meta atsign
193.	301	C1	~A	Meta upper-case letter ALPHA
194.	302	C2	~B	Meta upper-case letter BRAVO
195.	303	C3	~C	Meta upper-case letter CHARLIE
196.	304	C4	~D	Meta upper-case letter DELTA
197.	305	C5	~E	Meta upper-case letter ECHO
198.	306	C6	~F	Meta upper-case letter FOXTROT
199.	307	C7	~G	Meta upper-case letter GOLF
200.	310	C8	~H	Meta upper-case letter HOTEL
201.	311	C9	~I	Meta upper-case letter INDIA
202.	312	CA	~J	Meta upper-case letter JEHRICHO
203.	313	CB	~K	Meta upper-case letter KAPPA
204.	314	CC	~L	Meta upper-case letter LIMA
205.	315	CD	~M	Meta upper-case letter MIKE
206.	316	CE	~N	Meta upper-case letter NOVEMBER
207.	317	CF	~O	Meta upper-case letter OSCAR

208.	320	D0	~P	Meta upper-case letter PAPPA
209.	321	D1	~Q	Meta upper-case letter QUEBEC
210.	322	D2	~R	Meta upper-case letter ROMEO
211.	323	D3	~S	Meta upper-case letter SIERRA
212.	324	D4	~T	Meta upper-case letter TANGO
213.	325	D5	~U	Meta upper-case letter UNICORN
214.	326	D6	~V	Meta upper-case letter VICTOR
215.	327	D7	~W	Meta upper-case letter WHISKEY
216.	330	D8	~X	Meta upper-case letter XRAY
217.	331	D9	~Y	Meta upper-case letter YANKEE
218.	332	DA	~Z	Meta upper-case letter ZEBRA
219.	333	DB	~[	Meta left square bracket
220.	334	DC	~\	Meta left slash (backslash)
221.	335	DD	~]	Meta right square bracket
222.	336	DE	~^	Meta uparrow (caret)
223.	337	DF	~_	Meta underscore

224.	340	E0	~`	Meta (single) back quote (grave acent)
225.	341	E1	~a	Meta lower-case letter alpha
226.	342	E2	~b	Meta lower-case letter bravo
227.	343	E3	~c	Meta lower-case letter charlie
228.	344	E4	~d	Meta lower-case letter delta
229.	345	E5	~e	Meta lower-case letter echo
230.	346	E6	~f	Meta lower-case letter foxtrot
231.	347	E7	~g	Meta lower-case letter golf
232.	350	E8	~h	Meta lower-case letter hotel
233.	351	E9	~i	Meta lower-case letter india
234.	352	EA	~j	Meta lower-case letter jericho
235.	353	EB	~k	Meta lower-case letter kappa
236.	354	EC	~l	Meta lower-case letter lima
237.	355	ED	~m	Meta lower-case letter mike
238.	356	EE	~n	Meta lower-case letter november
239.	357	EF	~o	Meta lower-case letter oscar

240.	360	F0	~p	Meta lower-case letter pappa
241.	361	F1	~q	Meta lower-case letter quebec
242.	362	F2	~r	Meta lower-case letter romeo
243.	363	F3	~s	Meta lower-case letter sierra
244.	364	F4	~t	Meta lower-case letter tango
245.	365	F5	~u	Meta lower-case letter unicorn
246.	366	F6	~v	Meta lower-case letter victor
247.	367	F7	~w	Meta lower-case letter whiskey
248.	370	F8	~x	Meta lower-case letter xray
249.	371	F9	~y	Meta lower-case letter yankee
250.	372	FA	~z	Meta lower-case letter zebra
251.	373	FB	~{	Meta left curly brace
252.	374	FC	~\|	Meta vertical bar
253.	375	FD	~}	Meta right curly brace
254.	376	FE	~~	Meta tilde
255.	377	FF	~^?	Meta DEL (delete, rub out)

These forms can be used to prevent ambiguity:

Decimal	Octal	Hex	Graphic	can be printed as...

94.	136	5E	^	^=
126.	176	7E	~	^~
222.	336	DE	~^	~^=
254.	376	FE	~~	~^~

List of all 94 basic characters:

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Back to Contents.

Bibliography

This bibliography is in two parts. The first part is the list ofpublications used in the preparation of this book. The second part isthe annotated bibliography from the thesis. Documents that are markedwith "*" are especially valuable or interesting.

1 Current

American National Standards Institute (1990) X3J11 ProgrammingLanguage C. New York: ANSI.

American National Standards Institute (1983)ANSI/MIL-STD-1815A-1983 Reference Manual for the Ada ProgrammingLanguage. New York: Springer-Verlag. ISBN 0-387-90887-0.

*Apple Computer Corp. (1987) Human Interface Guidelines: the AppleDesktop Interface. Reading, Massachusetts: Addison-Wesley. ISBN0-201-17753-6.

ibid. (1989) Release 7.0 Macintosh Script Management System(unreleased preliminary). Cuptertino, California: Apple ComputerCorp.

*Brooks, Frederick P. (1982) The Mythical Man-Month. Reading,Massachusetts: Addison-Wesley. ISBN 0-201-00650-2.

*Caplan, Ralph (1982) By Design: Why There Are No Locks on theBathroom Doors in the Hotel Louis XIV and Other Object Lessons. NewYork: McGraw-Hill. ISBN 0-07-009777-1.

Carroll, Lewis (1865) Alice's Adventures in Wonderland. London:Macmillian and Co.

ibid. (1871) Through the Looking-Glass and What Alice FoundThere. London: Macmillian and Co.

Crowley, Terrence; Forsdick, Harry; Landau, Matt; Travers, Virginia(1987) The Diamond Multimedia Editor. USENIX Proceedings, Summer1987.

Finseth, Craig A. (June 1980) Theory and Practise of Text Editing-- or -- A Cookbook for an Emacs. Cambridge, Massachusetts:M.I.T. Laboratory for Computer Science. Technical Memo TM-165.

Hammer, Michael; Ilson, Richard; Anderson, Timothy; Gilbert, EdwardJ.; Good, Michael; Niamir, Bahram; Rosenstein, Larry; Schoichet,Sandor (February 1981) Etude: An Integrated Document ProcessingSystem. Cambridge, Massachusetts: M.I.T. Laboratory for ComputerScience. Office Automation Group Memo OAM-028.

Ilson, Richard; Good, Michael (March 1981) Etude: An InteractiveEditor and Formatter. Cambridge, Massachusetts: M.I.T. Laboratory forComputer Science. Office Automation Group Memo OAM-029.

Jensen, Kathleen & Wirth, Nikalus (1974) Pascal User Manual andReport. New York: Springer-Verlag. ISBN 0-387-90144-2.

Kemeny, John G. & Kurtz, Thomas E. (1985) Back to Basic. Reading,Massachusetts: Addison-Wesley. ISBN 0-201-13433-0.

Kernighan, Brian W. & Ritchie, Dennis M. (1978) The C ProgrammingLanguage. Englewood Cliffs, New Jersey: Prentice-Hall. ISBN0-13-110163-3.

*Knuth, Donald E. (1971) An Empirical Study of Fortran Programs.Software Practise and Experience, vol 1, April/May, p 105-133.

Miller, Webb (1987) A Software Tools Sampler. Englewood Cliffs,New Jersey: Prentice-Hall. ISBN 0-13-822305-X.

Myers, Eugene W. (December 1986) A Simple Row-Replacement Method.Tucson, Arizona: Department of Computer Science, University ofArizona. Technical Report TR 86-28.

*Norman, Donald A. (1990) The Design of Everyday Things. New York:Doubleday. ISBN 0-385-26774-6.

Oman, Paul W. & Cook, Curtis R. (1990) Typographic Style is Morethan Cosmetic. Communications of the ACM, vol. 33 #5, January, p506.

Phelps, Hermann (1982) The Craft of Log Building. (Translation ofHolzbaukunst : der Blockbau.) Ottawa, Ontario: Lee Valley Tools.ISBN 0-9691019-2-9 (bound), 0-9691019-1-0 (pbk).

Qiao, Jinan; Qiao, Yizheng; Qiao, Sanzheng. (1990) Six-Digit CodingMethod. Communications of the ACM, vol. 33 #5, January, p 491.

Quarterman, John S. (1989) The Matrix: Computer Networks andConferencing Systems Worldwide. Bedford, Massachusetts: DigitalPress. ISBN 1-55558-033-5.

Reid, Brian K. & Walker, Janet H. (1980) Scribe Introductory User'sManual. Pittsburgh, Pennsylvania: Unilogic Ltd.

Stallman, Richard (1987) GNU Emacs Manual. Cambridge,Massachusetts: Free Software Foundation. Sixth edition, version18.

Tayli, Murat & Al-Salamah, Abdulla I. (1990) Building BilingualMicrocomputer Systems. Communications of the ACM, vol. 33 #5,January, p 495.

Thorell, L.G. & Smith, W.J. (1990) Using Computer ColorEffectively, An Illustrated Reference. Englewood Cliffs, New Jersey:Prentice-Hall. ISBN 0-13-939878-3.

*Tufte, Edward R. (1990) Envisioning Information. Cheshire,Connecticut: Graphics Press.

ibid. (1983) The Visual Display of Quantitative Information.Cheshire, Connecticut: Graphics Press.

The USENET News groupsComp.editors,Comp.emacs, andGnu.emacs carry editor-related material.

2 Thesis

This bibliography includes many different types of documents. Someof the documents are user manuals for various editors. Others of themdescribe the implementation of specific editors. Still others discusslanguage tradeoffs or input/output system interfaces.

They are grouped by the type of editor that they refer to. Eachentry is annotated to help place it in perspective.

2.1 Emacs-Type Editors

There are four principal implementations of Emacs-type editors, andthere are enough documents to justify their separate listing.

2.1.1 ITS EMACS

Ciccarelli, Eugene (1978) An Introduction to the EmacsEditor. Cambridge, Massachusetts: MIT Artificial IntelligenceLaboratory, MIT AI Lab Memo #447, January 1978. -- A primer on theeditor's user interface.

*Stallman, Richard M. (1979) Emacs: The Extensible, Customizable,Self-Documenting, Display Editor. Cambridge, Massachusetts: MITArtificial Intelligence Laboratory, AI Memo #519, June 1979. --Provides arguments for the Emacs philosophy.

ibid. (1978) Structured Editing with a Lisp. ComputingSurveys, vol 10 #4, December, p 505. -- This is a response to theSanderwall paper (referenced later).

On-line Documentation:

MIT-AI: .TECO.; TECORD > -- A more detailed command list for TECO
MIT-AI: .TECO.; TECO PRIMER -- A primer for TECO
MIT-AI: EMACS; EMACS CHART -- A four-page command list for Emacs
MIT-AI: EMACS; EMACS GUIDE -- A detailed user interface manual
MIT-AI: EMACS; EMACS ORDER -- A more detailed command list for Emacs

2.1.2 Lisp Machine Zwei

*Weinreb, Daniel L. & Moon, David (January 1979) The Lisp MachineManual.

Cambridge, Massachusetts: MIT Artificial IntelligenceLaboratory. -- The user interface for Zwei.

ibid. (January 1979) A Real-Time Display-Oriented Editor forthe Lisp Machine. Cambridge, Massachusetts: S.B. Thesis, MITElectrical Engineering and Computer Science Department. -- How Zweiworks internally.

2.1.3 Multics Emacs

Greenberg, Bernard S. (in publication in 1980) Emacs ExtensionWriter's Guide. Honeywell Information Systems, Inc., order #CJ52. --How to write extensions.

ibid. (December 1979) Emacs Text Editor User'sGuide. Honeywell Information Systems, Inc., order #CH27. -- The userinterface.

*ibid. (March 1980) Multics Emacs: An Experiment in ComputerInteraction. Honeywell Information Systems, Proceedings, Fourth AnnualHoneywell Software Conference. -- A summary of MEPAP (referencedbelow, also, MIT-AI: BSG; NMEPAP >)

ibid. (April 1978) Real-Time Editing on Multics. Cambridge,Massachusetts: Honeywell Information Systems, Inc., Multics TechnicalBulletin #373

ibid., On-Line Documentation:

MIT-AI: BSG; LMEPAP > -- Why Lisp was chosen for the implementation language
*MIT-AI: BSG; MEPAP > -- A detailed history of Emacs in general and the Multics implementation in specific. Very valuable.
MIT-AI: BSG; R4V > -- A proposal for a terminal independent video terminal support package.
MIT-AI: BSG; TTYWIN > -- A look at the good and bad features of video terminals.

2.1.4 MagicSix TVMacs

*Anderson, Owen Ted (January 1979) The Design and Implementation ofa Display-Oriented Editor Writing System. Cambridge, Massachusetts:S.B. Thesis, MIT Physics Department. -- How TVMacs works internally.It concentrates on describing not the editor itself but rather theimplementations language: SINE.

Linhart, Jason T. (June 1980) Dynamic Multi-Window TerminalManagement for the MagicSix Operating System. Cambridge,Massachusetts: S.B. Thesis, MIT Electrical Engineering and ComputerScience Department. -- A video terminal management system. Containsmany useful comments on terminal independence and redisplayproblems.

2.1.5 Other Emacs-Type Text Editors

This section covers editors which have the same general userinterface as an Emacs (e.g., screen-oriented, similar keybindings) but are not extensible or otherwise fall noticeably short ofthe Emacs philosophy.

Finseth, Craig A. (August 1979) VINE Primer. Dallas, Texas: TexasInstruments, Inc., Central Research Laboratories, Systems andInformation Sciences Laboratory. -- User interface manual for thecomplete novice.

Schiller, Jeffrey I. (June 1979) TORES: the Text ORiented EditingSystem Cambridge, Massachusetts: revised from S.B. Thesis, MITElectrical Engineering and Computer Science Department.

On-Line Documentation:

Kazar, Mike. User manual for FINE, running at Carnegie-Mellon University. At CMU-10A: fine.{mss prt}[s200mk50]

2.2 Non-Emacs Display Editors

Bilofsky, Walter (December 1977) The CRT Text Editor NED --Introduction and Reference Manual. Rand Corporation, R-2176-ARPA.

Irons, E. T. & Djorup, F. M. (1972) A CRT Editing System.Communications of the ACM, vol. 15 #1, January, p 16.

Joy, William (April 1979) Ex Reference Manual; Version2.0. Berkeley, California; Computer Science Division, Dept ofElectrical Engineering and Computer Science, University of Californiaat Berkeley.

ibid. (April 1979) An Introduction to Display Editing Withvi. Berkeley, California: Computer Science Division, Dept ofElectrical Engineering and Computer Science, University of Californiaat Berkeley.

Kanerva, Pentti (1973) TVGUID: a User's Guide to TEC/DATAMEDIATV-Edit. Palo Alto, California: Stanford University, Institute forMathematical Studies in the Social Sciences.

Kelly, Jeanne (July 1977) A Guide to NED: a New On-Line ComputerEditor. The Rand Corporation, R-2000-ARPA.

Kernighan, Brian W. (1978) A Tutorial Introduction to the ED TextEditor. Murray Hill, New Jersey: Bell Laboratories TechnicalReport.

MacLeod, I. A. (November 1977) Design and Implementation of aDisplay-Oriented Text Editor. Software Practice and Experience, vol. 7#6, November, p 771.

Weiner, P., et. al. (April 1973) The Yale Editor "E": a CRT-BasedEditing System. Yale Computer Science Research Report 19

Seybold, Patricia B. (October 1978) TYMSHARE's AUGMENT -- Heraldinga New Era. The Seybold Report on Word Processing, vol. 1 #9. ISSN:0160-9572, Seybold Publications, Inc., Box 644, Media, Pennsylvania19063

On-Line Documentation:

SAIL: E.ALS[UP,DOC] -- User manual again. Stanford University.

2.3 Structure Editors

Ackland, Gillian M., et al (?) UCSD Pascal Version 1.5 (ReferenceManual). San Diego, California: Institute for Information Systems,University of California at San Diego.

Donzeau-Gouge, V.; Huet, G.; Kahn, G.; Lang, B.; & Levy,J.J. (April 1975) A Structure Oriented Program Editor: a First StepTowards Computer Assisted Programming. Paris: IRIA, Res. Rep. 114.

Teitelbaum, R. T. (?) The Cornell Program Synthesizer: aMicrocomputer Implementation of PL/CS. Ithaca, New York: Department ofComputer Science, Cornell University, Technical Report TR 79-370,.

2.4 Other Editors

Benjamin, Arthur J. (August 1972) An Extensible Editor for a SmallMachine With Disk Storage. Communications of the ACM, vol. 15 #8 p742. -- Talks about an editor for the IBM 1130 written in Fortran. Notextensible at all.

Bourne, S. R. (January 1971) A Design for a Text Editor. SoftwarePractice and Experience, vol 1 p 73. -- User manual.

Cecil, Moll & Rinde (March 1977) TRIX AC: a Set of General PurposeText Editing Commands. Lawrence Livermore Laboratory UCID 30040.

Deutsch, L. Peter & Lampson, Butler W. (1967) An On-lineEditor. Communications of the ACM, vol 10 #12, December, p 793. -- QEDuser manual.

Fraser, Christopher W. (1970) A Compact, Portable CRT-BasedEditor. Software Practice and Experience, vol. 9 #2, February, p121. -- Front end to a line editor.

ibid. (1980) A Generalized Text Editor. Communications ofthe ACM, vol. 23 #3, March, p 154. -- Applying text editors tonon-text objects,

Hansen, W. J. (June 1971) Creation of Hierarchic Text With aComputer Display. Palo Alto, California: Ph.D. Thesis, StanfordUniversity.

Kai, Joyce Moore (July 1974) A Text Editor Design. Urbana,Illinois: Department of Computer Science, University of Illinois atUrbana-Champaign. -- Describes both internals and externals on theeditor. However, the design is a poor one.

Kernighan, Brian W. & Plauger, P. J. (1976) SoftwareTools. Reading, Massachusetts: Addison-Wesley. -- This book has achapter which leads you by the hand in implementing a simple lineeditor in RatFor.

*Roberts, Teresa L. (November 1979) Evaluation of Computer TextEditors. Systems Sciences Laboratory, Xerox PARC. -- A comparativeevaluation of four text editors. Quite well done. Unfortunately, itdoes not include Emacs (it uses DEC TECO instead).

Sanderwall, Erik (1978) Programming in the Interactive Environment:the Lisp Experience. Computing Surveys, vol. 10 #1, March, p 35. --Talks about the editor for InterLisp.

Sneeringer, James (1978) User-Interface Design for Text Editing: aCase Study. Software Practice and Experience, vol 8, p 543. -- Usermanual and a discussion of user interface concepts.

Teitelman, Warren (October 1978) InterLisp Reference Manual. PaloAlto, California: Xerox Palo Alto Research Center. -- How to use theInterLisp (non-display) structure editor.

van Dam, Andries & Rice, David E. (1971) On-line Text Editing: aSurvey. Computing Surveys, vol #3, September, p 93. -- Contains ageneral introduction to the problems of text editing. Out-datedtechnology, however.

Back to Contents.

Book Index

This is the index to the book and the numbers, of course, reflectthe page numbers in the book. How quaint.

$	180
/etc/termcap	26

Ada	41
add_proc	58
advanced
algorithm	102
display	22, 89
after the point	55
again	123
allocation	68, 78
altmode	180
amount of experience	11
Annex	34
ANSI	87
APL	39
Apple	26, 28, 47, 139
approaches to redisplay	96
Argument	108
arguments	112
ASCII	202
asynchronous communications	31
attributes	52, 95
auto-repeat	24
availability	36

Back Space	4
backward from the point	55
Basic	41
basic
display	22
redisplay algorithm	100
users	12
Beep	88
before the point	55
between	55
biases	52
binary files	49, 147
binding	109, 115
breaking out of redisplay	95
buffer	54, 56
gap	68, 72
management	65, 72
BUFFERNAMEMAX	152
buffer_chain	56
Buffer_Clear	59
Buffer_Create	59
Buffer_Delete	59
Buffer_End	60
Buffer_Get_Name	59
Buffer_Insert	62
buffer_name	57
Buffer_Read	62
Buffer_Set_Current	59
Buffer_Set_Name	59
Buffer_Set_Next	59
Buffer_Start	60
Buffer_Write	62
button press	18
byte	56

C	39, 150
capitalization	143
Caps Lock	25
card images	47
cards, baseball	13
caret notation	49, 94, 202
categories of users	11
center tabs	93
changing your mind	119
character
definition	56
format	31
set	48
chunking	80
Clear_Line	88
Clear_Screen	88
CLEOL	88
CLEOS	88
clipboard	120
command
set design	125
shell	146
user-oriented	106
Command_Procedure	107
communications path	31
Compare_Locations	60
compilation	146
compiler	134
completion	115
considerations	36, 83, 90
consistency	126
contents, of line	48, 57
Control	172
control characters	93
constraints
physiological	14
redisplay	82
Copy_Region	63
core loop	106
counts	48
Count_To_Location	60
CP/M	47, 49, 73
crash recovery	74, 75
Ctrl_X_Dispatch	108
current_buffer	56
curses	87
cursor, left edge of	55
cur_line	57
custom editor languages	41
customers	11

data structures	56
debugging	146
DEC	47
decimal tabs	93
decomposition	54
defun	173
delay	70
Delayed_Display	108
Delete	4, 63
delete line	22
Delete_Chars	89
Delete_Lines	89
Delete_Region	63
deleting words	138
design	119, 125
dialog box	18
difference files	80
dispatch	107
display	21, 84
display independent procedures	86
Dvorak keyboard	27
dynamic linking	116

echo negotiation	33
editor procedures	84
efficiency	38
of editing	75
of input/output	76
of searching	77
electronic mail	146
Emacs	39
Emacs-type	4, 16, 41, 109, 122, 128, 145, 156, 172
empty lines	48
emulation	146
end
of file	49
of buffer	55, 91
error
checking	126
handling	110
messages	131
ETX/ACK	35
Evaluate	107
exiting	111
experience
amount of	11
type of	13
extended character sets	50
extensibility	37, 128, 147
external errors	111
extra shift keys	26
extra space	67
extremely large files	79
eyes	15

Fatal	155
fclose	155
fgets	155
file
formats	47
interface	147
name	57
FILENAMEMAX	152
file_time	57
FinalWord	74
Find_First_In_Backward	64
Find_First_In_Forward	64
Find_First_Not_In_Backward	64
Find_First_Not_In_Forward	64
fixed marks	55
FLAG	151
flow control	32
fonts	52, 95
fopen	155
forest	9
format, character	31
formats	52
Fortran	40
forward from the point	55
fragmentation	68, 78
framer	99
free	155
function keys	25

gap	68, 72
Get_Attr	88
Get_Char	62
Get_Column	65, 88
Get_File_Name	62
Get_Line	2
Get_Modified	62
Get_Num_Chars	62
Get_Num_Lines	62
Get_Point_Col	85
Get_Point_Row	85
Get_Row	88
Get_String	62
Get_Window_Bot	86
Get_Window_Bot_Line	86
Get_Window_Top	86
Get_Window_Top_Line	86
glass TTY	21
GNU-Emacs	147, 173
goals, user	14
grahical input	18, 29
graphics display	23
guidelines	18, 19, 131

handicaps	19
hands	14
hardware	21
hidden second gap	71
horizontal scrolling	91

IBM PC	23, 26, 28, 35
image, card and print	47
implementation
languages	36
methods	65, 71
implementations	156
in-band	32, 50
incremental	115
redisplay	82
search	142
input/output	76, 87
insert	6, 54
insert line	22
Insert_A_Character	08
Insert_Char	63
Insert_Lines	89
Insert_String	63, 89
interface	147
internal
editor	54
errors	110
internationalization	52
interrupting redisplay	95
isprint	155
Is_A_Match	64
Is_File_Changed	62
is_fixed	57
is_modified	57
Is_Point_After_Mark	60
Is_Point_At_Mark	60
Is_Point_Before_Mark	60
ITS	180

Jabberwocky	149
job control	32
joystick	30

kerning	95
keyboards	23
keyboard procedures	87
key placement	27
keystroke recording	124
Key_Fini	87
Key_Function_Keys	87
Key_Get	87
Key_Init	87
Key_Is_Input	87
kill	119

languages, implementation	36
language	130
lap-top computer	81
large
files	79
project upport	38
layout of text	45
LEAP	34
left tabs	93
line
boundaries	47
contents	48
wrap	91
linked line	72
Lisp	39, 72, 173
list of lines	44
location	151
Location_To_Count	60
long lines	48
loop	106

Macintosh	see Apple
macros	123
mail	146
malloc	155
management	65, 72
mark	55, 57
marker
bytes	75
record	48
Mark_Create	60
Mark_Delete	60
Mark_Get	60
mark_list	57
Mark_Set	60
Mark_To_Point	60
meaning of text	45
memmove	155
memory	74
memory management	65, 72
memory-mapped display	23, 105
memset	155
messages	130
meta	27, 172, 202
methods	65, 71
mind	15
Mince	73
model
editing	43
user's	11
modems	34
modes	56, 58, 114, 117, 129, 134
Mode_Append	63
Mode_Delete	63
Mode_Invoke	63
mode_list	57
modification flag	101
Modula	40
mouse	30
mouse ahead	18
Move_By_Character	107
moving	64, 135
MS/DOS	47, 49
Multics	33, 71
multiple
buffers	77
gaps	70
windows	96

n-key rollover	24
name	57
neophyte users	11
NEWLINE	152
newline	47, 55
next_chain_entry	57
next_mark	57
next_mode	58
NL	152
node_name	58
no management	66
non-printing characters	48
non-text files	see binary files
normal marks	55
novice users	12
NUL	152
NULL	152
numeric arguments	112
num_chars	57
num_lines	57

object models	45
objects	52
one-dimensional array	43, 55
out-of-band	32, 51
output	87

packaging, keyboard	25
padding	32
page breaks	136
paged
buffer gap	72, 73
model	44
virtual memory	78
paging	136
paragraphs	140
parsing	51
partial lines	48
Pascal	40
pen	31
permissiveness	126
philosophy	109, 125
physiological constraints	14
piano	129
PL/1	40
placement, key	27
point	55, 57
Point_Get	60
Point_Get_Line	60
Point_Move	60
Point_Set	60
Point_To_Mark	60
positional arguments	115
power users	12
prefix arguments	112
print images	47
printf	155
private	151
procedures	58-65, 84-89
programmer-level users	13
progress	127
prompts	113, 130
proportionally spaced text	94
Put_Char	88
Put_String	88

quality	36
quote	113
QWERTY	27

rat	30
raw	87
read	54
real text	45
rebinding	115
Recenter	85
record markers	48
recording	124
recovery	74, 75
Redisplay	85
redisplay	51, 82
algorithms	99
Redo	123
redo	122
Refresh_Screen	85
region	55, 61, 115
regular expressions	143
"religion"	13
repeat	24, 112
replace	6
Replace_Char	63
Replace_String	63
responsiveness	125
right tabs	93
rollover	24
ruler lines	92

S-exp	173
screen
definition	84
procedures	87
Screen_Attributes	87
Screen_Columns	87
Screen_Fini	87
Screen_Init	87
Screen_Rows	87
Screen_Timings	89
scripts	95
scroll window	22
scrolling	136
Scroll_Lines	89
searching	77, 141
Search_Backward	64
Search_Forward	64
second
gap	71
level dispatch	108
system effect	14
selection arguments	115
sentences	140
serial
chunking	80
communications	31
Set_Attr	88
Set_Column	65
Set_Column	88
Set_File_Name	62
Set_Modified	62
Set_Pref_Pct	85
Set_Row	88
shell	146
shift keys	26
Shift Lock	25
short lines	48
simplicity	127
Sine	41
SNOBOL	42
SP	152
special function keys	25
speed	31, 83
spinal cord	16
standard system text files	47
start of the buffer	55
state save	56, 58
status	151
status line	90
storage	74
strcpy	155
string arguments	113
strlen	155
structure editors	132
structured files	50
sub-editor	54, 101, 101
suffix arguments	113
Sun workstations	23, 29
suspend process	32
Swap_Point_And_Mark	60
system text files	47

TAB	152
tablet	30
tabs	93
TECO	39, 72, 80, 81, 145, 180
terminfo	26
text
files	45, 47
handling	37
structure of	45
three-file sysstem	80
time	151
toaster	59
top-level	106
Tops-20	180
touch sensitive display	29
trackball	30
trees	9
TTY	21
twiddling	143
two-dimensional array	43
"typeability"	24
type of experience	13
typing aids	133
typos	143

Undo	122
undo	120
UNICODE	50
unicorn	98
uniformity	128
unique identifier	101
universal argument	112
UNIX	47
Unix stream	22
up/down	135
upper-case	130
user
categories	11
goals	14
user-oriented commands	106

vi	109
virtual memory	78
visible effect	125
VT100	22, 28, 87
VT200	28
VT52	22

whale	83
where_it_is	57
whirlpool	36
window	84, 135
window mark	101
Window_Create	86
Window_Destroy	86
Window_Fini	84, 87
Window_Grow	86
Window_Init	84, 87
Window_Load	85
Window_Save	85
words	65, 136
word wrap	92
world	56
World_Fini	58
World_Init	58
World_Load	58
World_Save	58
World_Save	85
wrap	91
write	54
WYSIWYG	15

xiswhite	155
XON/XOFF	32
xstrcpy	155
Xylogics	34

zero-length lines	48

^	94, 202

I amCraig A. Finseth.

Back to Home.

[8]ページ先頭

Movatterモバイル変換

The Craft of Text Editing

--or--Emacs for the Modern World

-by-Craig A. Finseth

Quick Contents:

Credits

Trademarks

Table of Contents

Preface

Questions to Probe Your Understanding

Acknowledgements

Introduction: What Is Text Editing All About?

1 The Basic Get_Line

1.1 Version One

1.2 Version Two

1.3 Version Three

1.4 Version Four

2 The Forest

Questions to Probe Your Understanding

One: Users

1.1 User Categories

1.1.1 Amount of Experience

1.1.2 Type of Experience

1.2 "Religion"

1.3 User Goals

1.4 Physiological Constraints

1.5 Applying These Physiological Constraints

1.6 Users Who Have Handicaps

Questions to Probe Your Understanding

Two: User Interface Hardware

2.1 Display Types

2.1.1 TTY and Glass TTY

2.1.2 Basic Displays

2.1.3 Advanced Displays

2.1.4 "Memory Mapped" Displays

2.1.5 Graphics Displays

2.2 Keyboards

2.2.1 Special Function Keys

2.2.2 Extra Shift Keys

2.2.3 Key Placement

2.2.4 Example Keyboards

2.3 Graphical Input

2.3.1 Touch Sensitive Display

2.3.2 Tablet

2.3.3 Mouse

2.3.4 Trackball

2.3.5 Joystick

2.3.6 A Different Mouse

2.3.7 Other Devices

2.3.8 Conclusion

2.4 Communications Path Issues

2.4.1 Speed and Character Format

2.4.2 Flow Control

2.4.3 Echo Negotiation

2.4.4 Fancy Modems

Questions to Probe Your Understanding

Three: Implementation Languages

3.1 General Considerations

3.1.1 Availability and Implementation Quality

3.1.2 Text Handling Power

3.1.3 Support for Extensibility

3.1.4 Large Project Support

3.1.5 Efficiency

3.2 Specific Language Notes

3.2.1 TECO

3.2.2 Lisp

3.2.3 C

3.2.4 PL/1

3.2.5 Other Systems Languages

3.2.6 Fortran

3.2.7 Pascal

3.2.8 Basic

3.2.9 Ada

3.2.10 Sine

3.2.11 Custom Editor Languages

Questions to Probe Your Understanding

Four: Editing Models

4.1 One-Dimensional Array of Bytes

4.2 Two-Dimensional Array of Bytes

4.3 List of Lines

--or--
Emacs for the Modern World

-by-
Craig A. Finseth