Movatterモバイル変換

[0]ホーム

Jump to content

Pipeline (Unix)

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromPipe (Unix))

Mechanism for inter-process communication using message passing

This article is about the original implementation for shells. For software pipelines in general, seePipeline (software).

A pipeline of three program processes run on a text terminal

InUnix-like computeroperating systems, apipeline is a mechanism forinter-process communication usingmessage passing. A pipeline is a set ofprocesses chained together by theirstandard streams, so that the output text of each process (stdout) is passed directly as input (stdin) to the next one. The second process is started as the first process is still executing, and they are executedconcurrently.

The concept of pipelines was championed byDouglas McIlroy atUnix's ancestral home ofBell Labs, during the development of Unix, shaping itstoolbox philosophy. It is named by analogy to a physicalpipeline. A key feature of these pipelines is their "hiding of internals". This in turn allows for more clarity and simplicity in the system.

Thepipes in the pipeline areanonymous pipes (as opposed tonamed pipes), where data written by one process is buffered by the operating system until it is read by the next process, and this uni-directional channel disappears when the processes are completed. The standardshell syntax foranonymous pipes is to list multiple commands, separated by vertical bars ("pipes" in common Unix verbiage).

History

[edit]

The pipeline concept was invented byDouglas McIlroy^[1] and first described in theman pages ofVersion 3 Unix.^[2]^[3] McIlroy noticed that much of the timecommand shells passed the output file from one program as input to another. The concept of pipelines was championed byDouglas McIlroy atUnix's ancestral home ofBell Labs, during the development of Unix, shaping itstoolbox philosophy.^[4]^[5]

His ideas were implemented in 1973 when ("in one feverish night", wrote McIlroy)Ken Thompson added thepipe() system call and pipes to theshell and several utilities in Version 3 Unix. "The next day", McIlroy continued, "saw an unforgettable orgy of one-liners as everybody joined in the excitement of plumbing." McIlroy also credits Thompson with the| notation, which greatly simplified the description of pipe syntax inVersion 4.^[6]^[2]

Although developed independently, Unix pipes are related to, and were preceded by, the 'communication files' developed by Ken Lochner^[7] in the 1960s for theDartmouth Time-Sharing System.^[8]

Other operating systems

[edit]

Main article:Pipeline (software)

This feature ofUnix was borrowed by other operating systems, such asMS-DOS and theCMS Pipelines package onVM/CMS andMVS, and eventually came to be designated thepipes and filters design pattern ofsoftware engineering.

Further concept development

[edit]

InTony Hoare's communicating sequential processes (CSP), McIlroy's pipes are further developed.^[9]

Implementation

[edit]

A pipeline mechanism is used forinter-process communication using message passing. A pipeline is a set ofprocesses chained together by theirstandard streams, so that the output text of each process (stdout) is passed directly as input (stdin) to the next one. The second process is started as the first process is still executing, and they are executedconcurrently. It is named by analogy to a physicalpipeline. A key feature of these pipelines is their "hiding of internals".^[10] This in turn allows for more clarity and simplicity in the system.

In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected, and managed by thescheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept ofbuffering: for example a sending program may produce 5000bytes persecond, and a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in the buffer. When the receiving program is ready to read data, the next program in the pipeline reads from the buffer. If the buffer is filled, the sending program is stopped (blocked) until at least some data is removed from the buffer by the receiver. In Linux, the size of the buffer is 16pages, equivalent to 65,536 bytes (64 KiB) on most systems.^[11] An open source third-party filter calledbfr is available to provide larger buffers if required.

Network pipes

[edit]

Tools likenetcat andsocat can connect pipes to TCP/IPsockets.

Pipelines in command line interfaces

[edit]

All widely used Unix shells have a special syntax construct for the creation of pipelines. In all usage one writes the commands in sequence, separated by theASCII vertical bar character| (which, for this reason, is often called "pipe character"). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount ofbuffer storage).

The pipeline usesanonymous pipes. For anonymous pipes, data written by one process is buffered by the operating system until it is read by the next process, and this uni-directional channel disappears when the processes are completed; this differs fromnamed pipes, where messages are passed to or from a pipe that is named by making it a file, and remains after the processes are completed. The standardshell syntax foranonymous pipes is to list multiple commands, separated byvertical bars ("pipes" in common Unix verbiage):

command1|command2|command3

For example, to list files in the current directory (ls), retain only the lines ofls output containing the string"key" (grep), and view the result in a scrolling page (less), a user types the following into the command line of a terminal:

ls-l|grepkey|less

The commandls -l is executed as a process, the output (stdout) of which is piped to the input (stdin) of the process forgrep key; and likewise for the process forless. Eachprocess takes input from the previous process and produces output for the next process viastandard streams. Each| tells the shell to connect the standard output of the command on the left to the standard input of the command on the right by aninter-process communication mechanism called an(anonymous) pipe, implemented in the operating system. Pipes are unidirectional; data flows through the pipeline from left to right.

Example

[edit]

Below is an example of a pipeline that implements a kind ofspell checker for theweb resource indicated by aURL. An explanation of what it does follows.

curl'https://en.wikipedia.org/wiki/Pipeline_(Unix)'|sed's/[^a-zA-Z ]/ /g'|tr'A-Z ''a-z\n'|grep'[a-z]'|sort-u|comm-23-<(sort/usr/share/dict/words)|less

curl obtains theHTML contents of a web page (could usewget on some systems).
sed replaces all characters (from the web page's content) that are not spaces or letters, with spaces. (Newlines are preserved.)
tr changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line).
grep includes only lines that contain at least one lowercasealphabetical character (removing any blank lines).
sort sorts the list of 'words' into alphabetical order, and the-u switch removes duplicates.
comm finds lines in common between two files,-23 suppresses lines unique to the second file, and those that are common to both, leaving only those that are found only in the first file named. The- in place of a filename causescomm to use its standard input (from the pipe line in this case).sort /usr/share/dict/words sorts the contents of thewords file alphabetically, ascomm expects, and<( ... ) outputs the results to a temporary file (viaprocess substitution), whichcomm reads. The result is a list of words (lines) that are not found in /usr/share/dict/words.
less allows the user to page through the results.

Error stream

[edit]

By default, thestandard error streams ("stderr") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to theconsole. However, many shells have additional syntax for changing this behavior. In thecsh shell, for instance, using|& instead of| signifies that the standard error stream should also be merged with the standard output and fed to the next process. TheBash shell can also merge standard error with|& since version 4.0^[12] or using2>&1, as well as redirect it to a different file.

Pipemill

[edit]

In the most commonly used simple pipelines the shell connects a series of sub-processes via pipes, and executes external commands within each sub-process. Thus the shell itself is doing no direct processing of the data flowing through the pipeline.

However, it's possible for the shell to perform processing directly, using a so-calledmill orpipemill (since awhile command is used to "mill" over the results from the initial command). This construct generally looks something like:

command|whileread-rvar1var2...;do# process each line, using variables as parsed into var1, var2, etc# (note that this may be a subshell: var1, var2 etc will not be available# after the while loop terminates; some shells, such as zsh and newer# versions of Korn shell, process the commands to the left of the pipe# operator in a subshell)done

Such pipemill may not perform as intended if the body of the loop includes commands, such ascat andssh, that read fromstdin:^[13] on the loop's first iteration, such a program (let's call itthe drain) will read the remaining output fromcommand, and the loop will then terminate (with results depending on the specifics of the drain). There are a couple of possible ways to avoid this behavior. First, some drains support an option to disable reading fromstdin (e.g.ssh -n). Alternatively, if the drain does notneed to read any input fromstdin to do something useful, it can be given< /dev/null as input.

As all components of a pipe are run in parallel, a shell typically forks a subprocess (a subshell) to handle its contents, making it impossible to propagate variable changes to the outside shell environment. To remedy this issue, the "pipemill" can instead be fed from ahere document containing acommand substitution, which waits for the pipeline to finish running before milling through the contents. Alternatively, anamed pipe or aprocess substitution can be used for parallel execution.GNU bash also has alastpipe option to disable forking for the last pipe component.^[14]

Creating pipelines programmatically

[edit]

Pipelines can be created under program control. The Unixpipe()system call asks the operating system to construct a newanonymous pipe object. This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end. The pipe ends appear to be normal, anonymousfile descriptors, except that they have no ability to seek.

To avoiddeadlock and exploit parallelism, the Unix process with one or more new pipes will then, generally, callfork() to create new processes. Each process will then close the end(s) of the pipe that it will not be using before producing or consuming any data. Alternatively, a process might create newthreads and use the pipe to communicate between them.

Named pipes may also be created usingmkfifo() ormknod() and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or withtee.

Popular culture

[edit]

The robot in the icon forApple'sAutomator, which also uses a pipeline concept to chain repetitive commands together, holds a pipe in homage to the original Unix concept.

In other languages

[edit]

Pipelines can be used inC++.C++20 introducesoperator| (the piping operator) and allowsLINQ-style chaining operations with thestd::ranges namespaces.std::views contains several classes which are invoked throughoperator().^[15]

usingstd::vector;usingstd::ranges::to;usingstd::views::filter;usingstd::views::transform;vector<int>numbers={1,2,3,4,5,6,7,8,9,10};// Pipeline: filter even numbers, double them, and then sum the resultvector<int>result=numbers|filter([](intn)->bool{returnn%2==0;})|transform([](intn)->int{returnn*2;})|to<vector>();