D E S C R I P T I O N
METHOD AND MEANS FOR TIME ZERO BACKUP COPYING OF DATA
Field of the Invention
This invention relates to maintaining continued availability of datasets in external storage to accessing computer systems (CPU) . More particularly, it relates to backup copying of records in external storage concurrent with a dramatically shortened suspension of CPU application execution occasioned by said copying.
Description of Related Art
A data processing system must be prepared to recover, not only from corruptions of stored data such as to noise bursts, software bugs, media defects, and write path errors, but from global events such as CPU power failure. The most common technique to ensure continued availability of data is to make one or more copies of CPU datasets and put them in a safe place. This "backup" process occurs within contexts of storage systems of increasing function.
Applications have executed on CPU's in either a batch (streamed) or interactive (transactional) mode. In batch mode, usually one application at a time executes without interruption. Interactive mode is characterized by interrupt driven multiplicity of applications or transactions.
Backup policies are policies of scheduling. They have a space and a time dimension exemplified by a range of datasets and by frequency of occurrence. A FULL backup imports copying the entire range whether updated or not. An INCREMENTAL backup copies only that portion of the dataset that has changed since the last backup (either full or incremental). The backup copy represents a consistent view of the data as of the time the copy or snap-shot was made.
The higher the backup frequency, the closer the backup copy mirrors the current copy of the data. Considering the large volumes of data, backing up is not a trivial maintenance operation. Thus, the opportunity cost of backing up can be high on a large multiprocessing, multiprogramming facility relative to other processing.
The Backup Window and Effect Upon Batch and Transactional Processing
When a CPU backs up data in a streamed or batch mode system, every process, task, or application listens. By this it is meant that processes supporting streamed or batch mode operations are suspended for the duration of the copying. The coined term for this event is "backup window". In contrast to batch mode, log based or transaction management applications are processed in the interactive mode. They practically eliminate the "backup window" by concurrently updating an on-line dataset and logging the change. However, the latter is a form of backup copying whose consistency is "fuzzy". That is, it is not a snapshot of the state of a dataset/database at a single point in time. Rather, a log is an event file requiring further processing against said database. The co-pending Wang et al application USSN: 07/385,647, filed July 25, 1989, entitled "A Computer Based Method For Dataset Copying Using an Incremental Backup Policy", (IBM Ref. SA9-89-043), illustrates backup in a batch mode system using a modified incremental policy. A modified incremental policy copies only hew data or data updates since the last backup. Significantly, applications are suspended during the copying.
As mentioned above, to establish a prior point of consistency in a log based system, it is necessary to "repeat history" by replaying the log from the last checkpoint over the datasets or database of interest. The distinction between batch mode and log based backup is that the backup copy is consistent and speaks as of the time of its last recordation, whereas the log and database require further processing in the event of fault in order to exhibit point in time consistency.
Gawlick et al, US Pat. 4,507,751, "Method and Apparatus for Logging Journal Data Using a Write Ahead Dataset", issued 3/26/1985 exemplifies a transaction management system where all transactions are recorded on a log on a write-ahead- dataset basis. In this patent, a unit of work is first recorded on the backup medium (log) and then written to its external storage address.
Sidefile Generation in Performing DASD Media Maintenance
The copending application, Anglin et al, "Method and Apparatus for Executing Critical Disk Access Commands", USSN 07/524,206, filed May 16, 1990, (IBM Ref. SA9-90-012), teaches performing media maintenance on selective portions of a tracked cyclically operable magnetic media concurrent with active access to other portions of the DASD media. Anglin's method requires the phased movement of customer data between a target track to an alternate track, diversion of all concurrent access requests to the alternate track or tracks, and completion of maintenance and copyback from the alternate to the target track.
Requests and interrupts occurring prior to executing track to track customer data movement result in the process restarting. Otherwise, requests and interrupts occurring during execution of the data movement view a DEVICE BUSY state. This causes a re-queuing of the requests etc.
Summary of the Invention
It is an object of this invention to devise a method and means for consistent backup copying of records to external storage, and that such copying be concurrent with a drastically shortened suspension of CPU application execution occasioned by said copying.
It is a related object to devise a backup copying method and means susceptible of supporting full, incremental, or mixed backup scheduling policies.
The above objects are satisfied by a method and means that rely upon mapping data to be copied onto the backup copy medium atomically and using sidefiles for buffering any data subset affected by a concurrent update. This allows updates to be concurrently written through to external storage while preserving both the consistency and copyback order. The method of the invention is implemented by backup copying designated datasets in a uniquely identified session. Each session includes session registration and initialization and concurrent copying of the state of the designated datasets as of a predetermined time (tO) while writing through all updates after tO to the external store. The method includes the steps of (1) writing sidefiles of the affected uncopied portion of the dataset, (2) updating the original data in place on said external store, and (3) copying the sidefiles to the medium in backup copy order.
Advantageously, the integrity of the copied dataset is maintained while the period of process suspension is nearly eliminated. Also unlike the aformentioned Anglin reference, the method and means of this invention is directed to a new use of sidefile generation. That is, the difference resides in the use of generation of sidefiles of the uncopied portion of a dataset where the sidefile use facilitates both backing up datasets in ordinary copy order and overlapping of backing up and updating.
Brief Description of the Drawing
Figs. 1 exhibits a typical multi-processing multi-programming environment according to the prior art where executing processes and applications randomly or sequentially access data from external storage.
Fig. 2 shows a timeline depiction of the backup window among batch or streaming processes according to the prior art. Fig. 3 depicts the near elimination of the backup window as a consequence of the method and means of the invention.
Fig. 4 sets forth a conceptual flow of the to backup copy method of the invention.
Figs. 5 and 6 represents the control flow at the external storage control and the CPU operating system levels respectively.
Description of the Preferred Embodiment
Illustrative CPU Environment for Executing the Method of the Invention
The invention can be conveniently practiced in a configuration in which each CPU in a system may be of an IBM/360 or 370 architected CPU type having as an example an IBM MVS operating system. An IBM/360 architected CPU is fully described in Amdahl et al, USP 3,400,371, "Data Processing System", issued on September 3, 1968. A configuration involving CPU's sharing access to external storage is set forth in Luiz et al, USP 4,207,609, "Path Independent Device Reservation and Reconnection in a Multi-CPU and Shared Device Access System", issued June 10, 1980.
An MVS operating system is also described in IBM publication GC28-1150, "MVS/Extended Architecture System Programming Library: System Macros and Facilities", Volume 1. Details of standard MVS or other operating system services such as local lock management, sub-system invocation by interrupt or monitor, and the posting and waiting of tasks is omitted. These OS services are believed well appreciated by those skilled in the art.
Path to Data, Batch and Interactive Modes, and Backup Copying
Referring now to figure 1, there is depicted a multi¬ processing, multiprogramming system according to the prior art. Such systems including a plurality of processors (1,3) accessing external storage (21,23,25,27,29) over redundant channel demand/response interfaces (5,7,9). As described in Luiz et al, a CPU process establishes a path to externally stored data in an IBM System 370 and the like through an MVS or other operating system by invoking a START I/O, transferring control to a channel subsystem which reserves a path to the data over which transfers are made. Typically, applications have data dependences and may briefly suspend operations until a fetch or update is completed. During the transfer, the path is locked until the transfer is completed.
Referring now to figure 2, there is shown a timeline depiction of the backup window among batch or streaming processes according to the prior art. That is, at a time just prior to backup, applications are suspended or shut down. The suspension persists until the backup process is completed. Backup termination signifies completion and commitment. By completion it is meant that all the data that was to have been copied was in fact read from the source. By commitment it is meant that all the data to be copied was in fact written to the output media. Separating Logical Completion from Physical Completion
Referring now to figure 3, there is depicted the near elimination of the backup window as a consequence of the method and means of the invention. Once the backup method of the invention (tO copy) process starts, the data (as far as the copy is concerned) is "frozen" at that point in time. At that point in time, the copy is said to be "Logically Complete". The committed state, or "Physically Complete" state will not occur until late .
At the "Logically Complete" point in time, the data is completely usable again by the applications. The time from when the tO backup is issued and the data being available again is in the low sub-second range. In other words, the total application data outage (backup window) can be measured in milliseconds.
Abnormal Termination
If the tO backup process abnormally terminates between the point of logical completion and the point of physical completion, then the backup copy is useless and the process needs to be restarted. In this respect the method and means of the invention is vulnerable in a manner similar to the prior art. That is, all backup must be rerun. One limitation is that is that the time criticality of the snapshot is lost.
Conceptual Aspects
Referring now to figures 4 and 5, there is set forth a conceptual flow of the method of the invention. It should be noted that each backup session is assigned a unique session identification (ID) and comprises an initialization and a backup processing component. While multiple backup sessions may be run concurrently, each session ID and whence "snapshot" is unique.
Each CPU includes an operating system having a storage manager component. Typically, an IBM System 370 type CPU running under the MVS operating system would include a storage manager of the data facilities data set services (DFDSS) type as described in Ferro et al, U.S.Pat. 4,855,907, issued Aug. 8, 1989, "Method for Moving VSAM Base Clusters While Maintaining Alternate Indices into the Cluster". DFDSS is also described in the IBM publication GC26-4388, "Data Facility Data Set Services: User's Guide", dated <mm/dd/yyyy>.
Data is logically organized into records and datasets. The real address of the data in external storage is in terms of DASDs volumes, tracks, and cylinders. The virtual address of the same is couched in terms of base addresses + offsets and /or extents.
A record may be of the count-key-data format. It may occupy one or more units of real storage. A dataset as a logical collection of multiple records may be stored on contiguous units of real storage or may be dispersed. It follows that if backup proceeds on the dataset level then, it is necessary to perform multiple sorts to form inverted indices into real storage.
For purposes of this invention, backup processing is managed at two levels, namely, at the CPU OS resource manager level (fig.l - 1, 3) and at the storage control unit level (fig.l - 21, 23). Initialization.
Referring again to figures 4 and 5, the initialization process comprises three broad steps responsive to a resource manager (e.g.DFDSS) receiving a request to copy or backup particular data. These steps include sorting datasets, building one or more bit maps, and signalling logical completion to an invoking process at the CPU. The listed or identified datasets are sorted according to the access path elements down to DASD track granularity. Next, bit maps are constructed which correlate the data set and the access path insofar as any one of them is included or excluded from a given copy session. Lastly, the resource manager signals logical completion meaning that updates will be processed against the dataset after only a short delay.
More particularly, the resource manager for storage (DFDSS) receives a request to copy or backup up data. Normally, this request is in the form of a list of data sets or a filtered list of data sets. DFDSS maps the request into a list of physical extents by DASD storage volume and by storage control unit (SCU). Next, DFDSS registers the request with each participating SCU. At this point, the session ID is determined and the session is established.
It should be appreciated that DFDSS initializes the session with each SCU by passing all the extents being copied for each volume for each SCU. Each SCU will then build a bitmap for each volume participating in the session. This bitmap will indicate which tracks are part of the tO copy session. Control is returned to DFDSS. This is the "Logically Complete" point at which the data is again available for use. DFDSS notifies the operating system component such as a scheduler in system managed storage accordingly.
Backup Processing
Following initialization, DFDSS begins reading the tracks requested. While the to copy session is active, each SCU monitors all updates. If an update is received, the SCU executes a predetermined algorithm which takes the update into account.
If the update is for a volume NOT in the tO session, then the update completes normally. On the other hand, if the update is on a volume that is part of the session, then the bitmap is checked to see if that track is protected. If the bit is off (assume this imports a binary 0) , it indicates the track is not currently in the copy session and the update completes normally. Significantly, if the track is protected (bit is on) it indicates the track is part of a the copy session and it has not as yet been read by DFDSS. In this case, the SCU
(1) Holds the update.
(2) Stages the track from the device into a separate cache partition (This track contains the data as it existed at the point in time the tO backup process started) .
(3) Allows the update to continue.
(4) If any tracks are contained in a separate cache partition, DFDSS promptly reads those tracks to minimize the effect of normal cache operations. Referring again to figure 4, the steps of the method are depicted. In this figure, the updates to tracks 4 and 7 cause the unchanged tracks to be staged into the separate cache partition prior to the update completing. DFDSS subsequently reads the tracks from the separate cache partition. Tracks read by DFDSS that are not yet ready to be merged onto the output media are temporarily stored in a host sidefile.
Attention processing is used to ensure that separate cache partitions do not consume inordinate amounts of cache. When an attention is surfaced to the host, the operating system notifies a DFDSS task which then empties the separate cache partition.
In figure 4, random application updates of the data copied by to copy process occur at "A". The original images of these tracks are copied into the separate cache partition. DFDSS reads unchanged tracks from the DASD device at "B". If any track has been changed after the tO process started, they are not returned to DFDSS. When tracks are moved at "C" into the separate cache partition as a result of updates, a threshold attention interrupt is surfaced to the host. These interrupts are serviced by the operating system. The operating system issues the appropriate command to the SCU to obtain the reason for the interrupt. If the interrupt is for a specific to process, that indication is passed onto DFDSS.
Once DFDSS receives the interrupt, it begins emptying the tracks that had accumulated in the separate cache partition. Any tracks read that are not yet ready to be placed onto the output media are considered "out of sequence" and are stored temporarily in a host-memory sidefile.
As a last measure, data is read directly from the DASD device and data stored in the host sidefile are ultimately merged onto the output media in the proper sequence at step "D".
Illustrative Example
Referring again to figures 4 and 5, assume that a process invoking the tO process desires to backup copy the datasets stored on 100 predetermined DASD tracks. If none of those tracks are changed during the copy process, DFDSS could simply read tracks 1-100 and place them on the output media. In order to permit concurrent updating of external store while backup copying, it must also be assumed that data stored on one or more of the predetermined tracks has a reasonable expectation of being altered.
Given that the process has already begun and that DFDSS has already copied tracks 1-20. This means it has yet to copy tracks 21-100. If an application or process tries to change track 7, that would be allowed to complete "as usual" since track 7 has already been copied. If, however, an attempt was made to change track 44, that change could not complete "as usual" since track 44 has not yet been copied. It is necessary to ensure that track 44 is preserved in its original state for the copy. So prior to updating an uncopied track, a temporary copy of track 44 is retained in a sidefile before the change is allowed to complete. This temporary copy of track 44 is located in a separate cache partition for subsequent retrieval by DFDSS. DFDSS retrieves this track and, at the proper time, places track 44 on the output media. The backup process causes DFDSS to obtain data stored on predetermined tracks from two sources:
(1) Tracks read directly from DASD. These are tracks that have not been changed (by an application) after the tO copy process began.
(2) Tracks read from the cache partition. These are the original images of tracks that have been changed after the tO process began.
Since one objective is to minimize the impact on normal cache operations, as soon as tracks are read into the separate cache partition then they are available to be read by DFDSS.
Detail Logic Flow of Backup Processing
Referring now to figures 5 and 6, there are shown several flow diagrams. Fig. 5 covers initialization and SCU backup processing while figure 6 depicts CPU OS processing of sidefiles (asynch processing) and CPU OS management of copy session data transfers (synchronous processing) from the SCU to the output medium. These presentations are supplemented in this section by more detailed flow of control listings for purposes of completeness. These listings are a many to one mapping to the flow diagrams depicted in figures 5 and 6.
Initialization Flow Listing
The initialization process starts with the CPU operating system (OS) receiving a request to backup or copy some amount of data. This request is processed according to the following logic:
1. BUILD LIST OF DATA SETS TO BE BACKED UP.
2. SORT LIST OF DATA SETS BY THE DASD VOLUMES THAT THEY RESIDE ON.
3. FIND OUT WHICH VOLUMES BELONG TO WHICH SCUs
4. NOTIFY EACH SCU IN THE SESSION AND ESTABLISH A SESSIONID UNIQUE ACROSS ALL SCUS
5. FOR EACH VOLUME ON EACH SCU, NOTIFY WHICH TRACKS ARE PART OF THE TO COPY SESSION.
A. THE SCU THEN BUILDS A BIT MAP FOR EACH VOLUME IN THE SESSION
B. IN THE BIT MAP, A "0" INDICATES THAT TRACK IS NOT PART OF THE TO COPY SESSION. A "1" INDICATES THAT CORRESPONDING TRACK IS PART OF THE TO COPY SESSION.
6. CPU OS RETURNS AN INDICATION TO THE INVOKING PROCESS THAT THE "LOGICAL COMPLETE" POINT HAS BEEN REACHED AND THAT THE APPLICATION IS FREE TO USE THE DATA AGAIN.
SCU Flow Listing
This phase includes two processes being performed simultaneously, one by the SCU and one by the CPU Operating System. SCU 1. FOR EVERY UPDATE THAT OCCURS, A CHECK IS MADE TO SEE IF THAT UPDATE IS FOR A VOLUME THAT CURRENTLY HAS A TO COPY SESSION.
2. IF THE ANSWER TO #1 IS NO, THE UPDATE COMPLETES NORMALLY.
3. IF THE ANSWER TO #1 IS YES, A CHECK IS MADE AGAINST THE CORRESPONDING BITMAP TO SEE IF THE UPDATE IS TO A TRACK THAT IS PART OF THE TO COPY SESSION.
4. IF THE ANSWER TO #3 IS NO, THE UPDATE COMPLETES NORMALLY.
5. IF THE ANSWER TO #3 IS YES, THE FOLLOWING STEPS TAKE PLACE:
A. THE UPDATE IS TEMPORARILY HELD
B. THE TRACK THAT IS ABOUT TO BE UPDATED IS COPIED INTO A SIDEFILE AREA IN THE SCU CACHE.
C. THE UPDATE IS ALLOWED TO COMPLETE
D.' THE BITMAP ENTRY FOR THAT TRACK IS TURNED OFF
INDICATING THAT THE TRACK IS NO LONGER PART OF THE TO COPY SESSION. FUTURE UPDATES, THEREFORE ARE NOT IMPACTED.
E. CHECK IF THE NUMBER OF TRACKS CURRENTLY CONTAINED IN THE SIDEFILE EXCEED A PREDEFINED THRESHOLD (1) IF IT DOES NOT EXCEED THE THRESHOLD, CONTINUE
(2) IF IT DOES EXCEED THE THRESHOLD, SURFACE AN ATTENTION TO THE CPU OS INDICATING THAT THE SIDEFILE MUST BE READ (EMPTIED) IMMEDIATELY.
6. ANY READS (FROM DASD) WHICH OCCUR ONLY FROM THE TO COPY PROCESS IN CPU OS RESULT IN THE FOLLOWING STEPS BEING TAKEN:
A. THE DATA TRACKS REQUESTED ARE TRANSFERRED TO THE CPU OS TO COPY PROCESS.
B. THE CORRESPONDING BIT IN THE BITMAP IS TURNED OFF INDICATING THAT THE TRACK IS NO LONGER PART OF THE TO COPY SESSION AS FAR AS THE SCU IS CONCERNED.
7. WHEN ALL THE BITS IN ALL THE BITMAPS IN A SCU (BELONGING TO A SINGLE SESSION) ARE TURNED OFF, THAT SESSION HAS ESSENTIALLY COMPLETED FOR THAT SCU.
CPU OS Flow Listing
The CPU OS flow consists of an asynchronous process and a synchronous process.
ASYNCHRONOUS PROCESS
1. LISTEN FOR AN ATTENTION (any "signal" sent from an SCU to the CPU OS indicative of the occurrence of a predefined event) . 2. WHEN ATTENTION OCCURS ON A SCU, START READING DATA FROM THE SCU SIDEFILE UNTIL THAT SIDEFILE IS EMPTY.
3. EACH TRACK READ FROM THE SIDEFILE IS AN "OUT OF SEQUENCE" TRACK AND IS STORED IN A HOST WORKFILE UNTIL IT IS READY TO BE PUT ONTO THE OUTPUT MEDIUM.
4. GOTO #1
SYNCHRONOUS PROCESS
Recall that the tO Copy process starts reading the data tracks in a designated order.
1. THE TO COPY PROCESS DETERMINES WHICH TRACKS ARE TO BE READ IN A SINGLE I/O REQUEST
2. THE HOST WORKFILE IS QUERIED TO SEE IF ANY OF THE TRACKS TO BE READ ARE ALREADY IN THE WORKFILE
A. IF THE ANSWER TO #2 IS NO, THE TRACK IS STILL ASSUMED TO EXIST ON THE DASD DEVICE IN AN UNCHANGED STATE
B. IF THE ANSWER TO #2 IS YES, THE READ COMMAND IS ALTERED SO AS TO AVOID READING A TRACK ALREADY READ. THAT IS, THE TRACK HAD BEEN PREVIOUSLY UPDATED AND THE ORIGINAL TRACK WAS STAGED INTO THE SIDEFILE AND SUBSEQUENTLY MOVED TO THE HOST WORKFILE 3. THE SESSION READ IS ISSUED FOR SOME NUMBER OF TRACKS
A. IF THE SCU INDICATES A SESSION READ WAS
ATTEMPTED ON A TRACK NOT CURRENTLY IN THE SESSION, CPU OS ASSUMES THE TRACK RESIDES IN THE SCU SIDEFILE OR THE HOST WORKFILE AND THE TRACK IS RECOVERED FROM THERE.
4. DATA OBTAINED FROM #3 IS WRITTEN ONTO THE OUTPUT MEDIUM AFTER BEING MERGED WITH ANY DATA TRACKS OBTAINED FROM STEP #2B
5. IF THERE ARE MORE TRACKS TO READ, GOTO #1
6. ELSE, WHEN ALL TRACKS HAVE BEEN READ AND WRITTEN TO THE OUTPUT MEDIUM:
A. TERMINATE THE SESSION WITH ALL PARTICIPATING SCUS
B. RETURN A "PHYSICAL COMPLETE" SIGNAL TO THE INVOKING PROCESS. THIS INDICATES THAT THE DATA TO BE BACKED UP HAS IN FACT BEEN WRITTEN TO THE OUTPUT MEDIUM
Extensions
Although the invention has been described within the context of an IBM MVS operating system, it may likewise be practiced within any commercially available general purpose operating system such as VM, OS2 and the like. Also, although DFDSS has been identified as an illustrative external storage resource manager, the invention is operable with any equivalent manager without undue experimentation by the ordinary skilled artisan.
These and other extensions of the invention may be made without departing from the spirit and scope thereof as recited in the appended claims.