The Linux Journalling API¶

Overview¶

Details¶

The journalling layer is easy to use. You need to first of all create ajournal_t data structure. There are two calls to do this dependent onhow you decide to allocate the physical media on which the journalresides. Thejbd2_journal_init_inode() call is for journals stored infilesystem inodes, or thejbd2_journal_init_dev() call can be usedfor journal stored on a raw device (in a continuous range of blocks). Ajournal_t is a typedef for astructpointer, so when you are finallyfinished make sure you calljbd2_journal_destroy() on it to free upany used kernel memory.

Once you have got your journal_t object you need to ‘mount’ or load thejournal file. The journalling layer expects the space for the journalwas already allocated and initialized properly by the userspace tools.When loading the journal you must calljbd2_journal_load() to processjournal contents. If the client file system detects the journal contentsdoes not need to be processed (or even need not have valid contents), itmay calljbd2_journal_wipe() to clear the journal contents beforecallingjbd2_journal_load().

Note that jbd2_journal_wipe(..,0) callsjbd2_journal_skip_recovery() for you if it detects any outstandingtransactions in the journal and similarlyjbd2_journal_load() willcalljbd2_journal_recover() if necessary. I would advise readingext4_load_journal() in fs/ext4/super.c for examples on this stage.

Now you can go ahead and start modifying the underlying filesystem.Almost.

You still need to actually journal your filesystem changes, this is doneby wrapping them into transactions. Additionally you also need to wrapthe modification of each of the buffers with calls to the journal layer,so it knows what the modifications you are actually making are. To dothis usejbd2_journal_start() which returns a transaction handle.

jbd2_journal_start() and its counterpartjbd2_journal_stop(),which indicates the end of a transaction are nestable calls, so you canreenter a transaction if necessary, but remember you must calljbd2_journal_stop() the same number of times asjbd2_journal_start() before the transaction is completed (or moreaccurately leaves the update phase). Ext4/VFS makes use of this feature tosimplify handling of inode dirtying, quota support, etc.

Inside each transaction you need to wrap the modifications to theindividual buffers (blocks). Before you start to modify a buffer youneed to calljbd2_journal_get_create_access() /jbd2_journal_get_write_access() /jbd2_journal_get_undo_access() as appropriate, this allows thejournalling layer to copy the unmodifieddata if it needs to. After all the buffer may be part of a previouslyuncommitted transaction. At this point you are at last ready to modify abuffer, and once you are have done so you need to calljbd2_journal_dirty_metadata(). Or if you’ve asked for access to abuffer you now know is now longer required to be pushed back on thedevice you can calljbd2_journal_forget() in much the same way as youmight have usedbforget() in the past.

Ajbd2_journal_flush() may be called at any time to commit andcheckpoint all your transactions.

Then at umount time , in yourput_super() you can then calljbd2_journal_destroy() to clean up your in-core journal object.

Unfortunately there a couple of ways the journal layer can cause adeadlock. The first thing to note is that each task can only have asingle outstanding transaction at any one time, remember nothing commitsuntil the outermostjbd2_journal_stop(). This means you must completethe transaction at the end of each file/inode/address etc. operation youperform, so that the journalling system isn’t re-entered on anotherjournal. Since transactions can’t be nested/batched across differingjournals, and another filesystem other than yours (say ext4) may bemodified in a later syscall.

The second case to bear in mind is thatjbd2_journal_start() can blockif there isn’t enough space in the journal for your transaction (basedon the passed nblocks param) - when it blocks it merely(!) needs to waitfor transactions to complete and be committed from other tasks, soessentially we are waiting forjbd2_journal_stop(). So to avoiddeadlocks you must treatjbd2_journal_start() /jbd2_journal_stop() as if they were semaphores and include them inyour semaphore ordering rules to preventdeadlocks. Note thatjbd2_journal_extend() has similar blockingbehaviour tojbd2_journal_start() so you can deadlock here just aseasily as onjbd2_journal_start().

Try to reserve the right number of blocks the first time. ;-). This willbe the maximum number of blocks you are going to touch in thistransaction. I advise having a look at at least ext4_jbd.h to see thebasis on which ext4 uses to make these decisions.

Another wriggle to watch out for is your on-disk block allocationstrategy. Why? Because, if you do a delete, you need to ensure youhaven’t reused any of the freed blocks until the transaction freeingthese blocks commits. If you reused these blocks and crash happens,there is no way to restore the contents of the reallocated blocks at theend of the last fully committed transaction. One simple way of doingthis is to mark blocks as free in internal in-memory block allocationstructures only after the transaction freeing them commits. Ext4 usesjournal commit callback for this purpose.

With journal commit callbacks you can ask the journalling layer to calla callback function when the transaction is finally committed to disk,so that you can do some of your own management. You ask the journallinglayer for calling the callback by simply settingjournal->j_commit_callback function pointer and that function iscalled after each transaction commit.

JBD2 also provides a way to block all transaction updates viajbd2_journal_lock_updates() /jbd2_journal_unlock_updates(). Ext4 uses this when it wants awindow with a clean and stable fs for a moment. E.g.

jbd2_journal_lock_updates() //stop new stuff happening..jbd2_journal_flush()        // checkpoint everything...do stuff on stable fsjbd2_journal_unlock_updates() // carry on with filesystem use.

The opportunities for abuse and DOS attacks with this should be obvious,if you allow unprivileged userspace to trigger codepaths containingthese calls.

Fast commits¶

JBD2 to also allows you to perform file-system specific delta commits known asfast commits. In order to use fast commits, you will need to set followingcallbacks that perform corresponding work:

journal->j_fc_cleanup_cb: Cleanup function called after every full commit andfast commit.

journal->j_fc_replay_cb: Replay function called for replay of fast commitblocks.

File system is free to perform fast commits as and when it wants as long as itgets permission from JBD2 to do so by calling the functionjbd2_fc_begin_commit(). Once a fast commit is done, the clientfile system should tell JBD2 about it by callingjbd2_fc_end_commit(). If the file system wants JBD2 to perform a fullcommit immediately after stopping the fast commit it can do so by callingjbd2_fc_end_commit_fallback(). This is useful if fast commit operationfails for some reason and the only way to guarantee consistency is for JBD2 toperform the full traditional commit.

JBD2 helper functions to manage fast commit buffers. File system can usejbd2_fc_get_buf() andjbd2_fc_wait_bufs() to allocateand wait on IO completion of fast commit buffers.

Currently, only Ext4 implements fast commits. For details of its implementationof fast commits, please refer to the top level comments infs/ext4/fast_commit.c.

Summary¶

Using the journal is a matter of wrapping the different context changes,being each mount, each modification (transaction) and each changedbuffer to tell the journalling layer about them.

Data Types¶

The journalling layer uses typedefs to ‘hide’ the concrete definitionsof the structures used. As a client of the JBD2 layer you can just relyon the using the pointer as a magic cookie of some sort. Obviously thehiding is not enforced as this is ‘C’.

Structures¶

typehandle_t¶: The handle_t type represents a single atomic update being performed by some process.

Description

All filesystem modifications made by the process gothrough this handle. Recursive operations (such as quota operations)are gathered into a single update.

The buffer credits field is used to account for journaled buffersbeing modified by the running process. To ensure that there isenough log space for all outstanding operations, we need to limit thenumber of outstanding buffers possible at any time. When theoperation completes, any buffer credits not used are credited back tothe transaction, so that at all times we know how many buffers theoutstanding updates on a transaction might possibly touch.

This is an opaque datatype.

typejournal_t¶: The journal_t maintains all of the journaling state information for a single filesystem.

Description

journal_t is linked to from the fs superblock structure.

We use the journal_t to keep track of all outstanding transactionactivity on the filesystem, and to manage the state of the logwriting process.

This is an opaque datatype.

structjbd2_inode¶: The jbd_inode type is the structure linking inodes in ordered mode present in a transaction so that we can sync them during commit.

Definition:

struct jbd2_inode {    transaction_t *i_transaction;    transaction_t *i_next_transaction;    struct list_head i_list;    struct inode *i_vfs_inode;    unsigned long i_flags;    loff_t i_dirty_start;    loff_t i_dirty_end;};

Members

i_transaction: Which transaction does this inode belong to? Either the runningtransaction or the committing one. [j_list_lock]
i_next_transaction: Pointer to the running transaction modifying inode’s data in casethere is already a committing transaction touching it. [j_list_lock]
i_list: List of inodes in the i_transaction [j_list_lock]
i_vfs_inode: VFS inode this inode belongs to [constant for lifetime of structure]
i_flags: Flags of inode [j_list_lock]
i_dirty_start: Offset in bytes where the dirty range for this inode starts.[j_list_lock]
i_dirty_end: Inclusive offset in bytes where the dirty range for this inodeends. [j_list_lock]

structjbd2_journal_handle¶: The jbd2_journal_handle type is the concrete type associated with handle_t.

Definition:

struct jbd2_journal_handle {    union {        transaction_t *h_transaction;        journal_t *h_journal;    };    handle_t *h_rsv_handle;    int h_total_credits;    int h_revoke_credits;    int h_revoke_credits_requested;    int h_ref;    int h_err;    unsigned int    h_sync:         1;    unsigned int    h_reserved:     1;    unsigned int    h_aborted:      1;    unsigned int    h_type:         8;    unsigned int    h_line_no:      16;    unsigned long           h_start_jiffies;    unsigned int            h_requested_credits;    unsigned int            saved_alloc_context;};

Members

{unnamed_union}: anonymous
h_transaction: Which compound transaction is this update a part of?
h_journal: Which journal handle belongs to - used iff h_reserved set.
h_rsv_handle: Handle reserved for finishing the logical operation.
h_total_credits: Number of remaining buffers we are allowed to add tojournal. These are dirty buffers and revoke descriptor blocks.
h_revoke_credits: Number of remaining revoke records available for handle
h_revoke_credits_requested: Holdsh_revoke_credits after handle is started.
h_ref: Reference count on this handle.
h_err: Field for caller’s use to track errors through large fs operations.
h_sync: Flag for sync-on-close.
h_reserved: Flag for handle for reserved credits.
h_aborted: Flag indicating fatal error on handle.
h_type: For handle statistics.
h_line_no: For handle statistics.
h_start_jiffies: Handle Start time.
h_requested_credits: Holdsh_total_credits after handle is started.
saved_alloc_context: Saved context while transaction is open.

structjournal_s¶: The journal_s type is the concrete type associated with journal_t.

Definition:

struct journal_s {    unsigned long           j_flags;    int j_errno;    struct mutex            j_abort_mutex;    struct buffer_head      *j_sb_buffer;    journal_superblock_t *j_superblock;    rwlock_t j_state_lock;    int j_barrier_count;    struct mutex            j_barrier;    transaction_t *j_running_transaction;    transaction_t *j_committing_transaction;    transaction_t *j_checkpoint_transactions;    wait_queue_head_t j_wait_transaction_locked;    wait_queue_head_t j_wait_done_commit;    wait_queue_head_t j_wait_commit;    wait_queue_head_t j_wait_updates;    wait_queue_head_t j_wait_reserved;    wait_queue_head_t j_fc_wait;    struct mutex            j_checkpoint_mutex;    struct buffer_head      *j_chkpt_bhs[JBD2_NR_BATCH];    struct shrinker         *j_shrinker;    struct percpu_counter   j_checkpoint_jh_count;    transaction_t *j_shrink_transaction;    unsigned long           j_head;    unsigned long           j_tail;    unsigned long           j_free;    unsigned long           j_first;    unsigned long           j_last;    unsigned long           j_fc_first;    unsigned long           j_fc_off;    unsigned long           j_fc_last;    struct block_device     *j_dev;    int j_blocksize;    unsigned long long      j_blk_offset;    char j_devname[BDEVNAME_SIZE+24];    struct block_device     *j_fs_dev;    errseq_t j_fs_dev_wb_err;    unsigned int            j_total_len;    atomic_t j_reserved_credits;    spinlock_t j_list_lock;    struct inode            *j_inode;    tid_t j_tail_sequence;    tid_t j_transaction_sequence;    tid_t j_commit_sequence;    tid_t j_commit_request;    __u8 j_uuid[16];    struct task_struct      *j_task;    int j_max_transaction_buffers;    int j_revoke_records_per_block;    int j_transaction_overhead_buffers;    unsigned long           j_commit_interval;    struct timer_list       j_commit_timer;    spinlock_t j_revoke_lock;    struct jbd2_revoke_table_s *j_revoke;    struct jbd2_revoke_table_s *j_revoke_table[2];    struct buffer_head      **j_wbuf;    struct buffer_head      **j_fc_wbuf;    int j_wbufsize;    int j_fc_wbufsize;    pid_t j_last_sync_writer;    u64 j_average_commit_time;    u32 j_min_batch_time;    u32 j_max_batch_time;    void (*j_commit_callback)(journal_t *, transaction_t *);    int (*j_submit_inode_data_buffers) (struct jbd2_inode *);    int (*j_finish_inode_data_buffers) (struct jbd2_inode *);    spinlock_t j_history_lock;    struct proc_dir_entry   *j_proc_entry;    struct transaction_stats_s j_stats;    unsigned int            j_failed_commit;    void *j_private;    __u32 j_csum_seed;#ifdef CONFIG_DEBUG_LOCK_ALLOC;    struct lockdep_map      j_trans_commit_map;#endif;    struct lock_class_key   jbd2_trans_commit_key;    void (*j_fc_cleanup_callback)(struct journal_s *journal, int full, tid_t tid);    int (*j_fc_replay_callback)(struct journal_s *journal, struct buffer_head *bh, enum passtype pass, int off, tid_t expected_commit_id);    int (*j_bmap)(struct journal_s *journal, sector_t *block);};

Members

j_flags: General journaling state flags [j_state_lock,no lock for quick racy checks]
j_errno: Is there an outstanding uncleared error on the journal (from a priorabort)? [j_state_lock]
j_abort_mutex: Lock the whole aborting procedure.
j_sb_buffer: The first part of the superblock buffer.
j_superblock: The second part of the superblock buffer.
j_state_lock: Protect the various scalars in the journal.
j_barrier_count: Number of processes waiting to create a barrier lock [j_state_lock,no lock for quick racy checks]
j_barrier: The barrier lock itself.
j_running_transaction: Transactions: The current running transaction...[j_state_lock, no lock for quick racy checks] [caller holdingopen handle]
j_committing_transaction: the transaction we are pushing to disk[j_state_lock] [caller holding open handle]
j_checkpoint_transactions: ... and a linked circular list of all transactions waiting forcheckpointing. [j_list_lock]
j_wait_transaction_locked: Wait queue for waiting for a locked transaction to start committing,or for a barrier lock to be released.
j_wait_done_commit: Wait queue for waiting for commit to complete.
j_wait_commit: Wait queue to trigger commit.
j_wait_updates: Wait queue to wait for updates to complete.
j_wait_reserved: Wait queue to wait for reserved buffer credits to drop.
j_fc_wait: Wait queue to wait for completion of async fast commits.
j_checkpoint_mutex: Semaphore for locking against concurrent checkpoints.
j_chkpt_bhs: List of buffer heads used by the checkpoint routine. Thiswas moved fromjbd2_log_do_checkpoint() to reduce stackusage. Access to this array is controlled by thej_checkpoint_mutex. [j_checkpoint_mutex]
j_shrinker: Journal head shrinker, reclaim buffer’s journal head whichhas been written back.
j_checkpoint_jh_count: Number of journal buffers on the checkpoint list. [j_list_lock]
j_shrink_transaction: Record next transaction will shrink on the checkpoint list.[j_list_lock]
j_head: Journal head: identifies the first unused block in the journal.[j_state_lock]
j_tail: Journal tail: identifies the oldest still-used block in the journal.[j_state_lock]
j_free: Journal free: how many free blocks are there in the journal?[j_state_lock]
j_first: The block number of the first usable block in the journal[j_state_lock].
j_last: The block number one beyond the last usable block in the journal[j_state_lock].
j_fc_first: The block number of the first fast commit block in the journal[j_state_lock].
j_fc_off: Number of fast commit blocks currently allocated. Accessed onlyduring fast commit. Currently only process can do fast commit, sothis field is not protected by any lock.
j_fc_last: The block number one beyond the last fast commit block in the journal[j_state_lock].
j_dev: Device where we store the journal.
j_blocksize: Block size for the location where we store the journal.
j_blk_offset: Starting block offset into the device where we store the journal.
j_devname: Journal device name.
j_fs_dev: Device which holds the client fs. For internal journal this will beequal to j_dev.
j_fs_dev_wb_err: Records the errseq of the client fs’s backing block device.
j_total_len: Total maximum capacity of the journal region on disk.
j_reserved_credits: Number of buffers reserved from the running transaction.
j_list_lock: Protects the buffer lists and internal buffer state.
j_inode: Optional inode where we store the journal. If present, alljournal block numbers are mapped into this inode viabmap().
j_tail_sequence: Sequence number of the oldest transaction in the log [j_state_lock]
j_transaction_sequence: Sequence number of the next transaction to grant [j_state_lock]
j_commit_sequence: Sequence number of the most recently committed transaction[j_state_lock, no lock for quick racy checks]
j_commit_request: Sequence number of the most recent transaction wanting commit[j_state_lock, no lock for quick racy checks]
j_uuid: Journal uuid: identifies the object (filesystem, LVM volume etc)backed by this journal. This will eventually be replaced by an arrayof uuids, allowing us to index multiple devices within a singlejournal and to perform atomic updates across them.
j_task: Pointer to the current commit thread for this journal.
j_max_transaction_buffers: Maximum number of metadata buffers to allow in a single compoundcommit transaction.
j_revoke_records_per_block: Number of revoke records that fit in one descriptor block.
j_transaction_overhead_buffers: Number of blocks each transaction needs for its own bookkeeping
j_commit_interval: What is the maximum transaction lifetime before we begin a commit?
j_commit_timer: The timer used to wakeup the commit thread.
j_revoke_lock: Protect the revoke table.
j_revoke: The revoke table - maintains the list of revoked blocks in thecurrent transaction.
j_revoke_table: Alternate revoke tables for j_revoke.
j_wbuf: Array of bhs for jbd2_journal_commit_transaction.
j_fc_wbuf: Array of fast commit bhs for fast commit. Accessed onlyduring a fast commit. Currently only process can do fast commit, sothis field is not protected by any lock.
j_wbufsize: Size ofj_wbuf array.
j_fc_wbufsize: Size ofj_fc_wbuf array.
j_last_sync_writer: The pid of the last person to run a synchronous operationthrough the journal.
j_average_commit_time: The average amount of time in nanoseconds it takes to commit atransaction to disk. [j_state_lock]
j_min_batch_time: Minimum time that we should wait for additional filesystem operationsto get batched into a synchronous handle in microseconds.
j_max_batch_time: Maximum time that we should wait for additional filesystem operationsto get batched into a synchronous handle in microseconds.
j_commit_callback: This function is called when a transaction is closed.
j_submit_inode_data_buffers: This function is called for all inodes associated with thecommitting transaction marked with JI_WRITE_DATA flagbefore we start to write out the transaction to the journal.
j_finish_inode_data_buffers: This function is called for all inodes associated with thecommitting transaction marked with JI_WAIT_DATA flagafter we have written the transaction to the journalbut before we write out the commit block.
j_history_lock: Protect the transactions statistics history.
j_proc_entry: procfs entry for the jbd statistics directory.
j_stats: Overall statistics.
j_failed_commit: Failed journal commit ID.
j_private: An opaque pointer to fs-private information. ext3 puts itssuperblock pointer here.
j_csum_seed: Precomputed journal UUID checksum for seeding other checksums.
j_trans_commit_map: Lockdep entity to track transaction commit dependencies. Handleshold this “lock” for read, when we wait for commit, we acquire the“lock” for writing. This matches the properties of jbd2 journallingwhere the running transaction has to wait for all handles to bedropped to commit that transaction and also acquiring a handle mayrequire transaction commit to finish.
jbd2_trans_commit_key: “structlock_class_key” forj_trans_commit_map
j_fc_cleanup_callback: Clean-up after fast commit or full commit. JBD2 calls this functionafter every commit operation.
j_fc_replay_callback: File-system specific function that performs replay of a fastcommit. JBD2 calls this function for each fast commit block found inthe journal. This function should return JBD2_FC_REPLAY_CONTINUEto indicate that the block was processed correctly and more fastcommit replay should continue. Return value of JBD2_FC_REPLAY_STOPindicates the end of replay (no more blocks remaining). A negativereturn value indicates error.
j_bmap: Bmap function that should be used instead of the genericVFS bmap function.

Functions¶

The functions here are split into two groups those that affect a journalas a whole, and those which are used to manage transactions

Journal Level¶

intjbd2_journal_force_commit_nested(journal_t*journal)¶: Force and wait upon a commit if the calling process is not within transaction.

Parameters

journal_t*journal: journal to forceReturns true if progress was made.

Description

This is used for forcing out undo-protected data which containsbitmaps, when the fs is running out of space.

intjbd2_journal_force_commit(journal_t*journal)¶: force any uncommitted transactions

Parameters

journal_t*journal: journal to force

Description

Caller want unconditional commit. We can only force the running transactionif we don’t have an active handle, otherwise, we will deadlock.

journal_t*jbd2_journal_init_dev(structblock_device*bdev,structblock_device*fs_dev,unsignedlonglongstart,intlen,intblocksize)¶: creates and initialises a journal structure

Parameters

structblock_device*bdev: Block device on which to create the journal
structblock_device*fs_dev: Device which hold journalled filesystem for this journal.
unsignedlonglongstart: Block nr Start of journal.
intlen: Length of the journal in blocks.
intblocksize: blocksize of journalling device

Return

a newly created journal_t *

Description

jbd2_journal_init_dev creates a journal which maps a fixed contiguousrange of blocks on an arbitrary block device.

journal_t*jbd2_journal_init_inode(structinode*inode)¶: creates a journal which maps to a inode.

Parameters

structinode*inode: An inode to create the journal in

Description

jbd2_journal_init_inode creates a journal which maps an on-disk inode asthe journal. The inode must exist already, must supportbmap() andmust have all data blocks preallocated.

voidjbd2_journal_update_sb_errno(journal_t*journal)¶: Update error in the journal.

Parameters

journal_t*journal: The journal to update.

Description

Update a journal’s errno. Write updated superblock to disk waiting for IOto complete.

intjbd2_journal_load(journal_t*journal)¶: Read journal from disk.

Parameters

journal_t*journal: Journal to act on.

Description

Given a journal_t structure which tells us which disk blocks containa journal, read the journal from disk to initialise the in-memorystructures.

intjbd2_journal_destroy(journal_t*journal)¶: Release a journal_t structure.

Parameters

journal_t*journal: Journal to act on.

Description

Release a journal_t structure once it is no longer in use by thejournaled object.Return <0 if we couldn’t clean up the journal.

intjbd2_journal_check_used_features(journal_t*journal,unsignedlongcompat,unsignedlongro,unsignedlongincompat)¶: Check if features specified are used.

Parameters

journal_t*journal: Journal to check.
unsignedlongcompat: bitmask of compatible features
unsignedlongro: bitmask of features that force read-only mount
unsignedlongincompat: bitmask of incompatible features

Description

Check whether the journal uses all of a given set offeatures. Return true (non-zero) if it does.

intjbd2_journal_check_available_features(journal_t*journal,unsignedlongcompat,unsignedlongro,unsignedlongincompat)¶: Check feature set in journalling layer

Parameters

journal_t*journal: Journal to check.
unsignedlongcompat: bitmask of compatible features
unsignedlongro: bitmask of features that force read-only mount
unsignedlongincompat: bitmask of incompatible features

Description

Check whether the journaling code supports the use ofall of a given set of features on this journal. Return true

intjbd2_journal_set_features(journal_t*journal,unsignedlongcompat,unsignedlongro,unsignedlongincompat)¶: Mark a given journal feature in the superblock

Parameters

journal_t*journal: Journal to act on.
unsignedlongcompat: bitmask of compatible features
unsignedlongro: bitmask of features that force read-only mount
unsignedlongincompat: bitmask of incompatible features

Description

Mark a given journal feature as present on thesuperblock. Returns true if the requested features could be set.

intjbd2_journal_flush(journal_t*journal,unsignedintflags)¶: Flush journal

Parameters

journal_t*journal: Journal to act on.
unsignedintflags: optional operation on the journal blocks after the flush (see below)

Description

Flush all data for a given journal to disk and empty the journal.Filesystems can use this when remounting readonly to ensure thatrecovery does not need to happen on remount. Optionally, a discard or zerooutcan be issued on the journal blocks after flushing.

flags:: JBD2_JOURNAL_FLUSH_DISCARD: issues discards for the journal blocksJBD2_JOURNAL_FLUSH_ZEROOUT: issues zeroouts for the journal blocks

intjbd2_journal_wipe(journal_t*journal,intwrite)¶: Wipe journal contents

Parameters

journal_t*journal: Journal to act on.
intwrite: flag (see below)

Description

Wipe out all of the contents of a journal, safely. This will producea warning if the journal contains any valid recovery information.Must be called between journal_init_*() andjbd2_journal_load().

If ‘write’ is non-zero, then we wipe out the journal on disk; otherwisewe merely suppress recovery.

voidjbd2_journal_abort(journal_t*journal,interrno)¶: Shutdown the journal immediately.

Parameters

journal_t*journal: the journal to shutdown.
interrno: an error number to record in the journal indicatingthe reason for the shutdown.

Description

Perform a complete, immediate shutdown of the ENTIREjournal (not of a single transaction). This operation cannot beundone without closing and reopening the journal.

The jbd2_journal_abort function is intended to support higher level errorrecovery mechanisms such as the ext2/ext3 remount-readonly errormode.

Journal abort has very specific semantics. Any existing dirty,unjournaled buffers in the main filesystem will still be written todisk by bdflush, but the journaling mechanism will be suspendedimmediately and no further transaction commits will be honoured.

Any dirty, journaled buffers will be written back to disk withouthitting the journal. Atomicity cannot be guaranteed on an abortedfilesystem, but we _do_ attempt to leave as much data as possiblebehind for fsck to use for cleanup.

Any attempt to get a new transaction handle on a journal which is inABORT state will just result in an -EROFS error return. Ajbd2_journal_stop on an existing handle will return -EIO if we haveentered abort state during the update.

Recursive transactions are not disturbed by journal abort until thefinal jbd2_journal_stop, which will receive the -EIO error.

Finally, the jbd2_journal_abort call allows the caller to supply an errnowhich will be recorded (if possible) in the journal superblock. Thisallows a client to record failure conditions in the middle of atransaction without having to complete the transaction to record thefailure to disk. ext3_error, for example, now uses thisfunctionality.

intjbd2_journal_errno(journal_t*journal)¶: returns the journal’s error state.

Parameters

journal_t*journal: journal to examine.

Description

This is the errno number set withjbd2_journal_abort(), the lasttime the journal was mounted - if the journal was stoppedwithout calling abort this will be 0.

If the journal has been aborted on this mount time -EROFS willbe returned.

intjbd2_journal_clear_err(journal_t*journal)¶: clears the journal’s error state

Parameters

journal_t*journal: journal to act on.

Description

An error must be cleared or acked to take a FS out of readonlymode.

voidjbd2_journal_ack_err(journal_t*journal)¶: Ack journal err.

Parameters

journal_t*journal: journal to act on.

Description

An error must be cleared or acked to take a FS out of readonlymode.

intjbd2_journal_recover(journal_t*journal)¶: recovers a on-disk journal

Parameters

journal_t*journal: the journal to recover

Description

The primary function for recovering the log contents when mounting ajournaled device.

Recovery is done in three passes. In the first pass, we look for theend of the log. In the second, we assemble the list of revokeblocks. In the third and final pass, we replay any un-revoked blocksin the log.

intjbd2_journal_skip_recovery(journal_t*journal)¶: Start journal and wipe exiting records

Parameters

journal_t*journal: journal to startup

Description

Locate any valid recovery information from the journal and set up thejournal structures in memory to ignore it (presumably because thecaller has evidence that it is out of date).This function doesn’t appear to be exported..

We perform one pass over the journal to allow us to tell the user howmuch recovery information is being erased, and to let us initialisethe journal transaction sequence numbers to the next unused ID.

Transaction Level¶

handle_t*jbd2_journal_start(journal_t*journal,intnblocks)¶: Obtain a new handle.

Parameters

journal_t*journal: Journal to start transaction on.
intnblocks: number of block buffer we might modify

Description

We make sure that the transaction can guarantee at least nblocks ofmodified buffers in the log. We block until the log can guaranteethat much space. Additionally, if rsv_blocks > 0, we also create anotherhandle with rsv_blocks reserved blocks in the journal. This handle isstored in h_rsv_handle. It is not attached to any particular transactionand thus doesn’t block transaction commit. If the caller uses this reservedhandle, it has to set h_rsv_handle to NULL as otherwisejbd2_journal_stop()on the parent handle will dispose the reserved one. Reserved handle has tobe converted to a normal handle usingjbd2_journal_start_reserved() beforeit can be used.

Return a pointer to a newly allocated handle, or anERR_PTR() valueon failure.

intjbd2_journal_start_reserved(handle_t*handle,unsignedinttype,unsignedintline_no)¶: start reserved handle

Parameters

handle_t*handle: handle to start
unsignedinttype: for handle statistics
unsignedintline_no: for handle statistics

Description

Start handle that has been previously reserved withjbd2_journal_reserve().This attacheshandle to the running transaction (or creates one if there’snot transaction running). Unlikejbd2_journal_start() this function cannotblock on journal commit, checkpointing, or similar stuff. It can block onmemory allocation or frozen journal though.

Return 0 on success, non-zero on error - handle is freed in that case.

intjbd2_journal_extend(handle_t*handle,intnblocks,intrevoke_records)¶: extend buffer credits.

Parameters

handle_t*handle: handle to ‘extend’
intnblocks: nr blocks to try to extend by.
intrevoke_records: number of revoke records to try to extend by.

Description

Some transactions, such as large extends and truncates, can be doneatomically all at once or in several stages. The operation requestsa credit for a number of buffer modifications in advance, but canextend its credit if it needs more.

jbd2_journal_extend tries to give the running handle more buffer credits.It does not guarantee that allocation - this is a best-effort only.The calling process MUST be able to deal cleanly with a failure toextend here.

Return 0 on success, non-zero on failure.

return code < 0 implies an errorreturn code > 0 implies normal transaction-full status.

intjbd2__journal_restart(handle_t*handle,intnblocks,intrevoke_records,gfp_tgfp_mask)¶: restart a handle .

Parameters

handle_t*handle: handle to restart
intnblocks: nr credits requested
intrevoke_records: number of revoke record credits requested
gfp_tgfp_mask: memory allocation flags (for start_this_handle)

Description

Restart a handle for a multi-transaction filesystemoperation.

If thejbd2_journal_extend() call above fails to grant new buffer creditsto a running handle, a call to jbd2_journal_restart will commit thehandle’s transaction so far and reattach the handle to a newtransaction capable of guaranteeing the requested number ofcredits. We preserve reserved handle if there’s any attached to thepassed in handle.

voidjbd2_journal_lock_updates(journal_t*journal)¶: establish a transaction barrier.

Parameters

journal_t*journal: Journal to establish a barrier on.

Description

This locks out any further updates from being started, and blocksuntil all existing updates have completed, returning only once thejournal is in a quiescent state with no updates running.

The journal lock should not be held on entry.

voidjbd2_journal_unlock_updates(journal_t*journal)¶: release barrier

Parameters

journal_t*journal: Journal to release the barrier on.

Description

Release a transaction barrier obtained withjbd2_journal_lock_updates().

Should be called without the journal lock held.

intjbd2_journal_get_write_access(handle_t*handle,structbuffer_head*bh)¶: notify intent to modify a buffer for metadata (not data) update.

Parameters

handle_t*handle: transaction to add buffer modifications to
structbuffer_head*bh: bh to be used for metadata writes

Return

error code or 0 on success.

Description

In full data journalling mode the buffer may be of type BJ_AsyncData,because we’rewrite()ing a buffer which is also part of a shared mapping.

intjbd2_journal_get_create_access(handle_t*handle,structbuffer_head*bh)¶: notify intent to use newly created bh

Parameters

handle_t*handle: transaction to new buffer to
structbuffer_head*bh: new buffer.

Description

Call this if you create a new bh.

intjbd2_journal_get_undo_access(handle_t*handle,structbuffer_head*bh)¶: Notify intent to modify metadata with non-rewindable consequences

Parameters

handle_t*handle: transaction
structbuffer_head*bh: buffer to undo

Description

Sometimes there is a need to distinguish between metadata which hasbeen committed to disk and that which has not. The ext3fs code usesthis for freeing and allocating space, we have to make sure that wedo not reuse freed space until the deallocation has been committed,since if we overwrote that space we would make the deleteun-rewindable in case of a crash.

To deal with that, jbd2_journal_get_undo_access requests write access to abuffer for parts of non-rewindable operations such as deleteoperations on the bitmaps. The journaling code must keep a copy ofthe buffer’s contents prior to the undo_access call until such timeas we know that the buffer has definitely been committed to disk.

We never need to know which transaction the committed data is partof, buffers touched here are guaranteed to be dirtied later and sowill be committed to a new transaction in due course, at which pointwe can discard the old committed data pointer.

Returns error number or 0 on success.

voidjbd2_journal_set_triggers(structbuffer_head*bh,structjbd2_buffer_trigger_type*type)¶: Add triggers for commit writeout

Parameters

structbuffer_head*bh: buffer to trigger on
structjbd2_buffer_trigger_type*type: structjbd2_buffer_trigger_type containing the trigger(s).

Description

Set any triggers on this journal_head. This is always safe, becausetriggers for a committing buffer will be saved off, and triggers fora running transaction will match the buffer in that transaction.

Call with NULL to clear the triggers.

intjbd2_journal_dirty_metadata(handle_t*handle,structbuffer_head*bh)¶: mark a buffer as containing dirty metadata

Parameters

handle_t*handle: transaction to add buffer to.
structbuffer_head*bh: buffer to mark

Description

mark dirty metadata which needs to be journaled as part of the currenttransaction.

The buffer must have previously hadjbd2_journal_get_write_access()called so that it has a valid journal_head attached to the bufferhead.

The buffer is placed on the transaction’s metadata list and is markedas belonging to the transaction.

Returns error number or 0 on success.

Special care needs to be taken if the buffer already belongs to thecurrent committing transaction (in which case we should have frozendata present for that commit). In that case, we don’t relink thebuffer: that only gets done when the old transaction finallycompletes its commit.

intjbd2_journal_forget(handle_t*handle,structbuffer_head*bh)¶: bforget() for potentially-journaled buffers.

Parameters

handle_t*handle: transaction handle
structbuffer_head*bh: bh to ‘forget’

Description

We can only do the bforget if there are no commits pending against thebuffer. If the buffer is dirty in the current running transaction wecan safely unlink it.

bh may not be a journalled buffer at all - it may be a non-JBDbuffer which came off the hashtable. Check for this.

Decrements bh->b_count by one.

Allow this call even if the handle has aborted --- it may be part ofthe caller’s cleanup after an abort.

intjbd2_journal_stop(handle_t*handle)¶: complete a transaction

Parameters

handle_t*handle: transaction to complete.

Description

All done for a particular handle.

There is not much action needed here. We just return any remainingbuffer credits to the transaction and remove the handle. The onlycomplication is that we need to start a commit operation if thefilesystem is marked for synchronous update.

jbd2_journal_stop itself will not usually return an error, but it maydo so in unusual circumstances. In particular, expect it toreturn -EIO if a jbd2_journal_abort has been executed since thetransaction began.

booljbd2_journal_try_to_free_buffers(journal_t*journal,structfolio*folio)¶: try to free page buffers.

Parameters

journal_t*journal: journal for operation
structfolio*folio: Folio to detach data from.

Description

For all the buffers on this page,if they are fully written out ordered data, move them onto BUF_CLEANsotry_to_free_buffers() can reap them.

This function returns non-zero if we wishtry_to_free_buffers()to be called. We do this if the page is releasable bytry_to_free_buffers().We also do it if the page has locked or dirty buffers and the caller wantsus to perform sync or async writeout.

This complicates JBD locking somewhat. We aren’t protected by theBKL here. We wish to remove the buffer from its committing orrunning transaction’s ->t_datalist via __jbd2_journal_unfile_buffer.

This maychange the value of transaction_t->t_datalist, so anyonewho looks at t_datalist needs to lock against this function.

Even worse, someone may be doing a jbd2_journal_dirty_data on thisbuffer. So we need to lock against that.jbd2_journal_dirty_data()will come out of the lock with the buffer dirty, which makes itineligible for release here.

Who else is affected by this? hmm... Really the only contenderisdo_get_write_access() - it could be looking at the buffer whilejournal_try_to_free_buffer() is changing its state. But thatcannot happen because we never reallocate freed data as metadatawhile the data is part of a transaction. Yes?

Return false on failure, true on success

intjbd2_journal_invalidate_folio(journal_t*journal,structfolio*folio,size_toffset,size_tlength)¶

Parameters

journal_t*journal: journal to use for flush...
structfolio*folio: folio to flush
size_toffset: start of the range to invalidate
size_tlength: length of the range to invalidate

Description

Reap page buffers containing data after in the specified range in page.Can return -EBUSY if buffers are part of the committing transaction andthe page is straddling i_size. Caller then has to wait for current commitand try again.

Movatterモバイル変換

The Linux Journalling API¶

Overview¶

Details¶

Fast commits¶

Summary¶

Data Types¶

Structures¶

Functions¶

Journal Level¶

Transaction Level¶

See also¶