You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
On some operating systems, it doesn't make sense to retry fsync(),because dirty data cached by the kernel may have been dropped onwrite-back failure. In that case the only remaining copy of thedata is in the WAL. A subsequent fsync() could appear to succeed,but not have flushed the data. That means that a future checkpointcould apparently complete successfully but have lost data.Therefore, violently prevent any future checkpoint attempts bypanicking on the first fsync() failure. Note that we alreadydid the same for WAL data; this change extends that behavior tonon-temporary data files.Provide a GUC data_sync_retry to control this new behavior, forusers of operating systems that don't eject dirty data, and possiblyforensic/testing uses. If it is set to on and the write-back errorwas transient, a later checkpoint might genuinely succeed (on asystem that does not throw away buffers on failure); if the error ispermanent, later checkpoints will continue to fail. The GUC defaultsto off, meaning that we panic.Back-patch to all supported releases.There is still a narrow window for error-loss on some operatingsystems: if the file is closed and later reopened and a write-backerror occurs in the intervening time, but the inode has the badluck to be evicted due to memory pressure before we reopen, we couldmiss the error. A later patch will address that with a schemefor keeping files with dirty data open at all times, but we judgethat to be too complicated to back-patch.Author: Craig Ringer, with some adjustments by Thomas MunroReported-by: Craig RingerReviewed-by: Robert Haas, Thomas Munro, Andres FreundDiscussion:https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de