sync_file_range - sync a file segment with disk (None of these operations writes out the file’s metadata)

1 minute read

背景

sync_file_range 可以将文件的部分范围作为目标,将对应范围内的脏页刷回磁盘,而不是整个文件的范围。

好处是,当我们对大文件进行了修改时,如果修改了大量的数据块,我们最后fsync的时候,可能会很慢。即使fdatasync,也是有问题的,例如这个大文件的长度在我们的修改过程中发生了变化,那么fdatasync将同时写metadata,而对于文件系统来说,单个文件系统的写metadata 是串行的,这势必导致影响其他用户操作metadata(如创建文件)。

sync_file_range是绝对不会写metadata的,所以用它非常合适,每次对文件做了小范围的修改时,立即调用sync_file_range,把对应的脏数据刷到磁盘,那么在结束对文件的修改后,再调用fdatasync (flush dirty data page), fsync(flush dirty data+metadata page)都是很块的。

sync_file_range的几个flag, 注意SYNC_FILE_RANGE_WRITE是异步的,所以如果你要达到以上目的话,那么最好不要使用异步模式,或者至少在调用fdatasync和fsync前,使用SYNC_FILE_RANGE_WAIT_BEFORE SYNC_FILE_RANGE_WRITE
       SYNC_FILE_RANGE_WAIT_AFTER做一次全文件范围的sync_file_range。从而保证在调用fdatasync或fsync前,该文件的dirty page已经全部刷到磁盘了。  
  
       SYNC_FILE_RANGE_WAIT_BEFORE  
              Wait upon write-out of all pages in the specified range that  
              have already been submitted to the device driver for write-out  
              before performing any write.  
  
       SYNC_FILE_RANGE_WRITE  
              Initiate write-out of all dirty pages in the specified range  
              which are not presently submitted write-out.  Note that even  
              this may block if you attempt to write more than request queue  
              size.  
  
       SYNC_FILE_RANGE_WAIT_AFTER  
              Wait upon write-out of all pages in the range after performing  
              any write.  
  
       Useful combinations of the flags bits are:  
  
       SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE  
              Ensures that all pages in the specified range which were dirty  
              when sync_file_range() was called are placed under write-out.  
              This is a start-write-for-data-integrity operation.  
  
       SYNC_FILE_RANGE_WRITE  
              Start write-out of all dirty pages in the specified range  
              which are not presently under write-out.  This is an  
              asynchronous flush-to-disk operation.  This is not suitable  
              for data integrity operations.  
  
       SYNC_FILE_RANGE_WAIT_BEFORE (or SYNC_FILE_RANGE_WAIT_AFTER)  
              Wait for completion of write-out of all pages in the specified  
              range.  This can be used after an earlier  
              SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE operation  
              to wait for completion of that operation, and obtain its  
              result.  
  
       SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE |  
       SYNC_FILE_RANGE_WAIT_AFTER  
              This is a write-for-data-integrity operation that will ensure  
              that all pages in the specified range which were dirty when  
              sync_file_range() was called are committed to disk.  

参考

1. http://man7.org/linux/man-pages/man2/sync_file_range.2.html

Flag Counter

digoal’s 大量PostgreSQL文章入口