D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Ensuring data is on disk


On Sat, 10 Jul 2010, Simon Waters wrote:

Curiosity driven question.

If I write data to my external hard disk, how can I ensure that a read
to verify that data isn't using a cache?

Suddenly I realise that SCSI tape devices simply read the data after
writing it by putting the read head after the write head (if they had
two) thus ensuring your data was on tape in one operation.

The OS caches can be cleared by remounting the device, and there are
other options in later kernels. But I can't find clear documentation for
the behaviour of hard disk caches in such situations.

The supposedly safe way to make sure all data is physically on disk is to use the fsync(2) system call after a write, or you can pass the O_SYNC flag to open(2) These are supposed to block until the hardware reports that the data is actually on the disk, but these may not guarantee that the file metadata is written to disk, so an open/fsync on the directory holding the file is suggested...

However, that probably doesn't remove the data from the filesystem/block cache. (or even the disks own cache) So immediately reading it, even after fsync() is not going to guarantee that the data comes off the disk platters... (Same for the sync command which just calls sync(2))

I suspect umnounting it, power cycling (if possible - e.g. from a script) it and re-mounting would be the only way to force the issue...

The crux is the disks own cache - and I recall some papers written a few years ago about issues there - doing a fsync/sync,etc. then immediately killing power will not always guarantee that the data is on the platters - even though the disk tells the host OS that it is - this is due to manufacturers being more and more competitive to eek out every last ounce of precieved performance out of their disks.

What would I do if I had to be absolutely paranoid... Hm. write data to disk, unmount it, loop, allocating and touching memory, 4K at a time to effectively flush the file/block cache, release it all, then read the data back in from the disk.. That's a bit tedious though!

On most of my boxes I keep a backup on a separate partition which I normally keep read-only. I mount -oremount,rw it, rsync data into nightly, then re-mount it read-only. I'd assume that at that point all data is flushed out to the partition, but it's still very probably in the buffer cache. (it's not a proper backup, but a staging post before I then copy the data off the server to another, and a handy accidental file deletion get out of jail free card - disk space is cheap)


The Mailing List for the Devon & Cornwall LUG
FAQ: http://www.dcglug.org.uk/listfaq