D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Ensuring data is on disk

 

On Sat, 10 Jul 2010, Simon Waters wrote:

Curiosity driven question.

If I write data to my external hard disk, how can I ensure that a read
to verify that data isn't using a cache?

Suddenly I realise that SCSI tape devices simply read the data after
writing it by putting the read head after the write head (if they had
two) thus ensuring your data was on tape in one operation.

The OS caches can be cleared by remounting the device, and there are
other options in later kernels. But I can't find clear documentation for
the behaviour of hard disk caches in such situations.
The supposedly safe way to make sure all data is physically on disk is to 
use the fsync(2) system call after a write, or you can pass the O_SYNC 
flag to open(2) These are supposed to block until the hardware reports 
that the data is actually on the disk, but these may not guarantee that 
the file metadata is written to disk, so an open/fsync on the directory 
holding the file is suggested...
However, that probably doesn't remove the data from the filesystem/block 
cache. (or even the disks own cache) So immediately reading it, even after 
fsync() is not going to guarantee that the data comes off the disk 
platters... (Same for the sync command which just calls sync(2))
I suspect umnounting it, power cycling (if possible - e.g. from a script) 
it and re-mounting would be the only way to force the issue...
The crux is the disks own cache - and I recall some papers written a few 
years ago about issues there - doing a fsync/sync,etc. then immediately 
killing power will not always guarantee that the data is on the platters - 
even though the disk tells the host OS that it is - this is due to 
manufacturers being more and more competitive to eek out every last ounce 
of precieved performance out of their disks.
What would I do if I had to be absolutely paranoid... Hm. write data to 
disk, unmount it, loop, allocating and touching memory, 4K at a time to 
effectively flush the file/block cache, release it all, then read the data 
back in from the disk.. That's a bit tedious though!
On most of my boxes I keep a backup on a separate partition which I 
normally keep read-only. I mount -oremount,rw it, rsync data into nightly, 
then re-mount it read-only. I'd assume that at that point all data is 
flushed out to the partition, but it's still very probably in the buffer 
cache. (it's not a proper backup, but a staging post before I then copy 
the data off the server to another, and a handy accidental file deletion 
get out of jail free card - disk space is cheap)
Gordon

--
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/listfaq