D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Ensuring data is on disk - 2

 

On Wed, 14 Jul 2010, Simon Waters wrote:

The key tools are "smartctl" and "hdparm", but neither of which are
sufficiently fresh on Debian Lenny to display the relevant attributes to
me (if my drive has them).

Hm. I've been using smartctl for ... well, as long as I've known about them. Lenny's version of smartctl is 5.38 - the one on the projects website is 5.39 so I don't think it's that old... The one thing I wished it did was have the drive database in a file and not compiled into the program - that might make updating it a little easier.

And I have to say, I've never found hdparam useful for anything other than the raw benchmark facility (-tT) in recent years - although it was handy in the days when drive DMA was off by default and you wanted to tune multi-sector reads. (however, it might still be handy for that for all I know - especially on very old IDE drives...) Maybe I'll re-read the man page to see what it can do now ;-)

Although both report their own inadequacies
to describe the features of my disk drives - smartctl says one drive has
SMART but can't talk to it, and the other drive has a good SMART health
(despite me having another windows open which has counted 1948 bad
blocks and is only 77% finished scanning the disk), and hdparm displays
some unknown attributes for the first mentioned disk (include vendor
extensions), and the other disk isn't recognised at all (apparently it
is a SAMSUNG  HD642JI, 640GB).

At that size, I guess it's relatively modern - seems odd that it's not recognised, however I don't think I've ever bought/used a samsung drive!

I'm also not convinced the 'health' status is good - I have a pair of drives that show good 'health':

  # smartctl -d ata -H /dev/sda
  === START OF READ SMART DATA SECTION ===
  SMART overall-health self-assessment test result: PASSED

Yet this drive shows bad sectors during tests:

Num  Test_Description    Status                  Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      9816         7855446

And another 20 lines of similar )-:

However, there are no reallocated blocks:

  # smartctl -a -d ata /dev/sda | fgrep Reallocated_Event_Count
  ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  
WHEN_FAILED RAW_VALUE
  196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -   
    0

So what does that tell me )-:

I sometimes wonder if the drive manufacturers know what they're doing.... I have a set of drives with firmware bugs that cause some of the smart data to be read incorrectly - at least the makers (WDC in this case) acknowledged it - these drives read their temperatures as about 15C hotter than they really are - so if they can get the temperature reads wrong, what else can they get wrong?

  # hddtemp /dev/sd[a-e]
  /dev/sda: WDC WD2500KS-00MJB0: 52 C
  /dev/sdb: WDC WD2500KS-00MJB0: 52 C
  /dev/sdc: WDC WD2500KS-00MJB0: 52 C
  /dev/sdd: WDC WD2500KS-00MJB0: 52 C
  /dev/sde: WDC WD2500KS-00MJB0: 46 C


In the mean time, as suggested there are several ways to clear the OS
cache (unmount or drop_caches), and one is left relying on luck (or
power cycling) to miss the cache in the disk (mine are 8MB and 16MB so
when writing big backups luck is likely on your side....), at which
point just reading the data should be sufficient to ensure it is safely
on disk, since only a bad block should stop the process but verifying
the data is probably wise.

There's only so much you can cache though, so at some point you must start to read back off the drive - benchmarks rely on that - bonnie will insist on using a file size of 2x memory size for example....

So if we write to disk, flush memory cache:

  sync
  echo 3 > /proc/sys/vm/drop_caches

I'd probably unmount it and re-mount it, if practical too.

tell the disk to flush it's cache.... If it's a whole drive (and not just a partition), then maybe we can power it down and up again - either with hdparam or by writing the right runes to /proc/scsi.. but even powered down, I bet the controller is still active...

Hmmmm....

hdparam -f and -F both claim to flush caches, -F the write cache, but makes no mention of a read cache.

I wonder, if as part of the backup, you write a file which is 2x (or more) the size of the disk buffer - so 64MB or 128MB, or whatever it is... Then read this first before reading the 'real' data -

So..

  Do the backup.
  dd if=/dev/zero of=/mnt/dummy.file bs=1M count=256
  umount /mnt   # if practical
  sync
  echo 3 > /proc/sys/vm/drop_caches
  mount /mnt    # if practical
  dd if=/mnt/dummy.file of=/dev/null bs=1M

  then compare backup with original...

Seems a bit crude though...

I do a weekly 'resync' on all my RAID arrays - it doesn't verify that the data is what was written, but it will verify that every sector can be read and that the RAID checksums (R5/R6) are correct, or that the mirrors (R1) agree with each other. Mdadm can do it automagically, or do it manually -

  echo 'check' > /sys/block/md1/md/sync_action

etc.

The Lenny package only does this once a month though, so I change the crontab.

All this leads to the answer to a question Ben (I think) muttered
recently about who uses tape these days.... Well suddenly it looks a lot
simpler than disks.

Harder to manage though - especially in remote data centres - although "remote hands" or a big "jukebox" changer can help... (but potentially expensive) PITA to get backups from in cases of accidental deletion - that's when I started to do a 'backup' to the same server before dumping to other storage when disks got cheaper than tapes... (DLT tape at the time, then remote servers to disk - the local server would have a few days of data held on it via rsync on a separate partition)

Also one area where I think Microsoft Windows may be doing a better job
at exposing the relevant interface.

More probably the drive manufacturers writing their diagnostics and drivers for MS rather than for anything else.

Anyone with Mac experience know how they do it?

Gordon

--
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/listfaq