D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Folding@home 'errors'

 

On Wed, 2010-03-17 at 07:41 +0000, tom wrote:  
> And I'm not making myself clear.
> Its a one core steam driven machine. With F@h running the machine always 
> uses 100% cpu. So cpuburn/prime etc dont work the machine any harder.

I don't think we're in disagreement here! Anyway...

On Wed, 2010-03-17 at 08:58 +0000, tom wrote:
> Its (f@h) a big FP maths problem. It does provide some info but 'just 
> gone wrong- shouldn't be here' is about all it can realistically manage.
> I run a lot of spice sims which dont fail unexpectedly and are almost 
> pure number crunching. Its just f@h fails on this machine - and not the 
> others...
> If anyone knows of an FPU checker - ie it knows what the result should 
> be before asking?
> Tom te tom te tom

Do you *know* there aren't occasional small errors in your space sims?
Do you have any way of telling that your machine never occasionally
writes corrupted/wrong data to disk? Unexpected failure is not the only
symptom of a problem and is by no means guaranteed to happen if your CPU
hiccups.

As previously stated, Prime95 will calculate Mersenne primes for you and
then compare with the known answer - it does a lot of floating point
calculations and you will see when it gets one wrong, it won't just
halt. The longer you run tools like this without an error, the more sure
you can be of system stability.

More details on Prime95 here:
http://www.mersenne.org/

At the end of the day it comes down to your definition of a stable
system and what you're happy with. Some CPUs can run Prime95 for 24+
hours before getting an error - whether that system is stable or not is
a matter of opinion. By then you're probably beyond point where the
manufacturer would bin the chip as faulty if it was new - or it may be
that with another tool the problem would manifest itself sooner.

All I would say is that if you keep getting errors from F@H and don't
find a fix, then it wouldn't be ethical or useful to keep running it on
that machine.

Best wishes

Dan


-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html