D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Folding@home 'errors'

 

Gordon Henderson wrote:
On Tue, 16 Mar 2010, tom wrote:

I run folding@home on 4 machines here and one quite often finishes early:
" Simulation instability has been encountered. The run has entered a
[23:12:46]   state from which no further progress can be made.
[23:12:46] This may be the correct result of the simulation, however if you
[23:12:46]   often see other project units terminating early like this
[23:12:46] too, you may wish to check the stability of your computer (issues
[23:12:46]   such as high temperature, overclocking, etc.)."

there is no overclocking, cpu is ~29c and memtest can run for days without finding a problem...
Any clues/tips?
Once upon a time I worked in the R&D department of an old british 
supercomputer company... I mainly wrote test & diagnostics, and 
low-level driver code - worked with the hardware & chip desginers, did 
some design & integration, system building, etc, etc...
And even then, I could get a system to run all my diagnostics for days 
on end in & out of the burn-in ovens, then they would fail miserably 
when subject to application code )-:
And even more yerars ago - I looked after a PDP11 running Unix v6 - 
every quarter we'd get the DEC engineer in as part of the maintenance 
contract - he'd hoover the core memory, etc... run all his 
diagnostics, but I remember him saying that running Unix on them was a 
much better test than any of his diagnostics ever were!
So you need to think bigger than just memtest - have you tried 
cpuburn? However that's just a set of CPU tests. There is a user-land 
memory tester too - it's 'memtester' under debian. Portentially not as 
thorough as memtest86+, but you can run it in conjunction with other 
things.
I'll have a look at cpuburn - what baffles me though is I'm using that particular PC all the time and no-other programs have any noticeable faults which would be expected at the same time (other than firefox/flash issues) I have no special GPU or anything so F@H is just hammering the cpu/ram and the odd bit of disk - as would any other app. Nothing crops up in the logs anywhere. I'm probably just looking at an excuse to get an one of those 4 core AMD's but the only thing that fails is f@h and until I can get something else to fall over I cant even convince myself its a valid expense!
Tom te tom te tom

--
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html