Processes stall for an extended period of time, often tasks were reported as 'blocked for more that 120 seconds' and for a number of reasons I was suspecting issues with the LSI Raid driver.<br />
<br />
Other symptoms included:<br />
<br />
+ Occasional High CPU for kswapd - even with no swap used on the system.<br />
+ Occasional High CPU for khugepaged.<br />
+ Frequent high system CPU for no apparent reason.<br />
<br />
Following several threads I've found and self diagnoses lead to issues in transparent hugepage support (aka memory defrag). <br />
<br />
Disabling this support stopped all occurrences of the problem and the systems have been stable for a week now. It is worth noting the Redhat 6 appears to have this functionality disabled by default - where Centos has it enabled.<br />
<br />
To disable:<br />
<br />
<br />
echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag<br />
echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag
↧
0005716: Random 'stalls' ocurring often up to several minutes - broken transparent hugepage support
↧