hung_task_timeout_secs( topic related to linux kernel hangup )

ABOUT hung_task_timeout_secs

 
if a task(process) is hung then hung_task_timeout_secs value
 decides if the hung task needs no reboot or reboot after 
n seconds

LINUX KERNEL RELATED PARAMETER
$cat /proc/sys/kernel/hung_task_timeout_secs
120
$

$echo 0 | sudo tee –append /proc/sys/kernel/ hung_task_timeout_secs
0
$sudo cat /proc/sys/kernel/hung_task_timeout_secs
0
$

When a task in D state did not get scheduled for more than
this value report a warning.This file shows up if CONFIG_
DETECT_HUNG_TASK is enabled.

0: means infinite timeout - no checking done. Possible values
to set are in range {0..LONG_MAX/HZ}.

PARAMETER RELATED

TEST-MAIL1 ~ #dmesg

“echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
rm D ffff88107f472c40 0 16705 22512 0x00000000
ffff881014693810 0000000000000086 ffff881000000000 ffff88102013b040
0000000000012c40 ffff880471855fd8 0000000000012c40 ffff880471854010
ffff880471855fd8 0000000000012c40 ffff881017ff8e40 0000000100000000
Call Trace:
[<ffffffff8148d45d>] ? schedule_timeout+0x1ed/0x2d0
[<ffffffffa0b7d1ea>] ? dlmlock+0x8a/0xda0 [ocfs2_dlm]
[<ffffffff8148ce5c>] ? wait_for_common+0x12c/0x1a0
[<ffffffff81052230>] ? try_to_wake_up+0x280/0x280
[<ffffffffa0a3b9c0>] ? __ocfs2_cluster_lock+0x1f0/0x780 [ocfs2]
[<ffffffff8148ce80>] ? wait_for_common+0x150/0x1a0
[<ffffffffa0a9c6bc>] ? ocfs2_buffer_cached+0x8c/0x180 [ocfs2]
[<ffffffffa0a40bc6>] ? ocfs2_inode_lock_full_nested+0x126/0x540 [ocfs2]
[<ffffffffa0a5922e>] ? ocfs2_lookup_lock_orphan_dir+0x6e/0x1b0 [ocfs2]
[<ffffffffa0a5922e>] ? ocfs2_lookup_lock_orphan_dir+0x6e/0x1b0 [ocfs2]
[<ffffffffa0a5ba1a>] ? ocfs2_prepare_orphan_dir+0x4a/0x290 [ocfs2]
[<ffffffffa0a5e621>] ? ocfs2_unlink+0x6e1/0xbb0 [ocfs2]
[<ffffffff811bcfea>] ? may_link+0xda/0x170
[<ffffffff81141c8e>] ? vfs_unlink+0x9e/0x100
[<ffffffff81145881>] ? do_unlinkat+0x1a1/0x1d0
[<ffffffff81147b00>] ? vfs_readdir+0xa0/0xe0
[<ffffffff8116fedb>] ? fsnotify_find_inode_mark+0x2b/0x40
[<ffffffff81170c24>] ? dnotify_flush+0x54/0x110
[<ffffffff81133eec>] ? filp_close+0x5c/0x90
[<ffffffff81496912>] ? system_call_fastpath+0x16/0x1b

CLASSROOM


While  waiting for  read()  or write()  to/from  a file  
descriptor return, the process  will be put in aspecial  
kind of sleep, known as "D"  or "Disk Sleep". This  is 
special, because  the process can not  be killed  or 
interrupted  while in  such a  state.  A process waiting 
for  a return from  ioctl()
would also  be put to  sleep in this manner.

RELATED SOURCE CODE EXPOSURE

/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
if (sysctl_hung_task_warnings) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings–;
pr_err(“INFO: task %s:%d blocked for more than %ld seconds.\n”,
t->comm, t->pid, timeout);
pr_err(” %s %s %.*s%s\n”,
print_tainted(), init_utsname()->release,
(int)strcspn(init_utsname()->version, ” “),
init_utsname()->version,
LINUX_PACKAGE_ID);
pr_err(“\”echo 0 > /proc/sys/kernel/hung_task_timeout_secs\””
” disables this message.\n”);
sched_show_task(t);
hung_task_show_lock = true;
}
[/c]

[c light=”true”]
/*
* Process updating of timeout sysctl
*/
int proc_dohung_task_timeout_secs(struct ctl_table * styletable, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos)
{
int ret;

ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);

if (ret || !write)
goto out;

wake_up_process(watchdog_task);

out:
return ret;
}
[/c]
SOURCE CODE TAKEN FROM OFFICIAL LINUX KERNEL

RELATED FROM RESEARCH PAPER

Kernel  data collection  tools. Several  monitoring  facilities are
provided by  the Linux  kernel, which have  been exploited  in this
work. In  particular, we use  KProbes which inserts  breakpoints in
arbitrary binary code locations in charge of triggering user-defined
handler  functions. Handlers  can  be used  to collect  information
about internal kernel  variables; subsequently, kernel execution is
restored. Kdump is a tool  for failure data collection based on the
execution of  a secondary kernel,  namely capture kernel,  which is
preliminarily  loaded  into  a  reserved memory  region.  When  the
primary kernel fails, the capture  kernel is executed; then, it can
collect failure  data by reading  the main memory  state.  Built-in
hang  detection mechanisms. Several  hang detection  mechanisms are
available in the Linux OS,  which can be enabled by recompiling the
kernel.  In particular, the  following facilities  can be  used for
hang  detection: Soft  lockup detection,  i.e., the  kernel detects
whether a  "canary" task  is not scheduled  within a  timeout; Hard
lockup detection, i.e.,  if any CPU in the  system does not handles
local    timer    interrupt   for    longer    than   a    timeout;
Sleep-inside-spinlock   checking,  i.e.,  assertions   that  verify
whether there are spinlocks  that have been acquired before calling
a  sleeping function  (i.e., a  function during  which  the current
thread may block and be preempted by the scheduler); Checks on lock
API  usage, that  is: missing  lock initialization,  release  of an
already freed lock, release of a  lock by a thread or CPU different
from the lock holder, lock data structure corruption.

 

source : http://tinyurl.com/7pt5j9a

Assessment and Improvement of Hang Detection in the Linux Operating System
2009 28th IEEE International Symposium on Reliable Distributed System

LINKS
https://access.redhat.com/solutions/60572
https://www.linuxquestions.org/questions/linux-software-2/kernel-panic-echo-0-proc-sys-kernel-4175629199/
https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
https://stackoverflow.com/questions/84882/sudo-echo-something-etc-privilegedfile-doesnt-work-is-there-an-alterna
https://www.tldp.org/LDP/tlk/kernel/processes.html
https://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-than-n-seconds/
http://stackoverflow.com/questions/1475683/linux-process-states

hung_task_timeout_secs( topic related to linux kernel hangup )

Leave a comment

Cancel reply