Category: Linux

Kernel Threads – Part 1 : Kthread Introduction and its api workflow

How kernel threads are created

There are already ample of resources available on the web and books explaining the apis to create kernel threads. But just for the sake of mentioning it explicitly, following are the apis throught which kernel threads are spawned and scheduled.

kthread_create(threadfn, data, namefmt, arg...)
kthread_run(threadfn, data, namefmt, ...)

But most of us are oblivious to what happens behind the scenes when kthread is created.

One might think how a kernel thread is created and how the thread function gets executed by the kernel. Who manages the kernel threads and who decide to execute them. Well its simple. Process management subsystem(PMS) of the OS, it is !

So here I will try to explain how kthreads are created and how scheduler picks these threads and execute them. The explanation will also unravel some basics and will help resolving some common mistakes which, kernel beginners like me , make while using kernel threads.

kthread_run() internally calls kthread_create() and wake the newly created thread. Ultimately kthread_create() calls

13 #define kthread_create(threadfn, data, namefmt, arg...) \
14         kthread_create_on_node(threadfn, data, -1, namefmt, ##arg)

The function will create a new request for creating a thread and insert it into a linked list of request.

    267 struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
    268                                            void *data, int node,
    269                                            const char namefmt[],
    270                                            ...)
    271 {
    274         struct kthread_create_info *create = kmalloc(sizeof(*create),
    275                                                      GFP_KERNEL);
    277         if (!create)
    278                 return ERR_PTR(-ENOMEM);
    279         create->threadfn = threadfn;
    280         create->data = data;
    281         create->node = node;
    282         create->done = &done;
    284         spin_lock(&kthread_create_lock);
    285         list_add_tail(&create->list, &kthread_create_list);
    286         spin_unlock(&kthread_create_lock);

and wakes a kernel thread “kthreadadd_task

    288         wake_up_process(kthreadd_task);

Some background on when kthreadadd_task was created during system initialization, just after init process.

Its pid is 2.

[root@kashish ~]# ps aux | head -5
root         1  0.0  0.1   2828  1116 ?        Ss   Sep06   0:06 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Sep06   0:00 [kthreadd]    <<----
root         3  0.0  0.0      0     0 ?        S    Sep06   0:00 [migration/0]
root         4  0.0  0.0      0     0 ?        S    Sep06   0:01 [ksoftirqd/0]

Check function rest_init() in init/main.c. It executes thread function kthreadd()

Now kthreadd() will dequeue the request from the linked list and call create_kthread()

    483 int kthreadd(void *unused)
    484 {
    485         struct task_struct *tsk = current;
    487         /* Setup a clean context for our children to inherit. */
    488         set_task_comm(tsk, "kthreadd");
    489         ignore_signals(tsk);   <<---why ignoring signals of current(parent). see next blog
    495         for (;;) {
    501                 spin_lock(&kthread_create_lock);
    502                 while (!list_empty(&kthread_create_list)) {
    503                         struct kthread_create_info *create;
    505                         create = list_entry(,
    506                                             struct kthread_create_info, list);
    507                         list_del_init(&create->list);
    508                         spin_unlock(&kthread_create_lock);
    510                         create_kthread(create);
    512                         spin_lock(&kthread_create_lock);
    513                 }
    514                 spin_unlock(&kthread_create_lock);
    515         }

create_thread() calls kernel_thread() to create new kthread, similar to how kthreadadd_task was created.

    223 static void create_kthread(struct kthread_create_info *create)
    224 {
    225         int pid;
    230         /* We want our own signal handler (we take no signals by default). */
    231         pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);

kthread() is another internal function which adds up more magic. Most work is done by this function to create a kernel process

   1699 pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
   1700 {
   1701         return do_fork(flags|CLONE_VM|CLONE_UNTRACED, (unsigned long)fn,
   1702                 (unsigned long)arg, NULL, NULL);
   1703 }

now do_fork() will call kthread() after new thread is created. kthread() will call thread function of newly created kthread.

    175 static int kthread(void *_create)
    176 {
    177         /* Copy data: it's on kthread's stack */
    178         struct kthread_create_info *create = _create;
    179         int (*threadfn)(void *data) = create->threadfn;
    180         void *data = create->data;
    205         if (!test_bit(KTHREAD_SHOULD_STOP, &self.flags)) {
    206                 __kthread_parkme(&self);
    207                 ret = threadfn(data);
    208         }
    209         /* we can't just return, we must preserve "self" on stack */
    210         do_exit(ret);
    211 }

Note : At line 210 we call do_exit() after thread function returns. I will highlight its significance in next section.

Significance of kthread_stop()

I decided to keep a separate section for this api is because most of the times we miss what this api does and how it is linked with thread function execution.

In previous section, I explained what happens when thread function returns – PMS calls do_exit()

Quoting from the kernel –

threadfn() can either call do_exit() directly if it is a
standalone thread for which no one will call kthread_stop(), or
return when 'kthread_should_stop()' is true (which means
kthread_stop() has been called)
This should be, in general, kept in mind before using kthread_stop().

kthread_stop() can only be used when kthread is already running and flag KTHREAD_SHOULD_STOP is set. Otherwise, system will crash.

The above statement will make sense after understanding what kthread_stop() does

    460 int kthread_stop(struct task_struct *k)
    461 {
    462         struct kthread *kthread;
    463         int ret;
    465         trace_sched_kthread_stop(k);
    467         get_task_struct(k);
    468         kthread = to_live_kthread(k);
    469         if (kthread) {
    470                 set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
    471                 __kthread_unpark(k, kthread);
    472                 wake_up_process(k);
    473                 wait_for_completion(&kthread->exited);
    474         }
    475         ret = k->exit_code;
    476         put_task_struct(k);
    478         trace_sched_kthread_stop_ret(ret);
    479         return ret;
    480 }

The function sets KTHREAD_SHOULD_STOP and wakes up the process(incase it is not active)
, sets the exit code.

KTHREAD_SHOULD_STOP flag is checked by function kthread_should_stop(). This function can be used to loop and let thread function check this flag at every iteration.
When kthread_stop() will be called, thread function will come to know that request to stop this thread has arrived and thread function will safely be stopped.

//thread func of k1
int k1_func(void *arg)
        while (!kthread_should_stop()) {
                if (business logic) {
        return 0;

void __exit cleanup(void)
        printk("cleaning module \n");
        if (k1)

If thread function has already returned and do_exit() is called on it by kthread(), then call to kthread_stop() might result in system crash.
Therefore caller before calling this function must ensure that task_struct can’t go away.

Uff… Too much information. Have to take rest now. Will try to explain signal handling for kthreads in next section.


Demystifying Linux Kernel Timers

Some basics

jiffies – repesents time taken between two successive clock interrupt.
Incremented at every timer interrupt.
It is a global variable in kernel, declared as volatile so to avoid reading stale
value from memory.
Defined in linux/jiffies.h

HZ – represents clock interrupt frequency.
Its a macro set during compile time of kernel, based on different architectures.
Defined in linux/param.h

 Interval after which jiffies increments = 1000/HZ

If HZ=100, i.e. the frequency of timer interrupt is 100. In other words,
there will be 100 interrupts in 1 sec. That means at every 10ms, “jiffies”
is incremented. So “jiffies” value will equal to 100 after 1 sec.

System uptime = jiffies/HZ seconds

To convert seconds to jiffies, jiffies = seconds * HZ.

Using timers

Linux provides the timer subsystem to delay the execution of work/task and
execute them asynchronously. There can be cases that the timer fires when
the process which scheduled timer has exited already.
Timers are some underlying subsystem of kernel, which used by apis like schedule_timeout().

Timer data structure

struct timer_list {

struct list_head entry;                          /* entry in linked list of timers */
unsigned long expires;                         /* expiration value, in jiffies */
void (*function)(unsigned long);       /* the timer handler function */
unsigned long data;                              /* lone argument to the handler */
struct tvec_t_base_s *base;               /* internal timer field, do not touch */

What not-to-do while using timers and some of its weird properties

1. don’t pass auto variables as timer arguments
2. Timer function is called in interrupt context. Hence we cannot use any api which sleeps,
like kmalloc(), wait_event_*, semaphores, etc.
3. Timers have nasty property of binding themselves to the CPU on which they are created
4. There is no direct api to make periodic timers. mod_timer() has to be called time to time
to make them periodic.
5. Can’t use schedule_timeout(), msleep()
6. There is no relevance of “current” macro in timer function, as timer function is
executing in timer(interrupt) context

Useful Timer APIs

1. init_timer(timer)
add_timer(timer) / add_timer_on(timer, CPU)
They are used together generally.

2. setup_timer(timer, callback_fn, cb_data)
It internally calls init_timer()

3. del_timer(timer)
Destroy timer and not wait for timer callback to complete, if it is already fired.
It is safe to call del_timer() inside timer callback function.
This api works for both active or inactive(which is initialized but not running) timer.
If the timer is active, it will return 1, else return 0.

4. del_timer_sync(timer)
Destroy timer and wait for (spins until) timer callback to complete, if it is already fired.
This api cannot be called within the timer callback function, as it will lead a deadlock.

del_singleshot_timer_sync() is same as del_timer_sync().

When del_timer() returns, it guarantees only that the timer is no longer active (that is,
that it will not be executed in the future). On a multiprocessing machine, however, the
timer handler might already be executing on another processor.To deactivate the timer
and wait until a potentially executing handler for the timer exits, use del_timer_sync().
Unlike del_timer(), del_timer_sync() cannot be used from interrupt context.

5. try_to_del_timer_sync(timer)
It can be used before del_timer() to check if timer can be destroyed.
It will not destroy the timer if timer has hold any resource through spinlock, etc.

6. mod_timer(timer, expires)
It modifies the “expires” field of the timer to schedule it to fire again.
It is safe to modify the “expires” field of the deleted timer. In that
case, it will reactivate the timer

7. mod_timer_pending(timer, expires)
It allow modifying the timer’s “expires” field for only active timers. Check pending arg of the function.

8. mod_timer_pinned(timer, expires)
It should be used along with add_timer_on(), to pin timer on a particular CPU

9. set_timer_slack(timer, slack_hz)
It loosens the “expires” field of the timer. By using this API, the timer subsystem will
schedule the actual timer somewhere between the time mod_timer asks for, and that time plus the slack.

Other Useful links

1. Do not use in_atomic()

2. Something on in_interrupt()

3. About try_to_del_timer_sync()

4. About mod_timer_pinned()


 * Key Takeaways:
 * 1. How to use init_timer() and add_timer()
 * 2. How/where to use add_timer_on() and mod_timer_pinned()
 * 3. How to get CPU id (smp_processor_id())

#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/timer.h>        /* for timers apis */
#include <linux/delay.h>        /* for msleep() */
#include <linux/hardirq.h>      /* for in_interrupt(), in_atomic() , moved to preempt_mask.h, 3.0 onwards */

MODULE_DESCRIPTION("Usage of init_timer(), add_timer(), add_timer_on() and mod_timer_pinned()");

//Twitter Handle

struct timer_list timer;
struct timer_list timer2;
unsigned long *data;
spinlock_t data_lock = SPIN_LOCK_UNLOCKED;

void timer_func(unsigned long data)
        unsigned long *d = (unsigned long *)data;
        printk("%s() is running on CPU-%u\n", __FUNCTION__, smp_processor_id());
        printk("%s : data = %lu...\n",__FUNCTION__, *d);
        if (in_interrupt())
                printk("%s : In interrupt context...\n", __FUNCTION__);

        if (in_atomic())
                printk("%s : In atomic context...\n", __FUNCTION__);

        if ((*d) != 0) {
                //decrementing "data" faster
                mod_timer(&timer, jiffies + msecs_to_jiffies(500));

void timer2_func(unsigned long data)
        unsigned long *d = (unsigned long *)data;
        printk("%s() is running on CPU-%u\n", __FUNCTION__, smp_processor_id());
        printk("%s : data = %lu***\n",__FUNCTION__, *d);

        if (in_interrupt())
                printk("%s : In interrupt context***\n", __FUNCTION__);

        if (in_atomic())
                printk("%s : In atomic context***\n", __FUNCTION__);

        if ((*d) != 0) {
                //scheduling it again and pinning to same CPU
                mod_timer_pinned(&timer2, jiffies + msecs_to_jiffies(1000));

int __init start_module(void)
        data = kmalloc(sizeof(unsigned long), GFP_KERNEL);
        *data = 100;

        //create timer 1
        setup_timer(&timer, timer_func, (unsigned long)data);
        //set expiry time
        mod_timer(&timer, jiffies + msecs_to_jiffies(1000));

        //create timer 2
        timer2.function = timer2_func;
        timer2.expires = jiffies + msecs_to_jiffies(1000); = (unsigned long)data;              //type-casting pointer to unsigned long
        //now schedule timer on specified CPU
        add_timer_on(&timer2, 1); //on CPU 1

        return 0;

void __exit stop_module(void)
        int i = 0;
        printk("%s() is running on CPU-%u\n", __FUNCTION__, smp_processor_id());
        while (timer_pending(&timer)) {
                printk("timer 1 is pending\n");
                if (i == 5)
        if (i == 0) {
                printk("deleting timer 1\n");

        i = 0;
        while (timer_pending(&timer2)) {
                printk("timer 2 is pending\n");
                if (i == 5)
        if (i == 0) {
                printk("deleting timer 2\n");

        printk("exiting from module\n");



Fixing yum install issues

Sometimes after installation of a new linux machine, we come across following failures while trying to install packages.

yum utility doesn’t  work due to repositories issues.

[root@kashish ~]# yum install lsscsi
Loaded plugins: presto, refresh-packagekit
Error: Cannot retrieve repository metadata (repomd.xml) for repository: fedora.


To resolve above errors nicely and cleanly, do as follows

  • Go to /etc/yum.repos.d

[root@kashish ~]# cd /etc/yum.repos.d

  • In all the *.repo files, change https to http in the mirror_list URLs

Now run following commands to rebuild rpm database

[root@kashish ~]# yum clean all
[root@kashish ~]# rm -f /var/lib/rpm/__db.*
[root@kashish ~]# rpm –rebuilddb

There is very useful YUM cheat sheet compiled by redhat. To learn more about yum
command, checkout this.