Month: September 2015

Kernel Threads – Part 1 : Kthread Introduction and its api workflow

How kernel threads are created

There are already ample of resources available on the web and books explaining the apis to create kernel threads. But just for the sake of mentioning it explicitly, following are the apis throught which kernel threads are spawned and scheduled.

kthread_create(threadfn, data, namefmt, arg...)
kthread_run(threadfn, data, namefmt, ...)

But most of us are oblivious to what happens behind the scenes when kthread is created.

One might think how a kernel thread is created and how the thread function gets executed by the kernel. Who manages the kernel threads and who decide to execute them. Well its simple. Process management subsystem(PMS) of the OS, it is !

So here I will try to explain how kthreads are created and how scheduler picks these threads and execute them. The explanation will also unravel some basics and will help resolving some common mistakes which, kernel beginners like me , make while using kernel threads.

kthread_run() internally calls kthread_create() and wake the newly created thread. Ultimately kthread_create() calls

13 #define kthread_create(threadfn, data, namefmt, arg...) \
14         kthread_create_on_node(threadfn, data, -1, namefmt, ##arg)

The function will create a new request for creating a thread and insert it into a linked list of request.


    267 struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
    268                                            void *data, int node,
    269                                            const char namefmt[],
    270                                            ...)
    271 {
..
    274         struct kthread_create_info *create = kmalloc(sizeof(*create),
    275                                                      GFP_KERNEL);
    276
    277         if (!create)
    278                 return ERR_PTR(-ENOMEM);
    279         create->threadfn = threadfn;
    280         create->data = data;
    281         create->node = node;
    282         create->done = &done;
    283
    284         spin_lock(&kthread_create_lock);
    285         list_add_tail(&create->list, &kthread_create_list);
    286         spin_unlock(&kthread_create_lock);

and wakes a kernel thread “kthreadadd_task

    288         wake_up_process(kthreadd_task);

Some background on when kthreadadd_task was created during system initialization, just after init process.

Its pid is 2.


[root@kashish ~]# ps aux | head -5
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1   2828  1116 ?        Ss   Sep06   0:06 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Sep06   0:00 [kthreadd]    <<----
root         3  0.0  0.0      0     0 ?        S    Sep06   0:00 [migration/0]
root         4  0.0  0.0      0     0 ?        S    Sep06   0:01 [ksoftirqd/0]

Check function rest_init() in init/main.c. It executes thread function kthreadd()

Now kthreadd() will dequeue the request from the linked list and call create_kthread()


    483 int kthreadd(void *unused)
    484 {
    485         struct task_struct *tsk = current;
    486
    487         /* Setup a clean context for our children to inherit. */
    488         set_task_comm(tsk, "kthreadd");
    489         ignore_signals(tsk);   <<---why ignoring signals of current(parent). see next blog
..
    495         for (;;) {
..
    501                 spin_lock(&kthread_create_lock);
    502                 while (!list_empty(&kthread_create_list)) {
    503                         struct kthread_create_info *create;
    504
    505                         create = list_entry(kthread_create_list.next,
    506                                             struct kthread_create_info, list);
    507                         list_del_init(&create->list);
    508                         spin_unlock(&kthread_create_lock);
    509
    510                         create_kthread(create);
    511
    512                         spin_lock(&kthread_create_lock);
    513                 }
    514                 spin_unlock(&kthread_create_lock);
    515         }

create_thread() calls kernel_thread() to create new kthread, similar to how kthreadadd_task was created.

    223 static void create_kthread(struct kthread_create_info *create)
    224 {
    225         int pid;
..
    230         /* We want our own signal handler (we take no signals by default). */
    231         pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
..

kthread() is another internal function which adds up more magic. Most work is done by this function to create a kernel process

   1699 pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
   1700 {
   1701         return do_fork(flags|CLONE_VM|CLONE_UNTRACED, (unsigned long)fn,
   1702                 (unsigned long)arg, NULL, NULL);
   1703 }

now do_fork() will call kthread() after new thread is created. kthread() will call thread function of newly created kthread.


    175 static int kthread(void *_create)
    176 {
    177         /* Copy data: it's on kthread's stack */
    178         struct kthread_create_info *create = _create;
    179         int (*threadfn)(void *data) = create->threadfn;
    180         void *data = create->data;
...   
    205         if (!test_bit(KTHREAD_SHOULD_STOP, &self.flags)) {
    206                 __kthread_parkme(&self);
    207                 ret = threadfn(data);
    208         }
    209         /* we can't just return, we must preserve "self" on stack */
    210         do_exit(ret);
    211 }

Note : At line 210 we call do_exit() after thread function returns. I will highlight its significance in next section.

Significance of kthread_stop()

I decided to keep a separate section for this api is because most of the times we miss what this api does and how it is linked with thread function execution.

In previous section, I explained what happens when thread function returns – PMS calls do_exit()

Quoting from the kernel –

threadfn() can either call do_exit() directly if it is a
standalone thread for which no one will call kthread_stop(), or
return when 'kthread_should_stop()' is true (which means
kthread_stop() has been called)
This should be, in general, kept in mind before using kthread_stop().

kthread_stop() can only be used when kthread is already running and flag KTHREAD_SHOULD_STOP is set. Otherwise, system will crash.

The above statement will make sense after understanding what kthread_stop() does


    460 int kthread_stop(struct task_struct *k)
    461 {
    462         struct kthread *kthread;
    463         int ret;
    464
    465         trace_sched_kthread_stop(k);
    466
    467         get_task_struct(k);
    468         kthread = to_live_kthread(k);
    469         if (kthread) {
    470                 set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
    471                 __kthread_unpark(k, kthread);
    472                 wake_up_process(k);
    473                 wait_for_completion(&kthread->exited);
    474         }
    475         ret = k->exit_code;
    476         put_task_struct(k);
    477
    478         trace_sched_kthread_stop_ret(ret);
    479         return ret;
    480 }

The function sets KTHREAD_SHOULD_STOP and wakes up the process(incase it is not active)
, sets the exit code.

KTHREAD_SHOULD_STOP flag is checked by function kthread_should_stop(). This function can be used to loop and let thread function check this flag at every iteration.
When kthread_stop() will be called, thread function will come to know that request to stop this thread has arrived and thread function will safely be stopped.

//thread func of k1
int k1_func(void *arg)
{
        set_current_state(TASK_INTERRUPTIBLE);
        while (!kthread_should_stop()) {
                schedule_timeout(msecs_to_jiffies(1000));
                if (business logic) {
                      ....
                }
                set_current_state(TASK_INTERRUPTIBLE);
        }
        set_current_state(TASK_RUNNING);
        return 0;
}

void __exit cleanup(void)
{
        printk("cleaning module \n");
        if (k1)
            kthread_stop(k1);
}

If thread function has already returned and do_exit() is called on it by kthread(), then call to kthread_stop() might result in system crash.
Therefore caller before calling this function must ensure that task_struct can’t go away.

Uff… Too much information. Have to take rest now. Will try to explain signal handling for kthreads in next section.

How modprobe works

What is modprobe and how it works

It is a program that intelligently load and remove modules from the system.
modprobe looks in the module directory “lib/modules/$(KERNEL_RELEASE)/” for the modules and load them according to the rules defined directory /etc/modprobe.d.
In earlier distros, modprobe.conf was used instead.

While loading a specific module via modprobe, modprobe will also load all the dependent modules of a given module.
The list of modules and its dependent modules is stored in modules.dep file.

One can check the list of loaded modules using “lsmod” command

What is depmod

depmod is a program that generates modules.dep and map files.
module.dep is a database of modprobe. It helps modprobe to know a list of dependent modules, while modprobe is loading a speciic module.
Incase we have written a some new modules, to add new modules in the database (modules.dep file), we need to run following command
$ depmod -a.
Generally depmod is executed internally when we install the modules in “lib/modules/$(KERNEL_RELEASE)/” directory by issuing “make modules_install”.

Example

In previous post, I explain kmod subsystem and significance of request_module(). The following outputs highlights how modules of previous posts are loaded and how one can exploit modprobe effectively


[root@kashish kmod_subsystem_usage]# make
make -C /lib/modules/2.6.33.3-85.fc13.i686.PAE/build M=/root/git_client/reps/kmod_subsystem_usage modules
make[1]: Entering directory `/usr/src/kernels/2.6.33.3-85.fc13.i686.PAE'
  CC [M]  /root/git_client/reps/kmod_subsystem_usage/send_signal.o
  CC [M]  /root/git_client/reps/kmod_subsystem_usage/signal_handling.o
  Building modules, stage 2.
  MODPOST 2 modules
  CC      /root/git_client/reps/kmod_subsystem_usage/send_signal.mod.o
  LD [M]  /root/git_client/reps/kmod_subsystem_usage/send_signal.ko
  CC      /root/git_client/reps/kmod_subsystem_usage/signal_handling.mod.o
  LD [M]  /root/git_client/reps/kmod_subsystem_usage/signal_handling.ko
make[1]: Leaving directory `/usr/src/kernels/2.6.33.3-85.fc13.i686.PAE'

The following command will install the modules to /lib/modules/$(KREL). It will also invoke depmod


[root@kashish kmod_subsystem_usage]# make modules_install
make -C /lib/modules/2.6.33.3-85.fc13.i686.PAE/build INSTALL_MOD_DIR=extra M=/root/git_client/reps/kmod_subsystem_usage modules_install
make[1]: Entering directory `/usr/src/kernels/2.6.33.3-85.fc13.i686.PAE'
INSTALL /root/git_client/reps/kmod_subsystem_usage/send_signal.ko
INSTALL /root/git_client/reps/kmod_subsystem_usage/signal_handling.ko
DEPMOD  2.6.33.3-85.fc13.i686.PAE                                        <=======
make[1]: Leaving directory `/usr/src/kernels/2.6.33.3-85.fc13.i686.PAE'                  

One can check whether modules are copied to following directory successfully or not


[root@kashish kmod_subsystem_usage]# ls /lib/modules/2.6.33.3-85.fc13.i686.PAE/extra/
send_signal.ko  signal_handling.ko

Now load the module using modprobe


[root@kashish kmod_subsystem_usage]# modprobe signal_handling

As signal_module.ko loads send_signal.ko as well using request_module().
Module send_signal is using/dependent on module signal_handling.


[root@kashish kmod_subsystem_usage]# lsmod | head -n 3
Module                  Size  Used by
send_signal             1561  0
signal_handling         1429  1 send_signal


[root@kashish kmod_subsystem_usage]# modinfo signal_handling.ko
filename:       signal_handling.ko
license:        GPL
author:         @Kashish_Bhatia
description:    dynamically loading module through kmod subsystem and receiving signal from another module
srcversion:     594684C2DBE8A44CCC1D9A6
depends:
vermagic:       2.6.33.3-85.fc13.i686.PAE SMP mod_unload 686
[root@kashish kmod_subsystem_usage]# modinfo send_signal.ko
filename:       send_signal.ko
license:        GPL
author:         @Kashish_Bhatia
description:    sending signal from 1 kthread to another and using default signal handler
srcversion:     049F7612A1256598DA9F4FD
depends:        signal_handling
vermagic:       2.6.33.3-85.fc13.i686.PAE SMP mod_unload 686

Hence signal_handling module cannot be removed prior send_signal


[root@kashish kmod_subsystem_usage]# modprobe -r signal_handling
FATAL: Module signal_handling is in use.

modprobe can remove dependent modules also (signal_handling)


[root@kashish kmod_subsystem_usage]# modprobe -r send_signal

Demystifying request_module()

Introduction

request_module() is a function used in kernel modules to load another kernel module on fly.

It is defined in linux/kmod.h.

#define request_module(mod...) __request_module(true, mod)

Apart from module name, we can also specify module command-line parameters
accepted by the module.

request_module() internally calls userspace program modprobe to
load a specified module.
It returns 0, if modprobe is successfully called. Return value 0 doesn’t guarantee
that module gets loaded successfully by modprobe. request_module() will only
invoke modprobe (and if it is able to invoke it successfully, it returns 0)
but doesn’t check return status of modprobe program.
Hence if due to some reason modprobe failed, still request_module() would have
returned 0 in the kernel module.

How request_module() works and its workflow

request_module() calls __request_module().

__request_module() will wait until modprobe returns. Function checks whether
global variable “modprobe_path” is set by kmod subsystem. This is to check for cases
where request_module() is called in early booting stage, when modprobe is
yet started.
“modprobe_path” is set via /proc/sys/

[root@kashish kmod_subsystem_usage]# cat /proc/sys/kernel/modprobe
/sbin/modprobe

Then __request_module() will copy  module name and its arguments in “module_name” variable.

va_start(args, fmt);
ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
va_end(args);

and finally calls

ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);

call_modprobe() prepares the command and call call_usermodehelper_exec()

 
static int call_modprobe(char *module_name, int wait)
{
        struct subprocess_info *info;
        static char *envp[] = {
                "HOME=/",
                "TERM=linux",
                "PATH=/sbin:/usr/sbin:/bin:/usr/bin",
                NULL
        };

        char **argv = kmalloc(sizeof(char *[5]), GFP_KERNEL);
        if (!argv)
                goto out;

        module_name = kstrdup(module_name, GFP_KERNEL);
        if (!module_name)
                goto free_argv;

        argv[0] = modprobe_path;
        argv[1] = "-q";
        argv[2] = "--";
        argv[3] = module_name;  /* check free_modprobe_argv() */
        argv[4] = NULL;

        info = call_usermodehelper_setup(modprobe_path, argv, envp, GFP_KERNEL,
                                         NULL, free_modprobe_argv, NULL);
        if (!info)
                goto free_module_name;

        return call_usermodehelper_exec(info, wait | UMH_KILLABLE);

call_usermodehelper_setup() is always called prior function call_usermodehelper_exec(), to initialize
a helper process structure which contains details of command to be issued in userspace.
call_usermodehelper_setup() is a exported function.

The user space program is invoked using workqueues. The subprocess_info{} contains a
work structure where work function __call_usermodehelper() is added.
And the work is enqueued in workqueue by call_usermodehelper_exec().
The workqueue is created when kmod subsystem(kmod.ko) is loaded.

void __init usermodehelper_init(void)
{
        khelper_wq = create_singlethread_workqueue("khelper");
        BUG_ON(!khelper_wq);
}
struct subprocess_info *call_usermodehelper_setup(char *path, char **argv,
                char **envp, gfp_t gfp_mask,
                int (*init)(struct subprocess_info *info, struct cred *new),
                void (*cleanup)(struct subprocess_info *info),
                void *data)
{
        struct subprocess_info *sub_info;
        sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask);
        if (!sub_info)
                goto out;

        INIT_WORK(&sub_info->work, __call_usermodehelper); <path = path;
        sub_info->argv = argv;
        sub_info->envp = envp;

        sub_info->cleanup = cleanup;
        sub_info->init = init;
        sub_info->data = data;
  out:
        return sub_info;
}
EXPORT_SYMBOL(call_usermodehelper_setup);

All the magic of calling a user program is done by call_usermodehelper_exec()

 
int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
{
...
        /*
         * Set the completion pointer only if there is a waiter.
         * This makes it possible to use umh_complete to free
         * the data structure in case of UMH_NO_WAIT.
         */
        sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done;
        sub_info->wait = wait;

        queue_work(khelper_wq, &sub_info->work); <retval;
out:
        call_usermodehelper_freeinfo(sub_info);
unlock:
        helper_unlock();
        return retval;
}

__call_usermodehelper() is called by scheduler and it will spwan a kthread based on whether
__request_module() was called with wait or without wait.


/* This is run by khelper thread  */
static void __call_usermodehelper(struct work_struct *work)
{
        struct subprocess_info *sub_info =
                container_of(work, struct subprocess_info, work);
        pid_t pid;

        if (sub_info->wait & UMH_WAIT_PROC)
                pid = kernel_thread(wait_for_helper, sub_info,
                                    CLONE_FS | CLONE_FILES | SIGCHLD);
        else
                pid = kernel_thread(____call_usermodehelper, sub_info,
                                    SIGCHLD);

        if (pid < 0) { sub_info->retval = pid;
                umh_complete(sub_info);
        }
}

kernel_thread() is an internal function of kernel thread subsystem which ultimately
creates kernel thread. When we call kthread_create() from kernel module, this is the
function which is called internally.

In case of __request_module() is called with wait=true, kthread is created to execute
wait_for_helper().

wait_for_helper(), a parent kthread, will also create a new child
kthread to execute ____call_usermodehelper() and waits until child exits.

In case of __request_module is called with wait=false, kthread is created directly to
invoke ____call_usermodehelper().

And finally function ____call_usermodehelper() calls do_execve() to execute usermode program.

Pre-requisites of using request_module()

In order to make request_module() work, Makefile must do some magic to copy modules object files in “/lib/modules/$(KREL)/extras/” or “/lib/modules/$(KREL)/kernel/”

“/lib/modules/$(KREL)/kernel/” generally contains kernel modules which are being installed during kernel compilation.
The kernel modules are copied to “/lib/modules/$(KREL)/kernel/” when we execute
“make modules_install” during kernel compilation.

Similarly external modules are copied to “/lib/modules/$(KREL)/extras/” by default when we execute “modules_install” target of module’s Makefile.
To copy .ko’s to some custom path, we can use “INSTALL_MOD_DIR” in Makefiles

The .ko’s need to present in above directories because modprobe try to search the modules inside “/lib/modules/$(KREL)/extras/” or “/lib/modules/$(KREL)/kernel/”

KDIR – Path to kernel source tree
KREL/KERNELRELEASE – Version of kernel, i.e. ‘uname -r’ output
INSTALL_MOD_DIR – an alternate path to “/lib/modules/$(KERNEL_RELEASE)/extras/”
INSTALL_MOD_PATH – an alternate path to where the sources are residing. If INSTALL_MOD_PATH=/my_kernel
then kernel modules will be copied to “/my_kernel/lib/modules/$(KERNEL_RELEASE)/kernel/”

If we don’t write Makefile to copy .ko’s at correct location, call to request_module()
might fail.

EXPORT_SYMBOL() and its history

EXPORT_SYMBOL() is a function macro to export symbol across loadable modules.

In releases prior to 2.4, inter_module_* were used to share variables across modules.
Now all inter_module_* apis are deprecated by linux kernel

void inter_module_register(const char *string, struct module *module,
const void *data);
void inter_module_unregister(const char *string);
void inter_module_unregister(const char *string);
const void *inter_module_get_request(const char *string, const char *module);
void inter_module_put(const char *string);

Example

signal_handling.ko will spawn a thread and load another module.
The other module(send_signal.ko) will send signal to the module (signal_handling.ko)
which loaded it.



signal_handling.c


#include <linux/init.h>
#include <linux/module.h>       /* Specifically, a module */
#include <linux/kernel.h>       /* We're doing kernel work */
#include <linux/kthread.h>
#include <linux/delay.h>        /* for msleep() */
#include <asm/siginfo.h>        /* for signals */
#include <linux/signal.h>
#include <linux/kmod.h>         /* for request_module() */

MODULE_DESCRIPTION("dynamically loading module through kmod subsystem and "
                   "receiving signal from another module");

struct task_struct *k1;
struct task_struct *saved_task;

//Global vars
int val = 0;
int flag = 0;                   // a flag is set when val reached 10

//Thread func for k1
int inc_func(void *arg)
{
        int ret;
        saved_task = current;
        printk("%s : pid of current thread %u\n", current->comm, current->pid);
        kthread_pid = current->pid;

        msleep(1000);
        //load module "send_signal.ko"
        ret = request_module("send_signal");
        if (ret != 0)
                printk("%s : unable to load module, Status=%d\n", current->comm, ret);

        allow_signal(SIGUSR1);

        set_current_state(TASK_INTERRUPTIBLE);

        while (!kthread_should_stop()) {
                int ret;
                ret = schedule_timeout(msecs_to_jiffies(1000));

                if (signal_pending(current)) {
                        printk("%s : signal is pending****\n", current->comm);
                        flush_signals(current);
                }

                if (val <=10 && !flag) {
                        val++;
                        printk("%s : current value is %d\n", current->comm, val);
                        if (val == 10) {
                                printk("%s : setting flag\n", current->comm);
                                flag = 1;
                        }
                }
                set_current_state(TASK_INTERRUPTIBLE);
        }
        set_current_state(TASK_RUNNING);
        printk("%s : out of while loop, exiting \"%s\"\n", current->comm, __FUNCTION__);
        return 0;
}

int __init startup(void)
{
        k1 = kthread_run(inc_func, NULL, "K1");
        return 0;
}

void __exit cleanup(void)
{
        printk("cleaning module \n");
        kthread_stop(k1);
        printk("stopped kernel thread 1 \n");
}

module_init(startup);
module_exit(cleanup);
MODULE_LICENSE("GPL");



send_signal.c


#include <linux/init.h>
#include <linux/module.h>       /* Specifically, a module */
#include <linux/kernel.h>       /* We're doing kernel work */
#include <linux/kthread.h>
#include <linux/delay.h>        /* for msleep() */
#include <asm/siginfo.h>        /* for signals */
#include <linux/signal.h>
#include <linux/kmod.h>

MODULE_DESCRIPTION("sending signal from 1 kthread to another"
                   " and using default signal handler");

//Follow me on twitter
MODULE_AUTHOR("@Kashish_Bhatia");

struct task_struct *k2;
struct task_struct *saved_task;

struct siginfo data_for_handler;

//signal descriptor
struct sigaction sig;

//Global vars
int val = 0;

int signal_receive_cnt = 0;     //max # of times signal should be serviced

extern pid_t kthread_pid;
struct task_struct *target_kthread;
struct pid *pid_struct;

//Thread func for k2
int send_signal_to_task(void *arg)
{
        int ret;
        while (!kthread_should_stop()) {
                msleep_interruptible(2000);
                if (kthread_pid && signal_receive_cnt != 2) {
                        printk("%s : pid of kthread to which signal is sent = %u\n", current->comm, kthread_pid);
                        //TODO: siginfo is passed to default sighandler.
                        //Can't check the info, as i dont have access to default
                        //signal handler.
                        data_for_handler.si_code = SI_KERNEL;
                        data_for_handler.si_int = val;
                        printk("%s : sending signal to process with pid %u...\n", current->comm, kthread_pid);

                        //TODO: convert pid to task_struct
                        pid_struct = find_get_pid(kthread_pid);
                        target_kthread = pid_task(pid_struct, PIDTYPE_PID);

                        //send signal
                        ret = send_sig_info(SIGUSR1, &data_for_handler, target_kthread);
                        if (ret) {
                                printk("%s : Error sending signal to thread:%s with pid:%d, status %d\n",
                                        current->comm, saved_task->comm, saved_task->pid, ret);
                                //return ret;
                        } else {
                                signal_receive_cnt++;
                        }
                }
        }
        printk("%s : exiting \"%s\"\n", current->comm, __FUNCTION__);
        return 0;
}

int __init startup(void)
{
        printk(KERN_ALERT "send_signal module is starting\n");
        k2= kthread_run(send_signal_to_task, NULL, "K2");
        return 0;
}

void __exit cleanup(void)
{
        printk("cleaning module \n");
        kthread_stop(k2);
        printk("stopped kernel thread 2 \n");
}

module_init(startup);
module_exit(cleanup);
MODULE_LICENSE("GPL");



Makefile


obj-m += send_signal.o
obj-m += signal_handling.o

INSTALLED_PATH=/lib/modules/$(shell uname -r)/extra/
clean-files := $(INSTALLED_PATH)/*.ko

all:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
modules_install:
        make -C /lib/modules/$(shell uname -r)/build INSTALL_MOD_DIR=extra M=$(PWD) modules_install
clean:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean