Tag: linux kernel

How modprobe works

What is modprobe and how it works

It is a program that intelligently load and remove modules from the system.
modprobe looks in the module directory “lib/modules/$(KERNEL_RELEASE)/” for the modules and load them according to the rules defined directory /etc/modprobe.d.
In earlier distros, modprobe.conf was used instead.

While loading a specific module via modprobe, modprobe will also load all the dependent modules of a given module.
The list of modules and its dependent modules is stored in modules.dep file.

One can check the list of loaded modules using “lsmod” command

What is depmod

depmod is a program that generates modules.dep and map files.
module.dep is a database of modprobe. It helps modprobe to know a list of dependent modules, while modprobe is loading a speciic module.
Incase we have written a some new modules, to add new modules in the database (modules.dep file), we need to run following command
$ depmod -a.
Generally depmod is executed internally when we install the modules in “lib/modules/$(KERNEL_RELEASE)/” directory by issuing “make modules_install”.


In previous post, I explain kmod subsystem and significance of request_module(). The following outputs highlights how modules of previous posts are loaded and how one can exploit modprobe effectively

[root@kashish kmod_subsystem_usage]# make
make -C /lib/modules/ M=/root/git_client/reps/kmod_subsystem_usage modules
make[1]: Entering directory `/usr/src/kernels/'
  CC [M]  /root/git_client/reps/kmod_subsystem_usage/send_signal.o
  CC [M]  /root/git_client/reps/kmod_subsystem_usage/signal_handling.o
  Building modules, stage 2.
  MODPOST 2 modules
  CC      /root/git_client/reps/kmod_subsystem_usage/send_signal.mod.o
  LD [M]  /root/git_client/reps/kmod_subsystem_usage/send_signal.ko
  CC      /root/git_client/reps/kmod_subsystem_usage/signal_handling.mod.o
  LD [M]  /root/git_client/reps/kmod_subsystem_usage/signal_handling.ko
make[1]: Leaving directory `/usr/src/kernels/'

The following command will install the modules to /lib/modules/$(KREL). It will also invoke depmod

[root@kashish kmod_subsystem_usage]# make modules_install
make -C /lib/modules/ INSTALL_MOD_DIR=extra M=/root/git_client/reps/kmod_subsystem_usage modules_install
make[1]: Entering directory `/usr/src/kernels/'
INSTALL /root/git_client/reps/kmod_subsystem_usage/send_signal.ko
INSTALL /root/git_client/reps/kmod_subsystem_usage/signal_handling.ko
DEPMOD                                        <=======
make[1]: Leaving directory `/usr/src/kernels/'                  

One can check whether modules are copied to following directory successfully or not

[root@kashish kmod_subsystem_usage]# ls /lib/modules/
send_signal.ko  signal_handling.ko

Now load the module using modprobe

[root@kashish kmod_subsystem_usage]# modprobe signal_handling

As signal_module.ko loads send_signal.ko as well using request_module().
Module send_signal is using/dependent on module signal_handling.

[root@kashish kmod_subsystem_usage]# lsmod | head -n 3
Module                  Size  Used by
send_signal             1561  0
signal_handling         1429  1 send_signal

[root@kashish kmod_subsystem_usage]# modinfo signal_handling.ko
filename:       signal_handling.ko
license:        GPL
author:         @Kashish_Bhatia
description:    dynamically loading module through kmod subsystem and receiving signal from another module
srcversion:     594684C2DBE8A44CCC1D9A6
vermagic: SMP mod_unload 686
[root@kashish kmod_subsystem_usage]# modinfo send_signal.ko
filename:       send_signal.ko
license:        GPL
author:         @Kashish_Bhatia
description:    sending signal from 1 kthread to another and using default signal handler
srcversion:     049F7612A1256598DA9F4FD
depends:        signal_handling
vermagic: SMP mod_unload 686

Hence signal_handling module cannot be removed prior send_signal

[root@kashish kmod_subsystem_usage]# modprobe -r signal_handling
FATAL: Module signal_handling is in use.

modprobe can remove dependent modules also (signal_handling)

[root@kashish kmod_subsystem_usage]# modprobe -r send_signal

Demystifying request_module()


request_module() is a function used in kernel modules to load another kernel module on fly.

It is defined in linux/kmod.h.

#define request_module(mod...) __request_module(true, mod)

Apart from module name, we can also specify module command-line parameters
accepted by the module.

request_module() internally calls userspace program modprobe to
load a specified module.
It returns 0, if modprobe is successfully called. Return value 0 doesn’t guarantee
that module gets loaded successfully by modprobe. request_module() will only
invoke modprobe (and if it is able to invoke it successfully, it returns 0)
but doesn’t check return status of modprobe program.
Hence if due to some reason modprobe failed, still request_module() would have
returned 0 in the kernel module.

How request_module() works and its workflow

request_module() calls __request_module().

__request_module() will wait until modprobe returns. Function checks whether
global variable “modprobe_path” is set by kmod subsystem. This is to check for cases
where request_module() is called in early booting stage, when modprobe is
yet started.
“modprobe_path” is set via /proc/sys/

[root@kashish kmod_subsystem_usage]# cat /proc/sys/kernel/modprobe

Then __request_module() will copy  module name and its arguments in “module_name” variable.

va_start(args, fmt);
ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);

and finally calls

ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);

call_modprobe() prepares the command and call call_usermodehelper_exec()

static int call_modprobe(char *module_name, int wait)
        struct subprocess_info *info;
        static char *envp[] = {

        char **argv = kmalloc(sizeof(char *[5]), GFP_KERNEL);
        if (!argv)
                goto out;

        module_name = kstrdup(module_name, GFP_KERNEL);
        if (!module_name)
                goto free_argv;

        argv[0] = modprobe_path;
        argv[1] = "-q";
        argv[2] = "--";
        argv[3] = module_name;  /* check free_modprobe_argv() */
        argv[4] = NULL;

        info = call_usermodehelper_setup(modprobe_path, argv, envp, GFP_KERNEL,
                                         NULL, free_modprobe_argv, NULL);
        if (!info)
                goto free_module_name;

        return call_usermodehelper_exec(info, wait | UMH_KILLABLE);

call_usermodehelper_setup() is always called prior function call_usermodehelper_exec(), to initialize
a helper process structure which contains details of command to be issued in userspace.
call_usermodehelper_setup() is a exported function.

The user space program is invoked using workqueues. The subprocess_info{} contains a
work structure where work function __call_usermodehelper() is added.
And the work is enqueued in workqueue by call_usermodehelper_exec().
The workqueue is created when kmod subsystem(kmod.ko) is loaded.

void __init usermodehelper_init(void)
        khelper_wq = create_singlethread_workqueue("khelper");
struct subprocess_info *call_usermodehelper_setup(char *path, char **argv,
                char **envp, gfp_t gfp_mask,
                int (*init)(struct subprocess_info *info, struct cred *new),
                void (*cleanup)(struct subprocess_info *info),
                void *data)
        struct subprocess_info *sub_info;
        sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask);
        if (!sub_info)
                goto out;

        INIT_WORK(&sub_info->work, __call_usermodehelper); <path = path;
        sub_info->argv = argv;
        sub_info->envp = envp;

        sub_info->cleanup = cleanup;
        sub_info->init = init;
        sub_info->data = data;
        return sub_info;

All the magic of calling a user program is done by call_usermodehelper_exec()

int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
         * Set the completion pointer only if there is a waiter.
         * This makes it possible to use umh_complete to free
         * the data structure in case of UMH_NO_WAIT.
        sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done;
        sub_info->wait = wait;

        queue_work(khelper_wq, &sub_info->work); <retval;
        return retval;

__call_usermodehelper() is called by scheduler and it will spwan a kthread based on whether
__request_module() was called with wait or without wait.

/* This is run by khelper thread  */
static void __call_usermodehelper(struct work_struct *work)
        struct subprocess_info *sub_info =
                container_of(work, struct subprocess_info, work);
        pid_t pid;

        if (sub_info->wait & UMH_WAIT_PROC)
                pid = kernel_thread(wait_for_helper, sub_info,
                                    CLONE_FS | CLONE_FILES | SIGCHLD);
                pid = kernel_thread(____call_usermodehelper, sub_info,

        if (pid < 0) { sub_info->retval = pid;

kernel_thread() is an internal function of kernel thread subsystem which ultimately
creates kernel thread. When we call kthread_create() from kernel module, this is the
function which is called internally.

In case of __request_module() is called with wait=true, kthread is created to execute

wait_for_helper(), a parent kthread, will also create a new child
kthread to execute ____call_usermodehelper() and waits until child exits.

In case of __request_module is called with wait=false, kthread is created directly to
invoke ____call_usermodehelper().

And finally function ____call_usermodehelper() calls do_execve() to execute usermode program.

Pre-requisites of using request_module()

In order to make request_module() work, Makefile must do some magic to copy modules object files in “/lib/modules/$(KREL)/extras/” or “/lib/modules/$(KREL)/kernel/”

“/lib/modules/$(KREL)/kernel/” generally contains kernel modules which are being installed during kernel compilation.
The kernel modules are copied to “/lib/modules/$(KREL)/kernel/” when we execute
“make modules_install” during kernel compilation.

Similarly external modules are copied to “/lib/modules/$(KREL)/extras/” by default when we execute “modules_install” target of module’s Makefile.
To copy .ko’s to some custom path, we can use “INSTALL_MOD_DIR” in Makefiles

The .ko’s need to present in above directories because modprobe try to search the modules inside “/lib/modules/$(KREL)/extras/” or “/lib/modules/$(KREL)/kernel/”

KDIR – Path to kernel source tree
KREL/KERNELRELEASE – Version of kernel, i.e. ‘uname -r’ output
INSTALL_MOD_DIR – an alternate path to “/lib/modules/$(KERNEL_RELEASE)/extras/”
INSTALL_MOD_PATH – an alternate path to where the sources are residing. If INSTALL_MOD_PATH=/my_kernel
then kernel modules will be copied to “/my_kernel/lib/modules/$(KERNEL_RELEASE)/kernel/”

If we don’t write Makefile to copy .ko’s at correct location, call to request_module()
might fail.

EXPORT_SYMBOL() and its history

EXPORT_SYMBOL() is a function macro to export symbol across loadable modules.

In releases prior to 2.4, inter_module_* were used to share variables across modules.
Now all inter_module_* apis are deprecated by linux kernel

void inter_module_register(const char *string, struct module *module,
const void *data);
void inter_module_unregister(const char *string);
void inter_module_unregister(const char *string);
const void *inter_module_get_request(const char *string, const char *module);
void inter_module_put(const char *string);


signal_handling.ko will spawn a thread and load another module.
The other module(send_signal.ko) will send signal to the module (signal_handling.ko)
which loaded it.


#include <linux/init.h>
#include <linux/module.h>       /* Specifically, a module */
#include <linux/kernel.h>       /* We're doing kernel work */
#include <linux/kthread.h>
#include <linux/delay.h>        /* for msleep() */
#include <asm/siginfo.h>        /* for signals */
#include <linux/signal.h>
#include <linux/kmod.h>         /* for request_module() */

MODULE_DESCRIPTION("dynamically loading module through kmod subsystem and "
                   "receiving signal from another module");

struct task_struct *k1;
struct task_struct *saved_task;

//Global vars
int val = 0;
int flag = 0;                   // a flag is set when val reached 10

//Thread func for k1
int inc_func(void *arg)
        int ret;
        saved_task = current;
        printk("%s : pid of current thread %u\n", current->comm, current->pid);
        kthread_pid = current->pid;

        //load module "send_signal.ko"
        ret = request_module("send_signal");
        if (ret != 0)
                printk("%s : unable to load module, Status=%d\n", current->comm, ret);



        while (!kthread_should_stop()) {
                int ret;
                ret = schedule_timeout(msecs_to_jiffies(1000));

                if (signal_pending(current)) {
                        printk("%s : signal is pending****\n", current->comm);

                if (val <=10 && !flag) {
                        printk("%s : current value is %d\n", current->comm, val);
                        if (val == 10) {
                                printk("%s : setting flag\n", current->comm);
                                flag = 1;
        printk("%s : out of while loop, exiting \"%s\"\n", current->comm, __FUNCTION__);
        return 0;

int __init startup(void)
        k1 = kthread_run(inc_func, NULL, "K1");
        return 0;

void __exit cleanup(void)
        printk("cleaning module \n");
        printk("stopped kernel thread 1 \n");



#include <linux/init.h>
#include <linux/module.h>       /* Specifically, a module */
#include <linux/kernel.h>       /* We're doing kernel work */
#include <linux/kthread.h>
#include <linux/delay.h>        /* for msleep() */
#include <asm/siginfo.h>        /* for signals */
#include <linux/signal.h>
#include <linux/kmod.h>

MODULE_DESCRIPTION("sending signal from 1 kthread to another"
                   " and using default signal handler");

//Follow me on twitter

struct task_struct *k2;
struct task_struct *saved_task;

struct siginfo data_for_handler;

//signal descriptor
struct sigaction sig;

//Global vars
int val = 0;

int signal_receive_cnt = 0;     //max # of times signal should be serviced

extern pid_t kthread_pid;
struct task_struct *target_kthread;
struct pid *pid_struct;

//Thread func for k2
int send_signal_to_task(void *arg)
        int ret;
        while (!kthread_should_stop()) {
                if (kthread_pid && signal_receive_cnt != 2) {
                        printk("%s : pid of kthread to which signal is sent = %u\n", current->comm, kthread_pid);
                        //TODO: siginfo is passed to default sighandler.
                        //Can't check the info, as i dont have access to default
                        //signal handler.
                        data_for_handler.si_code = SI_KERNEL;
                        data_for_handler.si_int = val;
                        printk("%s : sending signal to process with pid %u...\n", current->comm, kthread_pid);

                        //TODO: convert pid to task_struct
                        pid_struct = find_get_pid(kthread_pid);
                        target_kthread = pid_task(pid_struct, PIDTYPE_PID);

                        //send signal
                        ret = send_sig_info(SIGUSR1, &data_for_handler, target_kthread);
                        if (ret) {
                                printk("%s : Error sending signal to thread:%s with pid:%d, status %d\n",
                                        current->comm, saved_task->comm, saved_task->pid, ret);
                                //return ret;
                        } else {
        printk("%s : exiting \"%s\"\n", current->comm, __FUNCTION__);
        return 0;

int __init startup(void)
        printk(KERN_ALERT "send_signal module is starting\n");
        k2= kthread_run(send_signal_to_task, NULL, "K2");
        return 0;

void __exit cleanup(void)
        printk("cleaning module \n");
        printk("stopped kernel thread 2 \n");



obj-m += send_signal.o
obj-m += signal_handling.o

INSTALLED_PATH=/lib/modules/$(shell uname -r)/extra/
clean-files := $(INSTALLED_PATH)/*.ko

        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
        make -C /lib/modules/$(shell uname -r)/build INSTALL_MOD_DIR=extra M=$(PWD) modules_install
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Demystifying Linux Kernel Timers

Some basics

jiffies – repesents time taken between two successive clock interrupt.
Incremented at every timer interrupt.
It is a global variable in kernel, declared as volatile so to avoid reading stale
value from memory.
Defined in linux/jiffies.h

HZ – represents clock interrupt frequency.
Its a macro set during compile time of kernel, based on different architectures.
Defined in linux/param.h

 Interval after which jiffies increments = 1000/HZ

If HZ=100, i.e. the frequency of timer interrupt is 100. In other words,
there will be 100 interrupts in 1 sec. That means at every 10ms, “jiffies”
is incremented. So “jiffies” value will equal to 100 after 1 sec.

System uptime = jiffies/HZ seconds

To convert seconds to jiffies, jiffies = seconds * HZ.

Using timers

Linux provides the timer subsystem to delay the execution of work/task and
execute them asynchronously. There can be cases that the timer fires when
the process which scheduled timer has exited already.
Timers are some underlying subsystem of kernel, which used by apis like schedule_timeout().

Timer data structure

struct timer_list {

struct list_head entry;                          /* entry in linked list of timers */
unsigned long expires;                         /* expiration value, in jiffies */
void (*function)(unsigned long);       /* the timer handler function */
unsigned long data;                              /* lone argument to the handler */
struct tvec_t_base_s *base;               /* internal timer field, do not touch */

What not-to-do while using timers and some of its weird properties

1. don’t pass auto variables as timer arguments
2. Timer function is called in interrupt context. Hence we cannot use any api which sleeps,
like kmalloc(), wait_event_*, semaphores, etc.
3. Timers have nasty property of binding themselves to the CPU on which they are created
4. There is no direct api to make periodic timers. mod_timer() has to be called time to time
to make them periodic.
5. Can’t use schedule_timeout(), msleep()
6. There is no relevance of “current” macro in timer function, as timer function is
executing in timer(interrupt) context

Useful Timer APIs

1. init_timer(timer)
add_timer(timer) / add_timer_on(timer, CPU)
They are used together generally.

2. setup_timer(timer, callback_fn, cb_data)
It internally calls init_timer()

3. del_timer(timer)
Destroy timer and not wait for timer callback to complete, if it is already fired.
It is safe to call del_timer() inside timer callback function.
This api works for both active or inactive(which is initialized but not running) timer.
If the timer is active, it will return 1, else return 0.

4. del_timer_sync(timer)
Destroy timer and wait for (spins until) timer callback to complete, if it is already fired.
This api cannot be called within the timer callback function, as it will lead a deadlock.

del_singleshot_timer_sync() is same as del_timer_sync().

When del_timer() returns, it guarantees only that the timer is no longer active (that is,
that it will not be executed in the future). On a multiprocessing machine, however, the
timer handler might already be executing on another processor.To deactivate the timer
and wait until a potentially executing handler for the timer exits, use del_timer_sync().
Unlike del_timer(), del_timer_sync() cannot be used from interrupt context.

5. try_to_del_timer_sync(timer)
It can be used before del_timer() to check if timer can be destroyed.
It will not destroy the timer if timer has hold any resource through spinlock, etc.

6. mod_timer(timer, expires)
It modifies the “expires” field of the timer to schedule it to fire again.
It is safe to modify the “expires” field of the deleted timer. In that
case, it will reactivate the timer

7. mod_timer_pending(timer, expires)
It allow modifying the timer’s “expires” field for only active timers. Check pending arg of the function.

8. mod_timer_pinned(timer, expires)
It should be used along with add_timer_on(), to pin timer on a particular CPU

9. set_timer_slack(timer, slack_hz)
It loosens the “expires” field of the timer. By using this API, the timer subsystem will
schedule the actual timer somewhere between the time mod_timer asks for, and that time plus the slack.

Other Useful links

1. Do not use in_atomic()

2. Something on in_interrupt()

3. About try_to_del_timer_sync()

4. About mod_timer_pinned()


 * Key Takeaways:
 * 1. How to use init_timer() and add_timer()
 * 2. How/where to use add_timer_on() and mod_timer_pinned()
 * 3. How to get CPU id (smp_processor_id())

#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/timer.h>        /* for timers apis */
#include <linux/delay.h>        /* for msleep() */
#include <linux/hardirq.h>      /* for in_interrupt(), in_atomic() , moved to preempt_mask.h, 3.0 onwards */

MODULE_DESCRIPTION("Usage of init_timer(), add_timer(), add_timer_on() and mod_timer_pinned()");

//Twitter Handle

struct timer_list timer;
struct timer_list timer2;
unsigned long *data;
spinlock_t data_lock = SPIN_LOCK_UNLOCKED;

void timer_func(unsigned long data)
        unsigned long *d = (unsigned long *)data;
        printk("%s() is running on CPU-%u\n", __FUNCTION__, smp_processor_id());
        printk("%s : data = %lu...\n",__FUNCTION__, *d);
        if (in_interrupt())
                printk("%s : In interrupt context...\n", __FUNCTION__);

        if (in_atomic())
                printk("%s : In atomic context...\n", __FUNCTION__);

        if ((*d) != 0) {
                //decrementing "data" faster
                mod_timer(&timer, jiffies + msecs_to_jiffies(500));

void timer2_func(unsigned long data)
        unsigned long *d = (unsigned long *)data;
        printk("%s() is running on CPU-%u\n", __FUNCTION__, smp_processor_id());
        printk("%s : data = %lu***\n",__FUNCTION__, *d);

        if (in_interrupt())
                printk("%s : In interrupt context***\n", __FUNCTION__);

        if (in_atomic())
                printk("%s : In atomic context***\n", __FUNCTION__);

        if ((*d) != 0) {
                //scheduling it again and pinning to same CPU
                mod_timer_pinned(&timer2, jiffies + msecs_to_jiffies(1000));

int __init start_module(void)
        data = kmalloc(sizeof(unsigned long), GFP_KERNEL);
        *data = 100;

        //create timer 1
        setup_timer(&timer, timer_func, (unsigned long)data);
        //set expiry time
        mod_timer(&timer, jiffies + msecs_to_jiffies(1000));

        //create timer 2
        timer2.function = timer2_func;
        timer2.expires = jiffies + msecs_to_jiffies(1000);
        timer2.data = (unsigned long)data;              //type-casting pointer to unsigned long
        //now schedule timer on specified CPU
        add_timer_on(&timer2, 1); //on CPU 1

        return 0;

void __exit stop_module(void)
        int i = 0;
        printk("%s() is running on CPU-%u\n", __FUNCTION__, smp_processor_id());
        while (timer_pending(&timer)) {
                printk("timer 1 is pending\n");
                if (i == 5)
        if (i == 0) {
                printk("deleting timer 1\n");

        i = 0;
        while (timer_pending(&timer2)) {
                printk("timer 2 is pending\n");
                if (i == 5)
        if (i == 0) {
                printk("deleting timer 2\n");

        printk("exiting from module\n");