Tuesday, February 23, 2016

The beauty of FTRACE

The kernel configuration options  need to be enabled for Ftrace:
CONFIG_FUNCTION_TRACER
CONFIG_FUNCTION_GRAPH_TRACER
CONFIG_STACK_TRACER
CONFIG_DYNAMIC_FTRACE

Ftrace Sys path :
[~]# cd /sys/kernel/debug/tracing
[tracing]#

Stack Tracing:
The stack tracer checks the size of the stack at every function call. If it is greater than the last recorded maximum, it records the stack trace and updates the maximum with the new size. To see the current maximum, look at the stack_max_size file.

[tracing]# echo 1 > /proc/sys/kernel/stack_tracer_enabled
[tracing]# cat stack_max_size
2928
[tracing]# cat stack_trace


List of available tracers:
[tracing]# cat available_tracers 
function_graph function sched_switch nop

Setting current_tracer:
[tracing]# echo function > current_tracer
[tracing]# cat current_tracer
function

Setting trace buffer size:
[tracing]# echo 50 > buffer_size_kb

Adding module for ftrace filter:
[tracing]# echo ':mod:amdgpu' > set_ftrace_filter

This will ignore existing modules if any added and add all the functions available in amdgpu module for tracing.

Adding module for ftrace filter:
[tracing]# echo ':mod:ttm' >> set_ftrace_filter

note the '>>' is used. It will add ttm module to the existing list of modules. 

Adding set of functions start with specific name for Tracing:
[tracing]# echo 'sched*' > set_ftrace_filter
[tracing]# echo 'schedule:traceoff' >> set_ftrace_filter

All function names  start with sched are added for Tracing.

Adding specific pid for Tracing:
[tracing]# echo $$ > set_ftrace_pid

View Function graph for particular function:
[tracing]# echo kfree > set_graph_function
[tracing]# echo function_graph > current_tracer
[tracing]# cat trace

It will display the function flow for kfree.

   
Removing unwanted function contain specific name:
[tracing]# echo '!*lock*' >> set_ftrace_filter


The '!' symbol will remove functions listed in the filter file. As shown above, the '!' works with wildcards, but could also be used with a single function. Since '!' has special meaning in bash it must be wrapped with single quotes or bash will try to execute what follows it. Also note the '>>' is used. If you make the mistake of using '>' you will end up with no functions in the filter file.


References:
http://lwn.net/Articles/365835/  - ftrace part1
https://lwn.net/Articles/366796/  - ftrace part2
https://lwn.net/Articles/370423/  - ftrace secrets

Sunday, February 21, 2016

Spin-lock usage with respect to Process, Bottom Half and Top Half Context

For kernels compiled without CONFIG_SMP, and without CONFIG_PREEMPT spinlocks do not exist at all. when no-one else can run at the same time, there is no reason to have a lock.

If the kernel is compiled without CONFIG_SMP, but CONFIG_PREEMPT is set, then spinlocks simply disable preemption, which is sufficient to prevent any races.

Linux guarantees the same interrupt will not be re-entered.


spin_lock(lock):
=>  Acquire the spin lock

=> Under certain circumstances, it is not necessary to disable local interrupts. For example, most filesystems only access their data structures from process context and acquire their spinlocks by calling spin lock(lock).

=> If another tasklet/timer wants to share data with your tasklet or timer , you will both need to use spin_lock() and spin_unlock() calls. spin_lock_bh() is unnecessary here, as you are already in a tasklet, and none will be run on the same CPU.


spin_lock_irq(lock)  :
=> Disable interrupts on local CPU
=> acquire the spin lock

=> If the code in process context is holding a spinlock and the code in interrupt context attempts to acquire the same spinlock, it will spin forever. For this reason, it is recommended that spin_lock_irq() is always used.

=> Data sharing between interrupt context and softirq or tasklet or process context needs to protect with spin_lock_irq().
 

spin_lock_irqsave(lock , flags) :
=> saves current interrupt state into flags
=> Disable interrupts on local CPU
=> acquire the spin lock

=> Sharing data bwtween two Hard IRQ Handlers ( interrupt contextes) use this locking technique

=> same code can be used inside an hard irq handler (where interrupts are already off) and in softirqs (where the irq disabling is required).


spin_lock_bh(lock):
=> Disbale softirq on current CPU
=> acquire the spin lock

=> If a data structure is accessed only from process and bottom half context, spin lock bh() can be used instead. This optimisation allows interrupts to come in while the spinlock is held, but doesn’t allow bottom halves to run on exit from the interrupt routine; they will be deferred until the spin unlock bh().

=> If another tasklet/timer wants to share data with your tasklet or timer , you will both need to use spin_lock() and spin_unlock()  calls. spin_lock_bh() is unnecessary here, as you are already in a tasklet, and none will be run on the same CPU.
 
 
Locking between same softirq sharing data :
The same softirq can run on the other CPUs: you can use a per-CPU array for better performance. If you're going so far as to use a softirq, you probably care about scalable performance enough to justify the extra complexity.You'll need to use spin_lock() and spin_unlock() for shared data.


Locking Between Hard IRQ and Softirqs/Tasklets:
If a hardware irq handler shares data with a softirq, you have two concerns. Firstly, the softirq processing can be interrupted by a hardware interrupt, and secondly, the critical region could be entered by a hardware interrupt on another CPU. This is where spin_lock_irq() is used. It is defined to disable interrupts on that cpu, then grab the lock. spin_unlock_irq() does the reverse.

The irq handler does not to use spin_lock_irq(), because the softirq cannot run while the irq handler is running: it can use spin_lock(), which is slightly faster. The only exception would be if a different hardware irq handler uses the same lock: spin_lock_irq() will stop that from interrupting us.



Saturday, February 20, 2016

Deadlock Vs Livelock Vs Starvation

Deadlock: A situation in which two or more processes are unable to proceed because each is waiting for one the others to do something.

For example, consider two processes, P1 and P2, and two resources, R1 and R2. Suppose that each process needs access to both resources to perform part of its function. Then it is possible to have the following situation: the OS assigns R1 to P2, and R2 to P1. Each process is waiting for one of the two resources. Neither will release the resource that it already owns until it has acquired the other resource and performed the function requiring both resources. The two processes are deadlocked

Livelock: A situation in which two or more processes continuously change their states in response to changes in the other process(es) without doing any useful work:

For example , consider two processes each waiting for a resource the other has but waiting in a non-blocking manner. When each learns they cannot continue they release their held resource and sleep for some times, then they retrieve their original resource followed by trying to the resource the other process held, then left, then reacquired. Since both processes are trying to cope (just badly), this is a livelock.

Starvation: A situation in which a runnable process is overlooked indefinitely by the scheduler; although it is able to proceed, it is never chosen.

For example , consider three processes (P1, P2, P3) each require periodic access to resource R. Consider the situation in which P1 is in possession of the resource, and both P2 and P3 are delayed, waiting for that resource. When P1 exits its critical section, either P2 or P3 should be allowed access to R. Assume that the OS grants access to P3 and that P1 again requires access before P3 completes its critical section. If the OS grants access to P1 after P3 has finished, and subsequently alternately grants access to P1 and P3, then P2 may indefinitely be denied access to the resource, even though there is no deadlock situation.