Documentation/locking/lglock.txt - linux - Git at Google

 lglock - local/global locks for mostly local access patterns
 ------------------------------------------------------------

 Origin: Nick Piggin's VFS scalability series introduced during
 	2.6.35++ [1] [2]
 Location: kernel/locking/lglock.c
 	include/linux/lglock.h
 Users: currently only the VFS and stop_machine related code

 Design Goal:
 ------------

 Improve scalability of globally used large data sets that are
 distributed over all CPUs as per_cpu elements.

 To manage global data structures that are partitioned over all CPUs
 as per_cpu elements but can be mostly handled by CPU local actions
 lglock will be used where the majority of accesses are cpu local
 reading and occasional cpu local writing with very infrequent
 global write access.


 * deal with things locally whenever possible
 	- very fast access to the local per_cpu data
 	- reasonably fast access to specific per_cpu data on a different
 	  CPU
 * while making global action possible when needed
 	- by expensive access to all CPUs locks - effectively
 	  resulting in a globally visible critical section.

 Design:
 -------

 Basically it is an array of per_cpu spinlocks with the
 lg_local_lock/unlock accessing the local CPUs lock object and the
 lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object
 the lg_local_lock has to disable preemption as migration protection so
 that the reference to the local CPUs lock does not go out of scope.
 Due to the lg_local_lock/unlock only touching cpu-local resources it
 is fast. Taking the local lock on a different CPU will be more
 expensive but still relatively cheap.

 One can relax the migration constraints by acquiring the current
 CPUs lock with lg_local_lock_cpu, remember the cpu, and release that
 lock at the end of the critical section even if migrated. This should
 give most of the performance benefits without inhibiting migration
 though needs careful considerations for nesting of lglocks and
 consideration of deadlocks with lg_global_lock.

 The lg_global_lock/unlock locks all underlying spinlocks of all
 possible CPUs (including those off-line). The preemption disable/enable
 are needed in the non-RT kernels to prevent deadlocks like:

                      on cpu 1

               task A          task B
          lg_global_lock
            got cpu 0 lock
                  <<<< preempt <<<<
                          lg_local_lock_cpu for cpu 0
                            spin on cpu 0 lock

 On -RT this deadlock scenario is resolved by the arch_spin_locks in the
 lglocks being replaced by rt_mutexes which resolve the above deadlock
 by boosting the lock-holder.


 Implementation:
 ---------------

 The initial lglock implementation from Nick Piggin used some complex
 macros to generate the lglock/brlock in lglock.h - they were later
 turned into a set of functions by Andi Kleen [7]. The change to functions
 was motivated by the presence of multiple lock users and also by them
 being easier to maintain than the generating macros. This change to
 functions is also the basis to eliminated the restriction of not
 being initializeable in kernel modules (the remaining problem is that
 locks are not explicitly initialized - see lockdep-design.txt)

 Declaration and initialization:
 -------------------------------

   #include <linux/lglock.h>

   DEFINE_LGLOCK(name)
   or:
   DEFINE_STATIC_LGLOCK(name);

   lg_lock_init(&name, "lockdep_name_string");

   on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note
   also that as of 3.18-rc6 all declaration in use are of the _STATIC_
   variant (and it seems that the non-static was never in use).
   lg_lock_init is initializing the lockdep map only.

 Usage:
 ------

 From the locking semantics it is a spinlock. It could be called a
 locality aware spinlock. lg_local_* behaves like a per_cpu
 spinlock and lg_global_* like a global spinlock.
 No surprises in the API.

   lg_local_lock(*lglock);
      access to protected per_cpu object on this CPU
   lg_local_unlock(*lglock);

   lg_local_lock_cpu(*lglock, cpu);
      access to protected per_cpu object on other CPU cpu
   lg_local_unlock_cpu(*lglock, cpu);

   lg_global_lock(*lglock);
      access all protected per_cpu objects on all CPUs
   lg_global_unlock(*lglock);

   There are no _trylock variants of the lglocks.

 Note that the lg_global_lock/unlock has to iterate over all possible
 CPUs rather than the actually present CPUs or a CPU could go off-line
 with a held lock [4] and that makes it very expensive. A discussion on
 these issues can be found at [5]

 Constraints:
 ------------

   * currently the declaration of lglocks in kernel modules is not
     possible, though this should be doable with little change.
   * lglocks are not recursive.
   * suitable for code that can do most operations on the CPU local
     data and will very rarely need the global lock
   * lg_global_lock/unlock is *very* expensive and does not scale
   * on UP systems all lg_* primitives are simply spinlocks
   * in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but
     does not change the tasks state while sleeping [6].
   * in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock
     is downgraded to a migrate_disable/enable, the other
     preempt_disable/enable are downgraded to barriers [6].
     The deadlock noted for non-RT above is resolved due to rt_mutexes
     boosting the lock-holder in this case which arch_spin_locks do
     not do.

 lglocks were designed for very specific problems in the VFS and probably
 only are the right answer in these corner cases. Any new user that looks
 at lglocks probably wants to look at the seqlock and RCU alternatives as
 her first choice. There are also efforts to resolve the RCU issues that
 currently prevent using RCU in place of view remaining lglocks.

 Note on brlock history:
 -----------------------

 The 'Big Reader' read-write spinlocks were originally introduced by
 Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They
 later were introduced by the VFS scalability patch set in 2.6 series
 again as the "big reader lock" brlock [2] variant of lglock which has
 been replaced by seqlock primitives or by RCU based primitives in the
 3.13 kernel series as was suggested in [3] in 2003. The brlock was
 entirely removed in the 3.13 kernel series.

 Link: 1 http://lkml.org/lkml/2010/8/2/81
 Link: 2 http://lwn.net/Articles/401738/
 Link: 3 http://lkml.org/lkml/2003/3/9/205
 Link: 4 https://lkml.org/lkml/2011/8/24/185
 Link: 5 http://lkml.org/lkml/2011/12/18/189
 Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/
         patch series - lglocks-rt.patch.patch
 Link: 7 http://lkml.org/lkml/2012/3/5/26
	lglock - local/global locks for mostly local access patterns
	------------------------------------------------------------

	Origin: Nick Piggin's VFS scalability series introduced during
	2.6.35++ [1] [2]
	Location: kernel/locking/lglock.c
	include/linux/lglock.h
	Users: currently only the VFS and stop_machine related code

	Design Goal:
	------------

	Improve scalability of globally used large data sets that are
	distributed over all CPUs as per_cpu elements.

	To manage global data structures that are partitioned over all CPUs
	as per_cpu elements but can be mostly handled by CPU local actions
	lglock will be used where the majority of accesses are cpu local
	reading and occasional cpu local writing with very infrequent
	global write access.


	* deal with things locally whenever possible
	- very fast access to the local per_cpu data
	- reasonably fast access to specific per_cpu data on a different
	CPU
	* while making global action possible when needed
	- by expensive access to all CPUs locks - effectively
	resulting in a globally visible critical section.

	Design:
	-------

	Basically it is an array of per_cpu spinlocks with the
	lg_local_lock/unlock accessing the local CPUs lock object and the
	lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object
	the lg_local_lock has to disable preemption as migration protection so
	that the reference to the local CPUs lock does not go out of scope.
	Due to the lg_local_lock/unlock only touching cpu-local resources it
	is fast. Taking the local lock on a different CPU will be more
	expensive but still relatively cheap.

	One can relax the migration constraints by acquiring the current
	CPUs lock with lg_local_lock_cpu, remember the cpu, and release that
	lock at the end of the critical section even if migrated. This should
	give most of the performance benefits without inhibiting migration
	though needs careful considerations for nesting of lglocks and
	consideration of deadlocks with lg_global_lock.

	The lg_global_lock/unlock locks all underlying spinlocks of all
	possible CPUs (including those off-line). The preemption disable/enable
	are needed in the non-RT kernels to prevent deadlocks like:

	on cpu 1

	task A task B
	lg_global_lock
	got cpu 0 lock
	<<<< preempt <<<<
	lg_local_lock_cpu for cpu 0
	spin on cpu 0 lock

	On -RT this deadlock scenario is resolved by the arch_spin_locks in the
	lglocks being replaced by rt_mutexes which resolve the above deadlock
	by boosting the lock-holder.


	Implementation:
	---------------

	The initial lglock implementation from Nick Piggin used some complex
	macros to generate the lglock/brlock in lglock.h - they were later
	turned into a set of functions by Andi Kleen [7]. The change to functions
	was motivated by the presence of multiple lock users and also by them
	being easier to maintain than the generating macros. This change to
	functions is also the basis to eliminated the restriction of not
	being initializeable in kernel modules (the remaining problem is that
	locks are not explicitly initialized - see lockdep-design.txt)

	Declaration and initialization:
	-------------------------------

	#include <linux/lglock.h>

	DEFINE_LGLOCK(name)
	or:
	DEFINE_STATIC_LGLOCK(name);

	lg_lock_init(&name, "lockdep_name_string");

	on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note
	also that as of 3.18-rc6 all declaration in use are of the _STATIC_
	variant (and it seems that the non-static was never in use).
	lg_lock_init is initializing the lockdep map only.

	Usage:
	------

	From the locking semantics it is a spinlock. It could be called a
	locality aware spinlock. lg_local_* behaves like a per_cpu
	spinlock and lg_global_* like a global spinlock.
	No surprises in the API.

	lg_local_lock(*lglock);
	access to protected per_cpu object on this CPU
	lg_local_unlock(*lglock);

	lg_local_lock_cpu(*lglock, cpu);
	access to protected per_cpu object on other CPU cpu
	lg_local_unlock_cpu(*lglock, cpu);

	lg_global_lock(*lglock);
	access all protected per_cpu objects on all CPUs
	lg_global_unlock(*lglock);

	There are no _trylock variants of the lglocks.

	Note that the lg_global_lock/unlock has to iterate over all possible
	CPUs rather than the actually present CPUs or a CPU could go off-line
	with a held lock [4] and that makes it very expensive. A discussion on
	these issues can be found at [5]

	Constraints:
	------------

	* currently the declaration of lglocks in kernel modules is not
	possible, though this should be doable with little change.
	* lglocks are not recursive.
	* suitable for code that can do most operations on the CPU local
	data and will very rarely need the global lock
	* lg_global_lock/unlock is very expensive and does not scale
	* on UP systems all lg_* primitives are simply spinlocks
	* in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but
	does not change the tasks state while sleeping [6].
	* in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock
	is downgraded to a migrate_disable/enable, the other
	preempt_disable/enable are downgraded to barriers [6].
	The deadlock noted for non-RT above is resolved due to rt_mutexes
	boosting the lock-holder in this case which arch_spin_locks do
	not do.

	lglocks were designed for very specific problems in the VFS and probably
	only are the right answer in these corner cases. Any new user that looks
	at lglocks probably wants to look at the seqlock and RCU alternatives as
	her first choice. There are also efforts to resolve the RCU issues that
	currently prevent using RCU in place of view remaining lglocks.

	Note on brlock history:
	-----------------------

	The 'Big Reader' read-write spinlocks were originally introduced by
	Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They
	later were introduced by the VFS scalability patch set in 2.6 series
	again as the "big reader lock" brlock [2] variant of lglock which has
	been replaced by seqlock primitives or by RCU based primitives in the
	3.13 kernel series as was suggested in [3] in 2003. The brlock was
	entirely removed in the 3.13 kernel series.

	Link: 1 http://lkml.org/lkml/2010/8/2/81
	Link: 2 http://lwn.net/Articles/401738/
	Link: 3 http://lkml.org/lkml/2003/3/9/205
	Link: 4 https://lkml.org/lkml/2011/8/24/185
	Link: 5 http://lkml.org/lkml/2011/12/18/189
	Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/
	patch series - lglocks-rt.patch.patch
	Link: 7 http://lkml.org/lkml/2012/3/5/26