Documentation/trace/rv/monitor_synthesis.rst - linux - Git at Google

 Runtime Verification Monitor Synthesis
 ======================================

 The starting point for the application of runtime verification (RV) techniques
 is the *specification* or *modeling* of the desired (or undesired) behavior
 of the system under scrutiny.

 The formal representation needs to be then *synthesized* into a *monitor*
 that can then be used in the analysis of the trace of the system. The
 *monitor* connects to the system via an *instrumentation* that converts
 the events from the *system* to the events of the *specification*.


 In Linux terms, the runtime verification monitors are encapsulated inside
 the *RV monitor* abstraction. The RV monitor includes a set of instances
 of the monitor (per-cpu monitor, per-task monitor, and so on), the helper
 functions that glue the monitor to the system reference model, and the
 trace output as a reaction to event parsing and exceptions, as depicted
 below::

  Linux  +----- RV Monitor ----------------------------------+ Formal
   Realm |                                                   |  Realm
   +-------------------+     +----------------+     +-----------------+
   |   Linux kernel    |     |     Monitor    |     |     Reference   |
   |     Tracing       |  -> |   Instance(s)  | <-  |       Model     |
   | (instrumentation) |     | (verification) |     | (specification) |
   +-------------------+     +----------------+     +-----------------+
          |                          |                       |
          |                          V                       |
          |                     +----------+                 |
          |                     | Reaction |                 |
          |                     +--+--+--+-+                 |
          |                        |  |  |                   |
          |                        |  |  +-> trace output ?  |
          +------------------------|--|----------------------+
                                   |  +----> panic ?
                                   +-------> <user-specified>

 RV monitor synthesis
 --------------------

 The synthesis of a specification into the Linux *RV monitor* abstraction is
 automated by the rvgen tool and the header file containing common code for
 creating monitors. The header files are:

   * rv/da_monitor.h for deterministic automaton monitor.
   * rv/ltl_monitor.h for linear temporal logic monitor.

 rvgen
 -----

 The rvgen utility converts a specification into the C presentation and creating
 the skeleton of a kernel monitor in C.

 For example, it is possible to transform the wip.dot model present in
 [1] into a per-cpu monitor with the following command::

   $ rvgen monitor -c da -s wip.dot -t per_cpu

 This will create a directory named wip/ with the following files:

 - wip.h: the wip model in C
 - wip.c: the RV monitor

 The wip.c file contains the monitor declaration and the starting point for
 the system instrumentation.

 Similarly, a linear temporal logic monitor can be generated with the following
 command::

   $ rvgen monitor -c ltl -s pagefault.ltl -t per_task

 This generates pagefault/ directory with:

 - pagefault.h: The Buchi automaton (the non-deterministic state machine to
   verify the specification)
 - pagefault.c: The skeleton for the RV monitor

 Monitor header files
 --------------------

 The header files:

 - `rv/da_monitor.h` for deterministic automaton monitor
 - `rv/ltl_monitor` for linear temporal logic monitor

 include common macros and static functions for implementing *Monitor
 Instance(s)*.

 The benefits of having all common functionalities in a single header file are
 3-fold:

   - Reduce the code duplication;
   - Facilitate the bug fix/improvement;
   - Avoid the case of developers changing the core of the monitor code to
     manipulate the model in a (let's say) non-standard way.

 rv/da_monitor.h
 +++++++++++++++

 This initial implementation presents three different types of monitor instances:

 - ``#define DECLARE_DA_MON_GLOBAL(name, type)``
 - ``#define DECLARE_DA_MON_PER_CPU(name, type)``
 - ``#define DECLARE_DA_MON_PER_TASK(name, type)``

 The first declares the functions for a global deterministic automata monitor,
 the second for monitors with per-cpu instances, and the third with per-task
 instances.

 In all cases, the 'name' argument is a string that identifies the monitor, and
 the 'type' argument is the data type used by rvgen on the representation of
 the model in C.

 For example, the wip model with two states and three events can be
 stored in an 'unsigned char' type. Considering that the preemption control
 is a per-cpu behavior, the monitor declaration in the 'wip.c' file is::

   DECLARE_DA_MON_PER_CPU(wip, unsigned char);

 The monitor is executed by sending events to be processed via the functions
 presented below::

   da_handle_event_$(MONITOR_NAME)($(event from event enum));
   da_handle_start_event_$(MONITOR_NAME)($(event from event enum));
   da_handle_start_run_event_$(MONITOR_NAME)($(event from event enum));

 The function ``da_handle_event_$(MONITOR_NAME)()`` is the regular case where
 the event will be processed if the monitor is processing events.

 When a monitor is enabled, it is placed in the initial state of the automata.
 However, the monitor does not know if the system is in the *initial state*.

 The ``da_handle_start_event_$(MONITOR_NAME)()`` function is used to notify the
 monitor that the system is returning to the initial state, so the monitor can
 start monitoring the next event.

 The ``da_handle_start_run_event_$(MONITOR_NAME)()`` function is used to notify
 the monitor that the system is known to be in the initial state, so the
 monitor can start monitoring and monitor the current event.

 Using the wip model as example, the events "preempt_disable" and
 "sched_waking" should be sent to monitor, respectively, via [2]::

   da_handle_event_wip(preempt_disable_wip);
   da_handle_event_wip(sched_waking_wip);

 While the event "preempt_enabled" will use::

   da_handle_start_event_wip(preempt_enable_wip);

 To notify the monitor that the system will be returning to the initial state,
 so the system and the monitor should be in sync.

 rv/ltl_monitor.h
 ++++++++++++++++
 This file must be combined with the $(MODEL_NAME).h file (generated by `rvgen`)
 to be complete. For example, for the `pagefault` monitor, the `pagefault.c`
 source file must include::

   #include "pagefault.h"
   #include <rv/ltl_monitor.h>

 (the skeleton monitor file generated by `rvgen` already does this).

 `$(MODEL_NAME).h` (`pagefault.h` in the above example) includes the
 implementation of the Buchi automaton - a non-deterministic state machine that
 verifies the LTL specification. While `rv/ltl_monitor.h` includes the common
 helper functions to interact with the Buchi automaton and to implement an RV
 monitor. An important definition in `$(MODEL_NAME).h` is::

   enum ltl_atom {
       LTL_$(FIRST_ATOMIC_PROPOSITION),
       LTL_$(SECOND_ATOMIC_PROPOSITION),
       ...
       LTL_NUM_ATOM
   };

 which is the list of atomic propositions present in the LTL specification
 (prefixed with "LTL\_" to avoid name collision). This `enum` is passed to the
 functions interacting with the Buchi automaton.

 While generating code, `rvgen` cannot understand the meaning of the atomic
 propositions. Thus, that task is left for manual work. The recommended pratice
 is adding tracepoints to places where the atomic propositions change; and in the
 tracepoints' handlers: the Buchi automaton is executed using::

   void ltl_atom_update(struct task_struct *task, enum ltl_atom atom, bool value)

 which tells the Buchi automaton that the atomic proposition `atom` is now
 `value`. The Buchi automaton checks whether the LTL specification is still
 satisfied, and invokes the monitor's error tracepoint and the reactor if
 violation is detected.

 Tracepoints and `ltl_atom_update()` should be used whenever possible. However,
 it is sometimes not the most convenient. For some atomic propositions which are
 changed in multiple places in the kernel, it is cumbersome to trace all those
 places. Furthermore, it may not be important that the atomic propositions are
 updated at precise times. For example, considering the following linear temporal
 logic::

   RULE = always (RT imply not PAGEFAULT)

 This LTL states that a real-time task does not raise page faults. For this
 specification, it is not important when `RT` changes, as long as it has the
 correct value when `PAGEFAULT` is true.  Motivated by this case, another
 function is introduced::

   void ltl_atom_fetch(struct task_struct *task, struct ltl_monitor *mon)

 This function is called whenever the Buchi automaton is triggered. Therefore, it
 can be manually implemented to "fetch" `RT`::

   void ltl_atom_fetch(struct task_struct *task, struct ltl_monitor *mon)
   {
       ltl_atom_set(mon, LTL_RT, rt_task(task));
   }

 Effectively, whenever `PAGEFAULT` is updated with a call to `ltl_atom_update()`,
 `RT` is also fetched. Thus, the LTL specification can be verified without
 tracing `RT` everywhere.

 For atomic propositions which act like events, they usually need to be set (or
 cleared) and then immediately cleared (or set). A convenient function is
 provided::

   void ltl_atom_pulse(struct task_struct *task, enum ltl_atom atom, bool value)

 which is equivalent to::

   ltl_atom_update(task, atom, value);
   ltl_atom_update(task, atom, !value);

 To initialize the atomic propositions, the following function must be
 implemented::

   ltl_atoms_init(struct task_struct *task, struct ltl_monitor *mon, bool task_creation)

 This function is called for all running tasks when the monitor is enabled. It is
 also called for new tasks created after the enabling the monitor. It should
 initialize as many atomic propositions as possible, for example::

   void ltl_atom_init(struct task_struct *task, struct ltl_monitor *mon, bool task_creation)
   {
       ltl_atom_set(mon, LTL_RT, rt_task(task));
       if (task_creation)
           ltl_atom_set(mon, LTL_PAGEFAULT, false);
   }

 Atomic propositions not initialized by `ltl_atom_init()` will stay in the
 unknown state until relevant tracepoints are hit, which can take some time. As
 monitoring for a task cannot be done until all atomic propositions is known for
 the task, the monitor may need some time to start validating tasks which have
 been running before the monitor is enabled. Therefore, it is recommended to
 start the tasks of interest after enabling the monitor.

 Final remarks
 -------------

 With the monitor synthesis in place using the header files and
 rvgen, the developer's work should be limited to the instrumentation
 of the system, increasing the confidence in the overall approach.

 [1] For details about deterministic automata format and the translation
 from one representation to another, see::

   Documentation/trace/rv/deterministic_automata.rst

 [2] rvgen appends the monitor's name suffix to the events enums to
 avoid conflicting variables when exporting the global vmlinux.h
 use by BPF programs.
	Runtime Verification Monitor Synthesis
	======================================

	The starting point for the application of runtime verification (RV) techniques
	is the specification or modeling of the desired (or undesired) behavior
	of the system under scrutiny.

	The formal representation needs to be then synthesized into a monitor
	that can then be used in the analysis of the trace of the system. The
	monitor connects to the system via an instrumentation that converts
	the events from the system to the events of the specification.


	In Linux terms, the runtime verification monitors are encapsulated inside
	the RV monitor abstraction. The RV monitor includes a set of instances
	of the monitor (per-cpu monitor, per-task monitor, and so on), the helper
	functions that glue the monitor to the system reference model, and the
	trace output as a reaction to event parsing and exceptions, as depicted
	below::

	Linux +----- RV Monitor ----------------------------------+ Formal
	Realm \| \| Realm
	+-------------------+ +----------------+ +-----------------+
	\| Linux kernel \| \| Monitor \| \| Reference \|
	\| Tracing \| -> \| Instance(s) \| <- \| Model \|
	\| (instrumentation) \| \| (verification) \| \| (specification) \|
	+-------------------+ +----------------+ +-----------------+
	\| \| \|
	\| V \|
	\| +----------+ \|
	\| \| Reaction \| \|
	\| +--+--+--+-+ \|
	\| \| \| \| \|
	\| \| \| +-> trace output ? \|
	+------------------------\|--\|----------------------+
	\| +----> panic ?
	+-------> <user-specified>

	RV monitor synthesis
	--------------------

	The synthesis of a specification into the Linux RV monitor abstraction is
	automated by the rvgen tool and the header file containing common code for
	creating monitors. The header files are:

	* rv/da_monitor.h for deterministic automaton monitor.
	* rv/ltl_monitor.h for linear temporal logic monitor.

	rvgen
	-----

	The rvgen utility converts a specification into the C presentation and creating
	the skeleton of a kernel monitor in C.

	For example, it is possible to transform the wip.dot model present in
	[1] into a per-cpu monitor with the following command::

	$ rvgen monitor -c da -s wip.dot -t per_cpu

	This will create a directory named wip/ with the following files:

	- wip.h: the wip model in C
	- wip.c: the RV monitor

	The wip.c file contains the monitor declaration and the starting point for
	the system instrumentation.

	Similarly, a linear temporal logic monitor can be generated with the following
	command::

	$ rvgen monitor -c ltl -s pagefault.ltl -t per_task

	This generates pagefault/ directory with:

	- pagefault.h: The Buchi automaton (the non-deterministic state machine to
	verify the specification)
	- pagefault.c: The skeleton for the RV monitor

	Monitor header files
	--------------------

	The header files:

	- `rv/da_monitor.h` for deterministic automaton monitor
	- `rv/ltl_monitor` for linear temporal logic monitor

	include common macros and static functions for implementing *Monitor
	Instance(s)*.

	The benefits of having all common functionalities in a single header file are
	3-fold:

	- Reduce the code duplication;
	- Facilitate the bug fix/improvement;
	- Avoid the case of developers changing the core of the monitor code to
	manipulate the model in a (let's say) non-standard way.

	rv/da_monitor.h
	+++++++++++++++

	This initial implementation presents three different types of monitor instances:

	- ``#define DECLARE_DA_MON_GLOBAL(name, type)``
	- ``#define DECLARE_DA_MON_PER_CPU(name, type)``
	- ``#define DECLARE_DA_MON_PER_TASK(name, type)``

	The first declares the functions for a global deterministic automata monitor,
	the second for monitors with per-cpu instances, and the third with per-task
	instances.

	In all cases, the 'name' argument is a string that identifies the monitor, and
	the 'type' argument is the data type used by rvgen on the representation of
	the model in C.

	For example, the wip model with two states and three events can be
	stored in an 'unsigned char' type. Considering that the preemption control
	is a per-cpu behavior, the monitor declaration in the 'wip.c' file is::

	DECLARE_DA_MON_PER_CPU(wip, unsigned char);

	The monitor is executed by sending events to be processed via the functions
	presented below::

	da_handle_event_$(MONITOR_NAME)($(event from event enum));
	da_handle_start_event_$(MONITOR_NAME)($(event from event enum));
	da_handle_start_run_event_$(MONITOR_NAME)($(event from event enum));

	The function ``da_handle_event_$(MONITOR_NAME)()`` is the regular case where
	the event will be processed if the monitor is processing events.

	When a monitor is enabled, it is placed in the initial state of the automata.
	However, the monitor does not know if the system is in the initial state.

	The ``da_handle_start_event_$(MONITOR_NAME)()`` function is used to notify the
	monitor that the system is returning to the initial state, so the monitor can
	start monitoring the next event.

	The ``da_handle_start_run_event_$(MONITOR_NAME)()`` function is used to notify
	the monitor that the system is known to be in the initial state, so the
	monitor can start monitoring and monitor the current event.

	Using the wip model as example, the events "preempt_disable" and
	"sched_waking" should be sent to monitor, respectively, via [2]::

	da_handle_event_wip(preempt_disable_wip);
	da_handle_event_wip(sched_waking_wip);

	While the event "preempt_enabled" will use::

	da_handle_start_event_wip(preempt_enable_wip);

	To notify the monitor that the system will be returning to the initial state,
	so the system and the monitor should be in sync.

	rv/ltl_monitor.h
	++++++++++++++++
	This file must be combined with the $(MODEL_NAME).h file (generated by `rvgen`)
	to be complete. For example, for the `pagefault` monitor, the `pagefault.c`
	source file must include::

	#include "pagefault.h"
	#include <rv/ltl_monitor.h>

	(the skeleton monitor file generated by `rvgen` already does this).

	`$(MODEL_NAME).h` (`pagefault.h` in the above example) includes the
	implementation of the Buchi automaton - a non-deterministic state machine that
	verifies the LTL specification. While `rv/ltl_monitor.h` includes the common
	helper functions to interact with the Buchi automaton and to implement an RV
	monitor. An important definition in `$(MODEL_NAME).h` is::

	enum ltl_atom {
	LTL_$(FIRST_ATOMIC_PROPOSITION),
	LTL_$(SECOND_ATOMIC_PROPOSITION),
	...
	LTL_NUM_ATOM
	};

	which is the list of atomic propositions present in the LTL specification
	(prefixed with "LTL\_" to avoid name collision). This `enum` is passed to the
	functions interacting with the Buchi automaton.

	While generating code, `rvgen` cannot understand the meaning of the atomic
	propositions. Thus, that task is left for manual work. The recommended pratice
	is adding tracepoints to places where the atomic propositions change; and in the
	tracepoints' handlers: the Buchi automaton is executed using::

	void ltl_atom_update(struct task_struct *task, enum ltl_atom atom, bool value)

	which tells the Buchi automaton that the atomic proposition `atom` is now
	`value`. The Buchi automaton checks whether the LTL specification is still
	satisfied, and invokes the monitor's error tracepoint and the reactor if
	violation is detected.

	Tracepoints and `ltl_atom_update()` should be used whenever possible. However,
	it is sometimes not the most convenient. For some atomic propositions which are
	changed in multiple places in the kernel, it is cumbersome to trace all those
	places. Furthermore, it may not be important that the atomic propositions are
	updated at precise times. For example, considering the following linear temporal
	logic::

	RULE = always (RT imply not PAGEFAULT)

	This LTL states that a real-time task does not raise page faults. For this
	specification, it is not important when `RT` changes, as long as it has the
	correct value when `PAGEFAULT` is true. Motivated by this case, another
	function is introduced::

	void ltl_atom_fetch(struct task_struct task, struct ltl_monitor mon)

	This function is called whenever the Buchi automaton is triggered. Therefore, it
	can be manually implemented to "fetch" `RT`::

	void ltl_atom_fetch(struct task_struct task, struct ltl_monitor mon)
	{
	ltl_atom_set(mon, LTL_RT, rt_task(task));
	}

	Effectively, whenever `PAGEFAULT` is updated with a call to `ltl_atom_update()`,
	`RT` is also fetched. Thus, the LTL specification can be verified without
	tracing `RT` everywhere.

	For atomic propositions which act like events, they usually need to be set (or
	cleared) and then immediately cleared (or set). A convenient function is
	provided::

	void ltl_atom_pulse(struct task_struct *task, enum ltl_atom atom, bool value)

	which is equivalent to::

	ltl_atom_update(task, atom, value);
	ltl_atom_update(task, atom, !value);

	To initialize the atomic propositions, the following function must be
	implemented::

	ltl_atoms_init(struct task_struct task, struct ltl_monitor mon, bool task_creation)

	This function is called for all running tasks when the monitor is enabled. It is
	also called for new tasks created after the enabling the monitor. It should
	initialize as many atomic propositions as possible, for example::

	void ltl_atom_init(struct task_struct task, struct ltl_monitor mon, bool task_creation)
	{
	ltl_atom_set(mon, LTL_RT, rt_task(task));
	if (task_creation)
	ltl_atom_set(mon, LTL_PAGEFAULT, false);
	}

	Atomic propositions not initialized by `ltl_atom_init()` will stay in the
	unknown state until relevant tracepoints are hit, which can take some time. As
	monitoring for a task cannot be done until all atomic propositions is known for
	the task, the monitor may need some time to start validating tasks which have
	been running before the monitor is enabled. Therefore, it is recommended to
	start the tasks of interest after enabling the monitor.

	Final remarks
	-------------

	With the monitor synthesis in place using the header files and
	rvgen, the developer's work should be limited to the instrumentation
	of the system, increasing the confidence in the overall approach.

	[1] For details about deterministic automata format and the translation
	from one representation to another, see::

	Documentation/trace/rv/deterministic_automata.rst

	[2] rvgen appends the monitor's name suffix to the events enums to
	avoid conflicting variables when exporting the global vmlinux.h
	use by BPF programs.