Latest update Sep. 6, 2019

This is an account for the most significant parts of my research. I have left out some supervised work where my contribution has been minor.

This is what I did for my PhD. Systolic arrays is a concept invented by H.T. Kung shortly before 1980. The concept was born out of the fact that the laws of nature eventually would limit (1) the speed of single processors, leaving parallelisation as the only way to increase hardware performance, and (2) the cost of communication between parallel processors would be strongly dependent on the physical distance between them.

Systolic arrays are thus highly regular, special purpose processor arrays where the simple processors are connected to their nearest neighbours only. This gives rise to various mesh-connected structures, not unlike modern mesh-connected many-core processors. Each processor is specialised to perform some simple operation, like the SAXPY (multiply-and-add) operation. The basic systolic array is synchronous, where at each clock cycle each processor will read its input, perform its operation, and write the result(s) to its outputs. Thus the term "systolic" - where data is being "pumped" through the array in analogy with the heart pumping blood through the body. There is no control except the timing and location of the appearance of the input data. Thus, systolic arrays are apt to process highly predictable and regular parallel algorithms. Such algorithms are found especially within linear algebra, but systolic algorithms have been proposed also for tasks like signal processing, sorting, and computation of transitive closure.

How can systolic algorithms be systematically designed? And how can a systolic array be designed that implements a particular systolic algorithm? An approach to this is to consider the problem as a DAG scheduling problem, where the nodes are computations and the edges represent data dependencies where some output from a computation is used as input to another. Each node will then be assigned a time when it is executed, and a location (processor) where this takes place: these assignments are called space-time mappings. The assignment is done subject to causality constraints (data must be produced before being used), communication constraints (communicated data must have enough time to propagate through the network from its production to its use(s)), and possible other constraints. The non-causal constraints can either arise from an existing systolic array, or provide design constraints on a systolic array to be synthesised: in the latter case, an array supporting the space-time mapping can be obtained through a projection along the time axis.

An explicit representation of DAG and space-time mapping would be prohibitively large for the kinds of massively parallel computations typically considered for systolic arrays. Fortunately these algorithms and mappings tend to have a regular structure, which makes it possible to have compact symbolic representations. Very common is to describe the steps in the computation by some recursion equation where the recursion argument is an integer vector ranging over an index set that is represented by a polyhedron. As the dimensionality of the index set often is low, like 2 or 3, these polyhedra will provide a very compact representation.

Also the space-time formed by time and processor coordinates will typically be an integer vector space in two or three dimensions. If we restrict the space-time mappings to linear mappings from index sets to space-time, then the design problem amounts to finding a matrix with a low number of elements that represents a mapping that satisfies the constraints. This is a typically a tractable problem, and one can also consider searching for optimal solutions with respect to some optimality criterion, like total execution time or processor utilization.

My contributions were in a formalization of the approach, giving a formal semantics to computations as DAGs where the nodes are substitutions, and edges represent data dependencies defined by variables being assigned by some substitution and used by others. Each substitution can be seen as a single-assignment (corresponding to a "let" declaration in a functional language). I also considered the problem of finding a suitable space-time mapping as a constraint solving problem, formulating various constraints as arithmetic constraints on the coefficients in the mapping. The problem of finding a time-optimal systolic array then turns into finding a space-time mapping from a set of ILP problems. I also considered extensions to "almost-systolic" processor arrays defined by non-linear space-time mappings, including efficient designs for FFT, and I investigated another extension to the approach with more detailed constraints, allowing good utilization of processing elements with pipelined functional units.

- Björn
Lisper.
*Synthesis of Synchronous Systems by Static Scheduling in Space-Time*, vol. 362 of Lecture Notes in Comput. Sci., Springer-Verlag, Heidelberg, May 1989. (Ph. D. thesis). - Björn Lisper. Synthesis and equivalence of concurrent
systems.
*Theoret. Comput. Sci*., 58:183-199, 1988. - Björn Lisper. Synthesis of time-optimal systolic arrays with cells
with inner structure.
*J. Parallel Distrib. Comput.*, 10(2):182-187, Oct. 1990. - Björn Lisper. Linear
programming methods for minimizing execution time of indexed
computations. In
*Proc. Int. Workshop on Compilers for Parallel Computers*, pages 131-142, Dec. 1990. - Björn Lisper. Preconditioning index set transformations for
time-optimal affine scheduling.
*Algorithmica*, 15:193-203, 1996.

As a spinoff to my work formalizing the computation model for systolic algorithms, I developed a single-assignment semantics for imperative programs. In this semantics, each execution of an ordinary assignment generates a new single-assignment. The result of a program execution is a set of single-assignments (i.e., a DAG where edges represent data dependencies). This kind of semantics could be used to support parallelization, and I showed that the semantics is consistent with the traditional Floyd-Hoare semantics for a simple While language. There are also some relations to the SSA form used by some optimizing compilers: however my semantics generates single-assignments in a dynamic, possibly data-dependent manner whereas the SSA form is static.

In some unpublished work (techreport) I also considered how to go in the other direction: to synthesize an imperative straight-line program, with concurrent assignments, from a space-time mapping of a static set of single-assignments. The synthesis includes the selection of imperative program variables and the allocation of single-assignment variables, representing values, to program variables representing memory locations. This is a kind of static memory allocation, similar to register allocation. In this work I proved that the synthesized imperative program is semantically consistent with the set of single-assigmnents, and I also showed how to select program variables and assign single-assignment variables to them in a way that minimizes the memory requirements.

- Björn Lisper. Single
assignment semantics for imperative programs. In E. Odjik, M. Rem, and
J.-C. Syre, editors,
*Proc. PARLE'89 vol. II: Parallel Languages*, pages 321-334, Berlin, June 1989. Volume 366 of*Lecture Notes in Comput. Sci.*, Springer-Verlag. - Björn Lisper.
*Formal derivation of concurrent assignments from scheduled single assignments*. Res. Rep. R91:15, Swedish Institute of Computer Science, Oct. 1991

Unfold-fold program transformations can be used for many optimizations of functional programs. My interest was sparked by the possibility to use such transformations to simplify the control structure of a recursive program, to yield a program that is easier to implement in hardware. I thus first considered a technique called "total unfolding" where a recursive program, with partially given indata, is unfolded until termination yielding a big static expression. This static expression could then be seen as a DAG similar to the single-assignment DAGs that I had considered earlier. This would allow the use of succinct, parameterized recursive function definitions that could, with enough information about the arguments given, be unfolded into a static expression that would allow an efficient hardware implementation. To support this I proved a theorem about termination of the symbolic unfolding relative to termination of executions of the original recursive program.

Later I became interested in the correctness of unfold-fold transformations. Basically all previous results in this area concern deterministic programs. I considered recursive, possibly nondeterministic programs where I formalized the programming model using rewrite systems (the "Recursive Applicative Program Scheme" approach). In order to cover languages with variable binding constructs, like the lambda calculus, I considered higher order rewrite systems, and to capture possibly infinite computations I based the semantics on infinitary term rewriting. In this setting I was able to prove a theorem that gives some conditions guaranteeing that an unfold-fold transformation will produce a semantically equivalent, possibly nondeterministic, recursive program.

- Björn Lisper. Total
unfolding: Theory and applications.
*J. Functional Programming*, 4(4):479-498, Oct. 1994. - Björn Lisper. Computing
in unpredictable environments: Semantics, reduction strategies, and program
transformations.
*Theoret. Comput. Sci.*, 190(1):61-85, Jan. 1998. - Björn
Lisper. Infinite
unfolding and transformations of nondeterministic
programs.
*Fundamenta Informaticae*, 66(4):415-439, Apr. 2005.

Systolic array synthesis from high-level recursive algorithm specifications, which I studied for my PhD, works only for a quite limited class of algorithms. A natural next step was to consider more general high-level parallel programming models. The data parallel model soon caught my interest: it expresses parallelism implicitly, through a number of primitives that work in parallel over data structures, rather than explicitly through some kind of parallel threads or processes. If these primitives are designed correctly then the resulting data parallel language will be free from race conditions and deadlocks. Another interesting property is that the judicious use of data parallel primitives can yield very succinct and clear code, similar to the use of higher-order functions like map and fold in functional programming (and, indeed, the data parallel primitives are very related to these).

My interest quickly turned into the latter aspect. Existing data-parallel
and array languages often had flaws, or undue restrictions, that hampered
their use for writing high-level parallel algorithm specifications. Could we
design a simple, generic, yet expressive data parallel model from first
principles? Data parallel data structures are often indexed, such as
arrays. My idea was to consider such structures as partial functions from
some index set, and define data parallel operations as operations on
functions. This lead to the data field model, where the data structures, or
data fields, are pairs of functions and bounds. The functions map from some
index type (often integers, or tuples of integers), and the bounds are
representations of index sets enclosing the domains of the data fields. For
instance ordinary arrays can be seen as data fields where the function maps
from integers, and the bound consists of a pair of integers representing an
interval. However, data fields are not restricted to regular arrays. Instead
the data field model requires index types and bounds to have some particular
properties: these properties are needed for making the data parallel
operations on data fields well-defined. They can be seen as an abstract
interface, making the data field model highly generic. The data field model
also includes a variable-binding construct called "forall-abstraction",
*forall x -> e*, which parallels lambda-abstraction for functions but
defines a data field rather than a function.

The functional nature of data fields make them suitable for inclusion in functional languages. Another advantage with this is that pure functional languages disallow functions with side-effects, which is required by the data field model. To demonstrate this we made a proof-of-concept implementation of a dialect of Haskell, "Data Field Haskell", where the nhc13 compiler for Haskell 1.3 was extended with an instance of data fields consisting of sparse and dense multi-dimensional arrays. The implementation also included forall-abstraction, with rules for how to derive the bounds of the resulting data field.

- Björn Lisper, and Per
Hammarlund. On the
relation between functional and data parallel programming
languages. In
*Proc. Sixth Conference on Functional Programming Languages and Computer Architecture*, pages 210-222. ACM Press, June 1993. - Björn Lisper, and Jean-François
Collard. Extent analysis
of data fields. In B. Le Charlier, editor,
*Proc. International Symposium on Static Analysis (SAS)*, Vol. 864 of Lecture Notes in Comput. Sci., pages 208-222, Namur, Sept. 1994. Springer-Verlag. - Björn Lisper. Data
fields. In
*Proc. International Workshop on Generic Programming*, Marstrand, Sweden, June 1998 - Claes Thornberg, and Björn
Lisper. Elemental function
overloading in explicitly typed languages. In M. Mohnen and P. Koopman,
editors,
*Proc. 12th International Workshop of Implementation of Functional Languages*, pages 31-46, Aachen, Germany, Sept. 2000. - Jonas Holmerin, and Björn
Lisper. Data Field
Haskell. In G. Hutton, editor,
*Proc. Fourth Haskell Workshop*, pages 106-117, Montréal, Canada, Sept. 2000.

The Worst-Case Execution Time, or WCET, of a program is the longest time it
takes to run the program without interruptions, or other disturbances from
the environment. WCET analysis aims to estimate the WCET. Very often the
WCET estimate is required to be *safe*, which means that it will never
underestimate the real WCET. Provably safe WCET analyses are invariably
static analyses formulated in some formal framework, like abstract
interpretation.

In order to obtain a reasonably scalable safe analysis, the analysis is often broken down into three phases:

- A low-level, or microarchitectural analysis of the timing effects of the hardware including possible influences from caches, pipelines, etc. This analysis is typically used to produce safe, local WCET estimates for basic blocks or similar.
- A high-level, or flow analysis, aiming to produce upper loop iteration bounds, infeasible path information, etc. This information is most often in the form of linear arithmetic constraints on the execution counts for basic blocks.
- A calculation phase where the information from the first two phases is combined to produce a safe WCET estimate. The state-of-practice is to use an Integer Linear Programming (ILP) solver for this purpose (the so-called "IPET" approach).

When I picked up my position at MDH, I switched to the research area of WCET analysis. Unlike my previous research this was more of a team effort, with people on the team like Andreas Ermedahl, Jan Gustafsson, Christer Sandberg, Linus Källberg, Stefan Bygde, and Andreas Gustavsson. We have mainly targeted the flow analysis, which often has been overlooked, and we have developed a number of approaches to automated flow analyses mainly based on abstract interpretation:

- Jan Gustafsson's Abstract Execution (AE), which is similar to symbolic execution but instead of using a constraint solver executes abstract states according to the abstract transfer functions of the program statements. The method uses a kind of instrumentation to successively gather information about possible program flows, like upper and lower loop iteration bounds and complex infeasible path constraints, during the AE. AE can be seen as a very context-sensitive value analysis that avoids the use of widening: this is good for the precision, but introduces the risk of possible non-termination which then has to be dealt with, e.g., through timeouts.
- A fast but less precise method for loop bounds estimation, which relies on the fact that for a terminating loop the same memory state cannot be reached twice in the same program point within the loop. Thus a unique memory state must be reached for each loop iteration, implying that the maximal number of loop iterations must be bounded by the maximal number of different memory states. This number can be safely estimated using a value analysis by conventional abstract interpretation, followed by a count of the elements in the set of memory states that is abstracted by the abstract state.
- A method for parametric WCET analysis. This was the topic of Stefan Bygde's PhD, starting off from some of my ideas. WCET analysis usually computes a single number, whereas parametric WCET analysis computes a formula for the WCET bound as a function of some unknown inputs to the program. This is useful for programs where the execution time depends strongly on some inputs: a conventional WCET analysis must then still produce a number bounding all possible execution times which can lead to large overestimations. Our method uses a conventional microarchitectural analysis, and a symbolic flow analysis that builds on the "quick-and-dirty" approach above using Halbwachs' polyhedral abstract domain for the value analysis. Since this domain is relational, it can compute polyhedra whose linear constraints contain parameters. Symbolic counting techniques exist that derives a symbolic formula for the number of integer points in the polyhedron. This produces a symbolic bound for the number of memory states, which then also bounds the number of loop iterations. As for the symbolic WCET bound calculation the original idea was to use Feautrier's Parametric Integer Programming (PIP). While possible in principle, we soon found that in practice this method had very poor scalability. To remedy this Stefan Bygde developed a more approximate and less complex calculation method, which in most cases gives an acceptable precision.
- A method for WCET analysis of thread-parallel programs with shared memory and locks. This analysis was developed by Andreas Gustavsson for his PhD. The method is a generalization of Abstract Execution to handle the concurrent execution of several threads, including possible race conditions, deadlocks, etc. A major part of the work was to develop a formalization of the algorithm, including a proof of soundness. The work also included a prototype implementation for a small model language with parallel threads, shared memory, and locks.

In addition to flow analysis, we have also developed some methods for approximate hybrid analysis that combine timing measurements with static analysis techniques. Such methods are not safe in general, but can sometimes still be useful in less safety-critical applications. Both our methods aim to replace the microarchitectural analysis with timing measurements from which simple timing models can be built:

- The first method uses timing measurements to estimate the WCETs of the basic blocks for some given program, for a subsequent IPET calculation of the WCET estimate. Rather than a fine-grain instrumentation to directly measure the execution times on basic block level, which can be very difficult, the method uses end-to-end timing measurements from which local WCET estimates for the basic blocks are derived by a form of model identification. Instead of using linear regression, which is sure to underestimate some local WCETs, the model identification uses linear programming: although still not safe, this reduces the risk of underestimation.
- The second method derives simple, approximate timing models for some given combination of compiler, and target hardware. These timing models can be used for an approximate WCET analysis on source code level at early stages of program development. The method works by identifying costs for the instructions in a virtual machine: first, a number of "training programs" are compiled and run on the target system, and the execution times are recorded. Then these programs are translated to the abstract machine instead, and run with the same inputs while recording how many times each virtual machine instruction is executed. If the training programs are selected well then timing costs for these instructions can be derived using linear regression or some other optimization technique, like simulated annealing. A static WCET analysis, approximating the WCET for any program being compiled and run on the target, can now be done by translating the program to the abstract machine and performing the WCET analysis for this platform, using the timing costs for the virtual machine instructions.

Much of our work in WCET analysis is implemented in our tool SWEET (SWEdish Execution Time tool). SWEET is a tool mainly for flow analysis, and it implements both Gustafsson's AE and the more approximate method. It has a rich "flow fact" language for expressing program flow constraints. The computed flow facts can be context-sensitive, and they can express a multitude of flow constraints such as upper an lower loop iteration bounds, infeasible path constraints, etc. SWEET also implements several other supporting analyses, such as dataflow analysis (Reaching Definitions), static backwards program slicing, and a conventional value analysis by abstract interpretation. It also computes call graphs and control flow graphs, and it can compute approximate (possibly unsafe) WCET estimates using the simple timing models mentioned above.

SWEET analyses the "ALF" format. ALF is an intermediate format designed to represent both source code (such as C), as well as low-level code. Programs can be analysed by translation into ALF, translating the results back to the original program. A translator for C exists, as well as a more experimental translator for PowerPC binaries.

A spinoff of our WCET analysis research is the "Mälardalen Benchmarks", a benchmark suite that has become widely used for experimental evaluations in WCET analysis research as well as in more general embedded systems research. This is a collection of simple C programs that we gathered, organised, documented, and made easily available on the web. We still keep the benchmarks online, although we don't actively maintain the benchmark suite anymore. A successor to our benchmark suite is the TACLeBench benchmark suite, which is still actively maintained.

- Jan Gustafsson, Andreas Ermedahl, Christer Sandberg, and Björn
Lisper. Automatic
derivation of loop bounds and infeasible paths for WCET analysis using
abstract execution. In
*Proc. 27th IEEE Real-Time Systems Symposium (RTSS'06)*, Dec. 2006. - Christer Sandberg, Andreas Ermedahl, Jan Gustafsson, and Björn
Lisper. Faster WCET flow
analysis by program slicing. In
*Proc. ACM SIGPLAN Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'06)*, pages 103-112, June 2006. - Andreas Ermedahl, Christer Sandberg, Jan Gustafsson, Stefan Bygde, and
Björn Lisper. Loop
bound analysis based on a combination of program slicing, abstract
interpretation, and invariant analysis. In C. Rochange,
editor,
*Proc. 7th International Workshop on Worst-Case Execution Time Analysis, (WCET'2007)*, Pisa, Italy, July 2007. - Björn Lisper. Fully
automatic, parametric worst-case execution time analysis. In
J. Gustafsson, editor,
*Proc. Third International Workshop on Worst-Case Execution Time (WCET) Analysis*, pages 77-80, Porto, July 2003. - Stefan Bygde, Andreas Ermedahl, and Björn
Lisper. An efficient
algorithm for parametric WCET calculation.
*Journal of Systems Architecture*, 57:614-624, 2011. - Björn Lisper, and Marcelo
Santos. Model
identification for WCET analysis. In
*Proc. 15th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'09)*, pages 55-64, San Francisco, CA, Apr. 2009. - Andreas Gustavsson, Jan Gustafsson, and Björn
Lisper. Timing analysis
of parallel software using abstract execution. In K. L. McMillan and
X. Rival, editors,
*Proc. 15th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI'14)*, volume 8318 of Lecture Notes in Computer Science, pages 79-97, San Diego, CA, Jan. 2014. Springer. - Peter Altenbernd , Jan Gustafsson, Björn Lisper, and Friedhelm
Stappert. Early execution
time-estimation through automatically generated timing
models.
*Real-Time Systems*, 52(6):731-760, Nov. 2016. - Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn
Lisper. The Mälardalen
WCET benchmarks - past, present and future. In B. Lisper,
editor,
*Proc. 10th International Workshop on Worst-Case Execution Time Analysis (WCET'2010)*, pages 137-147, Brussels, Belgium, July 2010. OCG. - Björn Lisper. SWEET
- a tool for WCET flow analysis (extended abstract). In T. Margaria
and B. Steffen, editors,
*Proc. 6th International Symposium on Leveraging Applications of Formal Methods (ISOLA'14)*, volume 8803 of Lecture Notes in Computer Science, pages 482-485, Corfu, Crete, Oct. 2014. Springer-Verlag.

This work was a spinoff from our work regarding parametric WCET analysis. An important requirement on numeric abstract domains, when used for analysis of safety-critical programs, is that the analysis is sound also in the presence of numbers of bounded size. For instance, wrap-arounds must be handled correctly. For relational domains, like the polyhedral domain used in our symbolic flow analysis, this is non-trivial. Our contribution was to improve the precision significantly, by some quite simple means, for a polyhedral analysis by Simon and King that is sound for numbers of bounded size.

- Stefan Bygde, Björn Lisper, and Niklas
Holsti. Improved precision
in polyhedral analysis with wrapping.
*Science of Computer Programming*, (133):74- 87, Jan. 2017.

Related to WCET analysis, my PhD student Xavier Vera (main driver), Jingling Xue, and I investigated how to increase the predictability of the data cache contents for real-time tasks without sacrificing too much performance. The idea was to use dynamic cache locking, keeping the caches unlocked for code sections with single-path code and input-independent memory accesses, and otherwise lock the cache. With this discipline the cache contents will be known at all stages of computation, which, for instance, simplifies WCET analysis a lot. We also considered optimizations such a preloading the cache, cache partitioning to make it possible to use the technique for preemptively scheduled tasks, and compiler optimizations like loop tiling to reduce the memory footprint in systems with cache partitioning. Overall the experimental results were very good, with only small degradations in performance.

- Xavier Vera, Jingling Xue, and Björn
Lisper. Data cache
locking for tight timing calculations.
*ACM Trans. on Embedded Computing Sys.*, 7(1):1-38, Dec. 2007.

Related to the above, in collaboration with Raimund Kirner and Peter Puschner from TU Vienna, we developed a simple analysis to safely identify code parts that are single-path, i.e., code where the conditions that control the execution of the code are independent of input data. The analysis is formulated in the framework of abstract interpretation, and we provide a proof of soundness.

- Jan Gustafsson, Björn Lisper, Raimund Kirner, Peter
Puschner. Code
Analysis for Temporal Predictability.
*Real-Time Systems*, 32, 253-277, 2006.

Originally my interest in program slicing was to use it to speed up our WCET flow analysis, by excluding the analysis of code that surely cannot influence the program flow. This led to an interest in the technique itself, including joint work with my PhD student Husni Khanfar regarding demand-driven approaches to slicing, and joint theoretical work with my colleague Abu Naser Masud on the correctness of inter-procedural slicing.

- Björn Lisper, Abu Naser Masud, and Husni
Khanfar. Static backward
demand-driven slicing. In
*Proc. ACM SIGPLAN-SIGACT Symposium on Partial Evaluation and Program Manipulation (PEPM'15)*, pages 115-126. ACM, Jan. 2015. - Husni Khanfar, Björn Lisper, and Abu Naser
Masud. Static backward
program slicing for safety critical systems. In J. A. de la Puente and
T. Vardanega, editors,
*Proc. 20th International Conference on Reliable Software Technologies - Ada-Europe 2015*, volume 9111 of Lecture Notes in Computer Science, pages 9111-50-9111-65, Madrid, Spain, June 2015. Springer.

This work is still under development. I am interested in how slicing can be used to make software testing more efficient by enabling a better selection of test cases, or to reduce the search space in search-based testing. I also collaborate with Mehrdad Saadatmand and Pasqualina Potena at RISE/SICS, and Birgitta Lindström at HiS, on the use of slicing to reduce the number of equivalent mutants in mutation testing.

- Björn Lisper, Birgitta Lindström, Pasqualina Potena, Mehrdad
Saadatmand, and Markus
Bohlin. Targeted
mutation: Efficient mutation analysis for testing non-functional
properties. In
*Proc. IICST workshop on Testing Extra-Functional Properties and Quality Characteristics of Software Systems*. IEEE, Mar. 2017. (Position paper)

TADL2 is a language for expressing timing constraints. It has been developed in the automotive domain, and an early version provides the basis for the Autosar Timing Extensions. TADL2 is based on events, which basically are increasing sequences of times (called event occurrences). TADL constraints can concern single events, defining what makes an event periodic, sporadic, etc, or involve several constraints that express some timing relations between them. An example is a delay constraint that says "for every occurrence of event s, an occurrence of event r must appear within a certain time". TADL2 specifies in all 17 constraints of these types. These constraints can be used to specify various requirement on the timing.

My first contribution was to define a formal semantics for TADL2, together with Johan Nordlander. The semantics is based on a logic that basically is a first-order logic with relational constraints on arithmetical expressions, extended with sets of times and some operations on these. Each TADL2 constraint is given a precise definition in this logic.

I then considered whether some combinations of TADL2 constraints could possibly by verified automatically. In automotive it is not unusual to implement time-critical software through periodically triggered real-time tasks, which then naturally gives rise to periodic events describing the timing of the actions of these tasks. A type of formula that then becomes interesting is "if events e, e', ... are periodic, will they then also fulfil some relational constraint R(e, e', ...)?" I came up with a transformation on the formulas specifying this implication, which replaces the event occurrences in the constraint R with arithmetic variables ranging over the time windows specified by the respective periodicity constraints. In many cases this transformation results in a Presburger formula. I then proved that the transformation is applicable and correct for a majority of the TADL2 constraints. As Presburger logic is decidable, this gives a method to automatically find out whether a set of periodic events satisfies other timing constraints. Interestingly there are methods for solving Presburger formulas that are symbolic in that they can return a formula with free variables, rather than just deciding if a closed formula is true or false. This makes it possible to, for instance, leave certain parameters open in a TADL2 constraint and derive a formula expressing for which values of the parameters that the constraint is satisfied. I made some experiments with the iscc Presburger calculator, which uses such a method, which shows that the method really works at least for moderately small examples.

- Björn Lisper, and Johan
Nordlander. A simple and
flexible timing constraint logic. In T. Margaria and B. Steffen,
editors,
*Proc. 5th International Symposium on Leveraging Applications of Formal Methods (ISOLA'12)*, Lecture Notes in Comput. Sci., Heraclion, Crete, Oct. 2012. - Björn Lisper. Verifying event-based timing constraints by translation into
Presburger formulae. In L. Petrucci, C. Seceleanu, and A. Cavalcanti,
editors,
*Proc. Joint 22nd International Workshop on Formal Methods for Industrial Critical Systems and 17th International Workshop on Automated Verification of Critical Systems (FMICS-AVoCS'17)*, pages 19-33, Torino, Sept. 2017. Springer International Publishing.

Parallel software may suffer from race conditions. Such data races can cause bugs that are extremely hard to find and reproduce, since their appearance may depend on the timing in the system in complex ways. THerefore it is interesting with analyses that can pinpoint possible data races and warn the developer. A first step in this kind of analysis is a so-called "may happen in parallel" (MHP) analysis, which determines which activities in the system that may possibly take place at the same time.

In cooperation with Ericsson and Evidente East, Abu Naser Masud (main driver) and I have developed a MHP analysis for an actor-based parallel task graph model that is used for writing parallel software for telecom applications. The analysis identifies tasks that may run in parallel, and which then possibly may give rise to race conditions. Experiments show that the analysis scales to industrial software with millions of lines of code.

- Abu Naser Masud, Björn Lisper, and Federico
Ciccozzi. Automatic
inference of task parallelism in task-graph-based actor
models.
*Journal of IEEE Access*, 99:1-27, Dec. 2018.

PLEX is a programming language that was used to program the AXE telephone exchanges from Ericsson. PLEX is event-driven, where signals can spark off new jobs to execute, and its code and data is organized in a kind of primitive objects called blocks. PLEX is also sequential, with jobs being non-preemptively executed in a sequential fashion.

There is still a large backmarket for the AXE exchange. Ericsson has therefore been interested in parallelizing the software in order to leverage on modern parallel multi-core architectures. Indeed the tasks carried out by PLEX programs are typically inherently very parallel, and one would therefore like to execute PLEX jobs in parallel. However, unconstrained parallel execution of PLEX jobs may trigger data races that will not appear in sequential execution. Ericsson therefore sponsored a PhD student, Johan Lindhult, to work with me on the topic of safe parallelization of PLEX programs.

At that time Ericsson worked with a prototype parallel central processor for AXE that used a conservative form of parallel job execution, basically treating blocks as critical sections forbidding any intra-block parallel execution. This however gave insufficient parallelization, and we made an investigation into how this could be improved. In order to have firm formal underpinnings, and to document the quite nonstandard PLEX language, we first developed two formal semantics for PLEX: one modeling the sequential version, and one modeling PLEX executing on the parallel prototype system. We then considered a simple memory access analysis that could detect cases where code executing in the same block still could be safely executed in parallel without data races. Johan performed a case study on some production code, documented in his Licentiate thesis, which showed that in some cases the analysis could help improving the parallelization significantly.

As a by-product of the formalization we discovered a case where the semantics for the parallel PLEX implementation differed from the sequential semantics, indicating a bug in the parallel implementation. This bug was later confirmed by the Ericsson engineers.

- Johan Lindhult, and Björn
Lisper. Two formal
semantics for PLEX. In
*Proc. 3rd APPSEM II Workshop, APPSEM05*, Frauenchiemsee, Germany, Sept. 2005. - Johan Lindhult, and Björn
Lisper. Sequential PLEX,
and its potential for parallel execution. In
*Proc. 13th International Workshop on Compilers for Parallel Computers (CPC 2007)*, Lisbon, Portugal, July 2007.

bjorn.lisper#mdh.se