MPS-DF
project overview

MPS-DF is the data-flow support in the MPS language workbench. It supports the definition and efficient execution of data-flow analyses. Users of MPS-DF first define data-flow builders for the analyzed language. These builders contribute subgraphs to the data-flow graph (DFG), an intermediate program representation encoding the data-flow of the analyzed program. MPS-DF then supports defining data-flow analyses, on the DFG, which compute some data-flow specific knowledge (e.g. which variables are initialized) about the program. These data-flow analyses are static program analyses, which derive the knowl- edge without actually running the analyzed program. Finally, existing MPS components, such as program validators, transformations or refactorings, make use of this knowledge.

MPS-DF has two important characteristics: extensibility and variable precision.

Software Requirements:

  • MPS 3.3.5
  • Java 1.8
  • mbeddr for the inter-procedural analyses and the case studies

1 - Highlights

The following video also highlights the various aspects of the project:

2 - Getting Started

At the time of the ASE submission only the intra-procedural part of MPS-DF is integrated into MPS. The inter-procedural part is part of the mbeddr project and it is being integrated into core MPS during the summer. The plan is that the upcoming major release of MPS (version 3.4) will already contain all MPS-DF components.

You can download the corresponding MPS version from here and you may find the MPS user guide interesting as well.

You can obtain the inter-procedural part along with the case studies from the mbeddr repository. Simply check out the feature/mps-df branch which also contains the example projects used in the paper and in the above video for demonstration. The relevant MPS modules for the inter-procedural data-flow analyses are the com.mbeddr.mpsutil.dataflow language and the com.mbeddr.mpsutil.dataflow.runtime solution from the mpsutil project in mbeddr.

3 - Case studies

We have developed several data-flow analyses for C in mbeddr and for Java in MPS itself. The analyses implementations are readily available once you obtained the above mentioned software components. In order to find these analysis implementations you simply need to press CTRL (or CMD on Mac) + N in MPS and type in the analysis name. MPS will automatically jump to the declaration of the analyses. The name of the analyses are given below in parentheses. In the following we give details about four analyses.

Points-to Analysis for C (PointsTo)

Synopsis: derives the possible targets of a pointer typed variable at a program point (DFG node).

Lattice element: Map<node<Var>, Set<node<Var>>> which represents a mapping from variables to set of variables.

Analysis direction: forward, because the points-to relationship is something that happened in the past and we carry that over to later nodes in the DFG.

Must/May property: must analysis, because we want this analysis to be a sound analysis, claiming that a variable points to another only if this relationship holds on all execution paths that lead to a program point.

Initialized Variables Analysis for C (InitializedVariables)

Synopsis: derives which variables are definitely initialized at a program point (DFG node).

Lattice element: Set<node<Var>> which represents the set of initialized variables.

Analysis direction: forward, because the initialized property is something that happened in the past and we carry that over to later nodes in the DFG.

Must/May property: must analysis, because we want this analysis to be a sound analysis, claiming the initialized property about a variable only when it is guaranteed to have been initialized on all execution paths that lead to a program point.

Custom instructions: The analysis uses only one custom instruction, defInit(node<Var> target), which represents a definitely initialized variable. The presence of the custom instruction in the DFG overrides the analysis result by considering the target variable initialized.

Other: The analysis uses the results of the points-to analysis to know which pointer typed variables point to which other variables. This is necessary, because whenever an assignment of the form *a = 10; happens, we do not directly initialize the variable a, but those variables which a may point to.

Liveness Analysis for C (Liveness)

Synopsis: the analysis derives which variables are live at a program point (DFG node), where live means that the current value of the variable may be read during the remaining execution of the program.

Lattice element: Set<node<Var>> which represents the set of live variables.

Analysis direction: backward, because the current value of a variable may be read in the future, thus we need to propagate the information from the back to the front.

Must/May property: may analysis, because we consider a variable live if it is read on any of the succeeding execution paths. Here, we deliberately compute and upper approximation of the data-flow knowledge at runtime.

Null Analysis for Java (Nullable)

Synopsis: the analysis derives the nullable state of a variable or an expression. This information is used to prevent null pointer dereferencing in Java code.

Lattice element: Map<node<>, NullableState> which represents the nullable state of an expression or a variable. The NullableState itself is also a lattice which consists of the following elements:

The lattice itself is depicted on the left in the following figure. We define a merge function which can be used to combine two lattice elements together. The implementation of the merge function is given with the table on the right in the following figure. The introduction and handling of NOTINIT may seem strange at first, but we emphasise that uninitialized read errors are caught by another analysis. Forgetting about NOTINITs is also accelerated by merge, because it picks the other element whenever we combine a NOTINIT with something else.

Analysis direction:forward, because the analysis result depends on previous assignments (or the absence of them) which happened in the past.

Must/May property:must analysis, because we want this analysis to be a sound analysis, claiming that an expression/variable is definitely null/not null only if this fact holds on all execution paths that lead to a program point.

Custom instructions:The analysis uses three custom instructions: nullable, notNull and null. All of these custom instructions have one parameter of type node<> which represents an expression or a variable.