PHP String Analyzer

PHP string analyzer is a static program analyzer that approximates the string output of a PHP program with a context-free grammar. The analyzer can be used to check properties of a PHP program. For example, it can be used to validate dynamically generated Web pages by a PHP program.

You can find the basic principle of the analysis in the following paper. The XML validation algorithms used in the analyzer are described in the following paper. The preliminary experiments on XHTML validation with the analyzer is reported in the following paper. Note the current of the analyzer does not use the validation algorithm described in the paper. The basic idea of the analyzer comes from Java String Analyzer. It is a string analyzer for Java based on regular languages.

What is the PHP String analyzer

PHP String Analyzer approximates the string output of a program as a context-free grammar. the analyzer takes two inputs: a PHP program and an input specification.
Let us consider the following program.
<?php
for ($i = 0; $i < $n; $i++)
  $x = "0".$x."1";
echo $x;
?>
For the analysis, we need to specify the initial values of the global variables in the program. The specification is given as follows:
$x : /abc|xyz/
$n : int
The specification /abc|xyz/ is a regular expression representing the set of strings {abc, xyz}. The type 'int' is specified for the variable $n. The analyzer is executed as follows:
% phpsa -ispec example0.ispec -simplify example0.php 
where the option -simplify indicates that the analyzer tries to simplify the CFG.
Then we obtain the following context-free grammar as approximation of the program's string output.
({$268$, $293$},                                 // variables
 {$268$ -> $293$|0$268$1,$293$ -> abc|xyz},      // productions
 $268$)                                          // start symbol
This grammar represents the set of strings { 0^nabc1^n | n >= 0} U { 0^nxyz1^n | n >= 0} as we expect.

Authors

Related projects and works