SMT Theories

A key feature of SMT is that the entire problem is parameterized by the choice of a theory \(T\). This is important because it means that SMT is an algorithmic framework, rather than a fixed algorithm. Thus, if a particular problem cannot easily be encoded in any existing theory supported by SMT solvers, one option is to add support for a new theory which is better suited to the problem. In fact, this is exactly the process by which many of the theories supported by modern SMT solvers were added.

Theories can be used alone or in arbitrary combinations. Besides the theory, other parameters related to the syntax of formulas include whether or not to enable quantifiers and whether to disallow or limit the use of certain theory operations. In the SMT-LIB standard, and in solvers that support it, these pa- rameters are configured by specifying a logic. A logic identifies the theory (or theories) being used and optionally imposes syntactic restrictions on the allowed formulas. Users can provide the SMT solver with a predefined logic name (like QF_LIA, QF_BV and UF seen earlier) to specify which logic is to be used. By default (i.e., if no logic name is provided), SMT solvers typically enable all the theories they support and allow all operations. This is equivalent to using the special logic name ALL. However, solvers are often tuned with specific heuristics for specific logics. Thus, it is advisable to provide the solver with the most specific logic name possible. In this section, we discuss the most common theories and logics supported by SMT solvers, with examples of each.

Core Theory and Uninterpreted Symbols

The SMT-LIB standard defines a core theory which consists of a core signature with a fixed interpretation that is always present, regardless of which other theories are being used. The core theory defines the Boolean sort Bool (BoolSort() in Python), the Boolean theory constants true and false (BoolVal(True) and BoolVal(False) in Python), and the operators =, not, and, or, xor, and => (==, Not, And, Or, Xor, and Implies in Python), all with the usual meanings. The equality symbol = is polymorphic: it can be applied to two terms of the same sort, for any predefined or user-declared sort. There are also two more polymorphic operators that require a bit more explanation. The distinct (Distinct) operator takes two or more arguments of the same sort and returns true exactly when all the arguments have pairwise distinct values. The ite (If) operator takes three arguments, the first of which must be of Boolean sort. The other two arguments can have any sort as long as it is the same for both. The meaning of the ite operator is the second argument when the first argument is true, and the third argument otherwise.

The simplest logic that builds on the core theory is QF_UF, short for “quantifier-free uninterpreted functions.” This logic disallows quantifiers and does not define any new symbols beyond those in the core theory. However, it allows the user to extend the signature with new sorts and symbols. The SMT solver is allowed to interpret these symbols in any way it chooses. This is why they are referred to as uninterpreted: the solver does not impose any restrictions on the interpretation (besides the declared arity and rank). The following example illustrates the use of uninterpreted symbols as well as the And and Distinct operators.

Example 4. Let \(f\) be a unary function from \(U\) to \(U\), for some set \(U\). Check that, whatever the meaning of \(f\), if \(f(f(f(x)))=x\) and \(f(f(f(f(f(x)))))=x\), then \(f(x)=x\).

Solutions for this example are shown below.

examples/ffff.smt2

(set-logic QF_UF)
(declare-sort U 0)

(declare-fun f (U) U)
(declare-const x U) 

(assert (and (= (f (f (f x))) x) (= (f (f (f (f (f x))))) x)))
(assert (distinct (f x) x))

(check-sat)

examples/ffff.py

from cvc5.pythonic import *
U = DeclareSort("U")
f = Function("f", U, U)
x = Const("x", U)

s = SolverFor('QF_UF')
s.add(And((f(f(f(x))) == x), (f(f(f(f(f(x))))) == x)))
s.add(Distinct(f(x), x))  # negation of f(x) = x
print(s.check())

We can derive \(f(x) = x\) from the first assertion by performing a series of substitutions, and thus the problem is unsatisfiable.

examples/ffff.out.smt2

1unsat

examples/ffff.out.py

1unsat

Now, we present a simple example that illustrates the ite operator. It also shows that in Python, we can use != instead of Distinct to assert that two terms are distinct.

Example 5. Suppose we know that \(x\) is either equal to \(y\) or \(z\), depending on the value of the Boolean \(b\). Suppose we further know that \(w\) is equal to one of \(y\) or \(z\). Does it follow that \(x\) equals \(w\)?

Solutions are shown below.

examples/xyzwite.smt2

(set-logic QF_UF)
(set-option :produce-models true)

(declare-sort U 0)

(declare-const b Bool)
(declare-const x U) 
(declare-const y U) 
(declare-const z U) 
(declare-const w U) 

(assert (= x (ite b y z)))
(assert (or (= w y) (= w z)))
(assert (distinct x w))
  
(check-sat)
(get-model)

examples/xyzwite.py

from cvc5.pythonic import *
U = DeclareSort("U")
b = Const("b", BoolSort())
x, y, z, w = Consts("x y z w", U)

s = SolverFor('QF_UF')
s.add(x == (If(b, y, z)))
s.add(Or((w == y), (w == z)))
s.add(x != w)

if s.check() == sat:
  m = s.model()
  print("\n".join([str(k) + " : " + str(m[k]) for k in m]))

The SMT-LIB and Python outputs when using cvc5 are as follows.

examples/xyzwite.out.smt2

sat
(
; cardinality of U is 2
; rep: (as @U_0 U)
; rep: (as @U_1 U)
(define-fun b () Bool true)
(define-fun x () U (as @U_1 U))
(define-fun y () U (as @U_1 U))
(define-fun z () U (as @U_0 U))
(define-fun w () U (as @U_0 U))
)

examples/xyzwite.out.py

   b : True
   w : (as @U_0 U)
   x : (as @U_1 U)
   y : (as @U_1 U)
   z : (as @U_0 U)

The result tells us that it does not follow that \(x=w\). The model gives us a counterexample to that claim. Because the sort U is uninterpreted, the model returned by cvc5 must choose an interpretation for it. Here, cvc5 tells us that it is interpreting U as a set with two elements, named @U_0 and @U_1. The model then specifies that x and y have one value and z and w have the other, so x is not equal to w.

Exercise 3. Modify Example 4 to make it satisfiable and Example 5 to make it unsatisfiable.

Solution to Exercise 3

Arithmetic

Though there are many tools available for arithmetic reasoning, SMT solvers are unique in their ability to reason efficiently about arbitrary Boolean combinations of arithmetic constraints, as well as to combine arithmetic reasoning with reasoning about other theories. It is important to note that SMT solvers reason precisely about both integer and real arithmetic. That is, they use arbitrary-precision arithmetic as opposed to machine integer or floating-point approximations. This means that SMT solvers are not susceptible to the numerical errors that can arise, for instance, when using floating-point arithmetic to approximate real arithmetic. It also means that for problems whose complexity lies mainly in the arithmetic reasoning, as opposed to Boolean reasoning, SMT solvers are typically slower than tools that use floating-point approximations. The underlying algorithms for arithmetic reasoning in SMT solvers are based on standard techniques that have been adapted to the SMT context, such as the Simplex algorithm [R20] and Cylindrical Algebraic Decomposition [R2].

There are a large number of logics to choose from within the arithmetic umbrella, with reasoning over reals generally more efficient than reasoning over integers, and reasoning over less expressive formulas generally more efficient than reasoning over more expressive ones. We briefly discuss the various logics here.

Difference Logic

In difference logic, every arithmetic constraint must be of the form \(x - y \bowtie c\) or \(x \bowtie c\), where \(\bowtie\ \in\{=, <, >, \le, \ge\}\), and \(c\) is a numeric theory constant. If \(x\) and \(y\) range over integers, we call it integer difference logic, and if they range over reals, we call it real difference logic. Efficient algorithms exist for both [R17], [R36]. The names of these logics are QF_IDL and QF_RDL, respectively. One application for difference logic is job shop scheduling [R41].

Example 6. Suppose we have 3 jobs to complete on 2 machines. Job 1 requires machine 1 for 10 minutes and then machine 2 for 5 minutes. Job 2 requires machine 2 for 20 minutes and then machine 1 for 5 minutes. And Job 3 requires machine 1 for 5 minutes and then machine 2 for 5 minutes. Can all jobs be completed in 30 minutes?

To solve the problem, we create integer variables for the start times of each task within each job. We assert that the start times are non-negative, each task within each job doesn’t start until the previous task finishes, and tasks on each machine don’t overlap. Finally, we check that each task finishes on time.[1]

examples/diff.smt2

(set-logic QF_IDL)
(set-option :produce-models true)

(declare-const j11 Int)
(declare-const j12 Int) 
(declare-const j21 Int)
(declare-const j22 Int)
(declare-const j31 Int)
(declare-const j32 Int)

(assert (and (>= j11 0) (>= j12 0) (>= j21 0) (>= j22 0) (>= j31 0) (>= j32 0)))
(assert (and (>= (- j12 j11) 10) (>= (- j22 j21) 20) (>= (- j32 j31) 5)))
(assert (and (or (>= (- j22 j11) 10) (>= (- j11 j22) 5))
             (or (>= (- j31 j11) 10) (>= (- j11 j31) 5))
             (or (>= (- j31 j22) 5) (>= (- j22 j31) 5))))
(assert (and (or (>= (- j21 j12) 5) (>= (- j12 j21) 20))
             (or (>= (- j32 j12) 5) (>= (- j12 j32) 5))
             (or (>= (- j32 j21) 20) (>= (- j21 j32) 5))))
(assert (and (<= j12 25) (<= j22 25) (<= j32 25)))
  
(check-sat)
(get-model)

examples/diff.py

from cvc5.pythonic import *

j11,j12,j21,j22,j31,j32 = Ints("j11 j12 j21 j22 j31 j32")

s = SolverFor('QF_IDL')
s.add(And([x >= 0 for x in [j11, j12, j21, j22, j31, j32]]))
s.add(And(j12 - j11 >= 10, j22 - j21 >= 20, j32 - j31 >= 5))
s.add(And(Or(j22 - j11 >= 10, j11 - j22 >= 5), 
          Or(j31 - j11 >= 10, j11 - j31 >= 5), 
          Or(j31 - j22 >= 5, j22 - j31 >= 5)))
s.add(And(Or(j21 - j12 >= 5, j12 - j21 >= 20), 
          Or(j32 - j12 >= 5, j12 - j32 >= 5), 
          Or(j32 - j21 >= 20, j21 - j32 >= 5)))
s.add(And(j12 <= 25, j22 <= 25, j32 <= 25))

print(s.model() if s.check() == sat else "unsat")

The output is as follows.

examples/diff.out.smt2

sat
(
(define-fun j11 () Int 5)
(define-fun j12 () Int 20)
(define-fun j21 () Int 0)
(define-fun j22 () Int 20)
(define-fun j31 () Int 15)
(define-fun j32 () Int 25)
)

examples/diff.out.py

[j11 = 5, j12 = 20, j21 = 0, j22 = 20, j31 = 15, j32 = 25]

Exercise 4. What is the minimum amount of time that it will take to complete all of the jobs in Example 6? What is the minimum if we change the requirements so that Job 2 only requires machine 2 for 15, and all other requirements remain the same?

Solution to Exercise 4

Linear Arithmetic

The logic of linear arithmetic allows arithmetic constraints to have any form that is equivalent to \(\sum c_i x_i + b \bowtie 0\), where \(b,c_i\) are numeric theory constants and \(\bowtie\ \in\{=,<,>,\le,\ge\}\). As before, there are both integer and real variants, QF_LIA and QF_LRA, respectively. One can also mix the two with QF_LIRA. Note that, according to the SMT-LIB standard, when using QF_LIRA, integers and reals should not be mixed in the same linear sum, but most solvers (including cvc5 and z3) are more permissive and do allow mixed terms. Example 1 is a good example of a simple QF_LIA problem.

Exercise 5. Repeat Exercise 1, but change the logic to QF_LRA, change the types of the variables from Int to Real, and append .0 to each numeric constant. Now, what output does the solver give?

Solution to Exercise 5

Nonlinear arithmetic

Moving up the expressiveness hierarchy, we next have logics for quantifier-free nonlinear arithmetic. In these logics, arbitrary polynomials are allowed in constraints. The logic QF_NRA is for nonlinear arithmetic over the reals, which is decidable but with doubly exponential complexity [R2]. On the other hand, the same logic over integers, QF_NIA, is undecidable. cvc5 implements a decision procedure for QF_NRA based on a combination of heuristic pruning and cylindrical algebraic coverings [R29]. cvc5 and other tools implement incomplete heuristic procedures for QF_NIA.

Example 7. Find a solution for \(x^2y + yz + 2xyz + 4xy + 8xz + 16 = 0\).

examples/qfnra.smt2

(set-logic QF_NRA)
(set-option :produce-models true)

(declare-const x Real)
(declare-const y Real) 
(declare-const z Real)

(assert (= (+ (* x x y) (* y z) (* 2.0 x y z) (* 4.0 x y) (* 8.0 x z) 16.0) 0.0))
(check-sat)
(get-model)

examples/qfnra.py

from cvc5.pythonic import *
x, y, z = Reals("x y z")
s = SolverFor('QF_NRA')
s.add(x*x*y + y*z + 2*x*y*z + 4*x*y + 8*x*z + 16 == 0)
print(s.model() if s.check() == sat else "unsat")

The output is as follows.

examples/qfnra.out.smt2

sat
(
(define-fun x () Real (- 2.0))
(define-fun y () Real (- 7.0))
(define-fun z () Real (/ (- 44) 5))
)

examples/qfnra.out.py

[x = -2, y = -7, z = -44/5]

Arrays

Consider the following Python function which swaps two elements in a dictionary.

def swap(a,i,j):
  tmp = a[i]
  a[i] = a[j]
  a[j] = tmp

If a[i] and a[j] happen to be equal, the dictionary a is unchanged by the function. To prove this fact, we could try modeling dictionaries as uninterpreted functions. However, asserting that two functions are equal is not allowed in first-order logic. Alternatively, we could use a quantifier to assert that two functions return the same output when given the same input, for any input. However, we would like to avoid quantifiers when possible, as their use puts us in an undecidable logic.

Fortunately, the SMT-LIB standard includes a theory of arrays [R30], which can help in this situation. The theory is perhaps more accurately viewed as a theory of mutable maps and is parameterized by two sorts, one for the index (corresponding to the key type of the dictionary) and one for the elements (values in the dictionary). For example, the SMT-LIB sort (Array Int Real) represents arrays indexed by integers and containing reals. Note that SMT arrays are always total, in the sense that they have an element for every value in the index sort. In particular, an array indexed by Int is conceptually infinite.

The theory has two operators: select, which takes an array and an index and returns the element at that index, and store, which takes an array \(a\), an index \(i\), and an element \(e\), and returns a new array that is the result of updating \(a\) with the element \(e\) at index \(i\).

Typically, the theory of arrays is used in combination with other theories that make sense for the index and element sorts. For example, the logic QF_DT allows quantifier-free formulas with variables that range over integers and arrays of integers. The simplest logic with arrays is QF_AX, in which all the sorts must be uninterpreted.

In the example below, we encode the above problem using the array theory.

Example 8. For the Python program above, show that, for arbitrary index and element sorts, if a[i] and a[j] are equal, then so are a and swap(a,i,j).

examples/arrays.smt2

(set-logic QF_AX)
(declare-sort I 0)
(declare-sort E 0)
(declare-fun i () I)
(declare-fun j () I)
(declare-fun tmp () E)
(declare-fun a_in () (Array I E))
(declare-fun a_out () (Array I E))

(assert (= tmp (select a_in i)))
(assert (= a_out (store (store a_in i (select a_in j)) j tmp)))
(assert (= (select a_in i) (select a_in j)))
(assert (distinct a_in a_out))

(check-sat)

examples/arrays.py

from cvc5.pythonic import *
I = DeclareSort("I")
E = DeclareSort("E")
i, j = Consts("i j", I)
tmp = Const("tmp", E)
array = ArraySort(I, E)
a_in, a_out = Consts("a_in, a_out", array)

s = SolverFor('QF_AX')

s.add(tmp == a_in[i])
s.add(a_out == (Store(Store(a_in, i, a_in[j]), 
    j, tmp)))
s.add(a_in[i] == a_in[j])
s.add(a_in != a_out)

print(s.check())

The output is as follows.

examples/arrays.out.smt2

1unsat

examples/arrays.out.py

1unsat

Exercise 6. Another property of swap that we can prove is that if a[i] and a[j] are distinct, then swap would change a. Modify the solution for Example 8 to prove this property.

Solution to Exercise 6

Bit-vectors

Consider a simple implementation (written in a C-like syntax) for computing the absolute value of a 32-bit integer: \(abs(x) := x < 0 \;?\; {-x} : x\). Instead of branching on \(x < 0\), it is possible to compute the absolute value of \(x\) with three or four branch-free operations [R28] as follows. Let \(\mathit{xrs}\) be an abbreviation for the arithmetic right shift (\(\mathop{>\kern-.3em>}_s\)) of \(x\) by \(31\) bits. Note that the result of this operation is either \(0\) or \(-1\) (all bits set to 1), depending on the most significant bit (MSB) of \(x\): if the MSB of \(x\) is 0, \(\mathit{xrs}\) is 0; otherwise, \(\mathit{xrs}\) is -1. Three branchless alternatives for computing the absolute value of \(x\) are as follows.

\(\mathit{abs}_1(x) := (x \oplus \mathit{xrs}) - \mathit{xrs}\)
\(\mathit{abs}_2(x) := (x + \mathit{xrs}) \oplus \mathit{xrs}\)
\(\mathit{abs}_3(x) := x - ((x \mathop{<\kern-.3em<} 1) \mathrel{\&} \mathit{xrs})\)

These branchless versions of \(\mathit{abs}(x)\) make use of the 32-bit versions of the bit-wise operations exclusive or (\(\oplus\)), bit-wise and (\(\mathrel{\&}\)), logical shift left (\(\mathop{<\kern-.3em<}\)), and arithmetic shift right (\(\mathop{>\kern-.3em>}_s\)).

We can use an SMT solver to prove whether the branchless versions are equivalent to the original implementation. Note that integers, as discussed in Arithmetic, are not a good fit, as it is difficult to model the bitwise operators using the arithmetic operators. However, the SMT-LIB standard includes a theory of fixed-size bit-vectors, which defines the bit-precise semantics of fixed-size machine integers. The name for the quantifier-free logic containing just this theory is QF_BV. Using this logic, we can easily check the equivalence of the absolute value computations.

Example 9. Show that the first branchless alternative \(\mathit{abs}_1\) is equivalent to \(\mathit{abs}\).

examples/branchless_abs.smt2

(set-logic QF_BV)

(declare-const x (_ BitVec 32))

; orignal abs(x) version
(define-fun abs ((x (_ BitVec 32))) (_ BitVec 32) (ite (bvslt x (_ bv0 32)) (bvneg x) x))

; encode branchless abs(x)
(define-fun xrs () (_ BitVec 32) (bvashr x (_ bv31 32)))
(define-fun abs1 ((x (_ BitVec 32))) (_ BitVec 32) (bvsub (bvxor x xrs) xrs))

(assert (distinct (abs x) (abs1 x)))  ; check if abs() and abs1() are equivalent

(check-sat)

examples/branchless_abs.py

from cvc5.pythonic import *
x = Const("x", BitVecSort(32))
xrs = x >> 31
s = SolverFor('QF_BV')
s.add(If(x < 0, -x, x) != (x ^ xrs) - xrs)      # prove abs() == abs1()
print(s.model() if s.check() == sat else "unsat")

The output is as follows.

examples/branchless_abs.out.smt2

1unsat

examples/branchless_abs.out.py

1unsat

Exercise 7. Show that the second and third branchless alternatives \(\mathit{abs}_2\) and \(\mathit{abs}_3\) are equivalent to \(\mathit{abs}\).

Solution to Exercise 7

Datatypes

Built into the SMT-LIB language is a mechanism for defining (algebraic) datatypes. Datatypes are highly useful in applications for reasoning about data structures like records, lists, and trees [R7]. The quantifier-free logic name is QF_DT.

Example 10. Model a binary tree containing integer data. Find trees \(x\) and \(y\) such that \((i)\) the left subtree of \(x\) is the same as the right subtree of \(y\) and \((ii)\) the data stored in \(x\) is greater than \(100\).

Note that we need both datatypes and integer arithmetic for this example. cvc5 supports the logic name QF_DTLIA, but z3 does not. Fortunately, we can always use ALL for the logic if a more specific logic is not available.

examples/datatypes.smt2

(set-logic QF_DTLIA)
(set-option :produce-models true)

(declare-datatypes ((Tree 0)) (((node (data Int) (left Tree) (right Tree)) (nil))))

(declare-fun x () Tree)
(declare-fun y () Tree)

(assert ((_ is node) x))
(assert ((_ is node) y))

(assert (= (left x) (right y)))
(assert (> (data x) 100))

(check-sat)
(get-model)

examples/datatypes.py

from cvc5.pythonic import *
decl = Datatype("tree")
decl.declare("node", ("data", IntSort()), ("left", decl), ("right", decl))
decl.declare("nil")
Tree = decl.create()

x, y = Consts("x y", Tree)

s = SolverFor('ALL')
s.add(Tree.is_node(x))
s.add(Tree.is_node(y))
s.add(Tree.left(x) == Tree.right(y))
s.add(Tree.data(x) > 100)

print(s.model() if s.check() == sat else "unsat")
    

The output from cvc5 is:

examples/datatypes.out.smt2

sat
(
(define-fun x () Tree (node 101 nil (node 0 nil nil)))
(define-fun y () Tree (node 0 (node 0 nil (node 0 nil nil)) nil))
)

examples/datatypes.out.py

[x = node(101, nil, node(0, nil, nil)), y = node(0, node(0, nil, node(0, nil, nil)), nil)]

Exercise 8. Show that a tree cannot be equal to its own left subtree.

Solution to Exercise 8

Floating-Point Arithmetic

The most common representation of real numbers in hardware and software is the binary floating-point number representation system as defined by the IEEE Standard 754-2019 for Floating-Point Arithmetic [R27]. Floating-point numbers are encoded as a triple of bit-vectors: the fractional part (the significand), the exponent (a power of 10 by which the significand is multiplied), and a sign bit. This representation is of limited range and precision, and thus, the domain of floating-point numbers is finite. It also includes special values for representing errors as not-a-number and for plus and minus infinity. In SMT-LIB, the IEEE-754 standard is formalized as the theory of floating-point arithmetic [R11]. The quantifier-free logic name is QF_FP.

Example 11. The SMT-LIB standard supports a fused multiplication and addition operator fp.fma. Given three single precision floating-point numbers \(a\), \(b\), and \(c\), show that the floating-point fused multiplication and addition of \(a\), \(b\), and \(c\) is different from first multiplying \(a\) and \(b\) and then adding \(c\).

examples/fp.smt2

(set-logic QF_FP)
(set-option :produce-models true)
(declare-const a Float32)
(declare-const b Float32)
(declare-const c Float32)
(declare-const rm RoundingMode)
(assert (distinct (fp.fma rm a b c) (fp.add rm (fp.mul rm a b) c)))
(check-sat)
(get-model)
(get-value ((fp.fma rm a b c)))
(get-value ((fp.add rm (fp.mul rm a b) c)))

examples/fp.py

from cvc5.pythonic import *

a, b, c = FPs("a b c", Float32())
rm = Const("rm", RNE().sort())
s = SolverFor('QF_FP')
s.add(Distinct(fpFMA(rm, a, b, c), fpAdd(rm, fpMul(rm, a, b),c)))
result = s.check()
m = s.model()
print(m)
print(f'fpFMA(rm, a, b, c) = {m.eval(fpFMA(rm, a, b, c))}')
print(f'fpAdd(rm, fpMul(rm, a, b),c) = {m.eval(fpAdd(rm, fpMul(rm, a, b),c))}')

The output from cvc5 is as follows.

examples/fp.out.smt2

sat
(
(define-fun a () (_ FloatingPoint 8 24) (fp #b1 #b01111110 #b01010101010101010101011))
(define-fun b () (_ FloatingPoint 8 24) (fp #b1 #b01111110 #b11111111111111111111111))
(define-fun c () (_ FloatingPoint 8 24) (fp #b1 #b01111110 #b11111111111111111111111))
(define-fun rm () RoundingMode roundTowardPositive)
)
(((fp.fma rm a b c) (fp #b1 #b01111101 #b01010101010101010101001)))
(((fp.add rm (fp.mul rm a b) c) (fp #b1 #b01111101 #b01010101010101010101000)))

examples/fp.out.py

[a = -1.3333333730697632*(2**-1), b = -1.9999998807907104*(2**-1), c = -1.9999998807907104*(2**-1), rm = RTP()]
fpFMA(rm, a, b, c) = -1.333333134651184*(2**-2)
fpAdd(rm, fpMul(rm, a, b),c) = -1.3333330154418945*(2**-2)

Exercise 9. Modify the solution to Example 11 to show that floating-point addition is not associative, i.e., \(a + (b + c) \not= (a + b) + c\).

Solution to Exercise 9

Strings

It is often necessary to reason about string data when reasoning about programs. Reasoning about bit-vector representations of strings has the disadvantage that it requires fixing the string length up front. Also, the theory of bit-vectors does not include many of the utility functions for strings that exist in string libraries in programming languages. The SMT-LIB theory of strings provides support for variable-length strings and a large set of string operations. The quantifier-free logic name is QF_S. Typically, though, we use QF_SLIA since we need arithmetic to reason about string lengths.

Example 12. Given two strings, x1 and x2, each consisting of no more than two characters, is it possible to build the string "abbaabb" using only 3 string concatenations (where each concatenation may use any previous result including x1 and x2)?

We can solve this problem by building a circuit of string concatenations and using nondeterministic choice to pick the inputs for each concatenation.

examples/strings.smt2

(set-logic QF_SLIA)
(set-option :produce-models true)

(declare-const p1 Bool)
(declare-const p2 Bool)
(declare-const p3 Bool)
(declare-const p4 Bool)
(declare-const p5 Bool)
(declare-const p6 Bool)
(declare-const p7 Bool)
(declare-const p8 Bool)
(declare-const p9 Bool)
(declare-const p10 Bool)
(declare-const p11 Bool)
(declare-const p12 Bool)

(declare-const x1 String)
(declare-const x2 String)
(declare-const x3 String)
(declare-const x4 String)
(declare-const x5 String)

(declare-const i1 String)
(declare-const i2 String)
(declare-const i3 String)
(declare-const i4 String)
(declare-const i5 String)
(declare-const i6 String)

(define-const result String "abbaabb")

(assert (and (<= (str.len x1) 2) (<= (str.len x2) 2)))

(assert (= i1 (ite p1 x1 x2)))
(assert (= i2 (ite p2 x1 x2)))
(assert (= x3 (str.++ i1 i2)))

(assert (= i3 (ite p3 x1 (ite p4 x2 x3))))
(assert (= i4 (ite p5 x1 (ite p6 x2 x3))))
(assert (= x4 (str.++ i3 i4)))

(assert (= i5 (ite p7 x1 (ite p8 x2 (ite p9 x3 x4)))))
(assert (= i6 (ite p10 x1 (ite p11 x2 (ite p12 x3 x4)))))
(assert (= x5 (str.++ i5 i6)))

(assert (= x5 result))

(check-sat)
(get-model)

examples/strings.py

from cvc5.pythonic import *
p, x, i = {}, {}, {}
for k in range(1, 13): p[k] = Bool("p" + str(k))
for k in range(1, 6): x[k] = String("x" + str(k))
for k in range(1,7): i[k] = String("i" + str(k))

result = StringVal("abbaabb")

s = SolverFor('QF_SLIA')
s.add(And(Length(x[1]) <= 2, Length(x[2]) <= 2))

s.add(i[1] == If(p[1], x[1], x[2]))
s.add(i[2] == If(p[2], x[1], x[2]))
s.add(x[3] == Concat(i[1], i[2]))

s.add(i[3] == If(p[3], x[1], If(p[4], x[2], x[3])))
s.add(i[4] == If(p[5], x[1], If(p[6], x[2], x[3])))
s.add(x[4] == Concat(i[3], i[4]))

s.add(i[5] == If(p[7], x[1], If(p[8], x[2], If(p[9], x[3], x[4]))))
s.add(i[6] == If(p[10], x[1], If(p[11], x[2], If(p[12], x[3], x[4]))))
s.add(x[5] == Concat(i[5], i[6]))

s.add(x[5] == result)

print(s.model() if s.check() == sat else "unsat")

The output is as follows:

examples/strings.out.smt2

sat
(
(define-fun p1 () Bool true)
(define-fun p2 () Bool false)
(define-fun p3 () Bool true)
(define-fun p4 () Bool true)
(define-fun p5 () Bool false)
(define-fun p6 () Bool false)
(define-fun p7 () Bool false)
(define-fun p8 () Bool false)
(define-fun p9 () Bool true)
(define-fun p10 () Bool false)
(define-fun p11 () Bool false)
(define-fun p12 () Bool false)
(define-fun x1 () String "a")
(define-fun x2 () String "bb")
(define-fun x3 () String "abb")
(define-fun x4 () String "aabb")
(define-fun x5 () String "abbaabb")
(define-fun i1 () String "a")
(define-fun i2 () String "bb")
(define-fun i3 () String "a")
(define-fun i4 () String "abb")
(define-fun i5 () String "abb")
(define-fun i6 () String "aabb")
)

examples/strings.out.py

[i1 = "a", i2 = "bb", i3 = "a", i4 = "abb", i5 = "abb", i6 = "aabb", p1 = True, p10 = False, p11 = False, p12 = False, p2 = False, p3 = True, p4 = True, p5 = False, p6 = False, p7 = False, p8 = False, p9 = True, x1 = "a", x2 = "bb", x3 = "abb", x4 = "aabb", x5 = "abbaabb"]

Exercise 10. Use SMT to determine how many concatenations are needed to get "abbaabb" if x1 and x2 are both restricted to have a length of 1.

Solution to Exercise 10

Quantifiers

We saw an example of quantified formulas in Example 3. Quantifiers can be enabled in SMT solvers by dropping QF from the logic name. However, enabling quantifiers typically increases the complexity of the decision problem significantly. In fact, solving UF problems is equivalent to solving the decision problem for first-order logic, Hilbert’s original Entscheidungsproblem, which is undecidable. And although LIA, LRA, and NRA are decidable, the decision procedures are expensive. For these reasons, SMT solvers mostly handle quantifiers by attempting to find quantifier instantiations that, together with the other quantifier-free assertions, are unsatisfiable. For problems that are expected to be unsatisfiable, this approach can be quite effective. Moreover, by using different instantiation techniques and effort levels, a wide variety of problems can be solved.

cvc5 supports several techniques for handling quantified formulas, which can vary based on the logic. By default, cvc5 limits its effort so that it usually returns quickly with an answer of either unsat or unknown. For logics that include uninterpreted functions, it uses a combination of E-matching [R31] and conflict-based instantiation [R40]. In case the user wants to invest more effort, these techniques can be supplemented with techniques such as enumerative instantiation [R38] (option enum-inst). For logics that admit quantifier elimination (e.g., quantified linear arithmetic or bit-vectors), it uses counterexample-guided quantifier instantiation [R34], [R39], which is a complete procedure for these logics.

By default, cvc5 will generally not attempt to determine that an input with quantified formulas is satisfiable. However, more advanced techniques can be used to answer sat in the presence of quantified formulas, including finite model finding [R37] (option finite-model-find), model-based quantifier instantiation [R23] (option mbqi), and syntax-guided quantifier instantiation [R35] (option sygus-inst).

In general, to set options that are not on by default, we can use the setOption solver method in Python, as shown below.

from cvc5.pythonic import *
s = SolverFor('UF')
s.setOption('enum-inst', True)
s.setOption('finite-model-find', True)
s.setOption('mbqi', True)
s.setOption('sygus-inst', True)

Non-standard Theories

cvc5 and z3 support several theories that are not (yet) part of the SMT-LIB standard. We discuss a few of them briefly here, focusing on those supported by cvc5. More documentation about non-standard theories, including reference tables describing the supported operators can be found on the cvc5 website.

Sequences

The theory of sequences brings together features of the theories of arrays and strings. Similar to arrays, sequences are parameterized by the sort of their elements. So we can declare a sequence of integers, a sequence of bit-vectors, and so on. Like strings, sequences have a variable but finite length and can be concatenated together. The sequence theory is enabled whenever the string theory is enabled (e.g., by using the logic name QF_S or QF_SLIA. Note that z3 also supports a theory of sequences that is mostly (but not entirely) compatible with the cvc5 version.

Example 13. Let x be a sequence of integers. Find a value for x such that the first and last elements sum to 9, and if we concatenate x with itself, then \((3,4,5)\) appears as a subsequence.

examples/sequences.smt2

(set-logic QF_SLIA)
(set-option :produce-models true)

(declare-const x (Seq Int))
(declare-const y (Seq Int))
(declare-const z (Seq Int))

(assert (> (seq.len x) 0))
(assert (= (+ (seq.nth x 0) (seq.nth x (- (seq.len x) 1))) 9))

(assert (= y (seq.++ x x)))
(assert (= z (seq.++ (seq.unit 3) (seq.unit 4) (seq.unit 5))))
(assert (seq.contains y z))

(check-sat)
(get-model)

examples/sequences.py

from cvc5.pythonic import *
x, y, z = Consts("x y z", SeqSort(IntSort()))

s = SolverFor('QF_SLIA')

s.add(Length(x) > 0)
s.add(x[0] + x[Length(x) - 1] == 9)
s.add(y == Concat(x,x))
s.add(z == Concat(Unit(IntVal(3)), Unit(IntVal(4)), Unit(IntVal(5))))
s.add(Contains(y, z))

print(s.model() if s.check() == sat else "unsat")

The output is as follows.

examples/sequences.out.smt2

sat
(
(define-fun x () (Seq Int) (seq.++ (seq.unit 4) (seq.unit 3) (seq.unit 4) (seq.unit 5)))
(define-fun y () (Seq Int) (seq.++ (seq.unit 4) (seq.unit 3) (seq.unit 4) (seq.unit 5) (seq.unit 4) (seq.unit 3) (seq.unit 4) (seq.unit 5)))
(define-fun z () (Seq Int) (seq.++ (seq.unit 3) (seq.unit 4) (seq.unit 5)))
)

examples/sequences.out.py

[x = (seq.++ (seq.unit 4) (seq.unit 3) (seq.unit 4) (seq.unit 5))(), y = (seq.++ (seq.unit 4) (seq.unit 3) (seq.unit 4) (seq.unit 5) (seq.unit 4) (seq.unit 3) (seq.unit 4) (seq.unit 5))(), z = (seq.++ (seq.unit 3) (seq.unit 4) (seq.unit 5))()]

Exercise 11. Show that it’s not possible to have sequences x, y, and z such that x is a proper prefix of y, y is a proper prefix of z, and z is a proper prefix of x.

Solution to Exercise 11

Finite Fields

cvc5 can reason about constraints over finite fields of order \(p\), where \(p\) is any prime. It relies on the fact that a field of order \(p\) is isomorphic to the integers modulo \(p\). The quantifier-free logic name for finite fields is QF_FF. At the time of writing, this theory is not supported by other SMT solvers.

Example 14. In a finite field of order 13, find two elements such that their sum and product are both equal to the multiplicative identity in the field.

Running this example requires a GPL build of cvc5, as explained in the Introduction.

examples/fields.smt2

(set-logic QF_FF)
(set-option :produce-models true)

(define-sort F () (_ FiniteField 13))  ; F is an alias of sort (_ FiniteField 13)

(declare-const x F)
(declare-const y F)

(assert (= (ff.add x y) (as ff1 F)))
(assert (= (ff.mul x y) (as ff1 F)))

(check-sat)
(get-model)

examples/fields.py

from cvc5.pythonic import *
x, y = FiniteFieldElems("x y", 13)
s = SolverFor("QF_FF")
s.add(x + y == 1)
s.add(x * y == 1)
print(s.model() if s.check() == sat else "unsat")

The output is as follows.

examples/fields.out.smt2

sat
(
(define-fun x () (_ FiniteField 13) #f4m13)
(define-fun y () (_ FiniteField 13) #f10m13)
)

examples/fields.out.py

[x = 4, y = -3]

Exercise 12. In a finite field of order 13, find an element such that if you square it twice you get the multiplicative identity.

Solution to Exercise 12

Finite Sets

cvc5 has support for the theory of finite sets. This theory supports basic set operations like membership, union, and intersection, as well as constraints on a set’s cardinality. The quantifier-free logic name is QF_FS. At the time of writing, this theory is not supported by other SMT solvers.

Example 15. Verify that union distributes over intersection.

examples/sets.smt2

(set-logic QF_FS)

(declare-sort U 0)

(declare-const A (Set U))
(declare-const B (Set U))
(declare-const C (Set U))

(assert (not (= (set.union A (set.inter B C)) (set.inter (set.union A B) (set.union A C)))))

(check-sat)

examples/sets.py

from cvc5.pythonic import *
S = DeclareSort("S")
A, B, C = [Set(i, S) for i in ["A","B","C"]]
s = SolverFor('QF_FS')
s.add(Not((A | (B & C)) == ((A | B) & (A | C))))
print(s.check())

As, expected, the result is unsat

examples/sets.out.smt2

1unsat

examples/sets.out.py

1unsat

Exercise 13. Does set difference distribute over intersection? If not, find a counterexample.

Solution to Exercise 13

Combinations of Theories

So far, we have mostly seen examples of how to pose queries that involve a single theory. Part of the appeal of SMT solvers is their ability to mix reasoning about different theories. This can be done in a natural way. Any well-sorted formula is allowed, and all sort constructors can take any other sort as an argument.

One slight complication is the question of how to specify the logic name. It is always safe to use ALL as the logic name, though as mentioned above, it may be more efficient to give a more precise logic name. When mixing theories, cvc5 allows any logic name that follows the following rules. First, the logic name must start with the prefix QF_ if the intent is to limit reasoning to quantifier-free formulas. The rest of the logic name can include any of the following components: \((i)\) A for arrays; \((ii)\) UF for uninterpreted functions; \((iii)\) BV for bit-vectors; (iv) FP for floating-point numbers; \((v)\) DT for datatypes; (vi) S for strings and sequences; \((vii)\) either IDL, RDL, LIA, LRA, LIRA, NIA, NRA, or NIRA for arithmetic; \((viii)\) FF for finite fields; and \((ix)\) FS for finite sets. Thus, for example, QF_AUFDTBVLRA allows formulas that are quantifier-free and mix arrays, uninterpreted functions, datatypes, bit-vectors, and linear real arithmetic. Example 10, Example 12, and Example 13 illustrate combinations of theories.