Reachability, transitive closure, graphs

More on directed graphs, cycles, and reachability

We'll continue with the basic, directed graph model we started with last time:

sig Node {
    edges : set Node
}
run { #Node = 3}

When I run this, I get the same instance in the evaluator.

Recall that edges is a relation that has an element for every edge. We can think of this as the set of all 1-hop paths.

The instance we get first is:

[N1] -> [N0] -> [N3], (N1 also has a self loop)

The following results are all from the Evaluator in the Alloy Analyzer.

If we ask for the join of the edges relation with itself:

edges.edges

We get:

The set of 2-hop paths ({N1->N0, N1->N1, N1->N2} ).

edges.edges.edges

We get:

The set of 3-hop paths ({N1->N0, N1->N1, N1->N2} ).

iden (A built-in relation keyword in Alloy)

We get:

A relation from every atom to itself ({N0->N0, N1->N1, N2->N2}).

edges - iden

We get:

The graph with self-loops removed cycles({N0->N2, N1->N0} ).

Ruling out self-loops

We could add this as a constraint. As an exercise, we wrote on the whiteboard several different, equivalent statements for ruling out self loops. Each line here is redundant/equivalent with every other line:

pred noSelfLoopsPredicate {
  -- no N1->N1 etc.
  all n: Node | n->n not in edges
  all n: Node | n not in n.edges
  no n: Node | n in n.edges
}

pred noSelfLoopsRelational {
  no (iden & edges)
  edges = edges - edges
}

We can use Alloy to check that these are the same (up to some small scope):

assert same {
  noSelfLoopsPredicate iff noSelfLoopsRelational
}
check same for 10 Node

Let's continue on with the idea of removing loops.

We can specify the set of 2-hop paths that do not include self-loops as:

(edges - iden).(edges - iden)

We get:

2-hop paths not including self-loops: {N1->N2}

Joins

We discussed in more detail how to think about the . join relational operator.

We can conceptualize joins are taking two database tables, finding all the rows were the columns in the middle of the join are the same, and keeping just those rows with the "matched" columns removed.

Said another way, a join a.b on two binary relations, a and b:

Looks at the second atom, a2, of every pair a1->a2in a.
- Finds every pair b1->b2 in b where the first atom b1 matches a2.
- Adds a1->b2 to the output relation.

Arity, which joins are well-defined

The number of columns in a relation is called its arity (e.g., in Alloy, a binary relation has arity 2, a set has arity 1).

Since joins remove the "matched" column, the join of a relation with arity m and a relation with arity n has arity m + n - 2.

If we try to join a set with itself:

Node.Node

We get a type error. This is because it would result in a zero column table: sets have arity 1, and 1 + 1 - 2 = 0. Zero row tables are fine (they represent the empty set), zero column tables not defined.

Going back to path lengths: write down what expression I should write to get the relation for 4-hop paths (from start to end) within this graph.

edges.edges.edges.edges

That's getting tedious! To remove loops of length n, we're having to write the join of the relation with itself n-1 many times.

What we are trying to use, though, has a specific name. It's related to the property of transitivity that we've seen a few types.

Transitive closure

Alloy has a built-in operator that conceptually represents repeated joins of a relation and itself.

The transitive closure of a binary relation is the smallest relation that encompasses the relation and satisfies transitivity. It's conceptually the same as repeatedly joining the relation with itself until the relation stops growing. In Alloy, the ^ operator gives the transitive closure of a relation. That is, for a binary relation r, ^r is the smallest relation that satisfies transitivity and contains r.

We can think of finding the transitive closure by taking any 2-hop path in the graph, and drawing a new edge between the start and end if the edge was not already there. We repeat this process until no more edges can be added this way.

This is on example of a running an algorithm to a fixed point. A fixed point, in computer science, is informally the point at which applying a loop iteration within an algorithm no longer changes the output.

The transitive closure if the fixed-point of taking the join of the relation with itself.

Reachability of a specific node

Now, we can use the transitive closure to think about all possible paths in a graph:

^edges

is the set of all paths in a graph.

The transitive closure is useful to express the “reachability” within a graph.

Let's imagine that nodes are servers, and CSLinux is one specific server.

How could I model this in Alloy?

(Think, then click!)

We want a special Node type to be a CSLinux server, so I will use extends. We want there to be just one, so we'll use one.

one sig CSLinux extends Node {}

We can now express the set of nodes reachable from CSLinux:

(Think, then click!)

CSLinux.^edges

And the set of nodes CSLinux is reachable from:

(Think, then click!)

^edges.CSLinux

Going back to our example without CSLinux.

If we wanted to eliminate all loops from our graph in this example, we could then use:

no ^edges & iden

Which constraints the instance such that no node is “reachable” from itself.

This is equivalent to:

no n: Node | n in n.^edges

Undirected graphs

How can we model an undirected graph in Alloy?

Relations have order, but we can include a symmetric constraint to represent that the order does not matter (a pair is either in the relation edges or not, regardless of order).

fact undirectedPredicate {
  all x, y: Node | x->y in edges implies y->x in edges
}

// same as: edges = ~edge
fact undirectedRelational {
  edges = ~edge
}

Undirected graphs are useful for many systems programming tasks (server connections, etc).

A downside to this model of undirected graphs:

Can we still use ^edges to talk about reachability?

(Think, then click!)

No, because every "edge" is essentially a small loop!

Solving the "is there a cycle" problem is harder for undirected graphs, and this is part of what we will touch on in Workshop this week for the next assignment.

Our solution will rely on some properties of graphs vs trees.

What is the definition of an undirected tree?

(Think, then click!)

Is it just a graph with no cycles?

No, it also needs to be connected.

An undirected tree is an acyclic and connected undirected graph.

Graphs as Trees

An undirected graph G is a tree if and only if it satisfies any of the following 5 conditions:

G is connected, but removing any edge disconnects it.
G is acyclic, but adding any edge between nodes that is not currently in the G creates a cycle.
Exactly one path connects each pair of nodes in G.
G has #Node-1 edges and no cycles (avoid in Alloy).
G has #Node-1 edges and is connected (avoid in Alloy).

We'll avoid the final two definitions in Alloy, since integer operations and counting in Alloy are much more expensive than expressing the other properties.