Reachability, transitive closure, graphs
More on directed graphs, cycles, and reachability
We'll continue with the basic, directed graph model we started with last time:
sig Node {
edges : set Node
}
run { #Node = 3}
When I run this, I get the same instance in the evaluator.
Recall that edges
is a relation that has an element for every edge. We can think of this as the set of all 1-hop paths.
The instance we get first is:
[N1] -> [N0] -> [N3], (N1 also has a self loop)
The following results are all from the Evaluator in the Alloy Analyzer.
If we ask for the join of the edges
relation with itself:
edges.edges
We get:
The set of 2-hop paths ({N1->N0, N1->N1, N1->N2}
).
edges.edges.edges
We get:
The set of 3-hop paths ({N1->N0, N1->N1, N1->N2}
).
iden
(A built-in relation keyword in Alloy)
We get:
A relation from every atom to itself ({N0->N0, N1->N1, N2->N2}
).
edges - iden
We get:
The graph with self-loops removed cycles({N0->N2, N1->N0}
).
Ruling out self-loops
We could add this as a constraint. As an exercise, we wrote on the whiteboard several different, equivalent statements for ruling out self loops. Each line here is redundant/equivalent with every other line:
pred noSelfLoopsPredicate {
-- no N1->N1 etc.
all n: Node | n->n not in edges
all n: Node | n not in n.edges
no n: Node | n in n.edges
}
pred noSelfLoopsRelational {
no (iden & edges)
edges = edges - edges
}
We can use Alloy to check that these are the same (up to some small scope):
assert same {
noSelfLoopsPredicate iff noSelfLoopsRelational
}
check same for 10 Node
Let's continue on with the idea of removing loops.
We can specify the set of 2-hop paths that do not include self-loops as:
(edges - iden).(edges - iden)
We get:
2-hop paths not including self-loops: {N1->N2}
Joins
We discussed in more detail how to think about the .
join relational operator.
We can conceptualize joins are taking two database tables, finding all the rows were the columns in the middle of the join are the same, and keeping just those rows with the "matched" columns removed.
Said another way, a join a.b
on two binary relations, a
and b
:
- Looks at the second atom,
a2
, of every pair a1->a2
in a
.
- Finds every pair
b1->b2
in b
where the first atom b1
matches a2
.
- Adds
a1->b2
to the output relation.
Arity, which joins are well-defined
The number of columns in a relation is called its arity (e.g., in Alloy, a binary relation has arity 2, a set has arity 1).
Since joins remove the "matched" column, the join of a relation with arity m
and a relation with arity n
has arity m + n - 2
.
If we try to join a set with itself:
Node.Node
We get a type error. This is because it would result in a zero column table: sets have arity 1, and 1 + 1 - 2 = 0. Zero row tables are fine (they represent the empty set), zero column tables not defined.
Going back to path lengths: write down what expression I should write to get the relation for 4-hop paths (from start to end) within this graph.
edges.edges.edges.edges
That's getting tedious! To remove loops of length n
, we're having to write the join of the relation with itself n-1
many times.
What we are trying to use, though, has a specific name. It's related to the property of transitivity that we've seen a few types.
Transitive closure
Alloy has a built-in operator that conceptually represents repeated joins of a relation and itself.
The transitive closure of a binary relation is the smallest relation that encompasses the relation and satisfies transitivity. It's conceptually the same as repeatedly joining the relation with itself until the relation stops growing. In Alloy, the ^
operator gives the transitive closure of a relation. That is, for a binary relation r
, ^r
is the smallest relation that satisfies transitivity and contains r
.
We can think of finding the transitive closure by taking any 2-hop path in the graph, and drawing a new edge between the start and end if the edge was not already there. We repeat this process until no more edges can be added this way.
This is on example of a running an algorithm to a fixed point. A fixed point, in computer science, is informally the point at which applying a loop iteration within an algorithm no longer changes the output.
The transitive closure if the fixed-point of taking the join of the relation with itself.
Reachability of a specific node
Now, we can use the transitive closure to think about all possible paths in a graph:
^edges
is the set of all paths in a graph.
The transitive closure is useful to express the “reachability” within a graph.
Let's imagine that nodes are servers, and CSLinux
is one specific server.
How could I model this in Alloy?
(Think, then click!)
We want a special Node type to be a CSLinux server, so I will use extends
. We want there to be just one, so we'll use one
.
one sig CSLinux extends Node {}
We can now express the set of nodes reachable from CSLinux
:
(Think, then click!)
CSLinux.^edges
And the set of nodes CSLinux
is reachable from:
(Think, then click!)
^edges.CSLinux
Going back to our example without CSLinux
.
If we wanted to eliminate all loops from our graph in this example, we could then use:
no ^edges & iden
Which constraints the instance such that no node is “reachable” from itself.
This is equivalent to:
no n: Node | n in n.^edges
Undirected graphs
How can we model an undirected graph in Alloy?
Relations have order, but we can include a symmetric constraint to represent that the order does not matter (a pair is either in the relation edges
or not, regardless of order).
fact undirectedPredicate {
all x, y: Node | x->y in edges implies y->x in edges
}
fact undirectedRelational {
edges = ~edge
}
Undirected graphs are useful for many systems programming tasks (server connections, etc).
A downside to this model of undirected graphs:
Can we still use ^edges
to talk about reachability?
(Think, then click!)
No, because every "edge" is essentially a small loop!
Solving the "is there a cycle" problem is harder for undirected graphs, and this is part of what we will touch on in Workshop this week for the next assignment.
Our solution will rely on some properties of graphs vs trees.
What is the definition of an undirected tree?
(Think, then click!)
Is it just a graph with no cycles?
No, it also needs to be connected.
An undirected tree is an acyclic and connected undirected graph.
Graphs as Trees
An undirected graph G
is a tree if and only if it satisfies any of the following 5 conditions:
G
is connected, but removing any edge disconnects it.
G
is acyclic, but adding any edge between nodes that is not currently in the G
creates a cycle.
- Exactly one path connects each pair of nodes in
G
.
G
has #Node-1
edges and no cycles (avoid in Alloy).
G
has #Node-1
edges and is connected (avoid in Alloy).
We'll avoid the final two definitions in Alloy, since integer operations and counting in Alloy are much more expensive than expressing the other properties.
Reachability, transitive closure, graphs
More on directed graphs, cycles, and reachability
We'll continue with the basic, directed graph model we started with last time:
When I run this, I get the same instance in the evaluator.
Recall that
edges
is a relation that has an element for every edge. We can think of this as the set of all 1-hop paths.The instance we get first is:
[N1] -> [N0] -> [N3], (N1 also has a self loop)
The following results are all from the Evaluator in the Alloy Analyzer.
If we ask for the join of the
edges
relation with itself:edges.edges
We get:
The set of 2-hop paths (
{N1->N0, N1->N1, N1->N2}
).edges.edges.edges
We get:
The set of 3-hop paths (
{N1->N0, N1->N1, N1->N2}
).iden
(A built-in relation keyword in Alloy)We get:
A relation from every atom to itself (
{N0->N0, N1->N1, N2->N2}
).edges - iden
We get:
The graph with self-loops removed cycles(
{N0->N2, N1->N0}
).Ruling out self-loops
We could add this as a constraint. As an exercise, we wrote on the whiteboard several different, equivalent statements for ruling out self loops. Each line here is redundant/equivalent with every other line:
We can use Alloy to check that these are the same (up to some small scope):
Let's continue on with the idea of removing loops.
We can specify the set of 2-hop paths that do not include self-loops as:
(edges - iden).(edges - iden)
We get:
2-hop paths not including self-loops:
{N1->N2}
Joins
We discussed in more detail how to think about the
.
join relational operator.We can conceptualize joins are taking two database tables, finding all the rows were the columns in the middle of the join are the same, and keeping just those rows with the "matched" columns removed.
Said another way, a join
a.b
on two binary relations,a
andb
:a2
, of every paira1->a2
ina
.b1->b2
inb
where the first atomb1
matchesa2
.a1->b2
to the output relation.Arity, which joins are well-defined
The number of columns in a relation is called its arity (e.g., in Alloy, a binary relation has arity 2, a set has arity 1).
Since joins remove the "matched" column, the join of a relation with arity
m
and a relation with arityn
has aritym + n - 2
.If we try to join a set with itself:
Node.Node
We get a type error. This is because it would result in a zero column table: sets have arity 1, and 1 + 1 - 2 = 0. Zero row tables are fine (they represent the empty set), zero column tables not defined.
Going back to path lengths: write down what expression I should write to get the relation for 4-hop paths (from start to end) within this graph.
edges.edges.edges.edges
That's getting tedious! To remove loops of length
n
, we're having to write the join of the relation with itselfn-1
many times.What we are trying to use, though, has a specific name. It's related to the property of transitivity that we've seen a few types.
Transitive closure
Alloy has a built-in operator that conceptually represents repeated joins of a relation and itself.
The transitive closure of a binary relation is the smallest relation that encompasses the relation and satisfies transitivity. It's conceptually the same as repeatedly joining the relation with itself until the relation stops growing. In Alloy, the
^
operator gives the transitive closure of a relation. That is, for a binary relationr
,^r
is the smallest relation that satisfies transitivity and containsr
.We can think of finding the transitive closure by taking any 2-hop path in the graph, and drawing a new edge between the start and end if the edge was not already there. We repeat this process until no more edges can be added this way.
This is on example of a running an algorithm to a fixed point. A fixed point, in computer science, is informally the point at which applying a loop iteration within an algorithm no longer changes the output.
The transitive closure if the fixed-point of taking the join of the relation with itself.
Reachability of a specific node
Now, we can use the transitive closure to think about all possible paths in a graph:
^edges
is the set of all paths in a graph.
The transitive closure is useful to express the “reachability” within a graph.
Let's imagine that nodes are servers, and
CSLinux
is one specific server.How could I model this in Alloy?
(Think, then click!)
We want a special Node type to be a CSLinux server, so I will use
extends
. We want there to be just one, so we'll useone
.We can now express the set of nodes reachable from
CSLinux
:(Think, then click!)
And the set of nodes
CSLinux
is reachable from:(Think, then click!)
Going back to our example without
CSLinux
.If we wanted to eliminate all loops from our graph in this example, we could then use:
Which constraints the instance such that no node is “reachable” from itself.
This is equivalent to:
Undirected graphs
How can we model an undirected graph in Alloy?
Relations have order, but we can include a symmetric constraint to represent that the order does not matter (a pair is either in the relation
edges
or not, regardless of order).Undirected graphs are useful for many systems programming tasks (server connections, etc).
A downside to this model of undirected graphs:
Can we still use
^edges
to talk about reachability?(Think, then click!)
No, because every "edge" is essentially a small loop!
Solving the "is there a cycle" problem is harder for undirected graphs, and this is part of what we will touch on in Workshop this week for the next assignment.
Our solution will rely on some properties of graphs vs trees.
What is the definition of an undirected tree?
(Think, then click!)
Is it just a graph with no cycles?
No, it also needs to be connected.
An undirected tree is an acyclic and connected undirected graph.
Graphs as Trees
An undirected graph
G
is a tree if and only if it satisfies any of the following 5 conditions:G
is connected, but removing any edge disconnects it.G
is acyclic, but adding any edge between nodes that is not currently in theG
creates a cycle.G
.G
has#Node-1
edges and no cycles (avoid in Alloy).G
has#Node-1
edges and is connected (avoid in Alloy).We'll avoid the final two definitions in Alloy, since integer operations and counting in Alloy are much more expensive than expressing the other properties.