Download Algorithms for Bioinformatics - Applied Graph Theory - Lecture Slides and more Slides Computer Science in PDF only on Docsity!
BIO/CS 471 – Algorithms for bioinformatics
Graph Theoretic
Concepts and Algorithms
for Bioinformatics
What is a “graph”
- Formally: A finite graph G ( V , E ) is a pair ( V , E ),
where V is a finite set and E is a binary relation on
V.
- Recall: A relation R between two sets X and Y is a subset of X x Y.
- For each selection of two distinct V ’s, that pair of V ’s is either in set E or not in set E.
- The elements of the set V are called vertices (or
nodes) and those of set E are called edges.
- Undirected graph : The edges are unordered pairs
of V (i.e. the binary relation is symmetric).
- Ex: undirected G(V,E); V = {a,b,c}, E = {{a,b}, {b,c}}
- Directed graph (digraph):The edges are ordered
pairs of V (i.e. the binary relation is not necessarily
symmetric).
- Ex: digraph G(V,E); V = {a,b,c}, E = {(a,b), (b,c)}
a b c a b c
Graphs in bioinformatics
• Sequences
- DNA, proteins, etc. Chemical compounds
Metabolic pathways
R Y L I
Docsity.com
Graphs in bioinformatics
Phylogenetic trees Docsity.com
x y
path : no vertex can be repeated example path: a-b-c-d-e trail : no edge can be repeated example trail: a-b-c-d-e-b-d walk : no restriction example walk: a-b-d-a-b-c
closed: if starting vertex is also ending vertex length : number of edges in the path, trail, or walk
circuit: a closed trail (ex: a-b-c-d-b-e-d-a) cycle: closed path (ex: a-b-c-d-a)
a
b
c
d
e
“Travel” in graphs
Types of graphs
- simple graph: an undirected graph with no loops or multiple edges between the same two vertices
- multi-graph: any graph that is not simple
- connected graph : all vertex pairs are joined by a path
- disconnected graph : at least one vertex pairs is not joined by a path
- complete graph : all vertex pairs are adjacent
- Kn : the completely connected graph with n vertices
Simple graph a
b
c
d
e K 5
a b
c
d
e
Disconnected graph with two components Docsity.com
Digraph definitions
• for digraphs only…
• Every edge has a head (starting point)
and a tail (ending point)
• Walks, trails, and paths can only use
edges in the appropriate direction
• In a DAG, every path connects an
predecessor/ancestor (the vertex at
the head of the path) to its
successor/descendents (nodes at the
tail of any path).
• parent: direct ancestor (one hop)
• child: direct descendent (one hop)
• A descendent vertex is reachable
from any of its ancestors vertices
Directed graph b^ a
c
d x y
z
w
u v
Computer representation
- undirected graphs: usually represented as digraphs with two directed edges per “actual” undirected edge.
- adjacency matrix: a | V | x | V | array where each cell i , j contains the weight of the edge between v (^) i and v (^) j (or 0 for no edge)
- adjacency list: a |V| array where each cell i contains a list of all vertices adjacent to v (^) i
- incidence matrix: a |V| by |E| array where each cell i , j contains a weight (or a defined constant HEAD for unweighted graphs) if the vertex i is the head of edge j or a constant TAIL if vertex I is the tail of edge j
c b
a 4 d
2
6
8 10
a a^ b^ c 8^ d 4 bc 6 d 10 2
a b c (8), d (4) dc^ b (6)c (2), b (10)
a 1 2 8^^3 4 5 4 b c (^) 6 t (^) t t t d 2 10 t adjacency matrix
adjacency list
incidence matrix
Subgraphs
• G’ ( V’ , E’ ) is a subgraph of G ( V , E ) if V’ ⊆ V and
E’ ⊆ E.
• induced subgraph: a subgraph that contains
all possible edges in E that have end points of
the vertices of the selected V’
a
b
c
d
e b
c
d
e
a
c
d
G(V,E) G’({a,c,d},{{c,d}})
Induced subgraph of G with V’ = {b,c,d,e}
Complement of a graph
• The complement of a graph G ( V , E ) is a graph
with the same vertex set, but with vertices
adjacent only if they were not adjacent in
G ( V , E ) a
b
c
d
e G G
a
b
c
d
e
Dijkstra’s Algorithm
• D( x ) = distance from s to x (initially all ∞)
1. Select the closest vertex to s , according to
the current estimate (call it c )
2. Recompute the estimate for every other
vertex, x , as the MINIMUM of:
- The current distance, or
- The distance from s to c , plus the distance from c to x – D( c ) + W( c, x )
Dijkstra’s Algorithm Example
A B C D E
Initial 0 ∞ ∞ ∞ ∞
Process A 0 10 3 20 ∞
Process C 0 5 3 20 18
Process B 0 5 3 10 18
Process D 0 5 3 10 18
Process E 0 5 3 10 18
A
B
C E
D 10
5 20 3 2 15
11
Famous problems: Maximal clique
- clique: a complete subgraph
- maximal clique: a clique not contained in any other clique; the largest complete subgraph in the graph
- Vertex cover: a subset of vertices such that each edge in E has at least one end-point in the subset
- clique cover: vertex set divided into non-disjoint subsets, each of which induces a clique
- clique partition: a disjoint clique cover
Maximal cliques: {1,2,3},{1,3,4} Vertex cover: {1,3} Clique cover: { {1,2,3}{1,3,4} } Clique partition: { {1,2,3}{4} }
Famous problems: Coloring
- vertex coloring: labeling the vertices such that no edge in E has two end- points with the same label
- chromatic number : the smallest number of labels for a coloring of a graph
- What is the chromatic number of this graph?
- Would you believe that this problem (in general) is intractable?