Release .10
Introduction:
In the world of data analysis, graphs are essential data structures that professionals rely on to efficiently organize and understand complex information. This post explores the professional application of graphs as data structures, highlighting their role in simplifying data analysis and problem-solving.
Graphs:
Graphs can be used to represent many interesting things in the real world. Flights from cities to cities, rods connecting various town and cities.
Once we have a good representation of the map, then we use a standard graph algorithms to solve many interesting problems of real life.
Definition and Terminologies:
A Graph is represented by G where G = (V, E), where V is a finite set of points called Vertices and E is a finite set of Edges.
Each edge is a tuple (u, v) where u, v β V. There can be a third component weight to the tuple.
Weight is cost to go from one vertex to another.
Edge in a graph can be of two types:
- If the edges of graph are one way it is called Directed graph or Digraph.
- The graph whose edges are two ways are called Undirected graph or just graph.
A Path is a sequence of edges between two vertices. The length of a path is defined as the sum of the weight of all the edges in the path.
Two vertices u and v are adjacent if there is an edge whose endpoints are u and v.
In the below graph:
V = { V1, V2, V3, V4, V5, V6, V7, V8, V9 }
In-Degree and Out-Degree:
The in-degree of a vertex v, Doneted by indeg(v) is the number of incoming edges to the vertex v.
The out-degree of a vertex v, Doneted by outdeg(v) is the number of outgoing edges of a vertex v.
The degree of a vertex v, Denoted by deg(v) is the total number of edges whose one endpoint is v.
deg(v) = Indeg (v) + outdeg (v)
In the above graph
deg(V4)=3, indeg(V4)=2 and outdeg(V4)=1
A Cycle is a path that starts and ends at the same vertex and include at least one vertex.
An edge is a Self-Loop if two if its two endpoints coincide. This is a form of a cycle.
A vertex v is Reachable from vertex u or βu reaches vβ if there is a path from u to v. In an undirected graph if v is reachable from u then u is reachable from v. However, in a directed graph it is possible that u reaches v but there is no path from v to u.
A graph is Connected if for any two vertices there is a path between them.
A Forest is a graph without cycles.
A Sub-Graph of a graph G is a graph whose vertices and edges are a subset of the vertices and edges of G.
A Spanning Sub-Graph of G is a graph that connects all the vertices of G.
A Tree is a acyclic connected graph.
A Spanning tree of a graph is a spanning sub-graph, which is also a tree that means, a connected graph, which connects all the vertices of graph and that, does not have a cycle.
Representation:
- Adjacency Matrix: One of the ways to represent a graph is to use two-dimensional matrix.
Each combination of row and column represent a vertex in the graph. The value stored at the location row v and column w is the edge
from vertex v to vertex w.
The nodes that are connected by an edge are called adjacent nodes.
This matrix is used to store adjacent relation so it is called the Adjacency Matrix.
Adjancency List:
Another efficient way of storing a graph is this.
In adjacency list of pointers to a linked list node.
Each pointer corresponds to vertices in a graph. Each pointer will then point to the vertices, which are connected to it and store this as a list.
The adjacency list helps us to represent a sparse graph.
An adjacency list representation also allows us to find all the vertices that are directly connected to any vertices by just one link list scan.
In all our programs, we are going to use the adjacency list to store the graph.
Graph Traversals:
The Depth first search ( DFS ) and Breadth first seach ( BFS ) are the 2 algorithms used to traverse a graph.
Traversal here has the same definition that you've read so far.
- BFS: In this we start with a node and start exploring it's connected nodes. The same process is repeated with all the connecting nodes until all the nodes are visited.
In BFS algorithm, a graph is traversed in layer-by-layer fashion. The graph is traversed closer to the starting point. The queue is used to implement BFS.
Remember that there can be multiple BFS results for a graph.
- DFS: In this we start with a node and start exploring it's connected nodes keeping on suspending the exploration of previous nodes.
The DFS algorithm we start from starting point and go into depth of graph until we reach a dead end and then move up to parent node (Backtrack).
In DFS, we use stack to get the next vertex to start a search. Alternatively, we can use recursion (system stack) to do the same.
Minimum Spanning Trees ( MST ):
A Spanning Tree of a graph G is a tree that contains all the edges of the Graph G.
A Minimum Spanning Tree is a tree whose sum of length/weight of edges is minimum as possible.
For example, if you want to setup communication between a set of cities, then you may want to use the least amount of wire as possible. MST can be used to find the network path and wire cost estimate.
Prim's Algorithm:
Primβs algorithm grows a single tree T, one edge at a time, until it becomes a spanning tree.
We initialize T with zero edges. U with single node. Where T is spanning tree edges set and U is spanning tree vertex set.
At each step, Primβs algorithm adds the smallest value edge with one endpoint in U and other not in us.
Since each edge adds one new vertex to U, after n β 1 additions, U contain all the vertices of the spanning tree and T becomes a spanning tree.
Dijkstraβs algorithm:
Dijkstra's algorithm is like finding the shortest way to get from one place to all other places on a map. You start at one location and keep moving to nearby locations, choosing the shortest path each time.
Given a weighted connected graph G, find shortest paths from the source vertex s to each of the other vertices. Dijkstraβs algorithm is similar to prims algorithm. It maintains a set of nodes for which shortest
path is known.
The algorithm starts by keeping track of the distance of each node and its parents.
All the distance is set to infinite in the beginning as we do not know the actual path to the nodes and parents of all the vertices are
set to null. All the vertices are added to a priority queue.
At each step algorithm takes one vertex from the priority queue (which will be the source vertex in the beginning).
Then update the distance array corresponding to all the adjacent vertices. When the queue is empty, then we will have the distance and parent array fully populated.
With this post, we conclude our series on the professional use of graphs as data structures.
In the final release, I'll provide a concise summary of the key insights and essential information you need to excel in this field.
I hope this post was informative and helpful.
If you have any questions, please feel free to leave a comment below.
Happy Coding ππ»!
Thank You
Top comments (0)