Graphviz Diagram Creation: DOT Language Syntax and Practical Examples
Graphviz Overview
Graphviz is an automated graph layout engine capable of generating various output formats including PNG, PDF, and SVG. The tool uses the DOT language to define graphs, nodes, and edges with customizable properties.
Table of Contents
- DOT language fundamentals
- Common graphical attributes
- Practical examples
- VSCode preview integration
- Additional resources
DOT Language Structure
Graphviz builds diagrams from three core components: graphs, nodes, and edges. Each element accepts properties that control its visual representation.
The following BNF-style notation describes the DOT language grammar:
Declaration | Structure
- | -
graph | [ strict ] (graph | digraph) [ ID ] '{' stmt_list '}'
stmt_list | [ stmt [ ';' ] stmt_list ]
stmt | node_stmt | edge_stmt | attr_stmt | ID '='** ID | subgraph
attr_stmt | (graph | node | edge) attr_list
attr_list | '[' [ a_list ] ']' [ attr_list ]
a_list | ID '=' ID [ (';' | ',') ] [ a_list ]
edge_stmt | (node_id | subgraph) edgeRHS [ attr_list ]
edgeRHS | edgeop (node_id | subgraph) [ edgeRHS ]
node_stmt | node_id [ attr_list ]
node_id | ID [ port ]
port | ':' ID [ ':' compass_pt ] | ':' compass_pt
subgraph | [ subgraph [ ID ] ] '{' stmt_list '}'
compass_pt | (n | ne | e | se | s | sw | w | nw | c | _)
An ID represents a string identifier for naming elements or properties. The naming conventions include:
- Alphanumeric sequences
[a-zA-Z\200-\377]combined with underscores, where digits cannot appear at the start - Numeric literals $[-]?(.[0-9]+ | [0-9]^+(.[0-9]*)6? $
- Double-quoted strings
"..." - HTML-formatted strings
<>
DOT Language Keywords
- strict: Prevents duplicate parallel edges in the graph
- graph: Declares an undirected graph
- digraph: Declares a directed graph
- node: Applies default attributes to all subsequent nodes
- edge: Applies default attributes to all subsequent edges
- subgraph: Groups elements for organization or layout control
Key Structurla Observations
- Every graph must be enclosed within either
graphordigraphbraces - The statement list can be empty or contain multiple statements; semicolon are optional
- Statement types include:
- node declarations
- edge declarations
- subgraph declarations
- attribute lists
- ID assignment statements
- Attribute lists require square bracket delimiters with optional content
- Individual atributes follow the
key = valueformat with optional commas or semicolons - Edge declarations chain multiple nodes or subgraphs with a single optional attribute list
- Node declarations support port-based connections for complex layouts
Node Ports and Compass Directions
Nodes can define connection points using compass directions:
digraph connections {
element0 [label = "<top> header|<middle> content|<bottom> footer", height=.5]
element1 [shape = box label="target"]
element0:top:n -> element1:n [label = "n"]
element0:top:ne -> element1:ne [label = "ne"]
element0:top:e -> element1:e [label = "e"]
element0:top:se -> element1:se [label = "se"]
element0:top:s -> element1:s [label = "s"]
element0:top:sw -> element1:sw [label = "sw"]
element0:top:w -> element1:w [label = "w"]
element0:top:nw -> element1:nw [label = "nw"]
element0:top:c -> element1:c [label = "c"]
element0:top:_ -> element1:_ [label = "_"]
element0:middle[style=filled color=lightblue]
}
Direction | Meaning
- | -
n | north
ne | northeast
e | east
se | southeast
s | south
sw | southwest
w | west
nw | northwest
c | center
_ | any available position
Graphical Attributes
Common attributes fall into three categories: node properties, edge properties, and subgraph properties.
Global Attribute Defaults
Setting default attributes eliminates repetitive per-element configuration:
digraph demo {
rankdir = LR
node [shape=box color=blue]
edge [color=red]
item_a
item_b [color=lightblue]
item_a -> item_b
item_b -> item_a [color=green]
}
All elements defined after the defaults inherit those properties unless overridden.
Record-Based Node Structures
Complex internal structures can be rendered using record shapes:
digraph hierarchical {
node [shape =record, charset = "UTF-8" fontname = "Microsoft YaHei", fontsize = 14]
country [label = " Country | { Province | { City | { District | { Landmark | { Pavilion | Bridge } | Wetland} | Yuhang } | Ningbo}}"]
pool_struct [
color = "cornflowerblue"
label = "pool_struct | {
{d | {
*last |
*end |
*next |
failed
}}|
*max |
*current |
*chain |
*cleanup |
*log
}"
]
}
Frequently Used Attributes
- charset: Character encoding, typically "UTF-8"
- fontname: Font family; use "Microsoft YaHei" for CJK characters to avoid rendering issues
- fontcolor: Text color
- fontsize: Text size
- fillcolor: Background fill for nodes and clusters
- size: Maximum diagram dimensions
- label: Text label displayed on elements
- margin: Diagram padding
- pad: Minimum drawing area extension in inches
- style: Element styling (e.g.,
filledfor solid backgrounds) - rankdir: Graph orientation: "TB" (top-bottom), "LR" (left-right), "BT" (bottom-top), "RL" (right-left)
- ranksep: Vertical spacing between ranks in inches
- ratio: Output image aspect ratio
Node Properties
Default node properties include shape = ellipse, width = .75, height = 0.5 with the node identifier serving as the display label.
Common node attributes:
- shape: Element shape (box, ellipse, record, etc.)
- width/height: Element dimensions; when
fixedsize = true, these are the final dimensions - fixedsize: When false, dimensions adapt to content
- rank: Vertical positioning constraint for subgraphs:
- same: All nodes share the same rank
- min: All nodes at the minimum rank
- source: Similar to min with additional constraints
- max: All nodes at the maximum rank
- sink: Similar to max
Edge Properties
Directed graphs use -> while undirected graphs use --. Multiple connections can be chained with a single attribute list:
digraph {
rankdir = LR
source -> middle -> dest[color=green]
}
Relevant edge attributes:
digraph {
rankdir = LR
splines = ortho
a -> b -> c -> d -> f [color = green]
e -> f -> b -> d [color = blue]
b -> e -> h[color = red]
}
- len: Preferred edge length
- weight: Influences edge straightness; higher values produce straighter lines
- lhead: Logical edge head target for compound graphs
- ltail: Logical edge tail source for compound graphs
- headlabel: Label near the arrow head
- taillabel: Label near the tail
- splines: Edge routing style:
- none: No edges drawn
- true/spline: Curved or straight lines
- false/line: Straight line segments
- polyline: Angular lines
- curved: Arc curves
- ortho: Perpendicular lines (horizontal and vertical)
- dir: Arrow drawing direction
Subgraph and Cluster Usage
Subgraphs must be combined with cluster prefixes for boundary rendering:
digraph G {
compound = true
ranksep = 1
node [shape = record]
subgraph cluster_hardware {
label = "hardware"
color = lightblue
CPU Memory
}
subgraph cluster_kernel {
label = "kernel"
color = green
Init IPC
}
subgraph cluster_libc {
label = "libc"
color = yellow
glibc
}
CPU -> Init [lhead = cluster_kernel ltail = cluster_hardware]
IPC -> glibc [lhead = cluster_libc ltail = cluster_kernel]
}
Practical Examples
TCP/IP State Transition Diagram
Two approaches demonstrate different layout strategies:
digraph tcp_states {
compound=true
fontsize=10
margin="0,0"
ranksep = .75
nodesep = .65
node [shape=Mrecord fontname="Inconsolata, Consolas", fontsize=12, penwidth=0.5]
edge [fontname="Inconsolata, Consolas", fontsize=10, arrowhead=normal]
"State Machine" [shape = "plaintext", fontsize = 16]
"CLOSED" -> "LISTEN" [style = bold, label = "Passive open\nSend: <none>"];
"LISTEN" -> "SYN_RECV" [style = bold, label = "Recv: SYN\nSend: SYN,ACK"]
"SYN_RECV" -> "ESTABLISHED" [style = bold, label = "Recv: ACK\nSend: <none>", weight = 20]
"ESTABLISHED" -> "CLOSE_WAIT" [style = bold, label = "Recv: FIN\nSend: ACK", weight = 20]
subgraph cluster_passive_close {
style = dotted
margin = 10
passive_close [shape = plaintext, label = "Passive Close", fontsize = 14]
"CLOSE_WAIT" -> "LAST_ACK" [style = bold, label = "App: close\nSend: FIN", weight = 10]
}
"LAST_ACK" -> "CLOSED" [style = bold, label = "Recv: ACK\nSend: <none>"]
"CLOSED" -> "SYN_SENT" [style = dashed, label = "App: active open\nSend: SYN"];
"SYN_SENT" -> "ESTABLISHED" [style = dashed, label = "Recv: SYN,ACK\nSend: ACK", weight = 25]
"SYN_SENT" -> "SYN_RECV" [style = dotted, label = "Recv: SYN\nSend: SYN,ACK\nSimultaneous open"]
"ESTABLISHED" -> "FIN_WAIT_1" [style = dashed, label = "App: close\nSend: FIN", weight = 20]
subgraph cluster_active_close {
style = dotted
margin = 10
active_open [shape = plaintext, label = "Active Close", fontsize = 14]
"FIN_WAIT_1" -> "FIN_WAIT_2" [style = dashed, label = "Recv: ACK\nSend: <none>"]
"FIN_WAIT_2" -> "TIME_WAIT" [style = dashed, label = "Recv: FIN\nSend: ACK"]
"FIN_WAIT_1" -> "CLOSING" [style = dotted, label = "Recv: ACK\nSend: <none>"]
"FIN_WAIT_1" -> "TIME_WAIT" [style = dotted, label = "Recv: SYN,ACK\nSend: ACK"]
"CLOSING" -> "TIME_WAIT" [style = dotted]
}
"TIME_WAIT" -> "CLOSED" [style = dashed, label = "2MSL timeout"]
}
A more refined version with better alignment:
digraph tcp_refined {
compound=true
margin="0,0"
ranksep = .75
nodesep = 1
pad = .5
node [shape=Mrecord, charset = "UTF-8" fontname="Microsoft YaHei", fontsize=14]
edge [charset = "UTF-8" fontname="Microsoft YaHei", fontsize=11, arrowhead = normal]
CLOSED -> LISTEN [style = dashed, label = "Passive open\nSend: <none>", weight = 100];
"State Machine" [shape = "plaintext", fontsize = 16]
{
rank = same
SYN_RCVD SYN_SENT
anchor_1 [shape = point, width = 0]
SYN_SENT -> anchor_1 [style = dotted, label = "App close or timeout"]
SYN_RCVD -> SYN_SENT [style = dotted, dir = back, headlabel = "Recv: SYN\nSend: SYN,ACK\nSimultaneous"]
}
LISTEN -> SYN_RCVD [style = dashed, headlabel = "Recv: SYN\nSend: SYN,ACK"]
SYN_RCVD -> LISTEN [style = dotted, headlabel = "Recv: RST"]
CLOSED:e -> SYN_SENT [style = bold, label = "Active open\nSend: SYN"]
{
rank = same
ESTABLISHED CLOSE_WAIT
ESTABLISHED -> CLOSE_WAIT [style = dashed, label = "Recv: SYN,ACK\nSend: ACK"]
}
SYN_RCVD -> ESTABLISHED [style = dashed, label = "Recv: ACK\nSend: <none>", weight = 9]
SYN_SENT -> ESTABLISHED [style = bold, label = "Recv: SYN,ACK\nSend: ACK", weight = 10]
{
rank = same
FIN_WAIT_1
CLOSING
LAST_ACK
anchor_2 [shape = point, width = 0]
FIN_WAIT_1 -> CLOSING [style = dotted, label = "Recv: FIN\nSend: ACK"]
LAST_ACK -> anchor_2 [style = dashed, label = "Recv: ACK\nSend: <none>"]
}
CLOSE_WAIT -> LAST_ACK [style = dashed, label = "App: close\nSend: FIN", weight = 10]
{
rank = same
FIN_WAIT_2 TIME_WAIT
anchor_3 [shape = point, width = 0]
TIME_WAIT -> anchor_3 [style = bold, label = "2MSL timeout"]
}
ESTABLISHED -> FIN_WAIT_1 [style = bold, label = "App: close\nSend: FIN"]
FIN_WAIT_1 -> FIN_WAIT_2 [style = bold, headlabel = "Recv: ACK\nSend: <none>", weight = 15]
FIN_WAIT_2 -> TIME_WAIT [style = bold, label = "Recv: FIN\nSend: ACK", weight = 10]
CLOSING -> TIME_WAIT [style = dotted, label = "Recv: ACK\nSend: <none>", weight = 15]
FIN_WAIT_1 -> TIME_WAIT [style = dotted, label = "Recv: ACK\nSend: <none>"]
anchor_3 -> anchor_2 [arrowhead = none, style = dotted, weight = 10]
anchor_2 -> anchor_1 [arrowhead = none, style = dotted]
anchor_1 -> CLOSED [style = dotted]
}
The improved version demonstrates effective use of rank = same for horizontal alignment and weight attributes for controlling edge straightness. Note that using rank constraints may prevent the use of subgraph cluster boundaries for certain grouped elements.
Epoll Internal Data Structures
digraph epoll_diagram {
compound=true
margin="0,0"
ranksep = .75
nodesep = 1
pad = .5
rankdir = LR
node [shape=record, charset = "UTF-8" fontname="Microsoft YaHei", fontsize=14]
edge [style = dashed, charset = "UTF-8" fontname="Microsoft YaHei", fontsize=11]
epoll [shape = plaintext, label = "Epoll Structures and Relationships"]
eventpoll [
color = cornflowerblue,
label = "<eventpoll> struct \n eventpoll |
<lock> spinlock_t lock; |
<mutex> struct mutex mtx; |
<wq> wait_queue_head_t wq; |
<poll_wait> wait_queue_head_t poll_wait; |
<rdllist> struct list_head rdllist; |
<ovflist> struct epitem *ovflist; |
<rbr> struct rb_root_cached rbr; |
<ws> struct wakeup_source *ws; |
<user> struct user_struct *user; |
<file> struct file *file; |
<visited> int visited; |
<visited_list_link> struct list_head visited_list_link;"
]
epitem [
color = sienna,
label = "<epitem> struct \n epitem |
<rb>struct rb_node rbn;\nstruct rcu_head rcu; |
<rdllink> struct list_head rdllink; |
<next> struct epitem *next; |
<ffd> struct epoll_filefd ffd; |
<nwait> int nwait; |
<pwqlist> struct list_head pwqlist; |
<ep> struct eventpoll *ep; |
<fllink> struct list_head fllink; |
<ws> struct wakeup_source __rcu *ws; |
<event> struct epoll_event event;"
]
epitem2 [
color = sienna,
label = "<epitem> struct \n epitem |
<rb>struct rb_node rbn;\nstruct rcu_head rcu; |
<rdllink> struct list_head rdllink; |
<next> struct epitem *next; |
<ep> struct eventpoll *ep; |
··· |
··· "
]
eppoll_entry [
color = darkviolet,
label = "<entry> struct \n eppoll_entry |
<llink> struct list_head llink; |
<base> struct epitem *base; |
<wait> wait_queue_entry_t wait; |
<whead> wait_queue_head_t *whead;"
]
epitem:ep -> eventpoll:se [color = sienna]
epitem2:ep -> eventpoll:se [color = sienna]
eventpoll:ovflist -> epitem:next -> epitem2:next [color = cornflowerblue]
eventpoll:rdllist -> epitem:rdllink -> epitem2:rdllink [dir = both]
eppoll_entry:llink -> epitem:pwqlist [color = darkviolet]
eppoll_entry:base -> epitem:nw [color = darkviolet]
}
Outstanding Items
- Add cluster boundaries around the active close sequence in the TCP/IP diagram
- Add TCP/IP sequence diagram elements
VSCode Preview Integration
- Download Graphviz from the official website
- Install the "Graphviz Preview" extension in VSCode
- Configure the dot executable path in settings.json:
"graphvizPreview.dotPath": "path\\to\\graphviz\\bin\\dot.exe" - Create a new .dot file and use the preview button in the editor toolbar
Note: Automated layout works well for most cases, but manual positioning may be preferable for diagrams requiring precise element placement.
References
- Graphviz Official Documentation