Graph Algorithms in Practice
π At a Glance
| Aspect | Details |
|---|---|
| Time to Read | 35 minutes |
| Prerequisites | Parts 11-14 (Graph algorithms) |
| Key Concepts | Social Networks, PageRank, Community Detection, Recommendations |
| Difficulty | βββ (Intermediate-Advanced) |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β GRAPH ALGORITHMS IN THE REAL WORLD β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β SOCIAL NETWORKS SEARCH & RANKING β β βββββββββββββββββββ βββββββββββββββββββ β β β U1 βββ U2 β β PageRank β β β β β β² β β β βββββ β β β β β β² β β β ββ A ββ 0.35 β β β β U3 βββ U4 β β βββββ β β β β β β β β β β β β Friend-of-friendβ β βββββ βββββ β β β β recommendations β β β B β β C β β β β βββββββββββββββββββ β βββββ βββββ β β β βββββββββββββββββββ β β β β COMMUNITY DETECTION RECOMMENDATIONS β β βββββββββββββββββββ βββββββββββββββββββ β β β βββββββ βββββββ β β User β Item β β β β β β β β β β β β β β β β β A βββ B β β β Similar users β β β β β β β β β β bought this β β β β βββββββ βββββββ β β β β β β Clusters β β Collaborative β β β βββββββββββββββββββ β filtering β β β βββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π― What You'll Learn
After completing this article, you will be able to:
- Analyze social networks: Calculate centrality measures, find influencers
- Implement PageRank: Understand and code the algorithm behind Google
- Detect communities: Use modularity-based algorithms
- Build recommendation systems: Graph-based collaborative filtering
- Use graph databases: Basic Neo4j patterns
π₯ Production Story: The Friend Recommendation Engine
A social gaming platform grew to 50 million users. Their friend recommendation system was struggling - suggestions were irrelevant, and user engagement was dropping.
The Problem
β Original system (2018): - Simple: "People who played same game" - Result: Recommending strangers with one common game - Click-through rate: 2.1% - Friend request acceptance: 0.8%
The team realized they needed to understand the social graph structure.
The Investigation
JAVA(17 lines)CodeLoading syntax highlighter...
The Solution
They implemented a multi-signal graph-based approach:
JAVA(54 lines)CodeLoading syntax highlighter...
The Impact
| Metric | Before | After | Improvement |
|---|---|---|---|
| Click-through rate | 2.1% | 8.7% | +314% |
| Friend request acceptance | 0.8% | 4.2% | +425% |
| Daily active friendships formed | 12K | 89K | +642% |
| User retention (30-day) | 34% | 47% | +38% |
π§ Mental Model: The Social Graph
Think of a social network as a city with different neighborhoods:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β SOCIAL GRAPH AS A CITY β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β CENTRALITY = IMPORTANCE β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β β β Degree: How many direct connections (popular person) β β β β π€βπ€βπ€ β β β β ββπ€ β β β β β β β β Betweenness: How many paths go through you (broker) β β β β π€ββ[π€]ββπ€ (bridge between groups) β β β β π€ π€ β β β β β β β β Closeness: How quickly you can reach everyone β β β β (central location) β β β β β β β β PageRank: Who points to you matters β β β β (endorsed by important people) β β β β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β COMMUNITIES = NEIGHBORHOODS β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β β β βββββββββββ βββββββββββ βββββββββββ β β β β β Gaming ββββββββ Bridge ββββββββ Tech β β β β β β Friends β β Person β β Friends β β β β β β π€π€π€ β β π€ β β π€π€π€ β β β β β βββββββββββ βββββββββββ βββββββββββ β β β β β β β β Dense connections within, sparse between β β β β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π¬ Deep Dive: Centrality Measures
Degree Centrality
The simplest measure - just count connections:
JAVA(34 lines)CodeLoading syntax highlighter...
Betweenness Centrality
Measures how often a node appears on shortest paths:
JAVA(90 lines)CodeLoading syntax highlighter...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β BETWEENNESS CENTRALITY β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β A βββ B βββ C β β β β β β D βββ[E]βββ F E has high betweenness β β β β (bridge between left and right) β β G βββ H βββ I β β β β Shortest paths through E: β β AβF, AβI, BβD, BβG, CβD, CβG, etc. β β β β Betweenness(E) = Ξ£ (paths through E / total paths) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Closeness Centrality
How close a node is to all others:
JAVA(48 lines)CodeLoading syntax highlighter...
π¬ Deep Dive: PageRank Algorithm
The Intuition
PageRank models a "random surfer" browsing the web:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β PAGERANK INTUITION β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Random Surfer Model: β β 1. Start on a random page β β 2. With probability d (0.85), follow a random link β β 3. With probability (1-d), jump to random page β β 4. PageRank = long-term probability of being on each page β β β β Example: β β βββββββββββββββββββ β β β β β β βΌ β β β A βββββ B βββββ C β β β β β β β β β βΌ β β β βββββββ D βββββββββ β β β β Page C: Only incoming from B β β Page A: Incoming from C (sole outlink), D β β A likely has higher PageRank than C β β β β Formula: β β PR(A) = (1-d)/N + d Γ Ξ£ PR(v)/out(v) β β for all v linking to A β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation
JAVA(99 lines)CodeLoading syntax highlighter...
Matrix Formulation
PageRank can be computed using matrix operations:
JAVA(45 lines)CodeLoading syntax highlighter...
Personalized PageRank
For recommendations, bias the random jump toward specific nodes:
JAVA(35 lines)CodeLoading syntax highlighter...
π¬ Deep Dive: Community Detection
Modularity
Modularity measures the quality of a community partition:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β MODULARITY CONCEPT β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Q = (1/2m) Γ Ξ£ [A_ij - (k_i Γ k_j)/(2m)] Γ Ξ΄(c_i, c_j) β β β β Where: β β - A_ij: adjacency matrix (1 if edge exists) β β - k_i: degree of node i β β - m: total number of edges β β - c_i: community of node i β β - Ξ΄: 1 if same community, 0 otherwise β β β β Intuition: Q > 0 means more edges within communities β β than expected by random chance β β β β Good partition: Q β 0.3 to 0.7 β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Louvain Algorithm
Fast and effective community detection:
JAVA(127 lines)CodeLoading syntax highlighter...
Label Propagation
Simpler algorithm, often faster:
JAVA(86 lines)CodeLoading syntax highlighter...
π¬ Deep Dive: Graph-Based Recommendations
Collaborative Filtering with Graphs
JAVA(95 lines)CodeLoading syntax highlighter...
Random Walk Recommendations
JAVA(56 lines)CodeLoading syntax highlighter...
π¬ Deep Dive: Graph Databases (Neo4j Patterns)
Basic Cypher Queries
CYPHER(30 lines)CodeLoading syntax highlighter...
Java Integration with Neo4j
JAVA(51 lines)CodeLoading syntax highlighter...
β οΈ Common Mistakes
Mistake 1: PageRank Dangling Nodes
JAVA(16 lines)CodeLoading syntax highlighter...
Mistake 2: Community Detection Convergence
JAVA(15 lines)CodeLoading syntax highlighter...
Mistake 3: Jaccard Similarity Division by Zero
JAVA(6 lines)CodeLoading syntax highlighter...
π Debug This: PageRank Convergence Issue
This PageRank implementation doesn't converge properly. Can you find the bug?
JAVA(39 lines)CodeLoading syntax highlighter...
π Click to reveal the bug
rank[v] instead of rank[u] in the contribution calculation.JAVA(5 lines)CodeLoading syntax highlighter...
PR(v) = (1-d)/n + d Γ Ξ£ PR(u)/out(u) for all u linking to vrank[v] makes the update self-referential and prevents proper propagation of rank through the graph.π» Exercises
Exercise 1: Influence Maximization ββ
Find k nodes that, if they adopt a product, maximize spread through the network (greedy approximation).
JAVA(6 lines)CodeLoading syntax highlighter...
Exercise 2: Link Prediction ββ
Predict which edges are likely to form in a social network.
JAVA(9 lines)CodeLoading syntax highlighter...
Exercise 3: Community Evolution βββ
Track how communities change over time given snapshots of the network.
JAVA(7 lines)CodeLoading syntax highlighter...
Exercise 4: Personalized Recommendations βββ
Implement Personalized PageRank for item recommendations.
JAVA(7 lines)CodeLoading syntax highlighter...
Exercise 5: Network Robustness ββββ
Analyze how the network's connectivity degrades as nodes fail.
JAVA(7 lines)CodeLoading syntax highlighter...
π€ Interview Questions
Q1: "How would you find the most influential person in a social network?"
JAVA(4 lines)CodeLoading syntax highlighter...
JAVA(3 lines)CodeLoading syntax highlighter...
JAVA(3 lines)CodeLoading syntax highlighter...
Q2: "Design a 'People You May Know' feature."
JAVA(30 lines)CodeLoading syntax highlighter...
- Privacy: Don't reveal why someone is recommended
- Diversity: Don't recommend only from one circle
- Freshness: Boost recently active users
Q3: "Explain how PageRank handles the 'spider trap' problem."
Spider trap: Normal web: A β B β C A β B β C β β β β β βββββββ external links
PR(A) = (1-d)/N + d Γ Ξ£ PR(v)/out(v) β β Random jump Link following (15% chance) (85% chance)
With probability 15%, the surfer teleports to a random page, escaping the trap. This ensures:
- All pages have minimum rank (1-d)/N
- Rank flows out of traps through random jumps
- Algorithm converges to unique solution
Q4: "How do you detect fake accounts in a social network?"
JAVA(40 lines)CodeLoading syntax highlighter...
Q5: "How would you implement real-time recommendations at scale?"
Architecture: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 1. Offline Pipeline (daily/hourly) β β - Compute PageRank, communities β β - Build similarity indices β β - Store in Redis/Cassandra β β β β 2. Near-real-time (minutes) β β - Stream processing (Kafka + Flink) β β - Update user activity vectors β β - Incremental graph updates β β β β 3. Real-time (milliseconds) β β - Pre-computed candidates from Redis β β - Light re-ranking with fresh signals β β - A/B test different algorithms β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
JAVA(23 lines)CodeLoading syntax highlighter...
π Quick Reference
Algorithm Complexity
| Algorithm | Time | Space | Use Case |
|---|---|---|---|
| Degree Centrality | O(V) | O(1) | Find popular nodes |
| Betweenness | O(VE) | O(V) | Find brokers |
| PageRank | O(EΓiter) | O(V) | Rank importance |
| Louvain | O(EΓiter) | O(V) | Community detection |
| Label Propagation | O(EΓiter) | O(V) | Fast communities |
Key Formulas
Degree Centrality: C_D(v) = degree(v) / (n-1) Closeness Centrality: C_C(v) = (n-1) / Ξ£ d(v,u) Betweenness Centrality: C_B(v) = Ξ£ Ο_st(v) / Ο_st PageRank: PR(v) = (1-d)/n + d Γ Ξ£ PR(u)/out(u) Modularity: Q = (1/2m) Γ Ξ£ [A_ij - k_iΓk_j/2m] Γ Ξ΄(c_i,c_j) Jaccard Similarity: J(A,B) = |A β© B| / |A βͺ B|
Decision Guide
Need to find important nodes? βββ Most connected β Degree centrality βββ Information brokers β Betweenness centrality βββ Endorsed by important β PageRank βββ Quick access to all β Closeness centrality Need to find groups? βββ High quality β Louvain (modularity optimization) βββ Fast, large graphs β Label Propagation βββ Overlapping communities β BigCLAM, DEMON Need recommendations? βββ User-based β Find similar users βββ Item-based β Find similar items βββ Graph-based β Random walks, PPR
π What's Next?
- Optimal substructure and overlapping subproblems
- Memoization vs tabulation
- State definition techniques
- Classic problems: Fibonacci, climbing stairs
π Review Schedule
| Day | Focus | Time |
|---|---|---|
| 0 | Full read + exercises | 90 min |
| 1 | PageRank implementation | 15 min |
| 3 | Community detection code | 20 min |
| 7 | Build simple recommender | 25 min |
| 14 | Debug exercise + centrality | 15 min |
| 30 | Interview questions | 10 min |