The complete DSA reference in C++. Arrays, linked lists, stacks, queues, trees, heaps, graphs, hashing, sorting, searching, dynamic programming, greedy, divide and conquer, backtracking, bit manipulation, segment trees, tries — every data structure implemented from scratch, every algorithm traced with real examples, every pattern you need to solve any problem.
Big-O describes how an algorithm's time or space grows as input size n grows. It measures the worst-case upper bound, ignoring constants and lower-order terms. It tells you whether your solution will pass within time limits before you write a single line of code.
| Notation | Name | n=10 | n=1000 | n=10⁶ | Typical Example |
|---|---|---|---|---|---|
O(1) | Constant | 1 | 1 | 1 | Array index, hash lookup |
O(log n) | Logarithmic | 3 | 10 | 20 | Binary search, balanced BST |
O(n) | Linear | 10 | 1,000 | 10⁶ | Linear scan, single loop |
O(n log n) | Linearithmic | 33 | 10,000 | 2×10⁷ | Merge sort, heap sort |
O(n²) | Quadratic | 100 | 10⁶ | 10¹² | Bubble sort, nested loops |
O(2ⁿ) | Exponential | 1024 | ★ | ★★ | Naive recursion, subset enumeration |
O(n!) | Factorial | 3.6M | ★★★ | ★★★ | Brute-force permutations |
A modern computer executes roughly 10⁸ simple operations per second. A 1-second time limit means your algorithm must do ≤ 10⁸ operations. For n = 10⁵: O(n²) = 10¹⁰ — TLE. O(n log n) = 1.7×10⁶ — fast. For n = 10⁶: O(n) = 10⁶ — ok. O(n log n) = 2×10⁷ — ok. Use this mental model before coding.
O(2n) = O(n). O(100) = O(1). Constants don't matter for growth rate. A loop that does 3 things per iteration is still O(n).
O(n² + n) = O(n²). O(n + log n) = O(n). The dominant term wins as n grows large.
Two loops: one over n, one over m → O(n + m), not O(n²). Nested over both → O(n × m).
vector::push_back is O(1) amortized. Individual pushes may be O(n) (realloc), but averaged over n pushes the total is O(n) → O(1) each.
| Structure | Access | Search | Insert | Delete | Space |
|---|---|---|---|---|---|
| Array | O(1) | O(n) | O(n) | O(n) | O(n) |
| Dynamic Array (vector) | O(1) | O(n) | O(1)★ | O(n) | O(n) |
| Singly Linked List | O(n) | O(n) | O(1)† | O(1)† | O(n) |
| Stack / Queue | O(n) | O(n) | O(1) | O(1) | O(n) |
| Hash Table (unordered) | — | O(1)★ | O(1)★ | O(1)★ | O(n) |
| Binary Search Tree | O(log n)★ | O(log n)★ | O(log n)★ | O(log n)★ | O(n) |
| AVL / Red-Black Tree | O(log n) | O(log n) | O(log n) | O(log n) | O(n) |
| Binary Heap | O(n) | O(n) | O(log n) | O(log n) | O(n) |
| Trie | O(k) | O(k) | O(k) | O(k) | O(n·k) |
| Segment Tree | — | O(log n) | O(log n) | O(log n) | O(n) |
★ = average case | † = with known position | k = key/word length
Use two pointers moving through an array to reduce O(n²) brute force to O(n). Works on sorted arrays or when one pointer chases another (sliding window variant).
Two Sum II — find pair that sums to target in SORTED array.
Container with Most Water — maximize trapped water between two walls.
Remove duplicates from sorted array in-place.
Maintain a window [left, right] over the array. Expand right to add elements, shrink left to remove. Converts O(n²) brute force to O(n).
Longest substring without repeating characters.
Minimum window substring — find smallest window in s containing all chars of t.
Range sum query — answer Q range-sum queries in O(1) each after O(n) preprocessing.
Subarray with given XOR — count subarrays with XOR equal to k.
An array provides O(1) random access, but inserting or deleting an element in the middle requires shifting all subsequent elements — an O(n) operation. A Linked List solves this by storing data in scattered memory blocks (nodes) connected by pointers. You lose O(1) random access (no indexing), but gain O(1) insertions and deletions anywhere in the list (if you have the pointer). You reach for a Linked List when you need constant-time splicing, or when implementing dynamic queues/stacks where you only care about the ends.
LRU Cache — implement get/put in O(1) using doubly linked list + hash map.
Why use a Stack instead of just a vector? Because restrictions create invariants. By forbidding random access, a Stack (Last-In, First-Out) mathematically forces you to process the most recently seen incomplete element before any older ones. You reach for a stack when you need to match history: parsing parentheses, undo/redo features, or finding the "next greater element" where recent smaller elements are shadowed by larger ones.
Valid Parentheses — check balanced brackets.
Next Greater Element — for each element find the first greater to its right.
Largest Rectangle in Histogram — classic monotonic stack.
Sliding window maximum — maximum in every window of size k in O(n).
Choose → Explore → Un-choose. At each step, make a choice, recurse, then undo the choice to try the next option. Base case: reached a solution or dead end. Prune: skip choices that can't lead to a valid solution.
Subsets — generate all subsets of a set.
Permutations — generate all permutations of an array.
N-Queens — place N queens on N×N board so none attack each other.
| Algorithm | Best | Average | Worst | Space | Stable |
|---|---|---|---|---|---|
| Bubble Sort | O(n) | O(n²) | O(n²) | O(1) | Yes |
| Selection Sort | O(n²) | O(n²) | O(n²) | O(1) | No |
| Insertion Sort | O(n) | O(n²) | O(n²) | O(1) | Yes |
| Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes |
| Quick Sort | O(n log n) | O(n log n) | O(n²) | O(log n) | No |
| Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) | No |
| Counting Sort | O(n+k) | O(n+k) | O(n+k) | O(k) | Yes |
| Radix Sort | O(nk) | O(nk) | O(nk) | O(n+k) | Yes |
Binary search is not just for sorted arrays. Any problem where the answer space is monotone (if x works, x-1 also works, or vice versa) can be binary-searched. This is one of the most powerful techniques in competitive programming.
Group Anagrams — group strings that are anagrams of each other.
Subarray sum equals K — count subarrays with given sum.
Longest consecutive sequence — find length of longest sequence of consecutive integers.
What problem does a Heap solve that a sorted array cannot? A sorted array gives you the maximum in O(1), but inserting a new element is O(n). An unsorted array gives you O(1) insertion, but finding the maximum is O(n). A Heap is the perfect middle ground: it maintains just enough order to give you the maximum in O(1), while allowing insertions and deletions in O(log n). You reach for a Heap when you need to repeatedly process the "best" or "most urgent" item in a dynamic dataset.
Kth Largest Element — find kth largest without full sort.
Find Median from Data Stream — median of all elements seen so far.
Merge K Sorted Lists — merge k sorted linked lists in O(n log k).
Adjacency List (vector<vector<int>>) is used for 95% of problems. It uses O(V + E) memory and is optimal for sparse graphs. Adjacency Matrix (vector<vector<bool>>) uses O(V²) memory, which causes Memory Limit Exceeded for V > 10,000. Only use a matrix for dense graphs or when you need O(1) edge existence lookups (like Floyd-Warshall).
BFS searches layer by layer. Reach for BFS when asked for the shortest path in an unweighted graph, or when modeling "moves" in a grid puzzle. DFS plunges to the bottom of a path before backtracking. Reach for DFS when investigating connectivity, counting components, finding cycles, or doing backtracking where you need to track the current path state.
Dijkstra works because of the Greedy Choice Property on non-negative weights. When the priority queue pops the unvisited node u with the smallest current distance, that distance is guaranteed to be optimal. Why? Because any other path to u would have to go through some other unvisited node v. But since u has the smallest distance in the queue, the path to v is already longer than the path to u, and adding non-negative edges can only make it worse. The moment a node is popped, it is settled permanently.
Why do Kruskal's and Prim's greedy approaches work? Imagine dividing all nodes into two arbitrary sets (a cut). To connect these two sets, you must pick at least one edge crossing the cut. To minimize the total weight, you should obviously pick the lightest crossing edge. Both algorithms just apply this logic repeatedly: Prim's grows a single component by picking the lightest edge leaving it, while Kruskal's considers the globally lightest edge and adds it if it connects two disconnected components (preventing cycles).
Dynamic Programming is essentially optimized recursion. In naive recursion (like fib(5) = fib(4) + fib(3)), you recompute fib(3) multiple times because subproblems overlap. The recursion tree grows exponentially (O(2ⁿ)). DP stores the result of fib(3) the first time it is computed, trimming the exponential tree down to a linear path. Every DP is solved by answering 3 questions: 1) What is the State? 2) What is the Transition? 3) What is the Base Case?
Recursive function + cache (array or map). Natural to write. Only computes strictly needed states. The recursion trace: solve(i) → check cache → compute if needed → save and return.
Iterative nested loops filling a table from the base cases up to the answer. No recursion overhead. Allows for space optimization (e.g., dropping older rows from the table).
A bitmask is just an integer used compactly as an array of booleans. In TSP, mask = 5 (binary 101) means we have currently visited cities 0 and 2. The DP state dp[mask][i] answers: "What is the minimum cost to visit exactly the set of cities represented by mask, ending our path at city i?" Transitions simply try adding an unvisited city v to the mask: new_mask = mask | (1 << v).
A greedy algorithm works when the problem has greedy choice property (locally optimal choice leads to globally optimal) and optimal substructure (optimal solution contains optimal solutions to subproblems). Proving correctness requires showing no better choice exists at each step.
Activity Selection — maximize number of non-overlapping intervals.
Jump Game II — minimum jumps to reach end of array.
Fractional Knapsack — maximize value, items are divisible.
Gas Station — can you complete a circular route?
A Segment Tree solves a very specific problem: you have an array, and you need to perform thousands of Range Queries (what is the sum/min/max from index L to R?) interspersed with thousands of Point Updates (change index i to value v). A standard array gives O(1) updates but O(n) range queries. A prefix sum array gives O(1) range queries but O(n) updates. A Segment Tree gives O(log n) for both.
The core idea is dividing the array into contiguous power-of-two segments and precalculating the answer for each segment in a binary tree. A node's value is the combination (sum, min, max) of its two children. To update an element, you only need to update the log(n) nodes above it. To query a range, you combine at most 2 * log(n) precalculated segments, rather than iterating through N individual elements.
If you need to add +5 to an entire range [L, R], doing (R-L) point updates takes O(N) per query. Lazy Propagation fixes this by recording the +5 at the highest possible nodes in the tree that fully cover the update range, stopping early. We mark the node as "lazy" (it has pending updates for its children). The next time any operation travels down that path, it "pushes" the lazy value down to its immediate children before proceeding. This keeps range updates and range queries at strictly O(log n).
Single Number — find the one element that appears once (all others appear twice).
Count set bits in all numbers from 1 to n.
Generate all subsets using bitmask.
| Problem Pattern | Go-To Algorithm | Complexity |
|---|---|---|
| Subarray with sum/XOR = k | Prefix sum/XOR + hash map | O(n) |
| Longest substring / window | Sliding window | O(n) |
| Two values summing to target | Two pointers (sorted) or hash set | O(n) |
| k-th largest/smallest | Min-heap of size k or QuickSelect | O(n log k) |
| Merge k sorted arrays | Min-heap / priority_queue | O(n log k) |
| Top k frequent elements | Bucket sort or heap | O(n) |
| Path in graph, shortest | BFS (unweighted) / Dijkstra (weighted) | O((V+E) log V) |
| Connected components | DFS/BFS or Union-Find | O(V+E) |
| Minimum spanning tree | Kruskal (sparse) / Prim (dense) | O(E log E) |
| Cycle in directed graph | DFS with 3-color or Kahn's topo sort | O(V+E) |
| Scheduling / interval | Sort by end time, greedy | O(n log n) |
| Optimal substructure | Dynamic programming | Varies |
| Enumerate all subsets | Backtracking or bitmask | O(2^n) |
| Range sum query | Prefix sum (static) / BIT / SegTree | O(1) / O(log n) |
| Range min/max query | Sparse Table (static) / SegTree | O(1) / O(log n) |
| String matching | KMP or Rabin-Karp | O(n+m) |
| Prefix / autocomplete | Trie | O(k) per op |
| Max XOR pair | XOR Trie | O(n log maxVal) |
| Count inversions | Merge sort or BIT | O(n log n) |
| Binary search on answer | BS + feasibility check function | O(n log(range)) |