Networking/
a field manual
Five modules, one mantra. Move from packets on a wire to VPCs in the cloud, with a debugging arsenal in between. Track your progress as you go — everything is saved locally to this device. Built by a Cloud Engineer at AWS — for engineers who already read kubernetes/kubernetes for fun.
How the internet actually works.
Strip away the abstractions. An IP address is a mailing address, DNS is the phonebook, HTTP is the language two machines agree to speak, and a request is what travels through all of it. Build this mental model first — everything else hangs off it.
Ports, protocols, packets & routing.
The middle layer. TCP guarantees order and delivery; UDP trades guarantees for speed. DNS resolution is recursive and cached at every hop. Routing is just “next hop?” asked over and over.
| Layer | What it does | Owns in packet | You debug with |
|---|---|---|---|
| L2 / Link | Moves frames across a physical/virtual network | MAC address | arp, ip link |
| L3 / Network | Routes packets between networks | IP address, TTL | ip route, traceroute, ping |
| L4 / Transport | End-to-end conversations (TCP/UDP) | Port, sequence, flags | ss, netstat, tcpdump |
| L7 / Application | HTTP, DNS, gRPC, the stuff humans care about | Headers, methods, body | curl, dig, nslookup |
| TCP | UDP | |
|---|---|---|
| Delivery | Guaranteed, ordered, retried | Fire-and-forget, no order |
| Setup | 3-way handshake (SYN → SYN-ACK → ACK) | None — just send |
| Overhead | Higher (state, ACKs, retransmits) | Lower (almost none) |
| Use when | HTTP, SSH, databases — correctness matters | DNS query, VoIP, gaming, metrics — speed matters |
| Failure mode | Slowness, retransmits, RST | Silent loss, you don’t know |
22 ssh,
53 dns,
80 http,
443 https,
3306 mysql,
5432 postgres,
6379 redis,
6443 kube-apiserver,
2379-2380 etcd.
The pieces in front of your app.
In production almost nothing talks to your container directly. Traffic lands on a load balancer, may pass through a reverse proxy, gets translated by NAT, and only then reaches your service. Each of these is a place a request can go wrong.
| Component | What it does | Lives at layer | Real-world example |
|---|---|---|---|
| Load Balancer (L4) | Spreads TCP/UDP connections across backends; doesn’t see HTTP | L4 | AWS NLB, HAProxy (TCP mode) |
| Load Balancer (L7) | Reads HTTP — routes by host/path, terminates TLS, rewrites headers | L7 | AWS ALB, NGINX, Envoy, Traefik |
| Reverse Proxy | Sits in front of an app, can cache, compress, auth, route | L7 | NGINX, Caddy, Envoy |
| NAT | Rewrites src/dst IP & port so private nets can reach the internet | L3/L4 | AWS NAT Gateway, your home router |
| Ingress Controller | K8s pattern that programs an L7 LB from Ingress / Gateway resources | L7 | AWS LBC, NGINX Ingress, Istio |
| Service Mesh | Sidecar proxies handle mTLS, retries, traffic-split between services | L4 – L7 | Istio, Linkerd, App Mesh |
target-type: ip programs the ALB to send traffic
directly to pod IPs via VPC CNI — skipping the kube-proxy/Service hop.
With target-type: instance the ALB hits a NodePort and lets
kube-proxy do the second hop. Knowing which mode you’re in changes how you debug latency.
If you can trace traffic, you can fix it.
Reading about networking gets you understanding. Running these commands against a broken system gets you a paycheck. Tap any card to expand — each one shows the purpose, a real example, and the thing that usually trips people up.
--resolve host:443:1.2.3.4 to bypass DNS — isolates whether DNS or the backend is broken.* * * isn’t always failure — many hops just refuse to reply to ICMP/UDP probes.ss — faster, modern, same flags. netstat on legacy boxes.ss -tlnp ends the argument.dig is the surgical tool; nslookup is what’s already installed everywhere.dig svc-name.namespace.svc.cluster.local from inside a pod isolates CoreDNS issues.ip route get <ip> answers “why does my traffic leave the wrong interface?” in one line.ss -tlnp on the other side — you’ve just proved L3+L4 from both ends.ping →
dig →
nc -zv →
curl -v →
traceroute/mtr →
tcpdump. Each step rules out a layer.
How traffic flows inside the cloud.
A VPC is just a software-defined data center. Once you’ve internalized the VPC → subnet → route table → SG chain, every cloud network feels the same. Without it, every cloud network feels like a riddle.
| Concept | Mental model | Common gotcha |
|---|---|---|
| VPC | A virtual data-center with its own private IP range (CIDR). | CIDR overlap kills VPC peering and TGW attachments. Plan early. |
| Subnet | A slice of the VPC, tied to one AZ. “Public” just means the route table points to an IGW. | A subnet isn’t public because of its name — it’s public because of its route table. |
| Route Table | Per-subnet rule: “for destination X, send to gateway Y.” | Missing 0.0.0.0/0 → nat-gw in a private subnet = mysterious egress timeouts. |
| Security Group | Stateful firewall at the ENI level. Default deny inbound, default allow outbound. | SGs are allow-only. There’s no deny rule. Use NACLs for explicit denies. |
| NACL | Stateless ACL at the subnet boundary. Numbered rules, allow + deny. | Stateless — you must allow ephemeral return ports (1024-65535) on outbound responses. |
| IGW vs NAT GW | IGW = two-way internet. NAT GW = outbound-only for private subnets. | NAT GW is per-AZ — one per AZ if you want HA and no cross-AZ data charges. |
| VPC Endpoint | Private tunnel from your VPC to an AWS service (S3, ECR, STS) without internet. | Without endpoints, every pull from ECR goes out the NAT GW. Look at your bill. |