Protocols
Part 2
1/28/2009
Oh Yeah ...
- the game
- coax
- immunity to electrical interference when contained completely inside a grounded conductor
- fiber
- L for long wavelength, usually 1310nm (usu just infrared),
- E for extra-long wavelength 1550nm (sometimes even this squeaks into "near" ir but it's usu 750-100nm)
- X for fiber channel aka 8B/10B,
- R for dark fiber,
- W for WAN encoding (SONET compatible)
- 1000BaseLH long haul, 10GBaseLX4 for 4 WDM
Oh Yeah ...
- Copper Categories
- Category 1 and 2 are telephone, Category 3 is 4-conductor 10BaseT only
- Category 5 / Cat5 is defined in EIA/TIA 568B ("regular" and crossover) for wiring order, pair twisting, install requirements like cable bends and length, and signal characteristics for attenuation and crosstalk
- Category 5e (enhanced for higher data rate), Category 6 (Gigabit)
- What about Category 4? Token ring!
Questions?
Beyond Ethernet
- American T-carrier system
- PCM, pulse code modulation
- TDM, time-division multiplexing
- a single call uses a Digital Service, DS-0
- 64 kbps, 8-bit channel
- 24 DS-0 lines multiplex into DS-1
- 1.544 Mbps
- 23 voice channels, 1 for overhead
- 8-bit sample from each channel, then framing bit for 193 bits per frame, one frame every 125 µs (signal bandwidth is 1.536 Mbps)
- 4 DS-1 multiplex into DS-2
- 7 DS-2 multiplex into DS-3
- 44.736 Mbps for a T3
- voice only is a DS-3
Beyond Ethernet
Synchronous Digital Hierarchy
but are not otherwise compatible (that's the problem with pre-standards)
- OC-1 base rate is 51.84 Mbps
- STS frame has 9 rows of 90 octets
- first 3 octets per row are overhead, 6 for section, 18 for line
Beyond Ethernet
- What you need to know in this class is Ethernet!
98% of all networks are Ethernet
ARP (Review)
- local delivery on LAN needs MAC address
- ARP maps Layer 3 address to Layer 2 address, usually IP to MAC.
- When a station hears an ARP for itself, it stores the sender's MAC in its ARP table because it assumes communication may follow.
- gratuitous ARP - ARP for self to make sure IP address is available, usually when configuring NIC at boot
- proxy ARP - a router answers on behalf of one of its networks; promiscuous ARP, ARP hack
ARP Packet (Review)
it's an Ethernet packet with an ARP payload/message
ARP Packet (Review)
- ARP header
- 6 octets - Ethernet destination address, all 1s, broadcast (reply is unicast)
- 6 octets - Ethernet source address
- 2 octets - frame type, 0x0806 for ARP
- ARP request or reply
- 2 octets - hardware type, 1 for Ethernet
- 2 octets - protocol type, 0x0800 for IP, same as Ethernet header
- 1 octet - hardware address size (6 octets)
- 1 octet - protocol address size (4 octets)
- 2 octets - op field
- 1: ARP request
- 2: ARP reply
- 3: RARP request
- 4: RARP reply
Transmission Process: Before
Transmission Process: Before
- you must know your: IP address, netmask, and gateway (at minimum)
- what else would be nice?
DNS (by numeric IP) so you don't need to know destination's numeric IP address
- what does that flowchart assume?
assumes DNS is local, although it could be behind the default router
Pre-Transmission Process: Local
| | source node | packet description | destination node |
| 1 | me.dept.university.edu | ARP broadcast for MAC address of DNS server at known numeric IP address | L2 broadcast (to all) |
| 2 | dns.university.edu | ARP unicast reply | me.dept.university.edu |
| 3 | me.dept.university.edu | DNS unicast request for numeric IP address of neat-stuff.university.edu | dns.university.edu |
| 4 | dns.university.edu | DNS unicast reply with numeric IP address of neat-stuff.university.edu | me.dept.university.edu |
| 5 | me.dept.university.edu | ARP broadcast for MAC address to the IP address for neat-stuff.university.edu | L2 broadcast (all) |
| 6 | neat-stuff.university.edu | ARP unicast reply | me.dept.university.edu |
| now local-area data transmission between two hosts can begin |
Pre-Transmission Process: Wide Area
| | source node | packet description | destination node |
| 1 | me.dept.university.edu | ARP broadcast for MAC address of DNS server at known numeric IP address | L2 broadcast (all) |
| 2 | dns.university.edu | ARP unicast reply | me.dept.university.edu |
| 3 | me.dept.university.edu | DNS unicast request for numeric IP address for neat-stuff.company.com | dns.university.edu |
| 4 | dns.university.edu | DNS unicast reply with numeric IP address of neat-stuff.company.com | me.dept.university.edu |
| 5 | me.dept.university.edu | ARP broadcast for MAC address of my-default-router.university.edu | L2 broadcast (all) |
| 6 | my-default-router.university.edu | ARP unicast reply | me.dept.university.edu |
| now wide-area data transmission through my-default-router can begin |
ICMP
ICMP echo request gets ICMP echo reply
traceroute
tracert on Windows
- three packets for each iteration to detect path instability
path instability makes TCP really upset
send 3 packets with TTL of 1, first router sends back ICMP error that it had to drop that packet
send 3 packets with TTL of 2, second router sends back ICMP error
and so on, until you get to specified destination
ICMP
- ICMP is contained in IP packets
so usual Ethernet and IP headers, ICMP is the "IP payload," the message in the IP envelope
- 8 bits - type
- 8 bits - code
- 16 bits - checksum
- optional ICMP message
ICMP message is optional
- ICMP error messages contain the IP header and the first 8 octets of the IP payload of the packet that caused the error.
- ICMP errors are never generated for ICMP errors, or for Layer 2 broadcast, IP broadcast, IP multicast, or otherwise any address that does not define a single host.
- ICMP errors are not sent for any packet fragment other than the first.
- These rules for ICMP errors prevent broadcast storms that needlessly fill available bandwidth, either from errors generating more errors or from ICMP errors returned to a broadcast address.
Common ICMP Types and Codes
| type | code | description | query/error |
| 0 | 0 | echo reply (ping) | query |
| 4 | 0 | source quench (basic flow control) | error |
| 5 | 0 | network redirect (redirect type) | error |
| 5 | 1 | host redirect | error |
| 5 | 2 | redirect for ToS and network | error |
| 5 | 3 | redirect for ToS and host | error |
| 8 | 0 | echo request (ping) | query |
| 11 | 0 | TTL of 0 during transit (time exceeded type) | error |
| 11 | 1 | TTL of 0 during reassembly | error |
| 30 | 0 | traceroute packet successfully forwarded (traceroute type for future use in RFC 1393) | query |
| 30 | 1 | traceroute packet discarded - no route | query |
All About ICMP
just make sure you understand and can use ping and traceroute!
wikipedia
| type | code | description | query/error |
| 0 | 0 | echo reply (ping) | query |
| 3 | 0 | network unreachable (destination unreachable type) | error |
| 3 | 1 | host unreachable | error |
| 3 | 2 | protocol unreachable | error |
| 3 | 3 | port unreachable | error |
| 3 | 4 | fragmentation needed but don't-fragment bit set | error |
| 3 | 5 | source route failed | error |
| 3 | 6 | destination network unknown | error |
| 3 | 7 | destination host unknown | error |
| 3 | 8 | source host isolated (obsolete) | error |
| 3 | 9 | destination network administratively prohibited | error |
| 3 | 10 | destination host administratively prohibited | error |
| 3 | 11 | network unreachable for ToS | error |
| 3 | 12 | host unreachable for ToS | error |
| 3 | 13 | communication administratively prohibited by filtering | error |
| 3 | 14 | host precedence violation | error |
| 3 | 15 | precedence cutoff in effect | error |
| 4 | 0 | source quench (basic flow control) | error |
| 5 | 0 | network redirect (redirect type) | error |
| 5 | 1 | host redirect | error |
| 5 | 2 | redirect for ToS and network | error |
| 5 | 3 | redirect for ToS and host | error |
| 6 | 0 | alternate host address | query |
| 8 | 0 | echo request (ping) | query |
| 9 | 0 | router advertisement | query |
| 10 | 0 | router solicitation | query |
| 11 | 0 | TTL of 0 during transit (time exceeded type) | error |
| 11 | 1 | TTL of 0 during reassembly | error |
| 12 | 0 | bad IP header (catch-all error parameter problem type) | error |
| 12 | 1 | missing required option | error |
| 12 | 2 | bad length | error |
| 13 | 0 | timestamp request | query |
| 14 | 0 | timestamp reply | query |
| 15 | 0 | information request (obsolete) | query |
| 16 | 0 | information reply (obsolete) | query |
| 17 | 0 | address mask request | query |
| 18 | 0 | address mask reply | query |
| 19 | - | reserved for security | reserved |
| 20-29 | - | reserved for robustness experiment | reserved |
| 30 | 0 | traceroute packet successfully forwarded (traceroute type for future use in RFC 1393) | query |
| 30 | 1 | traceroute packet discarded - no route | query |
| 31 | - | datagram conversion error type for next version IP called IPv7 in RFC 1475 | error |
| 32 | - | mobile host redirect type | error |
| 33 | - | IPv6 Where-Are-You type | query |
| 34 | - | IPv6 Here-I-Am type | query |
| 35 | - | mobile registration request type | query |
| 36 | - | mobile registration reply type | query |
Ports
- Layer 4 is the Transport Layer
where you talk about connections (sockets and flows) and services (like reliability)
- The Layer 4 port is a unique application identifier.
- (Layer 3):(Layer 4) is a socket.
- (Station A Layer 3):(Station A Layer 4)::(Station B Layer 3):(Station B Layer 4) is a flow.
- look up well-known port numbers at IANA
they assign these numbers!
Layer 4 Tools
- nmap: what L4 ports are open along the entire path?
nmap -sU -p 69 172.29.158.80
- more info: 1, 2, 3
- netstat: routing table and open ports
- routing table:
netstat -nr
- open ports:
netstat -an
- active ("listening") ports:
netstat -ln
hope@mjollnir$ netstat -nr
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ------ ---------
152.2.145.0 152.2.145.34 U 1 1881 hme0
224.0.0.0 152.2.145.34 U 1 0 hme0
default 152.2.145.1 UG 1 4686
127.0.0.1 127.0.0.1 UH 2 758 lo0
hope@mjollnir$ netstat -an
UDP: IPv4
Local Address Remote Address State
-------------------- -------------------- -------
*.111 Idle
*.32771 Idle
*.514 Idle
*.177 Idle
*.14001 Idle
*.14008 Idle
*.7001 Idle
*.161 Idle
*.34468 Idle
*.34469 Idle
*.34486 Idle
hope@jonilaptop$ netstat -ln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:902 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 :::2220 :::* LISTEN
tcp 0 0 :::5308 :::* LISTEN
udp 0 0 0.0.0.0:33720 0.0.0.0:*
udp 0 0 0.0.0.0:69 0.0.0.0:*
udp 0 0 0.0.0.0:631 0.0.0.0:*
udp 0 0 152.2.145.83:123 0.0.0.0:*
udp 0 0 127.0.0.1:123 0.0.0.0:*
udp 0 0 0.0.0.0:123 0.0.0.0:*
udp 0 0 :::123 :::*
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ACC ] STREAM LISTENING 7862 /tmp/.font-unix/fs7100
unix 2 [ ACC ] STREAM LISTENING 11799 /tmp/mapping-hope
unix 2 [ ACC ] STREAM LISTENING 10654 /tmp/.gdm_socket
unix 2 [ ACC ] STREAM LISTENING 10707 /tmp/.X11-unix/X0
unix 2 [ ACC ] STREAM LISTENING 11088 /tmp/ssh-OUhgwY4951/agent.4951
unix 2 [ ACC ] STREAM LISTENING 11201 /tmp/orbit-hope/linc-13c8-0-5c5bcd5b900ce
unix 2 [ ACC ] STREAM LISTENING 11210 /tmp/orbit-hope/linc-1357-0-6470ed98aeeb1
unix 2 [ ACC ] STREAM LISTENING 11387 /tmp/.ICE-unix/4951
unix 2 [ ACC ] STREAM LISTENING 11397 /tmp/keyring-t9PiQZ/socket
unix 2 [ ACC ] STREAM LISTENING 11407 /tmp/orbit-hope/linc-13cd-0-397813bad8642
unix 2 [ ACC ] STREAM LISTENING 11429 /tmp/orbit-hope/linc-13cf-0-7b1251a8177c
unix 2 [ ACC ] STREAM LISTENING 75919 /tmp/orbit-hope/linc-33da-0-4dfc566421112
unix 2 [ ACC ] STREAM LISTENING 11589 /tmp/orbit-hope/linc-13ec-0-b6f99663b0dd
unix 2 [ ACC ] STREAM LISTENING 11625 /tmp/orbit-hope/linc-1400-0-227f500721dbb
unix 2 [ ACC ] STREAM LISTENING 11650 /tmp/orbit-hope/linc-13fc-0-227f50076cbb0
unix 2 [ ACC ] STREAM LISTENING 11677 /tmp/orbit-hope/linc-13fe-0-54e8a5df113cd
unix 2 [ ACC ] STREAM LISTENING 11721 /tmp/orbit-hope/linc-1404-0-54e8a5dfafc18
unix 2 [ ACC ] STREAM LISTENING 11759 /tmp/orbit-hope/linc-140e-0-5d0b05321efea
unix 2 [ ACC ] STREAM LISTENING 11821 /tmp/orbit-hope/linc-1408-0-681f708c4b2d7
unix 2 [ ACC ] STREAM LISTENING 11845 /tmp/orbit-hope/linc-1418-0-598f9ee8e1721
unix 2 [ ACC ] STREAM LISTENING 11873 /tmp/orbit-hope/linc-141a-0-7d829a85c7fbf
unix 2 [ ACC ] STREAM LISTENING 11900 /tmp/orbit-hope/linc-141c-0-204d0d9054f7e
unix 2 [ ACC ] STREAM LISTENING 11950 /tmp/orbit-hope/linc-1421-0-4e5836c38dc2
unix 2 [ ACC ] STREAM LISTENING 11981 /tmp/orbit-hope/linc-1423-0-4e5836ca4f33
unix 2 [ ACC ] STREAM LISTENING 7398 /var/run/acpid.socket
unix 2 [ ACC ] STREAM LISTENING 7965 /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 7792 /dev/gpmctl
unix 2 [ ACC ] STREAM LISTENING 11176 @/tmp/dbus-UuQXseCgln
unix 2 [ ACC ] STREAM LISTENING 11490 @/tmp/fam-hope-
UDP
User Datagram Protocol
one level of service (for Layer 4) is not to offer any services at all; this is UDP
- UDP is pure connectionless networking! at its best/worst! (best effort)
- +: simple, lightweight with small 8 octet header that minimizes L4 overhead
- -: not reliable, no services, not very configurable
UDP Packet
- 16 bit fields each for source and destination ports, length, and checksum
TCP
Transmission Control Protocol
lost (or late) packets will be re-transmitted, thanks to ACKs
- if it ain't UDP ... it's probably TCP
TCP and UDP are sorta opposites: reliable vs no services, options vs none, complex vs simple
- +: reliable, even connection-oriented, many configuration options
- -: overhead (some TCP packets are all header! 20 bytes > 8 bytes!), complex to program, difficult to optimize
TCP Packet: the TCP Header
- 16 bits each for source and destination ports
- 32 bits each for sequence number and acknowledgement number (what sequence number is expected next from the other end of the conversation)
- sequence number from 0 to 232-1, increment with each packet and wrap
TCP Packet: the TCP Header
- 4 bits for header length, as measured in 32-bit words
- 20 bytes minimum header length with no options
- 60 bytes maximum header length
- 6 bits - reserved
- 6 bits for flags
- URGent pointer is valid
- ACKnowledgement number is valid
- retransmissions based on ACKs
- receiver should PuSH this packet to app asap
- receiver should ReSeT this connection
- receiver should SYNchronize sequence numbers to establish a connection
- sender has FINished data transmission
TCP Header Options
Typical TCP Options
| kind | length of fields (bytes) | option meaning |
| 0 | 1 | end of option list from RFC 793 |
| 1 | 1 | no operation (NOP) from RFC 793 |
| 2 | 1 | kind (value)=2 | maximum segment size (MSS), from RFC 793 |
| 2 | 1 | length=4 | ibid |
| 2 | 2 | MSS | ibid |
| 3 | 1 | kind=3 | window scale factor, from RFC 1323 |
| 3 | 1 | length=3 | ibid |
| 3 | 1 | shift count | ibid |
| 8 | 1 | kind=8 | timestamp, from RFC 1323 |
| 8 | 1 | length=10 | ibid |
| 8 | 4 | timestamp | ibid |
| 8 | 4 | timestamp echo reply | ibid |
Interesting TCP Flags: SYN
- SYN
- ISN, initial sequence number
- maximum segment size, MSS
- similar to MTU, to avoid the dangers of fragmentation
- 536 default; BSD wants multiple of 512 so 1024 also common
- 1460 octets is optimal for Ethernet: 1500 Ethernet max - 20 bytes IP header - 20 bytes TCP header (or 1420 to be safe)
find an NPAD server for TCP tuning suggestions (could improve your bulk throughput by 1000x, really!)
Interesting TCP Flags: ACK
- ACK
- to reduce TCP-header-only packets in the conversation, use delayed ACKs, usually 200 ms, must be under 500 ms
- immediate ACK required if a packet arrives out of order!
- every other incoming packet should get a timely ACK
- Nagle algorithm
TCP Start
3-way handshake
you can see these states, SYN_SENT, SYN_RECD, ESTABLISHED, with the netstat command
TCP Finish
you can see these states, ESTABLISHED, FIN_WAIT_1, CLOSE_WAIT, FIN_WAIT_2, LAST_ACK, CLOSED, with the netstat command
- first two steps are a half close
- generally four steps since TCP is full duplex, for an orderly release
- ending with a RST is an abortive release
- half-open: one socket open, the other end's socket closed
you can see these states with the netstat command
TCP Finish
- maximum segment lifetime (30 seconds, 1 minute, 2 minutes common)
- must wait 2MSL before closing socket (on active close side)
- may retransmit passive close ACK or active close FIN
- other packets discarded
TCP Traffic Control
- flow control optimizes traffic for the two end points
- congestion control optimizes traffic for the network
TCP Flow Control
- each ACK contains a window advertisement of how much more data can be sent, generally the amount of free buffer
- a window is number of packets in transit without an ACK (yet)
- ACKs have a timer, and must show up before curfew
TCP Flow Control
- sliding window to maximize throughput
- advertised window
- TCP can send as many packets as the receiver's window allows before it must wait for an ACK
- ideal window capacity is (bandwidth [bits/sec]) * (RTT [sec]) * [1 byte / 8 bits]
- bandwidth delay product
- TCP gets inefficient and unstable for high values
TCP Congestion Control
- congestion window: use the smallest one between the congestion window and flow control's advertised window
- congestion avoidance: assume packet loss (either the packet or its ACK dropped) due to high network traffic
- multiplicative decrease: halve the congestion window
- additive increase: add one segment to congestion window for each received ACK
- AIMD: additive increase, multiplicative decrease
TCP Congestion Control
- slow start will double congestion window for every received ACK up to advertised window (exponential)
exponential behavior isn't slow!
- enter congestion avoidance for 3 duplicate ACKs
"triple duplicate ACK" is the hallmark of a dropped packet; why?
... because an immediate ACK is required upon receipt of an out-of-order packet
- Reno: congestion window / 2, classic AIMD
- Tahoe: 1 MSS and slow start
- congestion window reset to 1MSS if ACK timeout
Coming soon
- IPv6
- Advance notice of light reading! Browse Zytrax to supplement class coverage of IPv6. Once you can handle the addressing, you're most of the way to understanding IPv6.
Questions?