When DNS problems are subtle, basic lookup commands can look normal while clients still fail. This is the sequence I use.
1) Check authoritative vs recursive answers
- Query authoritative servers directly.
- Query the recursive resolver used by clients.
- Compare record values, TTL, and response flags.
Mismatch here often explains "works for me" reports across segments.
2) Check from at least two client locations
- one host inside the primary network
- one host from an external or alternate network
If answers differ, treat it as a resolver path issue before touching app config.
3) Validate complete resolution path
- confirm search domains are not rewriting short names
- confirm no stale hosts file entries exist
- confirm CNAME chain resolves to the expected final A/AAAA target
4) Correlate with edge logs
If DNS looks right, check reverse proxy access/error logs for host header mismatches, wrong upstream selection, or connection failures. DNS and proxy issues often overlap during migrations.
5) Document TTL assumptions
Every DNS change note should include:
- old TTL
- new TTL
- exact change timestamp
- expected full propagation window
Without that, teams declare incidents too early or keep waiting after propagation should already be complete.