Jan
08

URL & HTML Encoding: The Quick Reference You’ll Actually Use

A practical guide to when and how to encode URLs and HTML safely, avoid double-encoding, and stop broken links or XSS issues.

URL & HTML Encoding: The Quick Reference You’ll Actually Use

Audience: Frontend/backend engineers, QA, support
Goal: Apply the right encoding in the right context—no broken links, no XSS surprises.

Why encoding matters

Browsers and servers interpret certain characters specially. Encoding prevents misinterpretation: it keeps URLs intact, shows HTML literally, and stops injection bugs. The trick is knowing which encoding to use for which context.

Three common contexts

  1. URL parameters (percent-encoding): Encode user input before placing it in a query string or path segment. Spaces → %20 (or + in form-encoding), &%26.
  2. HTML text/attributes (HTML entities): Encode <, >, &, quotes to prevent markup injection and XSS.
  3. Transport-safe email bodies/headers (Quoted-Printable, Base64): Used in MIME emails; separate from HTML/URL encoding.

URL encoding essentials

  • Follow RFC 3986: encode reserved characters when they appear as data, not separators.
  • Common pitfalls: + vs %20; %25 indicates double-encoding; don’t encode :/?#[]@ when they are URL separators.
  • International domains: Punycode for non-ASCII hostnames.

HTML entity encoding essentials

  • Always encode: <, >, &, ", ' when injecting into HTML text or attributes.
  • Named vs numeric entities: named are readable (&copy;), numeric are universal (&#169; / &#xA9;).
  • Decode only when you intend to render the markup; decoding untrusted input can reintroduce XSS.

Common mistakes and fixes

  • Double-encoding: %2520 or &amp;lt; means you encoded twice. Decode once and re-encode properly.
  • Encoding the whole URL when only params need it: Encode parameter values; don’t mangle :// unless embedding a URL inside another URL.
  • Using HTML encoding for URLs (or vice versa): Keep contexts separate; do URL encoding for URLs, HTML entities for HTML text/attributes.

Quick recipes

  • Add a user query to a search URL: Percent-encode the query value only.
  • Display user input inside HTML: HTML-encode the whole string; do not decode before display.
  • Show raw HTML in docs: Encode <>&"'; optionally encode all non-ASCII for portability.
  • International domain links: Convert host to Punycode; keep path/query percent-encoded UTF-8.

Where your tools fit

  • URL Encoder / Decoder: Percent-encode/decode parameters and full URLs; handle + vs %20.
  • HTML Entity Encode / Decode: Safely display or restore HTML/text with entities.
  • Punycode ↔ Unicode: Handle internationalized domain names.
  • Quoted-Printable Encode/Decode: For email/MIME contexts (not a substitute for URL/HTML encoding).

Safety reminders

  • Never decode untrusted input and then render it without re-encoding for the correct context.
  • Log raw and encoded forms separately if debugging; avoid logging sensitive payloads.
  • Document which contexts your app expects (URL, HTML, JSON) to prevent accidental misuse.

Bottom line

Use URL encoding for URLs, HTML entities for HTML, Punycode for IDNs, and avoid double-encoding. Keep contexts straight, and you’ll dodge most breakage and XSS issues.

Contact

Missing something?

Feel free to request missing tools or give some feedback using our contact form.

Contact Us