Jan
08

URL & HTML Encoding: The Quick Reference You’ll Actually Use

A practical guide to when and how to encode URLs and HTML safely, avoid double-encoding, and stop broken links or XSS issues.

URL & HTML Encoding: The Quick Reference You’ll Actually Use

Audience: Frontend/backend engineers, QA, support
Goal: Apply the right encoding in the right context—no broken links, no XSS surprises.

Why encoding matters

Browsers and servers interpret certain characters specially. Encoding prevents misinterpretation: it keeps URLs intact, shows HTML literally, and stops injection bugs. The trick is knowing which encoding to use for which context.

Three common contexts

URL parameters (percent-encoding): Encode user input before placing it in a query string or path segment. Spaces → %20 (or + in form-encoding), & → %26.
HTML text/attributes (HTML entities): Encode <, >, &, quotes to prevent markup injection and XSS.
Transport-safe email bodies/headers (Quoted-Printable, Base64): Used in MIME emails; separate from HTML/URL encoding.

URL encoding essentials

Follow RFC 3986: encode reserved characters when they appear as data, not separators.
Common pitfalls: + vs %20; %25 indicates double-encoding; don’t encode :/?#[]@ when they are URL separators.
International domains: Punycode for non-ASCII hostnames.

HTML entity encoding essentials

Always encode: <, >, &, ", ' when injecting into HTML text or attributes.
Named vs numeric entities: named are readable (©), numeric are universal (© / ©).
Decode only when you intend to render the markup; decoding untrusted input can reintroduce XSS.

Common mistakes and fixes

Double-encoding: %2520 or &lt; means you encoded twice. Decode once and re-encode properly.
Encoding the whole URL when only params need it: Encode parameter values; don’t mangle :// unless embedding a URL inside another URL.
Using HTML encoding for URLs (or vice versa): Keep contexts separate; do URL encoding for URLs, HTML entities for HTML text/attributes.

Quick recipes

Add a user query to a search URL: Percent-encode the query value only.
Display user input inside HTML: HTML-encode the whole string; do not decode before display.
Show raw HTML in docs: Encode <>&"'; optionally encode all non-ASCII for portability.
International domain links: Convert host to Punycode; keep path/query percent-encoded UTF-8.

Where your tools fit

URL Encoder / Decoder: Percent-encode/decode parameters and full URLs; handle + vs %20.
HTML Entity Encode / Decode: Safely display or restore HTML/text with entities.
Punycode ↔ Unicode: Handle internationalized domain names.
Quoted-Printable Encode/Decode: For email/MIME contexts (not a substitute for URL/HTML encoding).

Safety reminders

Never decode untrusted input and then render it without re-encoding for the correct context.
Log raw and encoded forms separately if debugging; avoid logging sensitive payloads.
Document which contexts your app expects (URL, HTML, JSON) to prevent accidental misuse.

Bottom line

Use URL encoding for URLs, HTML entities for HTML, Punycode for IDNs, and avoid double-encoding. Keep contexts straight, and you’ll dodge most breakage and XSS issues.

Contact

Missing something?

Feel free to request missing tools or give some feedback using our contact form.