Jan
08
08
URL & HTML Encoding: The Quick Reference You’ll Actually Use
A practical guide to when and how to encode URLs and HTML safely, avoid double-encoding, and stop broken links or XSS issues.
URL & HTML Encoding: The Quick Reference You’ll Actually Use
Audience: Frontend/backend engineers, QA, support
Goal: Apply the right encoding in the right context—no broken links, no XSS surprises.
Why encoding matters
Browsers and servers interpret certain characters specially. Encoding prevents misinterpretation: it keeps URLs intact, shows HTML literally, and stops injection bugs. The trick is knowing which encoding to use for which context.
Three common contexts
- URL parameters (percent-encoding): Encode user input before placing it in a query string or path segment. Spaces →
%20(or+in form-encoding),&→%26. - HTML text/attributes (HTML entities): Encode
<,>,&, quotes to prevent markup injection and XSS. - Transport-safe email bodies/headers (Quoted-Printable, Base64): Used in MIME emails; separate from HTML/URL encoding.
URL encoding essentials
- Follow RFC 3986: encode reserved characters when they appear as data, not separators.
- Common pitfalls:
+vs%20;%25indicates double-encoding; don’t encode:/?#[]@when they are URL separators. - International domains: Punycode for non-ASCII hostnames.
HTML entity encoding essentials
- Always encode:
<,>,&,",'when injecting into HTML text or attributes. - Named vs numeric entities: named are readable (
©), numeric are universal (©/©). - Decode only when you intend to render the markup; decoding untrusted input can reintroduce XSS.
Common mistakes and fixes
- Double-encoding:
%2520or&lt;means you encoded twice. Decode once and re-encode properly. - Encoding the whole URL when only params need it: Encode parameter values; don’t mangle
://unless embedding a URL inside another URL. - Using HTML encoding for URLs (or vice versa): Keep contexts separate; do URL encoding for URLs, HTML entities for HTML text/attributes.
Quick recipes
- Add a user query to a search URL: Percent-encode the query value only.
- Display user input inside HTML: HTML-encode the whole string; do not decode before display.
- Show raw HTML in docs: Encode
<>&"'; optionally encode all non-ASCII for portability. - International domain links: Convert host to Punycode; keep path/query percent-encoded UTF-8.
Where your tools fit
- URL Encoder / Decoder: Percent-encode/decode parameters and full URLs; handle
+vs%20. - HTML Entity Encode / Decode: Safely display or restore HTML/text with entities.
- Punycode ↔ Unicode: Handle internationalized domain names.
- Quoted-Printable Encode/Decode: For email/MIME contexts (not a substitute for URL/HTML encoding).
Safety reminders
- Never decode untrusted input and then render it without re-encoding for the correct context.
- Log raw and encoded forms separately if debugging; avoid logging sensitive payloads.
- Document which contexts your app expects (URL, HTML, JSON) to prevent accidental misuse.
Bottom line
Use URL encoding for URLs, HTML entities for HTML, Punycode for IDNs, and avoid double-encoding. Keep contexts straight, and you’ll dodge most breakage and XSS issues.
Contact
Missing something?
Feel free to request missing tools or give some feedback using our contact form.
Contact Us