URL Encoding: A Complete Guide to Percent Encoding
URL encoding is one of those things that seems obvious until it isn't — until you have a query parameter with an ampersand in it, or you're trying to pass a URL inside another URL, or you're wondering why your API is receiving "hello+world" instead of "hello world". This guide covers percent encoding from first principles through every practical scenario, including Python and PHP, form encoding, double-encoding pitfalls, IDN, and Punycode.
Why URLs Need Encoding
The URL specification (RFC 3986) defines URLs as sequences of ASCII characters. URLs can only contain characters from a specific allowed set. Every other character must be encoded.
This isn't arbitrary — URLs are transmitted in HTTP headers, logged in server access logs, stored in databases, displayed in browsers, and emailed as text. They have to survive all of these contexts reliably. A space character in a URL is ambiguous — is it part of the URL, or is it the end of the URL? Different parsers made different choices historically, leading to broken links and security vulnerabilities.
Percent encoding solves this by converting any problematic character into a universally safe representation: the percent sign followed by two hexadecimal digits.
What Breaks Without Encoding
// Problem: space in URL
https://example.com/search?q=hello world
// Different parsers treat this differently — truncate at space, error, or decode inconsistently
// Problem: ampersand in value
https://example.com/search?q=cats & dogs&lang=en
// The & in "cats & dogs" is parsed as a parameter separator
// Server receives: q="cats ", lang="en" — "dogs" disappears
// Problem: non-ASCII characters
https://example.com/search?q=日本語
// May work in modern browsers (which auto-encode), but many HTTP clients and servers reject this
With Proper Encoding
// Safe in all contexts
https://example.com/search?q=hello%20world
https://example.com/search?q=cats%20%26%20dogs&lang=en
https://example.com/search?q=%E6%97%A5%E6%9C%AC%E8%AA%9E
How Percent Encoding Works
Percent encoding follows a simple three-step algorithm:
- Convert the character to its UTF-8 byte sequence
- Convert each byte to two uppercase hexadecimal digits
- Prefix each byte representation with %
Examples
| Character | UTF-8 (hex) | Percent Encoded | Common In |
|---|---|---|---|
| Space | 0x20 | %20 | Any text in URLs |
| ! | 0x21 | %21 | Exclamation marks |
| & | 0x26 | %26 | Query string values |
| = | 0x3D | %3D | Key-value separators in values |
| ? | 0x3F | %3F | Query markers in values |
| # | 0x23 | %23 | Fragment markers in values |
| / | 0x2F | %2F | Path separators in values |
| é | 0xC3 0xA9 | %C3%A9 | Accented characters |
| 中 | 0xE4 0xB8 0xAD | %E4%B8%AD | CJK characters |
| 🎉 | 0xF0 0x9F 0x8E 0x89 | %F0%9F%8E%89 | Emoji |
Notice that multi-byte UTF-8 characters produce multiple percent-encoded sequences. The emoji 🎉 takes 4 bytes in UTF-8, producing 4 percent-encoded groups. This is why encodeURIComponent("🎉") returns a string 12 characters long.
Case Sensitivity
Percent encoding is case-insensitive — %2F and %2f are equivalent. RFC 3986 recommends uppercase. Most encoding functions produce uppercase; all parsers accept both.
Reserved vs Unreserved Characters
RFC 3986 divides URL characters into categories that determine when encoding is needed.
Unreserved Characters — Never Need Encoding
A-Z a-z 0-9 - _ . ~
These 66 characters are always safe in any part of a URL. You'll never need to encode them, and if you do encode them (e.g., %41 for "A"), it's technically equivalent but unnecessary.
Reserved Characters — Context Dependent
These characters have specific meaning in URL structure. Encode them when they appear in values, not when they're part of the URL structure:
| Character | Role in URL | Encoded As |
|---|---|---|
: | Scheme separator (https:), port separator (example.com:8080) | %3A |
/ | Path segment separator | %2F |
? | Marks start of query string | %3F |
# | Marks start of fragment | %23 |
[ ] | IPv6 address delimiters | %5B %5D |
@ | User info separator | %40 |
! | Sub-delimiter | %21 |
$ | Sub-delimiter | %24 |
& | Query parameter separator | %26 |
' | Sub-delimiter | %27 |
( ) | Sub-delimiters | %28 %29 |
* | Sub-delimiter | %2A |
+ | Sub-delimiter (also means space in form encoding) | %2B |
, | Sub-delimiter | %2C |
; | Sub-delimiter | %3B |
= | Key-value separator in query strings | %3D |
Context Is Everything
// & as a query separator — do NOT encode
https://example.com/search?q=cats&lang=en
// & as a value character — MUST encode
https://example.com/search?company=Johnson%26Johnson
// / as a path separator — do NOT encode
https://example.com/users/123/profile
// / as a literal in a value — MUST encode
https://example.com/redirect?to=%2Fusers%2F123
// ? as query start — do NOT encode
https://example.com/page?id=5
// ? as a character in a value — MUST encode
https://example.com/search?q=what%3F
Encoding in JavaScript
encodeURI() — For Complete URLs
Use when you have a complete URL that may contain spaces or non-ASCII characters, but you want to preserve the URL structure:
const url = "https://example.com/path with spaces?q=hello world";
encodeURI(url);
// "https://example.com/path%20with%20spaces?q=hello%20world"
// Characters NOT encoded by encodeURI:
// A-Z a-z 0-9 - _ . ! ~ * ' ( ) ; , / ? : @ & = + $ # %
encodeURIComponent() — For Values
Use for individual components — query parameter values, path segments that may contain special characters. It encodes everything except A-Z a-z 0-9 - _ . ! ~ * ' ( ):
encodeURIComponent("cats & dogs");
// "cats%20%26%20dogs"
encodeURIComponent("hello?world");
// "hello%3Fworld"
encodeURIComponent("price < 100");
// "price%20%3C%20100"
encodeURIComponent("user@example.com");
// "user%40example.com"
The Practical Rule
// Use encodeURIComponent for any user-provided value in a URL
const search = userInput; // Could be anything
const url = `https://example.com/search?q=${encodeURIComponent(search)}`;
// Use encodeURI only if you have a whole URL that just needs spaces fixed
const messy = "https://example.com/my files/doc.pdf";
encodeURI(messy); // Fixes the space, preserves URL structure
// Wrong: using encodeURI for a value — & won't get encoded
const query = "cats & dogs";
"?q=" + encodeURI(query); // "?q=cats%20&%20dogs" — & breaks the query string!
// Right: encodeURIComponent for values
"?q=" + encodeURIComponent(query); // "?q=cats%20%26%20dogs" ✓
Comparison Table
| Character | encodeURI | encodeURIComponent |
|---|---|---|
| Space | %20 | %20 |
| ? | ? (unchanged) | %3F |
| & | & (unchanged) | %26 |
| = | = (unchanged) | %3D |
| / | / (unchanged) | %2F |
| # | # (unchanged) | %23 |
| @ | @ (unchanged) | %40 |
| é | %C3%A9 | %C3%A9 |
URLSearchParams — The Modern Way
URLSearchParams handles encoding automatically and is available in all modern browsers and Node.js 10+:
// Build query string
const params = new URLSearchParams({
q: "cats & dogs",
lang: "en",
page: 2
});
params.toString();
// "q=cats+%26+dogs&lang=en&page=2"
// Note: URLSearchParams uses + for spaces, not %20
// Append to URL
const url = new URL("https://example.com/search");
url.search = params.toString();
url.href;
// "https://example.com/search?q=cats+%26+dogs&lang=en&page=2"
// Read and decode parameters
const qs = new URLSearchParams("q=cats+%26+dogs&lang=en");
qs.get("q"); // "cats & dogs" — automatically decoded
// Append individual parameters
const p = new URLSearchParams();
p.append("tag", "javascript");
p.append("tag", "web"); // Multiple values for same key
p.toString(); // "tag=javascript&tag=web"
Note that URLSearchParams uses + for spaces (form encoding), not %20. Both are valid in query strings, but if you need %20 specifically, use encodeURIComponent manually.
Encoding in Python
urllib.parse
from urllib.parse import quote, quote_plus, urlencode, urljoin
# quote() — similar to encodeURIComponent (but safe='/' by default)
quote("cats & dogs")
# "cats%20%26%20dogs"
quote("hello/world")
# "hello/world" — / is safe by default!
quote("hello/world", safe='') # No safe characters
# "hello%2Fworld" — now / is encoded
# quote_plus() — form encoding (spaces → +)
quote_plus("cats & dogs")
# "cats+%26+dogs"
# urlencode() — build query strings from a dict
urlencode({"q": "cats & dogs", "lang": "en"})
# "q=cats+%26+dogs&lang=en"
urlencode({"q": "cats & dogs"}, quote_via=quote) # Use %20 instead of +
# "q=cats%20%26%20dogs"
# Multiple values for same key
urlencode([("tag", "python"), ("tag", "web")])
# "tag=python&tag=web"
# Build a full URL
from urllib.parse import urlunparse, urlencode, ParseResult
base = ParseResult(scheme='https', netloc='example.com', path='/search',
params='', query=urlencode({'q': 'test'}), fragment='')
base.geturl()
# "https://example.com/search?q=test"
# Parse a URL
from urllib.parse import urlparse, parse_qs
parsed = urlparse("https://example.com/search?q=cats+%26+dogs&lang=en")
parsed.scheme # "https"
parsed.netloc # "example.com"
parsed.path # "/search"
parsed.query # "q=cats+%26+dogs&lang=en"
params = parse_qs(parsed.query)
params["q"] # ["cats & dogs"] — decoded automatically
params["lang"] # ["en"]
Encoding in PHP
<?php
// urlencode() — form encoding (spaces → +)
urlencode("cats & dogs"); // "cats+%26+dogs"
// rawurlencode() — RFC 3986 (spaces → %20)
rawurlencode("cats & dogs"); // "cats%20%26%20dogs"
// http_build_query() — build query strings
$params = ['q' => 'cats & dogs', 'lang' => 'en', 'page' => 2];
http_build_query($params); // "q=cats+%26+dogs&lang=en&page=2"
// With separator and encoding type
http_build_query($params, '', '&', PHP_QUERY_RFC3986);
// "q=cats%20%26%20dogs&lang=en&page=2"
// Decoding
urldecode("cats+%26+dogs"); // "cats & dogs"
rawurldecode("cats%20%26%20dogs"); // "cats & dogs"
// Parse a query string
parse_str("q=cats+%26+dogs&lang=en", $output);
$output['q']; // "cats & dogs"
$output['lang']; // "en"
// Full URL building
$base = "https://example.com/search";
$query = http_build_query(['q' => 'hello world', 'n' => 10]);
$url = $base . '?' . $query;
// "https://example.com/search?q=hello+world&n=10"
?>
PHP's urlencode() is form encoding (+ for spaces). rawurlencode() follows RFC 3986 (%20 for spaces). For query parameters in web apps, either works — just be consistent on the encoding and decoding side.
Form Encoding (application/x-www-form-urlencoded)
HTML forms have their own encoding format, specified as application/x-www-form-urlencoded. It's similar to URL encoding but with one key difference: spaces are encoded as + rather than %20.
This was an older convention (predates RFC 3986) and it's baked deeply into the web. When an HTML form submits via GET, the query string uses form encoding. When it submits via POST with the default content type, the body uses form encoding.
Form Encoding vs URL Encoding
| Character | URL Encoding (RFC 3986) | Form Encoding (HTML) |
|---|---|---|
| Space | %20 | + (or %20) |
| + | %2B | %2B |
| & | %26 | %26 |
| = | %3D | %3D |
Why This Matters
// Form encoding: space → +
// If you use decodeURIComponent on form data, + won't be decoded as space
const formData = "name=John+Doe&city=New+York";
const params = new URLSearchParams(formData);
params.get("name"); // "John Doe" — URLSearchParams handles + correctly
// If you manually parse:
decodeURIComponent("John+Doe"); // "John+Doe" — wrong! + stays as +
decodeURIComponent("John%20Doe"); // "John Doe" — correct
// Correct way to handle + in form data:
function decodeFormValue(s) {
return decodeURIComponent(s.replace(/\+/g, '%20'));
}
Submitting Forms via JavaScript
// URLSearchParams automatically uses form encoding
const formData = new URLSearchParams({
name: "John Doe",
city: "New York"
});
// POST with form encoding content type
fetch('/api/submit', {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: formData.toString()
// body: "name=John+Doe&city=New+York"
});
// Using FormData (sends as multipart/form-data instead)
const fd = new FormData();
fd.append('name', 'John Doe');
fd.append('file', fileInput.files[0]);
fetch('/api/upload', { method: 'POST', body: fd });
// Content-Type is automatically set to multipart/form-data with boundary
Common Mistakes
Encoding the Entire URL
// Wrong: encoding a complete URL with encodeURIComponent
const url = "https://example.com/path?q=test";
encodeURIComponent(url);
// "https%3A%2F%2Fexample.com%2Fpath%3Fq%3Dtest" — unusable as a URL
// Right: only encode values, not the URL structure
const base = "https://example.com/search?q=";
base + encodeURIComponent("test search"); // ✓
Using encodeURI for Parameter Values
// Wrong: encodeURI doesn't encode & so it breaks the query string
const query = "Tom & Jerry";
"/search?q=" + encodeURI(query);
// "/search?q=Tom%20&%20Jerry" — the & is interpreted as param separator!
// Server receives: q="Tom ", unnamed param=" Jerry"
// Right: encodeURIComponent encodes &
"/search?q=" + encodeURIComponent(query);
// "/search?q=Tom%20%26%20Jerry" ✓
Not Encoding + Characters
// User input: "C++ programming"
// Wrong: + isn't encoded
"?q=C++ programming";
// Server decodes + as space: q="C programming" — wrong!
// Right: encode the +
"?q=" + encodeURIComponent("C++ programming");
// "?q=C%2B%2B%20programming" ✓
// Server decodes: q="C++ programming" ✓
Mixing Encoding Contexts
// Wrong: manually encoding some parts, using URLSearchParams for others
const params = new URLSearchParams();
params.set("q", encodeURIComponent("cats & dogs")); // Double encoding!
// q → "cats%20%26%20dogs" (from encodeURIComponent)
// URLSearchParams encodes it again: "q=cats%2520%2526%2520dogs"
// Right: let URLSearchParams handle the encoding
const params = new URLSearchParams();
params.set("q", "cats & dogs"); // Raw value
params.toString(); // "q=cats+%26+dogs" ✓
Double Encoding
Double encoding happens when an already-encoded string gets encoded again. It's one of the most common URL-related bugs, and it's subtle because the resulting URL looks superficially correct.
How It Happens
// Step 1: Correctly encode a value
const encoded = encodeURIComponent("hello world");
// "hello%20world"
// Step 2 (bug): Encode the already-encoded string
encodeURIComponent(encoded);
// "hello%2520world" — % became %25
// The % in %20 got encoded to %25, creating %2520
// Server decodes %2520 as %20, not as a space
// User sees: "hello%20world" instead of "hello world"
Detecting Double Encoding
Look for %25 in a URL — that's an encoded percent sign, a classic sign of double encoding. %2520 is double-encoded space, %252F is double-encoded slash.
Fixing Double Encoding
// Option 1: Only encode raw (unencoded) strings — never encode twice
function safeEncode(value) {
// Decode first in case it's already encoded, then encode
return encodeURIComponent(decodeURIComponent(value));
}
// Option 2: Check if encoding is needed
function isEncoded(str) {
try {
return str !== decodeURIComponent(str);
} catch {
return false; // Invalid encoding
}
}
const value = "hello%20world"; // Already encoded
const safe = isEncoded(value) ? value : encodeURIComponent(value);
// safe = "hello%20world" — not double-encoded
URL Rewrites and Proxies
Double encoding often occurs in URL rewrite rules (Apache, nginx) or reverse proxies. If the proxy decodes the URL and the backend re-encodes it, you get %2520. The fix is usually to configure the proxy not to decode the URL before passing it upstream, or to ensure the backend expects pre-decoded input.
Unicode, IDN, and Punycode
International Domain Names (IDN)
Domain names were originally restricted to ASCII. Internationalized Domain Names (IDN) extend this to include non-Latin scripts — Arabic, Chinese, Russian, Hebrew, etc. However, the underlying DNS system still requires ASCII, so IDN uses Punycode encoding.
Punycode
Punycode converts Unicode domain labels to an ASCII-compatible encoding. The encoding is prefixed with xn-- to mark it as Punycode.
münchen.de → xn--mnchen-3ya.de
日本語.jp → xn--wgv71a309e.jp
españa.com → xn--espaa-rta.com
россия.рф → xn--h1alffa9f.xn--p1ai
Modern browsers display the Unicode form in the address bar while sending the Punycode form in the actual HTTP request. If you're working with domain names programmatically, use a dedicated IDN library rather than implementing Punycode yourself.
IDN in JavaScript
// Modern approach: use URL API — it handles IDN automatically
const url = new URL("https://münchen.de/path");
url.hostname; // "xn--mnchen-3ya.de" — Punycode
url.href; // "https://xn--mnchen-3ya.de/path"
// For just the domain conversion:
const idn = new URL("https://日本語.jp");
idn.hostname; // "xn--wgv71a309e.jp"
Non-ASCII in URL Paths and Queries
Non-ASCII characters in paths and queries (not domains) use UTF-8 percent encoding, not Punycode:
encodeURIComponent("日本語") // "%E6%97%A5%E6%9C%AC%E8%AA%9E"
encodeURIComponent("Ñoño") // "%C3%91o%C3%B1o"
encodeURIComponent("مرحبا") // "%D9%85%D8%B1%D8%AD%D8%A8%D8%A7"
Browsers will often display the decoded Unicode in the address bar even though the actual request uses percent encoding. This is a display convenience, not a change to the actual encoding.
The Punycode Converter Tool
The URL Encoder at ToolsDock handles both standard percent encoding and Punycode conversion for domain names.
Building Query Strings
The Safe Pattern
// JavaScript: use URLSearchParams
function buildURL(base, params) {
const url = new URL(base);
Object.entries(params).forEach(([key, value]) => {
if (Array.isArray(value)) {
value.forEach(v => url.searchParams.append(key, v));
} else {
url.searchParams.set(key, value);
}
});
return url.toString();
}
buildURL("https://example.com/search", {
q: "cats & dogs",
tag: ["javascript", "web"],
page: 2
});
// "https://example.com/search?q=cats+%26+dogs&tag=javascript&tag=web&page=2"
Embedding a URL in Another URL
This comes up in authentication flows (redirect_uri), link shorteners, and analytics tracking. The inner URL must be fully encoded so its special characters don't affect the outer URL's structure:
// Redirect after login
const returnUrl = "https://myapp.com/dashboard?tab=analytics&timeRange=7d";
const loginUrl = "https://auth.example.com/login?redirect_uri="
+ encodeURIComponent(returnUrl);
// loginUrl = "https://auth.example.com/login?redirect_uri=https%3A%2F%2Fmyapp.com%2Fdashboard%3Ftab%3Danalytics%26timeRange%3D7d"
// Decoding on the server:
const redirectTarget = decodeURIComponent(req.query.redirect_uri);
// "https://myapp.com/dashboard?tab=analytics&timeRange=7d"
// Security note: always validate the redirect_uri against an allowlist
// to prevent open redirect attacks
Query String Limits
HTTP/1.1 doesn't specify a URL length limit, but in practice:
- Most web servers: 8,192 bytes (nginx default, Apache default)
- Internet Explorer (historical): 2,048 bytes (long dead, but some proxies still have old limits)
- Chrome: ~2 MB
- The practical safe limit: 2,000 characters
If your query string might exceed a few hundred characters, consider using POST with a JSON body instead.
Debugging URL Issues
Browser Developer Tools
The Network tab in browser DevTools shows you the actual URL sent in each request, including headers. This is the ground truth — if the URL looks wrong here, that's what the server receives.
- Right-click a request → Copy → Copy as cURL to get the exact request as a command
- Check the Headers panel to see raw vs decoded URL values
- The Params panel (Chrome) shows query parameters decoded
Decoding for Inspection
// JavaScript
decodeURIComponent("hello%20world%26foo%3Dbar");
// "hello world&foo=bar"
// Command line
python3 -c "from urllib.parse import unquote; print(unquote('hello%20world%26foo%3Dbar'))"
# hello world&foo=bar
# Or use the URL Decoder tool at ToolsDock
Common Debugging Checks
- See %25 in your URL? Double encoding — a
%got encoded to%25 - Parameter disappears? An unencoded
&in a value is being treated as a separator - Plus signs in output? The server decoded
+as space (form encoding), but you sent actual+signs without encoding them as%2B - Wrong characters? Check encoding mismatch — encoding in one charset, decoding in another
URL Encoder
Encode text for safe use in URLs. Handles Unicode, emoji, and all special characters.
Encode URLQuick Reference
Use encodeURIComponent when:
- Encoding a query parameter value
- Encoding a path segment with special characters
- Embedding one URL inside another URL
- Any user input that goes into a URL
- Building API request URLs with dynamic values
Use URLSearchParams when:
- Building query strings in JavaScript
- Parsing incoming query strings
- Form submission via JavaScript
- Multiple values for the same key
- Reading/writing URL parameters on the URL object