HTML URL Encoding

What is URL Encoding

The RFC 3986 specifies that the characters in URLs are limited to a set of reserved characters and a set of unreserved characters (US-ASCII). No other characters are allowed in URLs. However, URLs often contain characters that are not part of the set of reserved characters, so they need to be converted into a valid US ASCII format for global interoperability.

URL-encoding (or URL-percent-encoding) is the process of encoding URLs information in a way that allows them to be transmitted securely over the internet.

 

A two-step process is employed to chart the vast array of characters utilized across the globe:

  • In the beginning, the data is encoded using the UTF-8 encoding.
  • Then only the non-character bytes in the reserved set should be encoded as %HH, where the value is in hexadecircle.

For instance, the string: François would be written as: Fran%C3%A7ois

Ç, ç (c-cedilla) is a Latin script letter.


Reserved Characters

Some characters are reserved or prevented from being used in URLs because they can (or cannot) be used as separators by the general syntax in a specific URL scheme. For instance, forward slash / is used to separate different sections of a URL.

If the data in a URL component contains a character that conflicts with a reserved character that is defined as a separator in the URL scheme, then the character that conflicts must be percent encoded before the URL can be formed. A reserved character in a URL is:

! # $ & ' ( ) * + , / : ; = ? @ [ ]
%21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D

Unreserved Characters

Unreserved characters in URLs are characters that are allowed in URLs but don’t have a specific purpose. These characters include uppercase letters, lowercase letters, decimals, hyphens, periods, underscores, and tildes. Here is a table of all the reserved characters in URLs:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~  

URL Encoding Converter

According to RFC 3986. characters are encoded and decoded by the following converter:

Input a character and click on Encrypt or Decode button to view the result.