Padding in base64/65
Padding in base64/65

Padding in base64/65

Tags
Base64
Encoding
Padding
Published
May 1, 2024
Encoding
Character Set
base64/65
0-9, a-z, A-Z, +, /, (= as padding)
base58
0-9, a-z, A-Z excluding I, l, O, 0
base52
a-z, A-Z excluding I, l, O, 0
base32
0-9, A-Z excluding I, L, O, U

TL;DR:

The "=" character is used in Base64 encoding as a padding character, not as a regular encoding character. Padding is necessary because Base64 encoding converts every 3 bytes of binary data into 4 Base64 characters. If the length of the data is not a multiple of 3 bytes, "=" is used to pad the encoded data so that its length becomes divisible by 4, allowing the encoded data to be represented correctly. For example,
Strictly speaking, it uses 65 characters in total, but only the 64 of them are used for encoding.

Base64 Encoding Process

  1. Convert to Binary: The first step in Base64 encoding is to convert the input data into a binary stream.
  1. Divide into 6-bit Blocks: The binary stream is then divided into blocks of 6 bits. Each 6-bit block corresponds to a single Base64 character, because 2^6 equals 64, which is the number of characters available in the Base64 alphabet.

Need for Padding

Since each group of 3 bytes (or 24 bits) is encoded into 4 Base64 characters (each representing 6 bits), sometimes the total number of bits in the data isn't a multiple of 24. This occurs when the input data is not a multiple of 3 bytes. Here's how padding is applied:
  • If the input has 1 byte leftover (8 bits): After encoding these 8 bits into two Base64 characters (12 bits), there are 4 bits missing to complete the third Base64 character. Therefore, two padding characters ("==") are added after the third character (which partially encodes the 8 bits with some placeholder bits to reach 18 bits). The fourth character is entirely padding.
  • If the input has 2 bytes leftover (16 bits): These 16 bits are encoded into three Base64 characters (18 bits). To complete the fourth Base64 character, one padding character ("=") is used, adding the necessary 6 bits to complete the data block.

Illustrative Example

Consider the word "Cat":
  1. In ASCII, "Cat" is represented as 67 (C), 97 (a), 116 (t).
  1. In binary, this is 01000011 01100001 01110100.
  1. These 24 bits are divided into four groups of 6 bits: 010000 110110 000101 110100.
  1. Base64 encoding for these groups gives you the characters: Q, 2, F, 0.
  1. No padding is needed as the data length is a multiple of 3.
However, if the word were "Ca" (2 bytes):
  1. In ASCII, "Ca" is 67 (C), 97 (a).
  1. In binary, this is 01000011 01100001.
  1. These 16 bits are divided into three groups of 6 bits: 010000 110110 0001-- (incomplete).
  1. The incomplete group (0001) needs two bits to complete it, followed by four padding bits. The third Base64 character is Q (after binary to Base64 conversion of the first two groups and partial third), and one "=" is added to denote padding. Therefore, Base64 encoding for these groups give you the characters: Q, 2, E, =
In summary, the "=" characters ensure the output length remains correct, acting as placeholders rather than encoding actual data. This mechanism ensures the integrity and recoverability of the original data upon decoding.