Python strings are UTF-8 encoded by default. UTF-8 is a variable width format where each character can be of different width.
An decoder would first check the very first character bit and if that is 0
, then it is an 8-bit ASCII character. 16-bit characters would always start with 110
and the second byte would start with 10
. A 24-bit character would start with 1110
and the following bytes would start with 10
again. And for the largest 32-bit character, it would start with 11110
and, again, the following three bytes start with 10
.
The Wikipedia page explains and visualizes it quite nicely.