ASCII table (re)explained
Contents
This totally blew my mind. Look at the ASCII table, organized in 4 columns:
00 01 10 11
00000 NUL Spc @ `
00001 SOH ! A a
00010 STX " B b
00011 ETX # C c
00100 EOT $ D d
00101 ENQ % E e
00110 ACK & F f
00111 BEL ' G g
01000 BS ( H h
01001 TAB ) I i
01010 LF * J j
01011 VT + K k
01100 FF , L l
01101 CR - M m
01110 SO . N n
01111 SI / O o
10000 DLE 0 P p
10001 DC1 1 Q q
10010 DC2 2 R r
10011 DC3 3 S s
10100 DC4 4 T t
10101 NAK 5 U u
10110 SYN 6 V v
10111 ETB 7 W w
11000 CAN 8 X x
11001 EM 9 Y y
11010 SUB : Z z
11011 ESC ; [ {
11100 FS < \ |
11101 GS = ] }
11110 RS > ^ ~
11111 US ? _ DEL
Observations
This original paper lists the design considerations (gets interesting at page 7). Wikipedia lists some interesting designs too. Some are explained here. Some are explained in the HN comments here and here. And this pdf paper.
But here are my observations:
Grouped in 4 columns, each with 32 chars, from 00000 (0) - 11111 (31). Each col has a different 2 bits prefix (00, 01, 10, 11).
All control chars are in first column, except DEL
, which is 0111_1111.
a-z and A-Z all begins with 00001 (1), ends at 11010 (26). The difference is the column they are in. Lower cases in column 10, upper cases in column 11. That means:
Converting to upper case means setting 6th bit to 1 (OR 0b00100000
).
Converting to lower case means setting 6th bit to 0 (AND 0b11011111
).
1 | let toLower (c:char) = byte c ||| byte 0b00100000 |> char |
Characters 0-9 are their represented numberic values (0b0000 - 0b1001) prefixed
with 0b0011 (OR 0b0011_0000
).
Converting char digit to number can be efficiently done with masking out the higher 4 bits:
1 | let char_to_num (digit: char) = byte digit &&& byte 0b00001111 |> int |
Note how these characters aligned:
10 11
11011 [ {
11100 \ |
11101 ] }