A) Strings to Numbers and Back Again
A common task in computer programs is to convert strings (e.g. "379") to a
numeric value 379. The string version comes from a text file, or perhaps is
typed in at the keyboard. The numeric value will be used in the program.
For output, the numeric values need to be converted back to strings so that
they are human readable.
Most languages provide convenient library routines to perform these operations.
Occasionally you might need to do this for yourself. This shows you how to deal
with positive integer/real numbers.
Character and character-codes
Characters are the things that humans recognize as having some meaning. Since
a computer only deals with 0/1 (binary numbers), there is a mapping between
characters and character-codes (number). The mapping commonly used in computers
is ASCII. ASCII allocates 8 bits
per character, but only uses 7, resulting 128 characters.
Unicode uses more bits
(typically 16)
per character to support all languages and many other symbols. The first 128
Unicode characters are the ASCII characters.
Here are some examples of character-codes and characters:
- unicode character-code x32 is the number character: 2
- unicode character-code x54 is the English character: T
- unicode character-code x03C0 is the math character: π
- unicode character-code x4E2D is the Chinese character: 中
- unicode character-code x06A0 is the Arabic character: ڠ
- unicode character-code x1F60A is the emoji character: 😊
Character-codes and digits
When dealing set of characters that the human interprets as a number, one must
convert the character to a digit. The digit is the number associated with
the character. Thus 'C' in base 16 will become a digit 12 (xC). Note that the
character-code is NOT the digit value. The character-code of 'C' is
the bit pattern: 0100.0011 (x43). The digit C has the bit pattern:
0000.1100 (x0C). The conversion from character-code to digit is:
digit = character-code -'0';
for '0' to '9'
digit = character-code -'A' + 10;
for 'A' to 'Z'
This relies on the fact that
in the ascii/unicode character-codes of '0' to '9' and 'A' to 'Z' have
consecutive, increasing values.
Why conversion is necessary
Consider the string "2573" and the number 2573. Both can be stored in 32 bits.
The string is four 8 bit ascii characters, and an int
is typically
32 bits on a computer. However, as the following shows, the internal format
is substantially different.
- 0011-0010|0011-0101|0011-0111|0011-0101 (the four ascii chars "2573")
- 0000-0000-0000-0000-0000-1010-0000-1101 (the 2's complement value 2573)
A1) Strings to Numbers
Converting string to numbers can be done one character at a time. Process
the characters from left to right and build up the final value. The basic steps
are:
- initialize:
value = 0
- loop over the characters in the string, left to right
- convert the character to a digit
- compute:
value = value * base + digit
- repeat steps 3 and 4 for all the characters
value
now contains the correct number
When the base is a power of 2, the multiplication by base can be implemented by
a simple left shift operation (e.g. shift by 4 for hex, 3 for octal, etc).
Why does this work?
A number anan-1an-2...a0 in
base b means anbn +
an-1bn-1
+ an-2bn-2 + ... + a0b0.
Applying Horner's rule for evaluation of polynomials, this can be rewritten as
(((anb + an-1)b + an-2)b + ... +
a0). Note that for each parentheses, the value in the parentheses
is multiplied by b and then the next terms is added. This is expressed in
step 4 above.
An example in decimal
Convert the string "263" to decimal.
- value = 0
- "263" - value = 10 * 0 + 2 = 2
- "263" - value = 10 * 2 + 6 = 26
- "263" - value = 10 * 26 + 3 = 263
An example in binary
Convert the string "010011" to its decimal value.
- value = 0
- "010011" - value = 2 * 0 + 0 = 0
- "010011" - value = 2 * 0 + 1 = 1
- "010011" - value = 2 * 1 + 0 = 2
- "010011" - value = 2 * 2 + 0 = 4
- "010011" - value = 2 * 4 + 1 = 9
- "010011" - value = 2 * 9 + 1 = 19
An example in hex
Convert the string "C8A" to decimal.
- value = 0
- "C8A" - value = 16 * 0 + 12 = 12
- "C8A" - value = 16 * 12 + 8 = 200
- "C8A" - value = 16 * 200 + 10 = 3210
Dealing with fractions
When one encounters a String with a '.', simply continue the
conversion described above and keep track of how many characters follow the
'.'. Call the count N. Once this is completed, take the result and divide it by
baseN. NOTE: be sure and force division to be in doubles.
Alternatively, one can convert only the fractional portion of the string to an
integer value using the procedure described above. Then do the division by the
power of the base and add the result to the integer portion of the number
(i.e. the part before the '.'). Under what circumstances might this be a
better solution than the original one?
An example with fractions
Convert the string "010.011" to its decimal value. Note that this is the same
value in the above example, except that a '.' has been added.
- value = 0
- "010.011" - value = 2 * 0 + 0 = 0
- "010.011" - value = 2 * 0 + 1 = 1
- "010.011" - value = 2 * 1 + 0 = 2
- "010.011" - value = 2 * 2 + 0 = 4
- "010.011" - value = 2 * 4 + 1 = 9
- "010.011" - value = 2 * 9 + 1 = 19
There are three characters to the right of the '.', so the result is 19 /
23 = 2.375. The reason this works is because one can multiply
anything by 1 without changing its value. Thus:
x = x * (baseN / baseN)
Now, by algebraic manipulation:
x = (x * baseN) / baseN
But, x * baseN
is simply x
with the
'.' move right by N places. Thus x
has been transformed from
a number containing a '.', to one where the '.' is at the right end and
is an integer. So, use the integer conversion, then correct the result by
doing the division.
Alternatively, convert the strings left and right of the '.' to values
(2 and 3) in the previous example. The final result is 2 + 3/23
(because there are three characters to the right of '.'), resulting in 2.375.
A2) Numbers to Strings
Converting numbers to strings is just the opposite of converting strings to
numbers. It can also be done one digit at a time, but since it is the opposite
the digits are computed from right to left. The basic operations used are
modulus (%) and integer division (/). The basic steps are:
- initialize the output string to empty (i.e. "")
- compute: r = value % base (i.e. the remainder)
- compute: value = value / base
- convert r to a character (e.g. 10 to 'A' in hex) For decimal digits the
expression (char)(r + '0') will give the correct character.
- prepend the character to the output string
- if value is 0, quit; otherwise return to step 2
When the base is a power of 2, the modulus and division operations are very
easy. Modulus is replaced by binary and with base -1 (i.e. and with 15 for hex).
Base minus 1 will be a binary string of all 1's that acts as a mask. Division
can be performed by a right shift (four places for hex, three for octal, etc).
In the following examples, leading zeros were added to keep columns lined up.
An example in decimal
Convert the number 156 to a decimal string
- output = ""
- r = 156 % 10 = 6; val = 156 / 10 = 15; output = "6"
- r = 015 % 10 = 5; val = 015 / 10 = 01; output = "56"
- r = 001 % 10 = 1; val = 001 / 10 = 00; output = "156"
An example in binary
Convert the number 23 to a binary string
- output = ""
- r = 23 % 2 = 1; val = 23 / 2 = 11; output = "1"
- r = 11 % 2 = 1; val = 11 / 2 = 05; output = "11"
- r = 05 % 2 = 1; val = 05 / 2 = 02; output = "111"
- r = 02 % 2 = 0; val = 02 / 2 = 01; output = "0111"
- r = 01 % 2 = 1; val = 01 / 2 = 00; output = "10111"
An example in hex
Convert the number 231 to a hex string
- output = ""
- r = 231 % 16 = 07; val = 23 / 16 = 14; output = "7"
- r = 014 % 16 = 14; val = 14 / 16 = 0; output = "E7"
If one is doing this by hand, it is often easier to first convert to binary,
then group the digits into blocks of four for hex (three for octal) to get the
base representation. Convert the number 231 to hex via binary
- output=""
- r = 231 % 2 = 1; val = 231 / 2 = 115; output = "1"
- r = 115 % 2 = 1; val = 115 / 2 = 057; output = "11"
- r = 057 % 2 = 1; val = 028 / 2 = 057; output = "111"
- r = 028 % 2 = 0; val = 028 / 2 = 014; output = "0111"
- r = 014 % 2 = 0; val = 014 / 2 = 007; output = "00111"
- r = 007 % 2 = 1; val = 007 / 2 = 003; output = "100111"
- r = 003 % 2 = 1; val = 003 / 2 = 001; output = "1100111"
- r = 001 % 2 = 1; val = 001 / 2 = 000; output = "11100111"
Now grouping the bits four at a time from right to left,
- 11100111 gives the value 7
- 11100111 gives the value E
resulting in the value E7. As an alternative, if you add zeros to the
left of the binary representation until the number of bits is a multiple of four
(or three for octal), you can work left to right.
Dealing with fractions
To convert a number with a fractional part to a string is a two step process.
First convert the integer portion of the value to a string using the method
described above. Then handle the fractional part. The string will be generated
left to right, unlike the integer portion which is generated right to left.
The basic process is:
- append '.' to the output
- Multiply the fractional part by the base
- Convert the integer portion of the result to a character
- append the character to the output
- discard the integer portion of the result and return to step 2
- repeat for as many digits as needed, or until the fractional part becomes 0
Note that this truncates the result rather than rounding it if the
process stops before the fractional part is 0.
An example with fractions
Convert 2.375 to binary.
- output = "10." convert integer portion and add '.'
- fract = 0.375; 0.375 * 2 = 0.75; output = "10.0
- fract = 0.750; 0.750 * 2 = 1.50; output = "10.01
- fract = 0.500; 0.500 * 2 = 1.00; output = "10.011
B) Spreadsheet Numbering Systems
Spreadsheets like Excel number columns using a base 26 numbering system
employing the characters 'A' to 'Z'. However, the numbering is 1 based,
meaning that 'A' represents a 1 and 'Z' represents a 26 rather that the
more familiar 0 to 25. Thus AA means 27, not 0 as it would be if the system
was 0 based. This only slightly complicates the conversion process. The reason
for the 1 based system is so that A and AA are different values. In a 0
based system, the letter 'A' would correspond to a value of 0, as would 'AA'.
By switching to a one based system, A and AA represent distinct values.
B1) String to Number
The only change one needs to make to the process described above is to force
the "value" of a character to be 1 based. Thus, we can convert the character
to a "value" by computing (character - 'A') and simply adding 1.
An Example
Convert "CF" to the decimal value
- value = 0
- "CF" - value = 26 * 0 + ('C' - 'A') + 1 = 00 + 3 = 3
- "CF" - value = 26 * 3 + ('F' - 'A') + 1 = 78 + 6 = 84
B2) Number to String
Again, the only change we need to make is to convert from 1 based to zero based.
In converting a string to a value, the conversion involved adding 1 to the
character's value. As this conversion is the "opposite", 1 will be subtracted.
An Example
Convert 47 to a string.
- output = ""
- val = 47 -1 = 46; r = 46 % 26 = 20; val = 46 / 26 = 1; output = "T"
- val = 01 -1 = 00; r = 00 % 26 = 00; val = 00 / 26 = 0; output = "AT"
(c) Fritz Sieker, 2009-2016