Number and character representation within computers

March 2010

what's a number?

How is a number (quantity) represented, stored in memory, or sent via some communications protocol? There is no "real" or "right" way. When you have the concept in your head of "one hundred and twenty three", how you express that is utterly independent from the concept. Arabic or roman digits, 123 small stones, etc. The concept of a quantity is utterly independent of it's representation.

The serial port (or USB port; the "S" means Serial) on the desktop/laptop and microcontrollers like the Arduino sends eight 1-bit digits at a time, asynchronously, eg. "at any time" not synchronized with anything external (like time of day, whatever). Please see my Bits, Baud, Modulation Rates" on how that works, and for true priests of the order, read Danny Cohen's On Holy Wars, or a Plea for Peace on how bits are sent.

With 8 contiguous one-bit states you can ENCODE a decimal number 0 to 255, a binary number 00000000 to 11111111, a hexadecimal number 00 to ff, etc. We call that conventionally a byte. You know I hope that 123 decimal is the same as 7B in hexadecimal notation, same as 01111011 in base two notation, and is a representation of the number of objects in a jar. It is not the objects in the jar.

With only byte-sized numbers, how do you represent 1234? Well one obvious way is to use different sized representations, like 16 or 32 bits, and this is of course commonly done. But it is impractical and unwieldy and not taking advantage of our convenient machinery to keep making up ad hoc representations for all possible datum.

what's a character?

Characters are graphical elements that represent a human idea or sound and like numbers characters are independent of their representation. There are many character encoding schemes, ASCII and it's modern offshoots the dominant scheme.

But these letters and digits you are reading here are yet another layer of encoding called "characters". to write the number concept "one hundred and twenty three" requires three characters (1, 2 and 3) to represent a "value" that would, in another encoding, fit into one byte.

Communication between a microcontroller like an Arduino, and a host computer program such as Max/MSP or Processing -- is the motor speed quantity of "one hundred and twenty three" packed into 8 bits (a single byte) or is it sending three characters 1, 2, 3, that have to be decoded and re-packed into the conceptual value "123"?

I can't answer that here. You'll have to look up how the program(s) transmits data through serial links. (In fact, if you use Serial.print in the Arduino, numbers are sent as ASCII encided strings of digits; one hundred and twenty three is sent as three characters, 1, 2 and 3, followed by an "end of line" character.)

It seems today even people with CS degrees are not familiar with these fundamental concepts so don't feel bad. It always makes me feel like I'm crazy when supposedly fully edumacated C.S. graduates have no idea of what I'm talking about!