Or more accurately, why I don’t like James Gosling. I have two really big issues with Java as a language. First, that everything is an Object, and Second, that there are no unsigned data types. The first point is easy to work around, but the second point annoys me to no end. So this is mostly going to be a rant against James Gosling’s stance on unsigned integers.
The answer to the question of why Java doesn’t have unsigned data types can be inferred from an interview James Gosling did for the Java Report in 2000 where he was asked:
What does the phrase ‘Simple Language’ mean to you, and is Java a ‘Simple Language’?
To this he replied:
For me as a language designer, which I don’t really count myself as these days, what “simple” really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn’t — and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.
blah So why exactly does James Gosling think that almost no C developers know what is going on with unsigned arithmetic? Because he doesn’t understand it obviously. In fact, if you take a quick look at the two systems, signed and unsigned, you quickly come to realize that signed arithmetic is more complex. That’s right, the exact opposite of what James Gosling said.
To prove this, let’s do a quick foray into the wonderful world of binary arithmetic. First, unsigned arithmetic on a byte. To determine the value contained in an unsigned byte, first take it’s binary representation:
0101 0101
Whats this?! Zeros and ones? Oh my, Binary! Now, each position in this byte corresponds to a number that is a power of two. How we get these values is simple: take the index of the bit and raise two by it. So, in bit position zero, numbering from the right and going to the left, we get:
128, 64, 32, 16, 8, 4, 2, 1
So, in the above number, we get
0 + 64 + 0 + 16 + 0 + 4 + 0 + 1 = 85
Easy right? So what about adding and subtracting the two numbers? Exactly the same as if you were adding or subtracting a two decimal numbers, except your number range is 0->1. If you add 1’s together, you get 10 (binary for two). So for addition, if you exceed the value of one, you put a zero and carry the one. For subtraction, if you subtract one from zero, you have to borrow a 1 from one of the more significant digits. In other words, exactly like you were taught in Elementary school, except with a smaller range of digits.
Now, let’s take a look at signed integers. First thing that we need to know about them is that they are represented in two’s complement. This is where to get the binary representation of the decimal number, first take the binary value for the absolute value, and if it is negative perform the operation (~value + 1). Wait, what is that squiggly thing?! That is what we call the complement, in which you take all the bits and flip them. So, let’s find the value of -1. First off, the absolute value of -1 is 1, which is represented in binary as:
0000 0001
So, complementing this we get:
1111 1110
Now, add one to it:
1111 1111
This is nice for several reasons, human readability not being one of them. However, you can read the values of two’s complement binary value if you know the trick. First, check the most significant bit is set. If it is not, read it as if it was unsigned. Otherwise, read it as a unsigned, except use only unset bits and then add one. Or you could just do the operation to flip it as described above and just add a negative sign. Now, to add or subtract the numbers. Well, this is actually easy to do, if not to understand; you do exactly the same thing as for unsigned values and let the signs take care of themselves.
Take for example -1 + 1:
1111 1111 + 0000 0001 = 0000 0000
Which works out to 0. Or, for a not so nice one: -1 + -1
1111 1111 + 1111 1111 = 1111 1110 (-2 in twos complement)
So, operations on both signed/unsigned integers are exactly the same, but signed integers have a slightly less clear representation in binary. So why would James Gosling consider unsigned integers to be more difficult? To be fair, in the quote he was talking about corner cases appearing in languages right before he complained about unsigned arithmetic. So, let’s roll with that and examine some ‘corner cases’ concerning unsigned arithmetic.
Taking an example of a ‘corner case’ that someone posted on Stack Overflow, what happens if, god forbid, you try to compare a signed value with an unsigned value? (For reference: the link they posted was http://skeletoncoder.blogspot.com/2006/09/java-tutorials-why-no-unsigned.html”)
First off, you shouldn’t, but let’s say you really have to. So, you have a nice unsigned value, lets call him ‘a’, and a signed value ‘b’. Why you would do this, I have no idea, but let’s try to naively compare them.
if(b < a) { /* do something */ }
The example from stack overflow of this corner case mentioned something about ages and using -1 as an ‘invalid age’ for when we don’t know their age. Already though, my common sense is tingling here and telling me that we should probably test for this before attempting to compare the two variables. Something like:
if(b == -1) {
System.out.printf(“Please actually give your age”);
return 1;
}
But, no, let’s not do the right thing -. -
So, we have our nice variable b, and it has our error code in it, -1. Now, going with the example, let’s say we want to bar people under 18 from accessing something. The problem results from not actually thinking about how the numbers are represented. The example states that, intuitively, -1 is less than 18, right? So, according to logic, comparing -1 (signed) < 18 (unsigned) should magically work. But, if you remember, signed values use two’s complement and unsigned values don’t. So, how should the computer evaluate the comparison? First off, it has to decide if it should compare them both as signed or as unsigned values. Now, this is really the decision of the complier, not the CPU, since the CPU actually has two different opcodes to do the two different comparisons. So, let’s consult the C language specification. According to it, a comparison between an unsigned integer and a signed integer will result in casting the signed value to an unsigned value.
So, the code from the example is evaluating as:
if(1111 1111 < 0001 0010) { /* do something */ }
which comes out to be false, so the part of the program that keeps people less than 18 years old out of the restricted content fails when they don’t specify an age. Brilliance. So, let me ask this, if I as a self taught programmer (for the most part), who has been programming for all of 5 years now, knows this…why do most C programmers not? Is it really that hard to take all of 5 minutes to learn about two’s complement and read up on how unsigned/signed comparisons/arithmetic are performed in the language you write code in?
Things like signed/unsigned arithmetic don’t make a language complex. It just exposes the programmer to how their computer actually works. If you want to compare two unlike data types, you either specify how you want it to work, or you are at the mercy of the compiler. Hell, the above example could have been reduced to casting the minimum age to a signed integer. In fact, the proper way to do things is to not mix the two types of arithmetic at all. Or if you do, be very, very careful and understand what you are getting yourself into.