Why doesn't Java include support for unsigned integers?
It seems to me to be an odd omission, given that they allow one to write code that is less likely to produce overflows on unexpectedly large input.
Furthermore, using unsigned integers can be a form of self-documentation, since they indicate that the value which the unsigned int was intended to hold is never supposed to be negative.
Lastly, in some cases, unsigned integers can be more efficient for certain operations, such as division.
What's the downside to including these?
byte
not being able to give a straight 140
gray level but a -116
that you need to & 0xff
to get the correct value.
This is from an interview with Gosling and others, about simplicity:
Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.
Reading between the lines, I think the logic was something like this:
generally, the Java designers wanted to simplify the repertoire of data types available
for everyday purposes, they felt that the most common need was for signed data types
for implementing certain algorithms, unsigned arithmetic is sometimes needed, but the kind of programmers that would be implementing such algorithms would also have the knowledge to "work round" doing unsigned arithmetic with signed data types
Mostly, I'd say it was a reasonable decision. Possibly, I would have:
made byte unsigned, or at least have provided a signed/unsigned alternatives, possibly with different names, for this one data type (making it signed is good for consistency, but when do you ever need a signed byte?)
done away with 'short' (when did you last use 16-bit signed arithmetic?)
Still, with a bit of kludging, operations on unsigned values up to 32 bits aren't tooo bad, and most people don't need unsigned 64-bit division or comparison.
short
is used - defltate/gzip/inflate algorithms are 16bit and they rely heavily on shorts... or at least short[]
[admittedly they are native - yet java impl of the algorithm carry terrabytes of data]. The latter (short[]
) has significant advantage to int[]
since it takes twice less memory and less memory = better caching properties, much better performance.
This is an older question and pat did briefly mention char, I just thought I should expand upon this for others who will look at this down the road. Let's take a closer look at the Java primitive types:
byte
- 8-bit signed integer
short
- 16-bit signed integer
int
- 32-bit signed integer
long
- 64-bit signed integer
char
- 16-bit character (unsigned integer)
Although char
does not support unsigned
arithmetic, it essentially can be treated as an unsigned
integer. You would have to explicitly cast arithmetic operations back into char
, but it does provide you with a way to specify unsigned
numbers.
char a = 0;
char b = 6;
a += 1;
a = (char) (a * b);
a = (char) (a + b);
a = (char) (a - 16);
b = (char) (b % 3);
b = (char) (b / a);
//a = -1; // Generates complier error, must be cast to char
System.out.println(a); // Prints ?
System.out.println((int) a); // Prints 65532
System.out.println((short) a); // Prints -4
short c = -4;
System.out.println((int) c); // Prints -4, notice the difference with char
a *= 2;
a -= 6;
a /= 3;
a %= 7;
a++;
a--;
Yes, there isn't direct support for unsigned integers (obviously, I wouldn't have to cast most of my operations back into char if there was direct support). However, there certainly exists an unsigned primitive data type. I would liked to have seen an unsigned byte as well, but I guess doubling the memory cost and instead use char is a viable option.
Edit
With JDK8 there are new APIs for Long
and Integer
which provide helper methods when treating long
and int
values as unsigned values.
compareUnsigned
divideUnsigned
parseUnsignedInt
parseUnsignedLong
remainderUnsigned
toUnsignedLong
toUnsignedString
Additionally, Guava provides a number of helper methods to do similar things for at the integer types which helps close the gap left by the lack of native support for unsigned
integers.
char
is too small to support long
arithmetic, for example.
Java does have unsigned types, or at least one: char is an unsigned short. So whatever excuse Gosling throws up it's really just his ignorance why there are no other unsigned types.
Also Short types: shorts are used all the time for multimedia. The reason is you can fit 2 samples in a single 32-bit unsigned long and vectorize many operations. Same thing with 8-bit data and unsigned byte. You can fit 4 or 8 samples in a register for vectorizing.
char
for anything but characters.
As soon as signed and unsigned ints are mixed in an expression things start to get messy and you probably will lose information. Restricting Java to signed ints only really clears things up. I’m glad I don’t have to worry about the whole signed/unsigned business, though I sometimes do miss the 8th bit in a byte.
static_cast
s around much to mix them. It is indeed messy.
byte
be signed as it was in Pascal.
& 0xFF
'ing every byte-to-int promotion makes the code even messier.
http://skeletoncoder.blogspot.com/2006/09/java-tutorials-why-no-unsigned.html
This guy says because the C standard defines operations involving unsigned and signed ints to be treated as unsigned. This could cause negative signed integers to roll around into a large unsigned int, potentially causing bugs.
-1
--to any unsigned quanity--even zero.
-1
as "unknown" age (as the article suggests) is one of the classic examples of "code smell". For instance, if you want to compute "how much Alice is older than Bob?", and A=25 and B=-1, you will get an answer of ±26
which is simply wrong. The proper handling of unknown values is some kind of Option<TArg>
when Some(25) - None
would return None
.
I think Java is fine as it is, adding unsigned would complicate it without much gain. Even with the simplified integer model, most Java programmers don't know how the basic numeric types behave - just read the book Java Puzzlers to see what misconceptions you might hold.
As for practical advice:
If your values are somewhat arbitrary size and don't fit into int, use long. If they don't fit into long use BigInteger.
Use the smaller types only for arrays when you need to save space.
If you need exactly 64/32/16/8 bits, use long/int/short/byte and stop worrying about the sign bit, except for division, comparison, right shift, and casting.
See also this answer about "porting a random number generator from C to Java".
>>
and >>>
for signed and unsigned, respectively. Shifting left is no problem.
>>>
doesn't work for short
and byte
. For example, (byte)0xff>>>1
yields 0x7fffffff
rather than 0x7f
. Another example: byte b=(byte)0xff; b>>>=1;
will result in b==(byte)0xff
. Of course you can do b=(byte)(b & 0xff >> 1);
but this adds one more operation (bitwise &).
I know this post is too old; however for your interest, in Java 8 and later, you can use the int
data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 232−1. Use the Integer
class to use int
data type as an unsigned integer and static methods like compareUnsigned()
, divideUnsigned()
etc. have been added to the Integer
class to support the arithmetic operations for unsigned integers.
With JDK8 it does have some support for them.
We may yet see full support of unsigned types in Java despite Gosling's concerns.
I've heard stories that they were to be included close to the orignal Java release. Oak was the precursor to Java, and in some spec documents there was mention of usigned values. Unfortunately these never made it into the Java language. As far as anyone has been able to figure out they just didn't get implemented, likely due to a time constraint.
char
) were left out because the designers thought they were a bad idea ... given the goals of the language.
I once took a C++ course with someone on the C++ standards committee who implied that Java made the right decision to avoid having unsigned integers because (1) most programs that use unsigned integers can do just as well with signed integers and this is more natural in terms of how people think, and (2) using unsigned integers results in lots easy to create but difficult to debug issues such as integer arithmetic overflow and losing significant bits when converting between signed and unsigned types. If you mistakenly subtract 1 from 0 using signed integers it often more quickly causes your program to crash and makes it easier to find the bug than if it wraps around to 2^32 - 1, and compilers and static analysis tools and runtime checks have to assume you know what you're doing since you chose to use unsigned arithmetic. Also, negative numbers like -1 can often represent something useful, like a field being ignored/defaulted/unset while if you were using unsigned you'd have to reserve a special value like 2^32 - 1 or something similar.
Long ago, when memory was limited and processors did not automatically operate on 64 bits at once, every bit counted a lot more, so having signed vs unsigned bytes or shorts actually mattered a lot more often and was obviously the right design decision. Today just using a signed int is more than sufficient in almost all regular programming cases, and if your program really needs to use values bigger than 2^31 - 1, you often just want a long anyway. Once you're into the territory of using longs, it's even harder to come up with a reason why you really can't get by with 2^63 - 1 positive integers. Whenever we go to 128 bit processors it'll be even less of an issue.
Your question is "Why doesn't Java support unsigned ints"?
And my answer to your question is that Java wants that all of it's primitive types: byte, char, short, int and long should be treated as byte, word, dword and qword respectively, exactly like in assembly, and the Java operators are signed operations on all of it's primitive types except for char, but only on char they are unsigned 16 bit only.
So static methods suppose to be the unsigned operations also for both 32 and 64 bit.
You need final class, whose static methods can be called for the unsigned operations.
You can create this final class, call it whatever name you want and implement it's static methods.
If you have no idea about how to implement the static methods then this link may help you.
In my opinion, Java is not similar to C++ at all, if it neither support unsigned types nor operator overloading, so I think that Java should be treated as completely different language from both C++ and from C.
It is also completely different in the name of the languages by the way.
So I don't recommend in Java to type code similar to C and I don't recommend to type code similar to C++ at all, because then in Java you won't be able to do what you want to do next in C++, i.e. the code won't continue to be C++ like at all and for me this is bad to code like that, to change the style in the middle.
I recommend to write and use static methods also for the signed operations, so you don't see in the code mixture of operators and static methods for both signed and unsigned operations, unless you need only signed operations in the code, and it's okay to use the operators only.
Also I recommend to avoid using short, int and long primitive types, and use word, dword and qword respectively instead, and you are about call the static methods for unsigned operations and/or signed operations instead of using operators.
If you are about to do signed operations only and use the operators only in the code, then this is okay to use these primitive types short, int and long.
Actually word, dword and qword don't exist in the language, but you can create new class for each and the implementation of each should be very easy:
The class word holds the primitive type short only, the class dword holds the primitive type int only and the class qword holds the primitive type long only. Now all the unsigned and the signed methods as static or not as your choice, you can implement in each class, i.e. all the 16 bit operations both unsigned and signed by giving meaning names on the word class, all the 32 bit operations both unsigned and signed by giving meaning names on the dword class and all the 64 bit operations both unsigned and signed by giving meaning names on the qword class.
If you don't like giving too many different names for each method, you can always use overloading in Java, good to read that Java didn't remove that too!
If you want methods rather than operators for 8 bit signed operations and methods for 8 bit unsigned operations that have no operators at all, then you can create the Byte class (note that the first letter 'B' is capital, so this is not the primitive type byte) and implement the methods in this class.
About passing by value and passing by reference:
If I am not wrong, like in C#, primitive objects are passed by value naturally, but class objects are passed by reference naturally, so that means that objects of type Byte, word, dword and qword will be passed by reference and not by value by default. I wish Java had struct objects as C# has, so all Byte, word, dword and qword could be implemented to be struct instead of class, so by default they were passed by value and not by reference by default, like any struct object in C#, like the primitive types, are passed by value and not by reference by default, but because that Java is worse than C# and we have to deal with that, then there is only classes and interfaces, that are passed by reference and not by value by default. So if you want to pass Byte, word, dword and qword objects by value and not by reference, like any other class object in Java and also in C#, you will have to simply use the copy constructor and that's it.
That's the only solution that I can think about. I just wish that I could just typedef the primitive types to word, dword and qword, but Java neither support typedef nor using at all, unlike C# that supports using, which is equivalent to the C's typedef.
About output:
For the same sequence of bits, you can print them in many ways: As binary, as decimal (like the meaning of %u in C printf), as octal (like the meaning of %o in C printf), as hexadecimal (like the meaning of %x in C printf) and as integer (like the meaning of the %d in C printf).
Note that C printf doesn't know the type of the variables being passed as parameters to the function, so printf knows the type of each variable only from the char* object passed to the first parameter of the function.
So in each of the classes: Byte, word, dword and qword, you can implement print method and get the functionality of printf, even though the primitive type of the class is signed, you still can print it as unsigned by following some algorithm involving logical and shift operations to get the digits to print to the output.
Unfortunately the link I gave you doesn't show how to implement these print methods, but I am sure you can google for the algorithms you need to implement these print methods.
That's all I can answer your question and suggest you.
Because unsigned
type is pure evil.
The fact that in C unsigned - int
produces unsigned
is even more evil.
Here is a snapshot of the problem that burned me more than once:
// We have odd positive number of rays,
// consecutive ones at angle delta from each other.
assert( rays.size() > 0 && rays.size() % 2 == 1 );
// Get a set of ray at delta angle between them.
for( size_t n = 0; n < rays.size(); ++n )
{
// Compute the angle between nth ray and the middle one.
// The index of the middle one is (rays.size() - 1) / 2,
// the rays are evenly spaced at angle delta, therefore
// the magnitude of the angle between nth ray and the
// middle one is:
double angle = delta * fabs( n - (rays.size() - 1) / 2 );
// Do something else ...
}
Have you noticed the bug yet? I confess I only saw it after stepping in with the debugger.
Because n
is of unsigned type size_t
the entire expression n - (rays.size() - 1) / 2
evaluates as unsigned
. That expression is intended to be a signed position of the n
th ray from the middle one: the 1st ray from the middle one on the left side would have position -1, the 1st one on the right would have position +1, etc. After taking abs value and multiplying by the delta
angle I would get the angle between n
th ray and the middle one.
Unfortunately for me the above expression contained the evil unsigned and instead of evaluating to, say, -1, it evaluated to 2^32-1. The subsequent conversion to double
sealed the bug.
After a bug or two caused by misuse of unsigned
arithmetic one has to start wondering whether the extra bit one gets is worth the extra trouble. I am trying, as much as feasible, to avoid any use of unsigned
types in arithmetic, although still use it for non-arithmetic operations such as binary masks.
unsigned
gets converted to int
at every operation what's the use of unsigned
? It won't have any functionality distinguishable from short
. And if you convert to int
only on mixed operations, such as unsigned+int
or unsigned+float
, then you still have the problem of ((unsigned)25-(unsigned)30)*1.0 > 0
, which is a major cause of unsigned
-related bugs.
exit(1);
really 'worth the extra trouble'? Is not being able to open large files really worth the security that less experienced java programmers will not mess up using unsigned
?
n - (rays.size() - 1) / 2
. You should always bracket binary operators because the reader of the code should not need to assume anything about order of operations in a computer program. Just because we conventionally say a+bc = a+(bc) does not mean you can assume this when reading code. Furthermore, the computation should be defined outside the loop so that it can be tested without the loop present. This is a bug in not making sure your types line up rather than a problem of unsigned integers. In C it's up to you to make sure your types line up.
There's a few gems in the 'C' spec that Java dropped for pragmatic reasons but which are slowly creeping back with developer demand (closures, etc).
I mention a first one because it's related to this discussion; the adherence of pointer values to unsigned integer arithmetic. And, in relation to this thread topic, the difficulty of maintaining Unsigned semantics in the Signed world of Java.
I would guess if one were to get a Dennis Ritchie alter ego to advise Gosling's design team it would have suggested giving Signed's a "zero at infinity", so that all address offset requests would first add their ALGEBRAIC RING SIZE to obviate negative values.
That way, any offset thrown at the array can never generate a SEGFAULT. For example in an encapsulated class which I call RingArray of doubles that needs unsigned behaviour - in "self rotating loop" context:
// ...
// Housekeeping state variable
long entrycount; // A sequence number
int cycle; // Number of loops cycled
int size; // Active size of the array because size<modulus during cycle 0
int modulus; // Maximal size of the array
// Ring state variables
private int head; // The 'head' of the Ring
private int tail; // The ring iterator 'cursor'
// tail may get the current cursor position
// and head gets the old tail value
// there are other semantic variations possible
// The Array state variable
double [] darray; // The array of doubles
// somewhere in constructor
public RingArray(int modulus) {
super();
this.modulus = modulus;
tail = head = cycle = 0;
darray = new double[modulus];
// ...
}
// ...
double getElementAt(int offset){
return darray[(tail+modulus+offset%modulus)%modulus];
}
// remember, the above is treating steady-state where size==modulus
// ...
The above RingArray would never ever 'get' from a negative index, even if a malicious requestor tried to. Remember, there are also many legitimate requests for asking for prior (negative) index values.
NB: The outer %modulus de-references legitimate requests whereas the inner %modulus masks out blatant malice from negatives more negative than -modulus. If this were to ever appear in a Java +..+9 || 8+..+ spec, then the problem would genuinely become a 'programmer who cannot "self rotate" FAULT'.
I'm sure the so-called Java unsigned int 'deficiency' can be made up for with the above one-liner.
PS: Just to give context to above RingArray housekeeping, here's a candidate 'set' operation to match the above 'get' element operation:
void addElement(long entrycount,double value){ // to be called only by the keeper of entrycount
this.entrycount= entrycount;
cycle = (int)entrycount/modulus;
if(cycle==0){ // start-up is when the ring is being populated the first time around
size = (int)entrycount; // during start-up, size is less than modulus so use modulo size arithmetic
tail = (int)entrycount%size; // during start-up
}
else {
size = modulus;
head = tail;
tail = (int)entrycount%modulus; // after start-up
}
darray[head] = value; // always overwrite old tail
}
I can think of one unfortunate side-effect. In java embedded databases, the number of ids you can have with a 32bit id field is 2^31, not 2^32 (~2billion, not ~4billion).
The reason IMHO is because they are/were too lazy to implement/correct that mistake. Suggesting that C/C++ programmers does not understand unsigned, structure, union, bit flag... Is just preposterous.
Ether you were talking with a basic/bash/java programmer on the verge of beginning programming a la C, without any real knowledge this language or you are just talking out of your own mind. ;)
when you deal every day on format either from file or hardware you begin to question, what in the hell they were thinking.
A good example here would be trying to use an unsigned byte as a self rotating loop. For those of you who do not understand the last sentence, how on earth you call yourself a programmer.
DC
Success story sharing