Issue
new BitSet(Integer.MAX_VALUE).size()
reports a negative value:
import java.util.BitSet;
public class NegativeBitSetSize {
public static void main(String[] args) {
BitSet a;
a = new BitSet(Integer.MAX_VALUE);
System.out.println(a.size()); // -2147483648
a = new BitSet(Integer.MAX_VALUE - 50);
System.out.println(a.size()); // -2147483648
a = new BitSet(Integer.MAX_VALUE - 62);
System.out.println(a.size()); // -2147483648
a = new BitSet(Integer.MAX_VALUE - 63);
System.out.println(a.size()); // 2147483584
}
}
On the test system:
$ java -version
openjdk version "11.0.14" 2022-01-18
OpenJDK Runtime Environment (build 11.0.14+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.14+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
I couldn't find a bug report for this. Is this known or documented?
Solution
I doubt this would be documented. It certainly won't be 'fixed', as there is no sensible fix available that doesn't break backwards compatibility, and it is nowhere near relevant enough to take such drastic steps.
Digging under the hood - why is this happening?
Whilst the API docs make no such guarantee, the effect of size()
is that it simply returns the nBits
value you passed when you constructed the BitSet
instance... but rounded up to the next value that is evenly divisible by 64:
sysout(new BitSet(1).size()); // 64
sysout(new BitSet(63).size()); // 64
sysout(new BitSet(64).size()); // 64
sysout(new BitSet(65).size()); // 128
sysout(new BitSet(100).size()); // 128
sysout(new BitSet(128).size()); // 128
sysout(new BitSet(129).size()); // 192
This is logical; the implementation uses an array of long
values to store these bits (as that's (by a factor of 8!) more efficient than using e.g. a boolean[]
, as each boolean still takes up a byte in an array, and an entire long's worth of bits as lone variable).
The spec doesn't guarantee this, but it explains why this is happening.
It then also explains why you are witnessing what you are: Integer.MAX_VALUE
is 2147483647. Round that up to the nearest multiple of 64 and you get... 2147483648. Which overflows int
- and Integer.MAX_VALUE + 1
/ (int) 2147483648L
- are both the same value: -2147483648. That is the one value that exists in signed int
space as a negative number with no matching positive number (that makes sense too: Some bit sequence needs to represent 0 which is neither positive or negative. By convention / by the rules of 2s complement, which is how java represents in bit form all numbers, the 0 is in the 'positive' space (given that it's all 0 bits). It thus 'leaches' a number from there, and that number is 2147483648.
Let's fix it!
One easy fix is to have the size()
method return a long
instead, which can trivially represent 2147483648, which is the correct answer. Unfortunately, this is not backwards compatible. Hence, extremely unlikely to succeed if one would ask for that change.
Another fix is to create a second method with some throw-in-the-towel name such as accurateSize()
or whatnot, so that size()
remains unmolested and thus backwards compatibility is preserved, which does return long
. But this is dirtying up the API forever, for a detail that is irrelevant for all cases except the largest 63 numbers you can ask for. (Integer.MAX_VALUE-62 through Integer.MAX_VALUE are the only values you can pass for nBits which result in size()
returning a negative value. The negative value returned will always be Integer.MIN_VALUE
. I doubt they'd do that.
A third fix is to lie and return Integer.MAX_VALUE instead, which isn't quite the right value (as 1 bit more is in fact 'available' in the bit space). Given that you can't actually 'set' that bit value, as you can't pass 2147483648 to the constructor (as you must pass an int
, that number is not passable as an int, if you try you end up with -2147483648, which is negative and causes the constructor to throw, hence not giving you an instance: Without hackery such as using reflection to set private fields, which APIs do not need to adress, you can't make a BitSet that can actually store the value of the 2147483648th bit.
This then gets us to what the point of size()
is. Is it for telling you the amount of bytes that the BitSet object occupies? If that's the point, it's never been a great way to go about it: The JVM doesn't guarantee that a long[]
's memory size is arrSize*8 bytes (though all JVM impls have that, + some low overhead for the array's header structure).
Instead it is perhaps simply to let you know what you can do with it. Even if you call, say, new BitSet(5)
, you can still set the 6th bit (because why not - it doesn't "cost" anything, I guess that was the intent). You can set all bits from 0 up to the .size()
minus 1.
And this gets us to the real answer!
size() is not actually broken. The number returned is entirely correct: That is, in fact, the size. It's merely that when you print it, it 'prints wrong' - because size()
's return value should be interpreted as unsigned. The javadoc of size()
explicitly calls out its only point, which is to take that number, and subtract 1: This then tells you the maximum element you can set.
And this works just fine:
BitSet x = new BitSet(Integer.MAX_VALUE);
int maxIndex = x.size() - 1;
System.out.println(maxIndex);
x.set(maxIndex);
The above code works fine. That maxIndex value is 2147483647 (Which is Integer.MAX_VALUE) as expected.
Hence, there's really nothing to do here: The API is fine as is and does what it suggests you use it for accurately. Any API you care to come up with that's 'better' would be backwards incompatible; changing BitSet is not a good idea, adding more methods, java.util.Vector
style uglies up the API which is definitely a case of the cure being worse than the disease.
That just leaves adding notes to the docs. If you delve into this level of exotics in docs, you end up with huge documentation that is, again, a cure worse than the disease. The sustainable solution would perhaps be for javadoc to gain a fundamental ability to write esoteric footnotes, which e.g. the javadoc
tool can turn into HTML by way of a 'folding' popdown interface element that is folded up by default (i.e. the exotic footnotes are not visible), but can be expanded if you really want to read the details.
Javadoc doesn't have this.
CONCLUSION: One can easily argue the API isn't broken at all; nothing in size()
explicitly says that the returned value should be interpreted as a signed int; the only explicit promise is that you can subtract 1 from the result and use that as index, which works fine. At best, you could file a bug report to get the docs updated, except that's not a good idea because it's not (easily) possible to add such esoterics to the documentation. If you do want to go down that path, there's a lot more of this sort of thing in the JDK libraries that aren't documented either.
Answered By - rzwitserloot
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.