String Comparison in a Little Detail

Why we avoid '==' operator to compare two strings

We all know that we should use String class's equals method to compare two strings. When we use '==' operator to compare two equivalent String objects will give you 'false'. If those are String literals then 'true'. The concern is you don't know whether the String reference is pointing to literal or String object. That's why you are always encouraged to use equals method to do string comparison.

Let's check few examples with String literal those we are focusing on this article:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
/**
 * Created by eananthaneshan on 6/8/16.
 */
public class Test {
    public static void main(String[] args) {
        String a = "a";
        String b = "b";
        String ab = "ab";
        System.out.println(ab==ab);
        System.out.println(ab==(a+b));
        System.out.println(ab== ("a" + "b"));
    }
}

now you see all strings we have in our program are string literals then why the second sysout gives you false and third again gives you true?

Let's see decompiled code of above Test.java file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

public class Test {
    public Test() {
    }

    public static void main(String[] args) {
        String a = "a";
        String b = "b";
        String ab = "ab";
        System.out.println(ab == ab);
        System.out.println(ab == a + b);
        System.out.println(ab == "ab");
    }
}

now we understand why third sysout gives us true; The compiler concat above two string constants. So the String literal "ab" is created at compile time so that is equivalent to the reference 'ab'. Still we have no clue why second sysout gives us false.

Let's look at the byte code of Test.class file.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
Es-MacBook-Pro:TestString eananthaneshan$ javap -c -verbose Test
Classfile /Users/eananthaneshan/WSO2/samples/TestString/out/production/TestString/Test.class
  Last modified Aug 6, 2016; size 937 bytes
  MD5 checksum 6571cdd8b9c826e9e6baacb1660db7bf
  Compiled from "Test.java"
public class Test
  SourceFile: "Test.java"
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #12.#34        //  java/lang/Object."<init>":()V
   #2 = String             #24            //  a
   #3 = String             #26            //  b
   #4 = String             #27            //  ab
   #5 = Fieldref           #35.#36        //  java/lang/System.out:Ljava/io/PrintStream;
   #6 = Methodref          #37.#38        //  java/io/PrintStream.println:(Z)V
   #7 = Class              #39            //  java/lang/StringBuilder
   #8 = Methodref          #7.#34         //  java/lang/StringBuilder."<init>":()V
   #9 = Methodref          #7.#40         //  java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  #10 = Methodref          #7.#41         //  java/lang/StringBuilder.toString:()Ljava/lang/String;
  #11 = Class              #42            //  Test
  #12 = Class              #43            //  java/lang/Object
  #13 = Utf8               <init>
  #14 = Utf8               ()V
  #15 = Utf8               Code
  #16 = Utf8               LineNumberTable
  #17 = Utf8               LocalVariableTable
  #18 = Utf8               this
  #19 = Utf8               LTest;
  #20 = Utf8               main
  #21 = Utf8               ([Ljava/lang/String;)V
  #22 = Utf8               args
  #23 = Utf8               [Ljava/lang/String;
  #24 = Utf8               a
  #25 = Utf8               Ljava/lang/String;
  #26 = Utf8               b
  #27 = Utf8               ab
  #28 = Utf8               StackMapTable
  #29 = Class              #23            //  "[Ljava/lang/String;"
  #30 = Class              #44            //  java/lang/String
  #31 = Class              #45            //  java/io/PrintStream
  #32 = Utf8               SourceFile
  #33 = Utf8               Test.java
  #34 = NameAndType        #13:#14        //  "<init>":()V
  #35 = Class              #46            //  java/lang/System
  #36 = NameAndType        #47:#48        //  out:Ljava/io/PrintStream;
  #37 = Class              #45            //  java/io/PrintStream
  #38 = NameAndType        #49:#50        //  println:(Z)V
  #39 = Utf8               java/lang/StringBuilder
  #40 = NameAndType        #51:#52        //  append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  #41 = NameAndType        #53:#54        //  toString:()Ljava/lang/String;
  #42 = Utf8               Test
  #43 = Utf8               java/lang/Object
  #44 = Utf8               java/lang/String
  #45 = Utf8               java/io/PrintStream
  #46 = Utf8               java/lang/System
  #47 = Utf8               out
  #48 = Utf8               Ljava/io/PrintStream;
  #49 = Utf8               println
  #50 = Utf8               (Z)V
  #51 = Utf8               append
  #52 = Utf8               (Ljava/lang/String;)Ljava/lang/StringBuilder;
  #53 = Utf8               toString
  #54 = Utf8               ()Ljava/lang/String;
{
  public Test();
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0       
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return        
      LineNumberTable:
        line 4: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
               0       5     0  this   LTest;

  public static void main(java.lang.String[]);
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=4, locals=4, args_size=1
         0: ldc           #2                  // String a
         2: astore_1      
         3: ldc           #3                  // String b
         5: astore_2      
         6: ldc           #4                  // String ab
         8: astore_3      
         9: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
        12: aload_3       
        13: aload_3       
        14: if_acmpne     21
        17: iconst_1      
        18: goto          22
        21: iconst_0      
        22: invokevirtual #6                  // Method java/io/PrintStream.println:(Z)V
        25: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
        28: aload_3       
        29: new           #7                  // class java/lang/StringBuilder
        32: dup           
        33: invokespecial #8                  // Method java/lang/StringBuilder."<init>":()V
        36: aload_1       
        37: invokevirtual #9                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        40: aload_2       
        41: invokevirtual #9                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        44: invokevirtual #10                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
        47: if_acmpne     54
        50: iconst_1      
        51: goto          55
        54: iconst_0      
        55: invokevirtual #6                  // Method java/io/PrintStream.println:(Z)V
        58: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
        61: aload_3       
        62: ldc           #4                  // String ab
        64: if_acmpne     71
        67: iconst_1      
        68: goto          72
        71: iconst_0      
        72: invokevirtual #6                  // Method java/io/PrintStream.println:(Z)V
        75: return        
      LineNumberTable:
        line 6: 0
        line 7: 3
        line 8: 6
        line 9: 9
        line 10: 25
        line 11: 58
        line 12: 75
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
               0      76     0  args   [Ljava/lang/String;
               3      73     1     a   Ljava/lang/String;
               6      70     2     b   Ljava/lang/String;
               9      67     3    ab   Ljava/lang/String;
      StackMapTable: number_of_entries = 6
           frame_type = 255 /* full_frame */
          offset_delta = 21
          locals = [ class "[Ljava/lang/String;", class java/lang/String, class java/lang/String, class java/lang/String ]
          stack = [ class java/io/PrintStream ]
           frame_type = 255 /* full_frame */
          offset_delta = 0
          locals = [ class "[Ljava/lang/String;", class java/lang/String, class java/lang/String, class java/lang/String ]
          stack = [ class java/io/PrintStream, int ]
           frame_type = 95 /* same_locals_1_stack_item */
          stack = [ class java/io/PrintStream ]
           frame_type = 255 /* full_frame */
          offset_delta = 0
          locals = [ class "[Ljava/lang/String;", class java/lang/String, class java/lang/String, class java/lang/String ]
          stack = [ class java/io/PrintStream, int ]
           frame_type = 79 /* same_locals_1_stack_item */
          stack = [ class java/io/PrintStream ]
           frame_type = 255 /* full_frame */
          offset_delta = 0
          locals = [ class "[Ljava/lang/String;", class java/lang/String, class java/lang/String, class java/lang/String ]
          stack = [ class java/io/PrintStream, int ]

}

Go to the main method and see the first line, JVM is loading the value from constant pool; ldc #2. Go to constant pool and see whats in #2, it's a String reference and that points to #24 let's go and check 24, it's a Utf8 value 'a'. What's Utf8 means? it's a stream of bytes representing a Utf8 encoded sequence of characters[1]. So, this is the way sting literals are loaded.

What's happening when System.out.println(ab == a + b) get executed. see the main method line 29 on the byte code. new StringBuilder gets loaded and then append() method called on that object, after toString method returns the String object's reference.

Let's see the StringBuilder.toString() method[2]

1
2
3
4
public String toString() {
        // Create a copy, don't share the array
        return new String(value, 0, count);
    }

The StringBuilder.toString() method creates a new String object. It is not a literal, Reference variable is pointing to that object's memory location.

We should also see what happens when we create a new String object.
First we look at java code to create new String.

1
2
3
4
5
6
7
8
/**
 * Created by eananthaneshan on 6/8/16.
 */
public class Test {
    public static void main(String[] args) {
        String a = new String("a");
    }
}

here is the compiled code of the above Java file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

public class Test {
    public Test() {
    }

    public static void main(String[] args) {
        new String("a");
    }
}

No much difference, we should see String's constructor too[3]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
        int size = original.count;
        char[] originalValue = original.value;
        char[] v;
        if (originalValue.length > size) {
            // The array representing the String is bigger than the new
            // String itself.  Perhaps this constructor is being called
            // in order to trim the baggage, so make a copy of the array.
            int off = original.offset;
            v = Arrays.copyOfRange(originalValue, off, off+size);
        } else {
            // The array representing the String is the same
            // size as the String, so no point in making a copy.
            v = originalValue;
        }
        this.offset = 0;
        this.count = size;
        this.value = v;
    }

Interestingly the constructor takes string as an argument and store its char array value. now it has connection with neither argument no literal.

Finally byte code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
Es-MacBook-Pro:TestString eananthaneshan$ javap -c -verbose Test                                                                                                                           
Classfile /Users/eananthaneshan/WSO2/samples/TestString/out/production/TestString/Test.class
  Last modified Aug 6, 2016; size 475 bytes
  MD5 checksum 0d23a6e244f295b3d7bce7241b7022d7
  Compiled from "Test.java"
public class Test
  SourceFile: "Test.java"
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #6.#23         //  java/lang/Object."<init>":()V
   #2 = Class              #24            //  java/lang/String
   #3 = String             #20            //  a
   #4 = Methodref          #2.#25         //  java/lang/String."<init>":(Ljava/lang/String;)V
   #5 = Class              #26            //  Test
   #6 = Class              #27            //  java/lang/Object
   #7 = Utf8               b
   #8 = Utf8               Ljava/lang/String;
   #9 = Utf8               <init>
  #10 = Utf8               ()V
  #11 = Utf8               Code
  #12 = Utf8               LineNumberTable
  #13 = Utf8               LocalVariableTable
  #14 = Utf8               this
  #15 = Utf8               LTest;
  #16 = Utf8               main
  #17 = Utf8               ([Ljava/lang/String;)V
  #18 = Utf8               args
  #19 = Utf8               [Ljava/lang/String;
  #20 = Utf8               a
  #21 = Utf8               SourceFile
  #22 = Utf8               Test.java
  #23 = NameAndType        #9:#10         //  "<init>":()V
  #24 = Utf8               java/lang/String
  #25 = NameAndType        #9:#28         //  "<init>":(Ljava/lang/String;)V
  #26 = Utf8               Test
  #27 = Utf8               java/lang/Object
  #28 = Utf8               (Ljava/lang/String;)V
{
  public Test();
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0       
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return        
      LineNumberTable:
        line 4: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
               0       5     0  this   LTest;

  public static void main(java.lang.String[]);
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=3, locals=2, args_size=1
         0: new           #2                  // class java/lang/String
         3: dup           
         4: ldc           #3                  // String a
         6: invokespecial #4                  // Method java/lang/String."<init>":(Ljava/lang/String;)V
         9: astore_1      
        10: return        
      LineNumberTable:
        line 7: 0
        line 8: 10
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
               0      11     0  args   [Ljava/lang/String;
              10       1     1     a   Ljava/lang/String;
}

look at the main method, first creating a new String object, then load 'a' from constant pool, then calling the constructor with loaded 'a' as a parameter, return is a newly created string object. Only this reference will be stored on variable.

How to take advantage of String literal

String literals are faster on comparison[4], compare to String objects. Then how we can use then. We shall take advantage of String.intern()[5]. intern method will give String object equivalent literal pool reference. So when comparing call intern method on String object then use '==' operator.

You may read these article to get related more information
[1]http://blog.jamesdbloom.com/JVMInternals.html
[4] http://cs-fundamentals.com/tech-interview/java/use-of-string-intern-method.php

Comments

Popular posts from this blog

The Digital Ghost & The Surviving Code: An RTIMULib Story

Programming a 'Profession' part II