When Two Identical Strings Are Different?

Think these two strings are the same?

1
2
"R. Padre Chagas 342"
"R. Padre Chagas 342"

Pretty much, right? So we open IRB:

1
2
1.9.3p0 :085 > "R. Padre Chagas 342" == "R. Padre Chagas 342"
  # => false

WTF!!? Took me a couple minutes of head scratching to figure out.. let’s look closer:

1
2
3
4
1.9.3p0 :086 > "R. Padre Chagas 342".bytes.to_a
  # => [82, 46, 32, 80, ... , 67, 104, 97, 103, 97, 115, 32, 51, 52, 50] 
1.9.3p0 :087 > "R. Padre Chagas 342".bytes.to_a
  # => [82, 46, 32, 80, ... , 67, 104, 97, 103, 97, 115, 194, 160, 51, 52, 50]

`

Haa. There it is.. an extra hidden byte!. Crazy huh? Even closer:

1
2
3
4
1.9.3p0 :088 > "R. Padre Chagas 342".byteslice(9..15)
 # => "Chagas " 
1.9.3p0 :089 > "R. Padre Chagas 342".byteslice(9..15)
 # => "Chagas\xC2"

This is what happens when you scrape data from the web… Do not believe everything you see :)