The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
From Iconv#iconv to String#encode

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eric Hodel

Posts: 660
Nickname: drbrain
Registered: Mar, 2006

Eric Hodel is a long-time Rubyist and co-founder of Seattle.rb.
From Iconv#iconv to String#encode Posted: Dec 17, 2010 8:14 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eric Hodel.
Original Post: From Iconv#iconv to String#encode
Feed Title: Segment7
Feed URL: http://blog.segment7.net/articles.rss
Feed Description: Posts about and around Ruby, MetaRuby, ruby2c, ZenTest and work at The Robot Co-op.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eric Hodel
Latest Posts From Segment7

Advertisement

In Ruby 1.8 if you wished to transcode between two character sets you would need to use Iconv. James has an excellent explanation of Encoding Conversion With iconv on his blog.

Ruby 1.9.3 will warn when iconv is required:

$ ruby19 -v -riconv -e 0
ruby 1.9.3dev (2010-12-17 trunk 30231) [x86_64-darwin10.5.0]
iconv will be deprecated in the future, use String#encode instead.

runpaint has a great description of Encoding in Ruby 1.9 and specifically transcoding.

I don't wish to duplicate anything in James or runpaint's documentation, so I'll just run through a simple example of porting from Iconv#iconv to String#encode.

We'll start with a part of James' final example which converts from UTF-8 to Latin 1 (ISO-8859-1), transliterates characters and ignores unknown sequences:

require "iconv"

utf8_to_latin1 = Iconv.new("LATIN1//TRANSLIT//IGNORE", "UTF8")

on_and_on = "On and on… and on…"
utf8_to_latin1.iconv(on_and_on)  # => "On and on... and on..."

Ruby 1.9.2 doesn't support transliterate while Ruby 1.9.3 supports transliteration but doesn't have the pre-built tables of Iconv.

For Ruby 1.9.2 using String#encode instead of Iconv looks like this:

on_and_on = "On and on… and on…"

on_and_on.encode Encoding::ISO_8859_1

However, the ellipsis character doesn't have an analog in ISO-8859-1, so we'll get an encoding error:

U+2026 from UTF-8 to ISO-8859-1 (Encoding::UndefinedConversionError)

String#encode supports an options Hash for encoding which is described in runpaint's transcoding section. For 1.9.2 we tell the encoding to replace undefined characters with the replacement character:

on_and_on = "On and on… and on…"

result = on_and_on.encode Encoding::ISO_8859_1, :undef => :replace

p result # => "On and on? and on?"

This isn't as nice as Iconv's handling with transliterate, but it's the best 1.9.2 can do without using the more verbose Encoding::Converter to recover from undefined conversions, and it still lacks the tables of Iconv.

In 1.9.3 String#encode's options Hash will support a new value, :fallback which can be either a Hash or Proc that maps characters to replacement Strings. The replacement string that :fallback returns can be in any encoding and String#encode will attempt to transcode the result to the destination encoding. The same example with a current 1.9.3dev build:

on_and_on = "On and on… and on…"

result = on_and_on.encode Encoding::ISO_8859_1, :fallback => { … => '...' }

p result # => "On and on... and on..."

Finally, the above examples all assume the input string is valid for its encoding. You can use String#valid_encoding? to check for invalid byte sequences. If there may be invalid byte sequences in your input string you can use :invalid => :replace in the encode options string to map these sequences to the replacement character.

Read: From Iconv#iconv to String#encode

Topic: Rubyists and Companies I am thankful for in 2010 Previous Topic   Next Topic Topic: Using Flash In .NET - channels of communication

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use