The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
String Encoding Quick-Start

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eric Hodel

Posts: 660
Nickname: drbrain
Registered: Mar, 2006

Eric Hodel is a long-time Rubyist and co-founder of Seattle.rb.
String Encoding Quick-Start Posted: Feb 2, 2011 3:56 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eric Hodel.
Original Post: String Encoding Quick-Start
Feed Title: Segment7
Feed URL: http://blog.segment7.net/articles.rss
Feed Description: Posts about and around Ruby, MetaRuby, ruby2c, ZenTest and work at The Robot Co-op.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eric Hodel
Latest Posts From Segment7

Advertisement

So you're using Ruby 1.9 and don't want to read JEG's Understanding M17N Series or runpaint's Encoding documents. I understand, they're very long and very detailed and you can go back and read them when you need to learn the details. How about just the important stuff?

Files have encodings and strings in those files have matching encodings. You set the encoding with a magic comment as the first line (second with shebang) # coding: UTF-8. If you don't set the encoding your strings are assumed to be in US-ASCII encoding instead.

Regular expressions are in US-ASCII encoding by default. You can use /u to make it UTF-8, /e to make it EUC-JP, /s to make it Windows-31J. You can also use #force_encoding. US-ASCII regular expressions will match strings that are ASCII-compatible. See ri Regexp for more details.

String#gsub may not preserve the input encoding. If the match happens at the beginning of the string the output encoding may not match the input encoding. You can work around this by forcing the encoding on the replacement string before replacement or using the block form of gsub. (This behavior may be a bug.)

IO objects have two encodings, the external_encoding which is how it is stored outside ruby (on disk for files and the stream or packet encoding for sockets) and the internal_encoding which will cause ruby to transcode the content if necessary. There's no provision for guessing the encoding of a document so you'll need to know ahead of time.

Strings can have an encoding but that encoding may not be valid, use String#valid_encoding? to verify. To transcode use String#encode. I have an example of using String#encode in my From Iconv#iconv to String#encode article and there are more in JEG and runpaint's articles linked above.

Read: String Encoding Quick-Start

Topic: String Encoding Quick-Start Previous Topic   Next Topic Topic: The Best Of Twitter 2010

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use