This post originated from an RSS feed registered with Ruby Buzz
by Daniel Berger.
Original Post: ActiveRecord::Base.foreach
Feed Title: Testing 1,2,3...
Feed URL: http://djberg96.livejournal.com/data/rss
Feed Description: A blog on Ruby and other stuff.
Little did I know that ActiveRecord doesn't have a truly iterative way for dealing with results. You get the find method, with its various incantations of :all, :first, :limit, etc, but there's not way to iterate over every record in turn without slurping the result set into memory first.
There is the find_each method, but that still slurps 1000 records at a time into memory. Unfortunately, the composite_primary_keys library breaks that method. At least, the version we're forced to use, does. And we can't upgrade because the composite_primary_keys library is basically fubar at this point. But that's another story.
This became seriously problematic when we needed to perform a full data migration from Oracle to Postgres. Running find(:all) really bogged downs your machine when there are over 50,000 records in the data you're trying to migrate.
So, armed with my knowledge of, you know, stuff like Oracle, cursors, and blocks, I whipped up an ActiveRecord::Base.foreach method. Unlike combining .all with .each, this never puts more than 1 record at a time into memory.
Enjoy!
# Iterate over each record one at a time. Each record is yielded to the
# block in turn.
#
# You may pass a single, optional +condition+ which is passed as part of
# a 'where' clause.
#
# This is much more memory efficient when dealing with large numbers of
# records, such as a data migration.
#--
# ActiveRecord does not provide a truly iterative method. It does have
# find_each but that does batches of 1000, and it doesn't work in
# conjunction with the composite_primary_keys library.
#
def self.foreach(conditions = nil, &block)
conn = connection.raw_connection
sql = "select * from #{table_name}"
sql += " where #{conditions}" if conditions
begin
raw_cursor = conn.exec(sql)
raw_columns = raw_cursor.get_col_names
# A little extra work here is required to downcase the column names.
while rec = raw_cursor.fetch
result = {}
raw_columns.each do |name|
result[name.downcase] = rec.shift
end
yield new(result)
end
ensure
raw_cursor.close
end
end