This post originated from an RSS feed registered with Ruby Buzz
by rwdaigle.
Original Post: What's New in Edge Rails: Batched Find
Feed Title: Ryan's Scraps
Feed URL: http://feeds.feedburner.com/RyansScraps
Feed Description: Ryan Daigle's various technically inclined rants along w/ the "What's new in Edge Rails" series.
ActiveRecord got a little batch-help today with the addition of ActiveRecord::Base#each and ActiveRecord::Base#find_in_batches. The former lets you iterate over all the records in cursor-like fashion (only retrieving a set number of records at a time to avoid cramming too much into memory):
123
Article.each { |a| ... } # => iterate over all articles, in chunks of 1000 (the default)Article.each(:conditions => { :published => true }, :batch_size => 100 ) { |a| ... }# iterate over published articles in chunks of 100
You’re not exposed to any of the chunking logic – all you need to do is iterate over each record and just trust that they’re only being retrieved in manageable groups.
find_in_batches performs a similar function, except that it hands back each chunk array directly instead of just a stream of individual records:
1234
Article.find_in_batches { |articles| articles.each { |a| ... } }# => articles is array of size 1000Article.find_in_batches(batch_size => 100 ) { |articles| articles.each { |a| ... } }# iterate over all articles in chunks of 100
find_in_batches is also kind enough to observe good scoping practices:
123456
classArticle < ActiveRecord::Base named_scope :published, :conditions => { :published => true }endArticle.published.find_in_batches(:batch_size => 100 ) { |articles| ... }# iterate over published articles in chunks of 100
One quick caveat exists: you can’t specify :order or :limit in the options to each or find_in_batches as those values are used in the internal looping logic.
Batched finds are best used when you have a potentially large dataset and need to iterate through all rows. If done using a normal find the full result-set will be loaded into memory and could cause problems. With batched finds you can be sure that only 1000 * (each result-object size) will be loaded into memory.