By coding through the ActiveRecord abstraction, we lose close ties to SQL. This is a major selling point for Rails, but it should not be why you are sold on Rails.
Avoiding vendor specific database features is silly, and doing so does not make you a better Rails developer. Database neutrality has never helped me because I have yet to swap database vendors. If I did need to swap, it would be due to a vendor specific feature. ActiveRecord is great for the majority of operations, but you should not live exclusively within its constraints. And so I begin the MySql Lovin' series...
Perfect Rails Code
Start with a simple data model for a blog. A post has text, and can have zero or more tags. Here is what the migration looks like:
1 create_table :posts do |t|
2 t.text :text
3 t.timestamps
4 end
5
6 create_table :post_tags do |t|
7 t.integer :post_id, :tag_id, :null => false
8 end
9
10 create_table :tags do |t|
11 t.string :name
12 t.string :post_tags_count
13 t.timestamps
14 end
15
16 add_index :tags, :name, :unique => true
The tags table also keeps track of the total posts per tag.
The models for the blog might look like this:
1 class Post < ActiveRecord::Base
2 has_many :post_tags
3 has_many :tags, :through => :post_tags
4
5 def tag_list=(tag_string)
6 self.tag = tag_string.split(',').map { |tag_name| Tag.find_or_create_by_name(tag_name.squish!) }
7 end
8
9 def tag_list
10 self.tags.map(&:name).join(', ')
11 end
12 end
13
14 class PostTag < ActiveRecord::Base
15 belongs_to :tag, :counter_cache => true
16 belongs_to :post
17 end
18
19 class Tag < ActiveRecord::Base
20 end
Notice that the Post model has a virtual tag_list attribute. This automatically splits a comma delimited list of tags, creates new tags if necessary, and generates the associations between posts and tags.
Too Many Queries
Fill out the form:

Now check the query log to see what queries hit the database:
1 SELECT * FROM tags WHERE (tags.name = 'latin') LIMIT 1
2 INSERT INTO tags (name, post_tags_count, created_at, updated_at) VALUES('latin', NULL, '2008-07-04 22:41:28', '2008-07-04 22:41:28')
3 SELECT * FROM tags WHERE (tags.name = ' illegible') LIMIT 1
4 INSERT INTO tags (name, post_tags_count, created_at, updated_at) VALUES(' illegible', NULL, '2008-07-04 22:41:28', '2008-07-04 22:41:28')
5 SELECT * FROM tags WHERE (tags.name = ' rails') LIMIT 1
6 INSERT INTO tags (name, post_tags_count, created_at, updated_at) VALUES(' rails', NULL, '2008-07-04 22:41:28', '2008-07-04 22:41:28')
7 INSERT INTO posts (created_at, updated_at, text) VALUES('2008-07-04 22:41:28', '2008-07-04 22:41:28', 'Lorem ipsum dolor sit amet...')
8 INSERT INTO post_tags (tag_id, post_id) VALUES(1, 1)
9 SELECT * FROM tags WHERE (tags.id = 1)
10 UPDATE tags SET post_tags_count = post_tags_count + 1 WHERE (id = 1)
11 INSERT INTO post_tags (tag_id, post_id) VALUES(2, 1)
12 SELECT * FROM tags WHERE (tags.id = 2)
13 UPDATE tags SET post_tags_count = post_tags_count + 1 WHERE (id = 2)
14 INSERT INTO post_tags (tag_id, post_id) VALUES(3, 1)
15 SELECT * FROM tags WHERE (tags.id = 3)
16 UPDATE tags SET post_tags_count = post_tags_count + 1 WHERE (id = 3)
16 queries. Gross, let's try to do better.
On Duplicate Key Update
MySql's INSERT ... ON DUPLICATE KEY UPDATE is a perfect solution to reduce the number of queries. It can be used to:
- Insert non-existing tags
- Update the counter cache for posts per tag.
- Remove the need to instantiate a Tag object for each tag.
Let's give it a try:
1 class Post < ActiveRecord::Base
2 has_many :post_tags
3 has_many :tags, :through => :post_tags
4
5 def tag_list=(tag_string)
6 self.tag_ids = tag_string.split(',').map { |tag_name| create_or_update_tag(tag_name.squish!) }
7 end
8
9 def tag_list
10 self.tags.map(&:name).join(', ')
11 end
12
13 private
14 def create_or_update_tag(name)
15 statement = %{
16 INSERT INTO tags(name, post_tags_count, created_at, updated_at)
17 VALUES (#{connection.quote(name)}, utc_timestamp, utc_timestamp, 1)
18 ON DUPLICATE KEY UPDATE
19 updated_at = utc_timestamp,
20 post_tags_count = post_tags_count + 1,
21 id=LAST_INSERT_ID(id)
22 }.squish!
23
24 connection.insert_sql(statement)
25 end
26 end
27
28 class PostTag < ActiveRecord::Base
29 belongs_to :tag
30 belongs_to :post
31 end
32
33 class Tag < ActiveRecord::Base
34 end
That SQL code sure looks ugly. It probably belongs in the Tag class, and can be generated automatically from the model's attributes. What if I made a plugin that added create_or_update to ActiveRecord::Base?
More importantly, the number of queries have decreased:
1 INSERT INTO tags(name, post_tags_count, created_at, updated_at) VALUES ('latin', utc_timestamp, utc_timestamp, 1) ON DUPLICATE KEY UPDATE updated_at = utc_timestamp, post_tags_count = post_tags_count + 1, id=LAST_INSERT_ID(id)
2 INSERT INTO tags(name, post_tags_count, created_at, updated_at) VALUES ('illegible', utc_timestamp, utc_timestamp, 1) ON DUPLICATE KEY UPDATE updated_at = utc_timestamp, post_tags_count = post_tags_count + 1, id=LAST_INSERT_ID(id)
3 INSERT INTO tags(name, post_tags_count, created_at, updated_at) VALUES ('rails', utc_timestamp, utc_timestamp, 1) ON DUPLICATE KEY UPDATE updated_at = utc_timestamp, post_tags_count = post_tags_count + 1, id=LAST_INSERT_ID(id)
4 SELECT * FROM `tags` WHERE (`tags`.`id` IN (1,2,3)
5 INSERT INTO `posts` (`created_at`, `updated_at`, `text`) VALUES('2008-07-04 23:17:20', '2008-07-04 23:17:20', 'Lorem ipsum dolor sit amet...')
6 INSERT INTO `post_tags` (`tag_id`, `post_id`) VALUES(1, 1)
7 INSERT INTO `post_tags` (`tag_id`, `post_id`) VALUES(2, 1)
8 INSERT INTO `post_tags` (`tag_id`, `post_id`) VALUES(3, 1)
This is a good start. The number of queries has decreased by a 2x factor, and Tag objects are no longer being created.
By alas, what is that query on line 4 for? It looks like Rails runs this query after assigning to tag_ids. This can be avoided by directly creating the entries for post_tags, but I can accept this annoyance for today.
3 comments:
If you wanted to do something similar without writing all the code yourself, you could try using the import method of the ar-extensions gem. It might not do all you did here, but it has its charms.
@pete - Thanks. I ended up finding ar-extensions from a post on igvita.com (Link). It has not been updated for about one year, which is a bit concerning given that it was written for rails 1.2.
@matthew higgins:
The author of ar-extensions warns that some stuff is broken with rails 2.x, but strictly using import is fine. I've been using it in a few production apps with rails 2.0.2.
Post a Comment