PostgreSQL's JSON vs Rails' Serialize

October 21, 2014

A recent article on persisting hashes in Rails using PostgreSQL made me curious about performance.

It made me wonder if there is a performance advantage to having PostgreSQL serialize the JSON instead of Rails itself. It also made me wonder how it compared to the default YAML serialization. So I decided to run some tests.

I created a bare bones Rails application version 4.1.6. PostgreSQL version 9.3.4. Running on my mid-2010 Macbook Pro with 8GB of RAM and my best attempts to keep the machine relatively quiet.

The models:

class JsonThing < ActiveRecord::Base
end

class SerializejsonThing < ActiveRecord::Base
  serialize :data, JSON
end

class SerializeyamlThing < ActiveRecord::Base
  serialize :data
end

The database schema:

class CreateThings < ActiveRecord::Migration
  def change
    create_table :json_things do |t|
      t.json :data, default: {}, null: false
    end
    create_table :serializejson_things do |t|
      t.text :data
    end
    create_table :serializeyaml_things do |t|
      t.text :data
    end
  end
end

I then wrote a little bit of code to create a bunch of records and then read from them. The entire code is at the end of this article, but basically it creates 50,000 records of each type in batches of 100. The cycling was done to try and account for external impact on the process my Mac might cause. I also thought it might help with any Ruby GC issues.

This run stores a simple string. Note that in all examples I merge in an incrementing integer to add a little bit of randomness.

$ bin/rake db:reset && bin/rails r 'Fu.run(limit:50_000,step:100, data:{s: "ipsum lorem"})'

Creating 50000 records cycling every 100:
                      user     system      total        real
json          :  56.010000   8.400000  64.410000 ( 92.876464)
serialize_yaml:  77.390000   9.220000  86.610000 (117.776747)
serialize_json:  52.870000   8.240000  61.110000 ( 89.297786)

Reading 50000 records cycling every 100:
                      user     system      total        real
json          :   2.080000   0.060000   2.140000 (  7.577118)
serialize_yaml:  10.120000   0.380000  10.500000 ( 15.818386)
serialize_json:   2.840000   0.100000   2.940000 (  7.896888)

This run serializes the entire ENV hash. This is probably a lot more data than you should be serializing, but it’s still interesting.

$ bin/rake db:reset && bin/rails r 'Fu.run(limit:50_000,step:100, data:ENV)'

Creating 50000 records cycling every 100:
                      user     system      total        real
json          : 148.680000   9.780000 158.460000 (199.714663)
serialize_yaml: 314.510000  10.260000 324.770000 (368.929057)
serialize_json: 106.360000   8.780000 115.140000 (153.957679)

Reading 50000 records cycling every 100:
                      user     system      total        real
json          :   7.980000   0.380000   8.360000 ( 25.861588)
serialize_yaml:  79.650000   0.920000  80.570000 (100.403798)
serialize_json:   9.080000   0.370000   9.450000 ( 27.458656)

Some interesting results.

  • YAML is much slower than either JSON solution.

  • Serializing JSON in Rails is a tiny bit faster creating, and a tiny bit slower reading, but not by much.

  • Letting PostreSQL do the work is almost as fast as serializing in Rails, but not quite. However, if you need to query JSON data directly in PostgreSQL this could help sway the decision. Also, if you’re on an older version of Rails, this would alleviate the annoyance that changed? always returns true if the model has any serialized attributes. This has been fixed in Rails 4.2.

Here is the entirety of the Fu class.

class Fu
  class << self

    def run(limit:1_000, step:100, data:{})
      dev_null = Logger.new("/dev/null")
      Rails.logger = dev_null
      ActiveRecord::Base.logger = dev_null
      ActiveRecord::Base.logger.level = 2
      ActiveRecord::Base.connection.disable_query_cache!

      json_bm = Benchmark.measure { true }
      serialize_yaml_bm = Benchmark.measure { true }
      serialize_json_bm = Benchmark.measure { true }

      0.step(limit, step) do
        json_bm += Benchmark.measure { create_things(JsonThing, step, data) }
        serialize_yaml_bm += Benchmark.measure { create_things(SerializeyamlThing, step, data) }
        serialize_json_bm += Benchmark.measure { create_things(SerializejsonThing, step, data) }
      end

      puts 
      puts "Creating #{limit} records cycling every #{step}:"
      puts '                ' + Benchmark::CAPTION
      puts 'json          : ' + json_bm.to_s
      puts 'serialize_yaml: ' + serialize_yaml_bm.to_s
      puts 'serialize_json: ' + serialize_json_bm.to_s
      puts
      
      ################################################################################

      json_bm = Benchmark.measure { true }
      serialize_yaml_bm = Benchmark.measure { true }
      serialize_json_bm = Benchmark.measure { true }

      0.step(limit, step) do |i|
        json_bm += Benchmark.measure { read_things(JsonThing, i, step) }
        serialize_yaml_bm += Benchmark.measure { read_things(SerializeyamlThing, i, step) }
        serialize_json_bm += Benchmark.measure { read_things(SerializejsonThing, i, step) }
      end

      puts 
      puts "Reading #{limit} records cycling every #{step}:"
      puts '                ' + Benchmark::CAPTION
      puts 'json          : ' + json_bm.to_s
      puts 'serialize_yaml: ' + serialize_yaml_bm.to_s
      puts 'serialize_json: ' + serialize_json_bm.to_s
      puts
      

    end

    def read_things(klass, offset = 0, limit = 100)
      klass.order(:id).limit(limit).offset(offset).map(&:data)
      nil
    end

    def create_things(klass, n = 1000, data = {})
      n.times do |i|
        klass.create(data: {i: i}.merge(data))
      end
      nil
    end
    
  end
end