Exploring Eventual Consistency of the EC2 API

Exploring Eventual Consistency of the EC2 API

I’ve recently been assisting on a project that relies heavily on the boto python interface to manage AWS. In particular the project creates a security group and then subsequently queries boto to get and set attributes. Due to the eventually consistent nature of Amazon’s EC2 API the results were frustrating – for instance the program would to create a group and assign rules, but the authorize_security_group API call would occasionally fail because the group didn’t exist.

I did notice during testing that particular calls were more or less likely to fail and wanted to explore this further. In particular, I thought I was seeing that:

  1. API calls using VPC resources failed more frequently
  2. Security Group “Gets” using filters failed more frequently than those using using names or security group ids.
  3. Security Group “Gets” using names failed more frequently than those using security group ids.

I decided that I would run some tests to determine the consistency of the Amazon EC2 API and to test the success of the different commands.

The Test:

Note that the test is available in my “Snippets” github repository by following this link.

  1. Create an EC2 security group.
  2. Get the security group using either a filter, groupname or security group id. I used the “get_all_security_groups” boto method for this.
  3. Delete the EC2 security group.

Test Results:

A screenshot of the ec2 api test results for a run creating, getting (using a filter) and deleting 100 security groups in EC2-VPC.

ec2_api_eventual_consistency

Group in EC2-Classic, using name based filter for gets:

note that you should *never* do this – if groups with the same name exists in both EC2-Classic and EC2-VPC this will return *both* groups.

group_filter = {‘group-name’: group_name}
groups = conn.get_all_security_groups(filters=group_filter)

Results:

  • errors_create: 0
  • errors_get: 0
  • errors_count: 0
  • errors_delete: 0

Group in EC2-VPC, using name based filters for gets:

group_filter = {‘group-name’: group_name, ‘vpc-id’: vpc_id}
groups = conn.get_all_security_groups(filters=group_filter)

Results:

  • errors_create: 0
  • errors_get: 0
  • errors_count: 32
  • errors_delete: 38

Group in EC2-Classic, using group_names for gets:

groups = conn.get_all_security_groups(groupnames=[group_name])

Results:

  • errors_create: 0
  • errors_get: 0
  • errors_count: 0
  • errors_delete: 0

# note that errors in “gets” and “deletes” did occur in occasional runs of this test

Group in EC2-VPC, using group_names for gets:

groups = conn.get_all_security_groups(groupnames=[group_name])
Invalid value ‘test-1’ for groupName. You may not reference Amazon VPC security groups by name. Please use the corresponding id for this operation.

Group in EC2-Classic, using group_ids for gets:

groups = conn.get_all_security_groups(group_ids=[created.id])

Results:

  • errors_create: 0
  • errors_get: 0
  • errors_count: 0
  • errors_delete: 0

Group in EC2-VPC, using group_ids for gets:

groups = conn.get_all_security_groups(group_ids=[created.id])

Results:

  • errors_create: 0
  • errors_get: 42
  • errors_count: 42
  • errors_delete: 31

Conclusion:

More to come in a future post, but I’m going to be testing the following:

  1. caching result data to reduce the number of EC2 API calls. For example, I am experimenting with a cache that stores the the returned security group object and can be re-used for subsequent queries.
  2. eliminating the need for subsequent calls – for instance, create a resource and set a resource’s attributes in a single call, if possible.
  3. a back-off algorithm, such as the backoff algorithm suggested in the AWS General Reference: Error Retries and Exponential Backoff document

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s