Race condition Scenario 1

Recently we hit an issue where the client program has stuck. The client has been written in such way it sends requests to server in synchronous manner (sends the next request only after receiving the acknowledgement for current request).  This issue happens intermittently.  This symptom tells that there is some race condition.

On the server side, there are two threads

thread 1

  1. submits a request to one of its internal submission queue.
  2. increments the io_submitted value.

thread 2

  1. picks up the item from submission queue and does an asynchronous IO using libaio.
    1. libaio thread calls the call-back function passed to it once the IO is done.
    2. As part of call-back function aio thread enqueues the request into completion queue
  2. picks up the item from completion queue and increments io_completed value.
  3. the does a check io_submitted == io_completed to do next set of task.
  4.  after completing the next set of tasks, sends a response to the client.

The problem is that the client is not receiving the acknowledgment.  Why?

There is a race:   Before thread 1 increments the io_submitted value,  thread2 increments io_completed and does a comparison check. This can be possible if thread1 is scheduled out before we increment io_submitted value.

Couple of solutions:

  1. Move increment before submitting request o internal submission queue.
  2. Use spin lock to protect the io_submitted

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s