Implementing multithreaded epoll

Posted on December 13, 2015
Tags: epoll
by Sanchayan Maity

Finally I have some time to relax a bit and write about a few things now that my fall semester course ended. I had taken a Distributed Systems class where we were taught the basics of distributed systems, had some assignments, with the final assignment being evaluation of some of the distributed key-value stores. The most interesting thing I learned on my own was how to use epoll in a multithreaded design. Was actually trying to use the libevent library during my second assignment, however, due to my lack of understanding I could not get it to work the way I wanted. During my third assignment I tried using epoll directly, but, I wanted to do a multithreaded design. Searching on the internet did not get me any examples, only notes on how one might implement it. So I tried and finally got it working. This will probably be the first write up on the net to be talking about this. However I am no expert in such things, so if anyone has any suggestions or improvements to what I did, please do share in the comments or mail me directly.

One can clone the concerned repository with

The first requirement is a workqueue implementation. During my work with libevent and looking for usable examples, I came across one which was being used with libevent by the code’s author Ron Cemer [1]. The second requirement was to have a good usable epoll example I could use which I found here [2].

With the above I had a workqueue implementation and a usable epoll example, on which I could base my work upon. So let us jump to the thread which uses epoll and acts as the main event loop queueing up work for the workqueues to process.

The listen descriptor is created when the main loop starts. The listen descriptor is added to the list of descriptors epoll should wait on. After that we start our infinite loop, in which we block with epoll waiting for events on this listen descriptor. Once any event occurs on the listen descriptor, which in this case are the incoming connection requests, we iterate over the event list. There are three possibilites from here on. First is of course some error occured, so we check for errors, we also check if connections were closed with the EPOLLHUP and if it is not the event we registered for.

Second is an incoming connection request. In this case we accept the incoming connection, and then add the descriptor returned by accept to the descriptor list of events epoll should let us know about. This descriptor would be the one on which incoming and outgoing messages will be processed and send respectively.

Third case is an incoming request on one of the connections that was accepted in second step. In this case, we dynamically allocate a job and add it to the workqueue along with the respective data required. However before adding the job to the workqueue, we make sure to remove the descriptor which we will process, from the list of event descriptors epoll will monitor. See the comment in code.

The above code snippets are from the dht.c file which is the primary file. The workqueue implementation can be seen and understood by going through the workqueue.c and workqueue.h files. Note that the function called by the workqueue will basically do the job of processing the incoming client request as per one’s protocol requirements. Workqueue initialisation is simple enough as below

At the end of the request processing function, the descriptor which was passed to the function for processing needs to be added back to the list of descriptors epoll should monitor.

The server thread with the epoll event loop will only be one and depending on the number of cores one would initialise the number of workqueues. So since I had a four core system, I had three workqueues and one thread with epoll event loop. This kind of a system can scale well instead of naive approaches like one thread or one process per client connection. Further work to be done are to get the EPOLLONESHOT flag to work as I am sure I was definitely doing something wrong while trying to use it and second would be using non blocking sockets instead of blocking one, however that will be quite a bit of work. Non blocking sockets are not so easy to handle.