99 more life Hacks Tumblr

C++ Futures at Instagram  February 2, 2016 – 12:29 pm

imageOver the past few months, we’ve built two high-performing recommendation services that handle tens of thousands of queries per second and generate tens of millions connections per day. In this blog post, we want to share our experience of scaling these two services using Futures and, most importantly, how we fine-tuned the details.

  • The first recommendation service is “Suggested Users.” The SU service fetches candidate accounts from various sources, such as your friends, accounts that you may be interested in, and popular accounts in your area. Then, a machine learning model blends them to produce a list of personalized account suggestions. It powers the people tab on explore, as well as one of the entry points after new users sign up. It is an important means of people discovery on Instagram and generates tens of millions follows per day.
  • The second service is “Chaining.” Chaining generates a list of accounts that a viewer may be interested in during every profile load. The performance of this service is important - it must be ready to be hit for every profile visit, which translates to over 30, 000 queries per second.

These two services share similar infrastructure: they need to make outbound network calls to retrieve suggestions from various sources, load features and rank them before returning them to our Django backend for instrumentation and filtering:

The Thrift threading model

imageWhile most of our backend logic lives in Django, we write the services that generate and rank suggestions in C++ using fbthrift. To understand the evolution of our services’ threading model, we need to understand the life cycle of a thrift request. An fbthrift server has three kinds of threads: acceptor threads, I/O threads and worker threads.

When a request comes in:

  1. An acceptor thread accepts the client connection and assigns it to an I/O thread;
  2. The I/O thread reads the input data sent by a client, and passes it to a worker thread and the I/O thread will again be responsible for sending outbound requests later;
  3. The worker thread deserializes the input data into parameters, calls the request handler of the service in its context and spawns additional threads for outbound calls or computation.

The important part is that the thrift request handler runs in a worker thread and not in an I/O thread. This allows the server to be responsive to clients - even if all the worker threads are busy, the server will still have free I/O threads to send an overloaded response to clients and close sockets.

Synchronous I/O: The initial version

The initial version of the service loaded candidates and features synchronously. To reduce latency, all the I/O calls were issued in parallel in separate threads. At the end of the handler was a join primitive which blocked until all the threads were done. What this essentially means is that one worker thread could only service one client request at a time, and one single request would block as many threads as the number of outbound calls.

This has several disadvantages:

  1. It leads to a large memory footprint - each thread by default has a stack size of several MBs.
  2. We need a separate worker thread to service each client request (and more threads created in the handler to make the I/O calls in parallel). If each request makes M outbound calls, we will have O(M * N) threads waiting for responses.
  3. Thread scheduling also becomes a bottleneck in the kernel at around 400 threads.
  4. With this model, we had to run several hundred instances of server across many machines to support our QPS, because we are not utilizing CPU resource or memory efficiently.

Clearly, there was room for improvement.

Using non-blocking I/O

The fbthrift offers three ways to handle requests: synchronous, asynchronous and future-based. The latter two offer non-blocking I/O and this is how it works : every I/O thread has a list of file descriptors on whose status change it waits on in an event loop (it detects this status change through the select/poll/epoll system call). When the status of the file descriptor changes to “completed, ” the I/O thread calls the associated callback. In order to do non-blocking I/O under this mechanism, two things need to be specified:


TED Talks: Mastery: How To Write & Deliver The Ultimate Public Speaking Presentation With Storytelling (Public Speaking, Confidence, Presentation Skills, ... Persuasion, Sales Techniques, Engage)
eBooks ()

You might also like:

Hack My Life Tumblr Challenge - Week 1
Hack My Life Tumblr Challenge - Week 1
  • avatar Life Hacks Hack your spinbrush for more (battery free) power
    • To prevent any leaks I used a hot glue gun to waterproof the hole where the wire is inserted into the cap. I glued around the wire on the inside and the outside. Once this has dried I also applied glue around the cap where it snaps into place with the body of the toothbrush. For added safety always use your tweaked Spinbrush on a GFCI outlet.

Related posts:

  1. College life Hacks Tumblr
  2. Awesome life Hacks Tumblr
  3. Female life Hacks Tumblr