One of the interesting papers presented at USENIX 2009 was “Black-Box Performance Control for High-Volume Non-Interactive Systems” [pdf][html[slides]. Since this is right up my alley, I paid close attention and took some notes. The paper was authored by several IBM Research folks: Chunqiang Tang, Sunjit Tara, Rong N. Chang and Chun Zhang.
First of all, this is interesting and thought-provoking work. However, the paper deals with a very constrained environment of throughput-centric systems and with only a single pool of threads. I have reservations about the general applicability of the system to, say, disk scheduling. Nevertheless, their black box treatment of the system (multiple unknown bottlenecks) is quite interesting and it really made me wonder how else it could be extended. The main problem is that if you have multiple controls in the system (e.g. cpu, memory, disk, etc) that the effective online search they are performing will get really tricky. Nevertheless, good food for thought.
If you see slide 7, it talks about why they didn’t use TCP. (1) They deal with general distributed systems rather than just network, (2) No packet loss as performance indicator, (3) Unlike routers, a general-purpose server’s service time is not a constant. Turns out, each of these we encountered in the PARDA work [pdf][html][slides][blog references 1 2 3 4 5].
They also, in slide 22, raise the corresponding general research question of how far TCP can be taken. I pointed out our work in the lines of using TCP-style control for distributed IO scheduling in “PARDA” to them in the Q&A period. Some of the authors from IBM subsequently read our PARDA paper, really liked it and emailed us privately to discuss further. I gave them some ideas on how to use the PARDA concepts to apply to different resource allocation problems.