MultiLanes (from USENIX FAST 2014)

コンテナでは、ホストOS上のファイルシステムがボトルネックとなって遅くなる。その解決策の検討をしたお。
なお、本研究の目的は。フラッシュやストレージクラスメモリ等の早いストレージが出現してきており、ストレージのメタデータ管理がそれに対処できていないということである。

ここでは、16ホストの場合について、調査研究を行った。
なお、修正した箇所は、デバイスマッパとVFSの部分である。I/O部分の競合は以下のようにして回避している。

デバイスマッパは、イメージごとにデバイスを見せる。
VFSについては、デバイスごとにVFSを設定する。

なお、デバイスマッパの部分はbioで実装(multiqueueではない)
ロックの回数や時間が測定できるLinuxで使えるlockstatsを、今更ながら知った。
しかしながら、XFSのスケーラビリティは聞いていた話なので、特にびっくりしなかった。Ext3, Ext4のスケーラビリティが落ちるのもさもあり何といった感じ。しかし、そこまでファイルシステムにシビアに評価せざるを得ない時代になったのだなと、変なところで感心した。

USENIX FAST 2014 MultiLanes
- MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores | USENIX
ロックのオーバヘッド等の測定ツール
- https://github.com/torvalds/linux/blob/master/Documentation/lockstat.txt
デバイスマッパ
- http://lc.linux.or.jp/lc2009/slide/T-02-slide.pdf
- 10分で分かるLinuxブロックレイヤ - SSSSLIDE

以下、斜め読みのまとめ。(箇条書き書き抜き)

Introduction
- The virtualized block device
- The partitioned VFS
- OpenVZ (2.6.32)
- Linux-VServer (3.7.10)
- LXC (3.8.2)
Motivation
- The poor scalability of the storage system is mainly caused by the concurrent accesses to shared data structures and the use of synchronization primitives.
Multilanes Design
- Design Goals
  - The containers co-located share the same I/O stack, which not only leads to severe performance interference between them but also suppress flexibility.
  - design goal
    1. it should be conceptually simple, self-contained, and transparent to applications and to various file systems;
    2. it should achieve good scalability with the number of containers on the host
    3. it should minimize the virtualization overhead on fast storage media so as to offer near native performance.
  - Architectural Overview
    - MultiLanes is composed of two key design modules:
      1. the virtualized storage device
        
        MultiLanes maps regular files in the host file system as virtualized storage device to containers, which provides the fundamental basis for running multiple guest file systems.
      2. the pVFS
  - Design Components
    - Virtualized Storage
    - Driver Model
      - For the virtualized block device of MultiLanes, the sector region specified in the request is actually a data section of the back-end file. It is composed of two major components: the block translation and block handling.
      - Block Translation
        
        The block translation unit of each virtualized driver consists of a cache table, a job queue and a translation thread.
      - Request Handling
    - Partitioned VFS
      - In particular, MultiLanes allocates an inode hash table and a dentry hash table for each container to eliminate the performance interference within the VFS layer.
Implementation
- We implemented MultiLanes in the Linux 3.8.2 kernel(LXC)
- Driver Implementation
  - standard interface
    - make_request_fn
    - submit_bio
- pVFS Implementation
Evalutaion
- Experimental Setup
  - using a RAM disk could rule out any effect from SSDs so as to measure the maximum scalability benefits of MultiLanes.
  - Question?
    - Does MultiLanes achieve good scalability with the number of containers on many cores?
    - Are all of MultiLane's design components necessary to achieve such good scalability?
    - Does the overhead induced by MultiLanes contribute marginally to the performance under most workloads?
  - Server
    - Intel 16-core machine with Intel Xeon(R) E7520 (1.87GHz)
      - 32KB L1 data cache, 32KB L1 instruction cache
      - 256KB L2 cache
      - 18MB L3 cache
      - HyperThread off
    - 64GB memory
    - RAM disk size 40GB
    - Lock usage statistics on
- Performance Results
  - Microbenchmarks
    - Ocrd (Metadata intensive)
    - IOzone (Data intensive)
  - Macrobenchmarks
    - Filebench
    - MySQL
- Overhead Analysis
  - Apache Build (file I/O less intensive)
    - none
  - Webserver (read intensive)
    - none
  - Streamwrite (write intensive)
    - around 10% degration