In our projects, when returning videos from servers, the question of disk subsystem efficiency comes up. Obviously, gigabit network interfaces are more efficient than, say, a 2 disk RAID 0. If video clips shared the same popularity, then hard disks would be a bottleneck when returning content.

We, however, have been lucky and there has always been a small set of clips which make up 80% of the traffic and a long line of less-frequently-accessed clips. This small set subsides in the file system’s cache and is returned practically without any disk activity.

For example, if we have 100 gigabytes of clips on a server, and the server has 24 gigs of memory, then we can draw this graph. Along the horizontal line, we have files sorted by popularity; along the vertical line, we have traffic caused by these files. The total traffic is the surface area under the line.

filesystemcache

On the other hand, there’s always the question about how much is the best amount of memory to be installed on a server and how to measure the hit/miss ratio. In this post, we will examine how to do this.

As far as I know, there is no standard utility that can assemble the “cache hit ratio” for a file system cache, but we are doing this with SystemTap.

This is the script with comments below.

global total_bytes, disk_bytes, counter, overall_cache_bytes, overall_disk_bytes

probe vfs.read.return {
        if ($return > 0) {
                total_bytes += $return
        }
}

probe generic.fop.sendfile.return {
	if ($return > 0) {
		total_bytes += $return
	}
}

probe ioblock.request {
        if (rw == 0 && size > 0 && devname == "dm-0") {
                disk_bytes += size
        }
}

probe begin {
	disk_bytes = 0
	total_bytes = 0
}

probe timer.s(1) {
        if (counter%15 == 0) {
                printf ("\n%18s %18s %18s %10s %10s\n",
                        "Total Reads (KB)", "Cache Reads (KB)", "Disk Reads (KB)", "Miss Rate", "Hit Rate")
        }
        counter++

        cache_bytes = total_bytes - disk_bytes
        if (cache_bytes < 0) {
                cache_bytes = 0
	}
        if (cache_bytes+disk_bytes > 0) {
                hitrate =  10000 * cache_bytes / (cache_bytes+disk_bytes)
                missrate = 10000 * disk_bytes / (cache_bytes+disk_bytes)
        } else {
                hitrate = 0
                missrate = 0
        }
        printf ("%18d %18d %18d %6d.%02d%% %6d.%02d%%\n", total_bytes/1024, cache_bytes/1024, disk_bytes/1024, missrate/100, missrate%100, hitrate/100, hitrate%100)
        overall_cache_bytes += cache_bytes
        overall_disk_bytes += disk_bytes
        total_bytes = 0
        disk_bytes = 0
}

probe end {
        avg_hitrate =  10000 * overall_cache_bytes / ( overall_cache_bytes + overall_disk_bytes )
        avg_missrate = 10000 * overall_disk_bytes  / ( overall_cache_bytes + overall_disk_bytes )
        printf("\n%s: %d.%02d\n%s: %d.%02d\n",
                " Average Hit Rate", avg_hitrate/100, avg_hitrate%100,
                "Average Miss Rate", avg_missrate/100, avg_missrate%100)
}

The server returns files without “read/write”, but uses “sendfile”. Thus, basic disk traffic revolves around probe generic.fop.sendfile.return. Additionally, since we consider only read and sendfile, and disk traffic can be caused by write or different stat, readdir, getdents, etc. operations, total_bytes can be lower than read_bytes. But 99% of our operations on the server are read or sendfile, so we turn our backs to this fairly minor distortion.

We get roughly the following results:

# stap cache-hit-rate.stp
  Total Reads (KB)   Cache Reads (KB)    Disk Reads (KB)  Miss Rate   Hit Rate
            116150             113894               2256      1.94%     98.05%
            119515             116643               2872      2.40%     97.59%
            117422             114438               2984      2.54%     97.45%
            115679             112203               3476      3.00%     96.99%
            121254             118526               2728      2.24%     97.75%
            119375             116059               3316      2.77%     97.22%
            114101             109885               4216      3.69%     96.30%
            114776             112416               2360      2.05%     97.94%
            118970             116062               2908      2.44%     97.55%
            121416             117820               3596      2.96%     97.03%
            115386             112790               2596      2.24%     97.75%
            119955             116831               3124      2.60%     97.39%
            118245             115893               2352      1.98%     98.01%
            118250             115398               2852      2.41%     97.58%
            117079             114171               2908      2.48%     97.51%

  Total Reads (KB)   Cache Reads (KB)    Disk Reads (KB)  Miss Rate   Hit Rate
            112801             110241               2560      2.26%     97.73%
            118078             115742               2336      1.97%     98.02%
            117863             114659               3204      2.71%     97.28%
            119600             117120               2480      2.07%     97.92%
            118293             114977               3316      2.80%     97.19%

Average Hit Rate: 97.51
Average Miss Rate: 2.48
#