As an average C/C++ programmers we usually use standard library functions to read or write files. Unless you are digging into your OS provided functions, you are most likely to use streams from either C or C++ library for most of your work. For every day it is pretty much OK, but there is still question which of them to use?
I was always avoiding using fgetc() family due to fact they read one char at time. But lately I was about to write a small tokenizer where reading one char at time is really tempting. I have even red on different forums and articles that it really does not matter if I use fgetc, fgets or fread since OS and library internally buffer streams anyway. Well I wanted to check if this is true and wrote a simple benchmark to test all 3 function on my system.
Surprise, even due to internal buffering, fgetc is consistently around 10 times slower than fread, while fgets is about 2 times slower. Test data consists of randomly generated printable ASCII characters in sizes from 1kilobyte to 100 megabyte, growing with factor 10; 1k, 10k, 10k and so on. All measurements are done 3 times and mean value is calculated. Timer used is one found at http://software.intel.com/en-us/articles/measure-code-sections-using-the-enhanced-timer/ (Edit 2017-02-05: the page seems to be down now).
First comparison is between fgetc, fgets and fread:
SIZE fgetc() fgets() fread() 1K 0.000170 0.000045 0.000029 10K 0.001288 0.000301 0.000103 100K 0.012736 0.002904 0.000848 1M 0.120394 0.026483 0.007996 10M 1.120597 0.282562 0.080213 100M 10.798302 2.541511 0.744125 200M 21.437850 5.030052 1.488380 500M 3.819704 1G 7.494000
What is seen is that fread() is around 10x or more, faster than fgetc, and around 1.5x ~ 3x faster than fgets(), especially when it gets to bigger files. What is also clearly visible is that execution time for all three methods is linearly dependent on file size.
Second comparison was used with setvbuf() so that buffer sis allocated on stack in size of BUFSIZ*4, which translates on my system into 512 * 4 bytes:
SIZE fgetc() fgets() fread() 1K 0.000147 0.000044 0.000026 10K 0.001300 0.000312 0.000103 100K 0.012643 0.002995 0.000856 1M 0.110152 0.025210 0.007382 10M 1.077510 0.256205 0.074444 100M 11.294960 2.565854 0.743703 200M 21.893663 5.197142 1.492345 500M 3.769666 1G 7.475868
I don’t think those differences are really measurable, since they seem to vary little bit at random.
I also measured fread() alone when used with chunks of different size, inclusive entire file. Yepp I did attempt to read in 1gig file in memory. Since I have 16 gig ram in this computer it shouldn’t be a problem. What I wanted to see if there was difference using setvbuf() or not.
What is seen in table below is how long time in seconds it takes for files of size 1k, 10k and so on to be red into chunks of 256 bytes, 512 bytes, 1024 and up to 1 megabyte.
Data is copied with fread() into a user allocated array horizontal row, and managed by std buffer size:
SIZE 256 512 1K 4K 16K 32K 1M filesize 1K 0.000029 0.000023 0.000023 0.000025 0.000025 0.000025 0.000026 0.000029 10K 0.000121 0.000109 0.000107 0.000108 0.000098 0.000099 0.000100 0.000103 100K 0.001048 0.000985 0.000975 0.000954 0.000893 0.000878 0.000856 0.000848 1M 0.009288 0.008926 0.008762 0.008641 0.008180 0.007918 0.007758 0.007996 10M 0.096388 0.088114 0.084811 0.083801 0.078113 0.076242 0.074664 0.080213 100M 0.921991 0.971907 0.875404 0.864597 0.812085 0.818127 0.822893 0.744125 200M 1.787666 1.725685 1.696015 1.672898 1.568236 1.528346 1.495935 1.488380 500M 4.616582 4.526420 4.375277 4.318964 4.055546 3.937942 3.901319 3.819704 1G 9.031612 8.807461 8.797347 8.371005 7.826087 7.617529 7.461680 7.494000
It seems like for small files it does not matter at all. In case of data up to 10K there is no measurable difference regardless if we user smaller or bigger buffer. When up too 100K and beyond it seems that buffer size does make difference. In case of 1 gigabyte data, difference is around 1.5 seconds, which is around 15% faster in this case (I am just approximating here). Sweet spot seems to be around 16K since doubling the buffer does not really make much of a difference.
Finally what about setting stream buffering to our user buffer by calling setvbuf()? This should at least remove one copy operation which in case of 1 gigabyte make make a slight difference; but probably the biggest difference should be in less OS calls, since by setting bigger buffer size than by standard size we are minimizing number of calls done to OS.
SIZE 256B 512B 1KB 4KB 16KB 32KB 1MB filesizeB 1K 0.000043 0.000035 0.000030 0.000029 0.000029 0.000030 0.000030 0.000026 10K 0.000285 0.000192 0.000152 0.000112 0.000104 0.000103 0.000105 0.000103 100K 0.002573 0.001729 0.001296 0.000962 0.000880 0.000863 0.000849 0.000856 1M 0.022267 0.014881 0.011085 0.008344 0.007724 0.007564 0.007498 0.007382 10M 0.223249 0.149266 0.112341 0.084416 0.076963 0.075942 0.075396 0.074444 100M 2.241777 1.532283 1.124452 0.845310 0.770661 0.758015 0.754588 0.743703 200M 4.366671 3.023781 2.267224 1.685182 1.540373 1.515737 1.511035 1.487029 500M 10.990479 7.360141 5.637200 4.225620 3.867625 3.803451 3.819377 3.769666 1G 22.123617 14.891769 11.394952 8.442253 7.823164 7.674322 7.627471 7.475868
If I didn’t do something crazy in my benchmark, there is not much of a difference here. It seems to be very slight boost, but not consistently, and not below 16 kilo buffers, maybe some microseconds up to ½ second, but not always. The difference is probably not even measurable. If that kind of performance matter for an application, I suppose that some other IO routines than those of stdlib should be used (those provided by the OS for example).
As a note, setting buffer to wrong size, less than or equal to standard BUFSIZ, seems (not unexpectedly at least according to documentation) to effectively result in no buffering at all, as seen when buffer is 256 and 512 bytes in size. Result is similary to fgetc(), which probably means that fread() is reading one byte at time.
Most important conclusion I can draw from both last benchmarks is that size of buffer were we copy data into application should be at least BUFSIZ or multiple of that size.
I have forgot to include measurement for getc() which usually is implemented as a macro to remove function call overhead. I have tested afterwards and got more or less same reulsts as for fgetc() on my system. As another remark, above results are from debugging version. Non debug version did about 50% better in all cases, but I am too lazy to make nice table. It is not really important since ratio between methods still shows same results and same conclusions apply. Here is my bench code; I really appreciate if someone can point if I did some big misstake there. To compile it one needs also high resolution timer found on Intel’s site (windows only unfortunately).
Measured on Windows 8, CPU i7, 2.4 ghz (asus rog laptop), with 16 gig ram, compiler was VS 2012.
{ 1 } Comments