Skip to content

Measuring file IO in C standard library (fgetc, fgets and fread)

As an average C/C++ programmers we usually use standard library functions to read or write files. Unless you are digging into your OS provided functions, you are most likely to use streams from either C or C++ library for most of your work. For every day it is pretty much OK, but there is still question which of them to use?

I was always avoiding using fgetc() family due to fact they read one char at time. But lately I was about to write a small tokenizer where reading one char at time is really tempting. I have even red on different forums and articles that it really does not matter if I use fgetc, fgets or fread since OS and library internally buffer streams anyway. Well I wanted to check if this is true and wrote a simple benchmark to test all 3 function on my system.

Surprise, even due to internal buffering, fgetc is consistently around 10 times slower than fread, while fgets is about 2 times slower. Test data consists of randomly generated printable ASCII characters in sizes from 1kilobyte to 100 megabyte, growing with factor 10; 1k, 10k, 10k and so on. All measurements are done 3 times and mean value is calculated. Timer used is one found at http://software.intel.com/en-us/articles/measure-code-sections-using-the-enhanced-timer/ (Edit 2017-02-05: the page seems to be down now).

First comparison is between fgetc, fgets and fread:


SIZE	fgetc()	        fgets()	        fread()

1K	0.000170	0.000045	0.000029
10K	0.001288	0.000301	0.000103
100K	0.012736	0.002904	0.000848
1M	0.120394	0.026483	0.007996
10M	1.120597	0.282562	0.080213
100M	10.798302	2.541511	0.744125
200M	21.437850	5.030052	1.488380
500M			                3.819704
1G			                7.494000

What is seen is that fread() is around 10x or more, faster than fgetc, and around 1.5x ~ 3x faster than fgets(), especially when it gets to bigger files. What is also clearly visible is that execution time for all three methods is linearly dependent on file size.

Second comparison was used with setvbuf() so that buffer sis allocated on stack in size of BUFSIZ*4, which translates on my system into 512 * 4 bytes:

SIZE	fgetc()	        fgets()	        fread()
			
1K	0.000147	0.000044	0.000026
10K	0.001300	0.000312	0.000103
100K	0.012643	0.002995	0.000856
1M	0.110152	0.025210	0.007382
10M	1.077510	0.256205	0.074444
100M	11.294960	2.565854	0.743703
200M	21.893663	5.197142	1.492345
500M			                3.769666
1G			                7.475868

I don’t think those differences are really measurable, since they seem to vary little bit at random.

I also measured fread() alone when used with chunks of different size, inclusive entire file. Yepp I did attempt to read in 1gig file in memory. Since I have 16 gig ram in this computer it shouldn’t be a problem. What I wanted to see if there was difference using setvbuf() or not.

What is seen in table below is how long time in seconds it takes for files of size 1k, 10k and so on to be red into chunks of 256 bytes, 512 bytes, 1024 and up to 1 megabyte.

Data is copied with fread() into a user allocated array horizontal row, and managed by std buffer size:

SIZE	256	        512	        1K	        4K	        16K	        32K	        1M	        filesize
1K	0.000029	0.000023	0.000023	0.000025	0.000025	0.000025	0.000026	0.000029
10K	0.000121	0.000109	0.000107	0.000108	0.000098	0.000099	0.000100	0.000103
100K	0.001048	0.000985	0.000975	0.000954	0.000893	0.000878	0.000856	0.000848
1M	0.009288	0.008926	0.008762	0.008641	0.008180	0.007918	0.007758	0.007996
10M	0.096388	0.088114	0.084811	0.083801	0.078113	0.076242	0.074664	0.080213
100M	0.921991	0.971907	0.875404	0.864597	0.812085	0.818127	0.822893	0.744125
200M	1.787666	1.725685	1.696015	1.672898	1.568236	1.528346	1.495935	1.488380
500M	4.616582	4.526420	4.375277	4.318964	4.055546	3.937942	3.901319	3.819704
1G	9.031612	8.807461	8.797347	8.371005	7.826087	7.617529	7.461680	7.494000

It seems like for small files it does not matter at all. In case of data up to 10K there is no measurable difference regardless if we user smaller or bigger buffer. When up too 100K and beyond it seems that buffer size does make difference. In case of 1 gigabyte data, difference is around 1.5 seconds, which is around 15% faster in this case (I am just approximating here). Sweet spot seems to be around 16K since doubling the buffer does not really make much of a difference.

Finally what about setting stream buffering to our user buffer by calling setvbuf()? This should at least remove one copy operation which in case of 1 gigabyte make make a slight difference; but probably the biggest difference should be in less OS calls, since by setting bigger buffer size than by standard size we are minimizing number of calls done to OS.

SIZE	256B	        512B	        1KB	        4KB	        16KB	        32KB	        1MB	        filesizeB
1K	0.000043	0.000035	0.000030	0.000029	0.000029	0.000030	0.000030	0.000026
10K	0.000285	0.000192	0.000152	0.000112	0.000104	0.000103	0.000105	0.000103
100K	0.002573	0.001729	0.001296	0.000962	0.000880	0.000863	0.000849	0.000856
1M	0.022267	0.014881	0.011085	0.008344	0.007724	0.007564	0.007498	0.007382
10M	0.223249	0.149266	0.112341	0.084416	0.076963	0.075942	0.075396	0.074444
100M	2.241777	1.532283	1.124452	0.845310	0.770661	0.758015	0.754588	0.743703
200M	4.366671	3.023781	2.267224	1.685182	1.540373	1.515737	1.511035	1.487029
500M	10.990479	7.360141	5.637200	4.225620	3.867625	3.803451	3.819377	3.769666
1G	22.123617	14.891769	11.394952	8.442253	7.823164	7.674322	7.627471	7.475868

If I didn’t do something crazy in my benchmark, there is not much of a difference here. It seems to be very slight boost, but not consistently, and not below 16 kilo buffers, maybe some microseconds up to ½ second, but not always. The difference is probably not even measurable. If that kind of performance matter for an application, I suppose that some other IO routines than those of stdlib should be used (those provided by the OS for example).

As a note, setting buffer to wrong size, less than or equal to standard BUFSIZ, seems (not unexpectedly at least according to documentation) to effectively result in no buffering at all, as seen when buffer is 256 and 512 bytes in size. Result is similary to fgetc(), which probably means that fread() is reading one byte at time.

Most important conclusion I can draw from both last benchmarks is that size of buffer were we copy data into application should be at least BUFSIZ or multiple of that size.

I have forgot to include measurement for getc() which usually is implemented as a macro to remove function call overhead. I have tested afterwards and got more or less same reulsts as for fgetc() on my system. As another remark, above results are from debugging version. Non debug version did about 50% better in all cases, but I am too lazy to make nice table. It is not really important since ratio between methods still shows same results and same conclusions apply. Here is my bench code; I really appreciate if someone can point if I did some big misstake there. To compile it one needs also high resolution timer found on Intel’s site (windows only unfortunately).

Measured on Windows 8, CPU i7, 2.4 ghz (asus rog laptop), with 16 gig ram, compiler was VS 2012.

{ 1 } Comments