| Justin 的个人资料Various Technical Topics照片日志 | 帮助 |
|
11月16日 Creating Text Files - fast_ofstream and Custom FileStreamfast_ofstreamAs we saw with the analysis of the ofstream program we can achieve close to optimal throughput, but not with advanced features such as defragmenting that we require. One solution that doesn’t involve starting from scratch is to write a custom std::streambuf implementation that can be used with the standard C++ io streams. The application using fast_ofstream was able to write a defragmented file with a single loop using tellp() at 75MB/s. If we enable Write Through then the Defragment option becomes important. reserveBy using our own streambuf implementation, we can implement a better Defragment feature. We use an interface similar to std::vector in which we call a reserve() method passing the desired size in bytes. This is much better, because although we allocate space for the reserved size, we don’t actually have to use the space or know for sure how big the file will be. Any unused space will automatically be reclaimed when the stream is closed. tellp()Since we’re implementing our own buffer, we can make tellp() more efficient, thereby allowing us to use the simpler code from the original example while still retaining most of the performance benefits. final results (100MB)
The speed is close to what we observed with the best case using the default ofstream, but we’re able to use more advanced features, the most important being Defragment. Defragment is crucial to avoid overall file system fragmentation and to ensure optimal read speed. The current fast_ofstream implementation is not very robust, and it’s possible that even faster results could be achieved. The biggest remaining problems are:
Custom FileStreamIt should be possible to achieve even better performance by avoiding the standard C++ iostream interfaces completely. I created a simple class as a test case. enum FS_OPTIONS { fsNone = 0, fsDisableCaching = 1, fsWriteThrough = 2, fsSequential = 4 }; class FileStream { FileStreamImpl* impl_; public: FileStream(const std::string& fname, FS_OPTIONS opts); ~FileStream(); void write(const std::string& s); void write(const std::string& s, std::string::size_type offset, std::string::size_type length); unsigned long long size() const; void reserve(unsigned long long bytes); private: FileStream(const FileStream&); FileStream& operator=(const FileStream&); }; final results (100MB)
For many applications this could be worth the effort of implementing a custom FileStream. Next I'll discuss the use of other languages and platforms.
Creating Text Files - Optimizing std::ofstreamWhere to start?Just to make sure I was on the right track, I decided to write a quick proof-of-concept Windows program to verify that I could come close to the desired performance. This program works by creating a 64K buffer filled with ‘X’ characters, and writes it repeatedly until it gets a file of the desired size. The program supports the following options.
With this test program I was able to achieve 77MB/s with caching disabled and 145MB/s with caching enabled for 1000MB test files. Smaller files were even faster, sometimes approaching 1200MB/s effective speeds with caching enabled. For this test, the Sequential Scan option didn’t seem to make much difference. This makes sense, because I’m already accessing the file in fairly large chunks. It’s possible that this flag is of primary benefit when reading smaller amounts of data, or that the hint is ignored by my drivers and/or hardware. The Write Through and Disable Caching options seem to have about the same effect. Without these options the program is obviously faster than the disk can physically handle. The Defragment option was a little less conclusive. When I had all the above options turned on, I was only able to write at 65MB/s, but turning off Defragment only slowed it down to 63MB/s. However, without Write Through the Defragment option was good for 8MB/s difference. Despite this, it was very apparent when the Defragment option was enabled from a disk noise perspective, and as I mentioned previously, file fragmentation is of paramount importance when reading the files. Optimizing std::ofstreamNow that I had some idea of what was possible given the underlying OS, I decided to attempt to speed up the original ofstream program. Some of the changes that made the most difference were surprising, and should be helpful when optimizing any program that uses ofstream. defragmentofstream does not provide an option to directly create a large file, but by using the following statements I was able to create a defragmented file.
Unfortunately, this also slowed the program to 5.5MB/s from the original 11MB/s. For some applications this might be a worthwhile tradeoff, but hopefully this optimization will work better when combined with others. buffer sizeOne problem with the existing program is that it takes 1200+ writes per MB to build the file. We should be able to improve performance by telling the ofstream to use a larger buffer.
When I ran this program with BUFSIZE=64K I was surprised to find that it slowed from 11MB/s to 1MB/s. What I discovered is that smaller BUFSIZE values actually increased performance. By using 1K I was able to achieve 15MB/s. This is a good example of why you should always test for performance before making design decisions based on faulty assumptions. This is not premature optimization. Addressing these issues early in a project before it’s too late is just good engineering. Interestingly, BUFSIZE=64K resulted in 1037 writes/MB while BUFSIZE=1K resulted in 2000 writes/MB. Also, turning on the defragment option had no additional negative performance impact when using BUFSIZE=64K. use ios::binaryOpening the stream in binary mode results in a speed increase to 23.5MB/s. This prevents automatic translation of ‘\n’ to “\r\n”, but has no other effect as far as I can tell. Furthermore, the Defragment option only slows this to 19MB/s, and the Buffer Size option now works as originally expected, decreasing the number of disk writes to 16 per MB (100MB/64K=16) resulting in 26MB/s. Combining Defragment with Buffer Size was still 19MB/s however. writing single charsChanging the loop that writes out tabs to write ‘\t’ instead of “\t” results in .7MB/s speedup. better indentationAlthough writing ‘\t’ instead of “\t” was an improvement, we can do better by creating a string of tab characters, and then writing a variable number of them. const string tabs(MAX_TABS, ‘\t’); … int indent = num_lines % MAX_TABS; if (indent > 0) { out.write(tabs.c_str(), indent - 1); } This results in 32MB/s with all options except Defragment enabled. eliminate tellp()When I first ran the (so I thought) fully optimized version of the program on my Mac Mini under OSX I was surprised to find that I could only get .8MB/s. A little tinkering with the code revealed that the call to tellp() was taking most of the time. I eliminated tellp() and the Mac Mini jumped to 40MB/s. const string tabs(MAX_TABS, '\t'); const string msg1 = "All work and no play make "; const string msg2 = " a dull boy.\n"; // Calculate num_lines in separate loop first for (int bytes = 0; bytes < num_bytes;) { ++num_lines; bytes += num_lines % MAX_TABS; const string& name = names[num_lines % names.size()]; bytes += name.length(); bytes += msg1.length(); bytes += msg2.length(); } for (unsigned line = 1; line <= num_lines; ++line) { int indent = line % MAX_TABS; if (indent > 0) { out.write(tabs.c_str(), indent - 1); } const string& name = names[line % names.size()]; out << msg1 << name << msg2; } The first loop duplicates the logic in the original loop without actually writing anything. This allows the second loop to use a simple for() loop, because we now know exactly how many lines to write. This is the first change that really complicated our code, but it also has a big performance benefit. The program now writes at 60MB/s. writing stringsOne surprising thing I found was that writing std::string can be significantly slower than writing const char*. Simply changing the above code to write: out << msg1.c_str() << name.c_str() << msg2.c_str(); resulted in a speedup to 75MB/s. Making this change before the tellp() change did not have any significant effect. ostream::writeLooking at the source code for operator<<(ostream& out, const char* str) showed that it essentially calls “out.write(str, strlen(str))”. We already know the length of each string, so we should be able to write faster using: out.write(msg1.c_str(), msg1.length()); out.write(name.c_str(), name.length()); out.write(msg2.c_str(), msg2.length()); This resulted in a speedup to 80MB/s defragment revisitedWith all other options in place writing the file defragmented now writes at 40MB/s. final results (100MB)
The biggest remaining problems are:
Performance Quiz - Creating Text FilesI have an interesting programming challenge. Write a program to create a sample text file of a specified size in megabytes. The program should write out as many lines as possible of the format “[tabs]All work and no play makes [name] a dull boy.”, which you may recognize as the text that Jack Torrance types repeatedly in 1980’s The Shining It seems appropriate. The program should provide a little variation to the lines by starting each line with a varying number of tabs, and alternating the name in the sentence. Here’s a C++ program that does the job: #include <iostream> #include <fstream> #include <string> using namespace std; const int FILE_BYTES = 100 * 1024 * 1024; const int MAX_INDENT = 8; const int NUM_NAMES = 6; int main(int argc, char* argv[]) { const string names[NUM_NAMES] = { "Jack", "Justin Michel", "Fred Flintstone", "Barney Rubble", "Homer J. Simpson", "John Jacob Jingleheimer Schmidt" }; ofstream out("sample.txt", ios_base::out); for (unsigned num_lines = 1; out.tellp() <= FILE_BYTES; ++num_lines) { for (unsigned int i = 0; i < num_lines % MAX_INDENT; ++i) { out << "\t"; } const string& name = names[num_lines % NUM_NAMES]; out << "All work and no play makes " << name << " a dull boy.\n"; } return 0; } When I run this program, I find that it’s able to generate a 100MB file at almost 11 MB/s, however I noticed several issues that indicate this could be improved.
It was surprising to me that the above program was CPU-bound. After all, there’s not much apparent processing going on. Few calculations are done within the loop, and it writes the same short strings over and over. If this simple program uses 100% of the CPU then any less trivial program is only going to use more. Ideally we’d like to have close to 0% CPU usage so that the program could make use of the CPU for other functionality. For example, an application with a logging facility using code similar to that above would spend all the limited computer resources on logging with nothing left over for running the actual application. The noisy disk activity provides a clue to the biggest single problem. C++ ofstream will automatically grow the file as needed, but if you think about it this can only lead to excessive file fragmentation. The OS has no idea how big the file is going to get, so it will allocate small chunks as needed. What we’d like is some way to tell the OS that we’re preparing to write 1GB so that it can allocate one nice consecutive chunk. This should allow optimal write speed assuming we write everything sequentially, but more importantly it will help prevent disk fragmentation and allow the file to be read in much faster. (Previous testing has shown that a fragmented file can take 10-15 times longer to read.) Another reason for the excessive disk activity is that ofstream seems to use a small buffer. I know this because my real test program has instrumentation that measures Disk, CPU, and Memory usage, and it shows that a 100MB file is written using 125,820 disk writes. I found the following review of my hard disk, which indicates that I should be able to achieve about 75MB/s presumably with caching disabled. http://techreport.com/reviews/2006q2/raptor-wd1500/index.x?pg=12 I also checked the specs on the manufacturers website, which claim that 84 MB/s should be possible. The other problem with using ofstream is that you don’t have any mechanism for specifying advanced features. For example, in Windows you can create a file with flags that specify Compression, Encryption, Temporary (Tells the OS to avoid writing to a physical file if possible), DeleteOnClose, DisableSystemCache, Sequential/Random access (A hint to the cache), DisableDriveCache, and many other options. In my next post, I'll attempt to optimize the program while still using std::ofstream.
|
|
|