Profil de JustinVarious Technical TopicsPhotosBlog Outils Aide

Blog


18 novembre

Creating Text Files - Downloads

The code (create_text_file.zip) can now be found in my SkyDrive public folder.
 
To build the C++ samples, you'll need to create project files using MPC. The C++/CLI and VB.NET projects use Visual Studio 2005 (The free Express edition will probably work.) The Java projects use Eclipse and Java 1.5.
 
MPC is a program that a coworker and I designed to make working with C++ projects much easier. Most of the C++ code that we write needs to run on a wide variety of platforms, and supporting all the different build tools can be a pain. Rather than including project files or makefiles for VC++, nmake, bmake, gmake, etc, we just create simple text files that can be used to generate any of the above.
 
How simple?
 
Here's the complete set of MPC files for all seven C++ projects:
 
create_text_file/create_text_file.mpb -- This is a Base project that is used to set defaults.
project {
  after += timer
  libs += timer
  libpaths += ../timer
  includes += ../timer
}
This basically says that all projects that derive from this base are dependent on a timer library found at ../timer.
 
create_text_file/create_text_file.mwc -- This is a workspace file used to group related projects
workspace {
  cmdline += -static
  implicit = create_text_file
  exclude {
    FileStreamDotNet
  }
}
This says to automatically create a project in any suitable subdirectories containing either C++ source code or .mpc files. Each implicitly created project will automatically inherit from the .mpb above. We also exclude one directory, because it contains a C++/CLI .NET project.
 
create_text_file/timer/timer.mpc -- This is an explicit project file
project {
  staticname = timer
}
 
This file causes a project to be created with the name timer. There are staticname and sharedname keywords, allowing you to specify different names depending on whether you generate a project to create a dynamic or static library.  If only one name is specified then the other defaults to the same name.
Actually, the timer.mpc file is totally unnecessary, because it merely specifies the same behavior as the default.
 
I also explicitly specify that a static library will be created by including a cmdline += -static in the .mwc.
 
 
You can read more about MPC and download it here.
To run MPC, you'll also need Perl.
 
To create the Visual Studio 2005 version of these projects unzip create_text_file.zip to a directory, and run mwc.pl like so:
 
c:\create_text_file>mwc.pl -type vc8
 
This should create a .SLN file which can then be used to build and run the projects.
 
 
16 novembre

Creating Text Files - Conclusion

SCSI Blues

On the one SCSI machine, the write speed was 2-5MB/s regardless of implementation or configuration options if WriteThrough was enabled. This probably just means that the drive does a more thorough job of honoring the WriteThrough setting. However the SCSI drive should have been able to achieve something close to its 125MB/s specified transfer rate.

 

One thing I tried to improve the performance on the SCSI machine was to install Windows Server 2003 R2. This had no benefit until I enabled a new “Enable advanced performance” option for the disk driver. This seemed to help quite a bit with the slowest tests now in the 9MB/s range. This is still quite a bit slower than the fastest system (#1) however.

results (win32 2000MB)

Write Through + Disable Caching + Defrag = 9MB/s @ <1% CPU

As above with 16MB buffer = 53MB/s @ <1% CPU

Write Through + Defrag + 16MB buffer = 58MB/s @ 28% CPU

Defrag = 55MB/s @ 20% CPU

Defrag + 16MB buffer = 47MB/s @ 21% CPU

compared to system #1

Write Through + Disable Caching + Defrag = 55MB/s @ <2% CPU

As above with 1MB buffer = 68MB/s @ <1% CPU (>1MB made no difference)

Write Through + Defrag + 1MB buffer = 68MB/s @ 8% CPU

Defrag = 90MB/s @ 10% CPU

Defrag + 1MB buffer = 56MB/s @ 6% CPU

A Grain of Salt

The performance measurements above paint a certain picture which can be a little misleading. One thing they don’t show is subjective disk thrashing that seemed to occur without Defragmenting. They also don’t show how repeatable the results were from run to run. The numbers only represent the best result I was able to achieve with the given options, but some tests were very consistent, while others varied widely. For instance the .NET results seemed to vary more than the others.

I also found that the Defrag option made a bigger difference on my older home machine #2, than it did on my work machine #1.  I suspect this was due to improvements in the newer version of the hard disk.

Before putting too much stock in these measurements, you should run them using your own platforms and compliers.

Conclusions

  • Measure performance using a simple application that has similar disk access patterns to your real application. (Or a subset of your application.)
  • Use the mechanisms above to help ensure that files are written defragmented.
  • If you have to use std::ofstream then performance may be greatly improved by providing a larger buffer, using ios::binary, preventing excessive calls to tellp(),  writing string::c_str() instead of string, and using ostream::write() instead of operator<<().
  • If you must use std::ostream, but need more powerful features such as writing encrypted or compressed files, automatically deleting a file when it’s closed, specifying caching hints to the OS, or just want better performance, then write a custom std::streambuf implementation similar to std::file_buf.
  • For the ultimate performance consider bypassing the std::ostream facility entirely.
  • SCSI drive systems behave differently, and must be tested separately. When using SCSI, you can get the best performance by using a very large buffer, and specifying WriteThrough and DisableCaching.
  • Let your ears guide you. A noisy disk drive may indicate fundamental problems with your design.
  • DotNet provided pretty good performance and features for very little effort, although some features such as Disable Buffering are inexplicably missing.
  • Java was not very performant or feature filled despite the NIO implementation. It was also very difficult to write the NIO version.

System 1 Results for 100MB

Test MB/s CPU% Defrag WriteThrough NoCache
ofstream_unopt 11 100 0 0 0
ofstream 40 50 1 0 0
  80 100 0 0 0
fast_ofstream 30 42 0 1 0
  40 50 1 1 0
  43 65 1 0 1
  75 100 0 0 0
  75 100 1 0 0
Custom FileStream 255 100 0 0 0
  267 96 1 0 0
  55 12 0 1 0
  68 19 1 1 0
  53 50 0 0 1
  77 36 1 0 1
  71 37 1 1 1
VB 2005 206 100 0 0 0
  213 100 1 0 0
  59 40 0 1 0
  63 38 1 1 0
C++ CLI 228 100 0 0 0
  237 100 1 0 0
  53 32 0 1 0
  66 38 1 1 0
Java FileOutputStream 26 100 0 0 0
  20 100 1 0 0
Java NIO 22 100 0 0 0
  24 100 1 0 0

 


Creating Text Files - Other Languages and Platforms

Some of the C++ programs above (fast_ofstream in particular) were fairly difficult to implement, and I wanted to see what could be achieved using a more rapid development tool. So far I’ve implemented the test in Visual Basic 2005, and Java (2 ways). Adam Mitz (a coworker) also contributed a C++/CLI implementation for comparison.

I also tested many of the programs on Macintosh as well as Linux, and these platforms achieved similar benefits.

In the future I’d like to try the tests under Mono on Linux, and possibly implement a Java version using FileOutputStream, since NIO didn’t seem to provide an advantage.

I ran tests on multiple systems.

  1. WinXP Core 2 Duo 6700 w/ 10Krpm drive and 2GB RAM
  2. WinXP Athlon64 3400 w/ 10Krpm drive and 1GB RAM
  3. WinXP Dual AthlonMP 2800 w/ 15Krpm SCSI drive and 1.5GB RAM
  4. Linux Athlon64 3500 w/ 10Krpm drive and 1GB RAM
  5. OSX Mac Mini Core Duo
  6. WinXP Mac Mini Core Duo

The results quoted throughout this article are those from the first configuration above, and this system unsurprisingly also gave the best performance.

All of these systems gave very similar relative results with the exception of #3, which seemed due primarily to the use of a SCSI disk drive.

Visual Basic 2005

I chose to implement the test in VB, because I find the language and IDE allow me to write more quickly than other .NET alternatives. I think the VB language has a lot of problems (such as some of the keyword names), but I find that case insensitivity, line orientation, English keywords, and other features make it much easier to work with than the C-like alternatives.

In the end it only took a few hours to create the VB version of the program, and it was actually this program that led me to create the fastest C++ programs. I was pretty satisfied with C++ fast_ofstream when it was running at 40MB/s, but the VB program was much faster with much less development effort. This led me to develop the C++ FileStream program which then led to further improvements in fast_ofstream.

No Options = 206MB/s @ 100% CPU

Defrag Only = 213MB/s @ 100% CPU

Write Through = 59MB/s @ 40% CPU

Write Through + Defrag = 63MB/s @ 38% CPU

One problem with .NET is that the System.FileStream object doesn’t support disabling the system cache. I was therefore unable to measure that option.

Another problem is that all Strings are Unicode, and there is some small overhead outside the main loop to encode them as ASCII. While this may affect this test, it also would be trivial to change the VB program to write out UTF8, UTF16, UTF32, and other formats. I wouldn’t even want to try this with the C++ version.

For what it’s worth the VB program, despite the apparent verbosity of the language, was actually shorter than most of the C++ versions.

Sub WriteSampleFile()
  Dim pt As New PerfTimer
  Dim fname As String = OUT_DIR & FILESIZE_MB & "MB.txt"
  Dim fs As FileStream = CreateFile(fname)
  Dim numLines As Long
  Dim totalBytes As Int64 = FILESIZE_MB * 1024L * 1024L
  Using fs
    Console.WriteLine("Creating {0}MB file.", FILESIZE_MB)
    If DEFRAGMENTED Then
      Dim reserve As Long = ((totalBytes \ BUFSIZE) + 1) * BUFSIZE
      fs.SetLength(reserve)
    End If
    Dim enc As Text.Encoding = Text.Encoding.ASCII
    Dim msg1() As Byte = enc.GetBytes("All work and no play makes ")
    Dim msg2() As Byte = enc.GetBytes(" a dull boy." & Environment.NewLine)
    Dim tabs() As Byte = {9, 9, 9, 9, 9, 9, 9, 9}
    Dim names As List(Of Byte()) = CreateNames()
    Do
      numLines += 1
      fs.Write(tabs, 0, CInt(numLines Mod tabs.Length))
      fs.Write(msg1, 0, msg1.Length)
      Dim name() As Byte = names.Item(CInt(numLines Mod names.Count))
      fs.Write(name, 0, name.Length)
      fs.Write(msg2, 0, msg2.Length)
    Loop Until fs.Position >= totalBytes
    fs.SetLength(totalBytes)
  End Using
  pt.PrintElapsed("Done. ")
  Dim tp As Double = CalcThroughput(FILESIZE_MB, pt.ElapsedWall)
  Console.WriteLine("Wrote {0} lines at {1} MB/s", numLines, tp)
End Sub

C++/CLI

I want to thank my coworker Adam Mitz for contributing a C++/CLI version of the program. This version was a little faster, but .NET performance varies quite a bit, and the VB version and C++ version have about the same performance overall.

No Options = 228MB/s @ 100% CPU

Defrag Only = 237MB/s @ 100% CPU

Write Through = 53MB/s @ 32% CPU

Write Through + Defrag = 66MB/s @ 38%CPU

int main(array<String^>^)
{
  using IO::FileStream;
  PerfTimer pt;
  String^ fname = gcnew String(OUT_DIR);
  fname += FILESIZE_MB;
  fname += "MB.txt";
  long long totalBytes = FILESIZE_MB * 1024 * 1024;
  Console::WriteLine("Creating {0}MB file.", FILESIZE_MB);
  FileStream^ fs = CreateFile(fname);
  if(DEFRAGMENTED)
  {
    long long reserve = static_cast<long long>(((totalBytes / (double)BUFSIZE) + 1) * BUFSIZE);
    fs->SetLength(reserve);
  }
  array<unsigned char>^ msg1 = Text::Encoding::ASCII->GetBytes("All work and no play makes ");
  array<unsigned char>^ msg2 = Text::Encoding::ASCII->GetBytes(" a dull boy.\n");
  array<unsigned char>^ tabs = {9, 9, 9, 9, 9, 9, 9, 9};
  array<array<unsigned char>^>^ names =
  {
    Text::Encoding::ASCII->GetBytes("Jack"),
    Text::Encoding::ASCII->GetBytes("Justin Michel"),
    Text::Encoding::ASCII->GetBytes("Fred Flintstone"),
    Text::Encoding::ASCII->GetBytes("Barney Rubble"),
    Text::Encoding::ASCII->GetBytes("Homer J. Simpson"),
    Text::Encoding::ASCII->GetBytes("John Jacob Jingleheimer Schmidt")
  };
  long numLines(0);
  do
  {
    ++numLines;
    fs->Write(tabs, 0, numLines % tabs->Length);
    fs->Write(msg1, 0, msg1->Length);
    array<unsigned char>^ name = names[numLines % names->Length];
    fs->Write(name, 0, name->Length);
    fs->Write(msg2, 0, msg2->Length);
  }
  while(fs->Position < totalBytes);
  fs->SetLength(totalBytes);
  pt.PrintElapsed("Done. ");
  double tp = CalcThroughput(FILESIZE_MB, pt.ElapsedWall);
  Console::WriteLine("Wrote {0} lines at {1} MB/s", numLines, tp);
  return 0;
}

Java NIO

I first implemented the Java version using NIO rather than the much simpler FileOutputStream, because the latter didn’t appear to support the necessary features to create defragmented files, and I thought that NIO should give the best potential performance. Later, I rewrote the program using FileOutputStream, because the complexity of using NIO didn’t seem to have any perceivable benefit.

The first thing I noticed is that this implementation of the test program was far more difficult than any of the others. This is despite the fact that I have the most recent experience with Java, and had even implemented a project using NIO within the last six months. The biggest problems involved the complication of dealing with ByteBuffer and String encoding. Compare the code needed to send a String encoded as ASCII in VB.NET.

Dim enc As Text.Encoding = Text.Encoding.ASCII
Dim buf() As Byte = enc.GetBytes("This is a test.")
fs.Write(buf, 0, buf.Length)

To the equivalent Java NIO.

Charset cs = Charset.forName("US-ASCII");
CharsetEncoder enc = cs.newEncoder();
enc.onMalformedInput(CodingErrorAction.REPLACE);
enc.onUnmappableCharacter(CodingErrorAction.REPLACE);
String str = “This is a test.“;
ByteBuffer buf = ByteBuffer.allocateDirect(str);
enc.reset();
CharBuffer cb = CharBuffer.wrap(str);
CoderResult cr = enc.encode(cb, buf, true);
if (cr == CoderResult.OVERFLOW) {
throw new Exception(“WTF”);
} // CoderResult.UNDERFLOW ignored
cr = enc.flush(buf);
if (cr == CoderResult.OVERFLOW) {
throw new Exception(“WTF”);
} // CoderResult.UNDERFLOW ignored
channel.write(buf);

Another problem with the Java implementation is that I couldn’t find a good way to implement most of the features. There’s no capability to specify Write Through, Disable Caching, or Sequential options. Furthermore the FileChannel.truncate() method doesn’t support extending a file. I had to resort to using the same Defragment method that I used for ofstream.

No Options = 22MB/s

Defrag Only = 24MB/s

I didn’t know a way to measure CPU usage in Java, but it seemed to be about 100%.

The much simpler FileOutputStream implementation gave similar performance.

No Options = 26MB/s

Defrag Only = 20MB/s

Next, I'll try to wrap this all up with some conclusions and a summary of what we've learned.


Creating Text Files - fast_ofstream and Custom FileStream

fast_ofstream

As we saw with the analysis of the ofstream program we can achieve close to optimal throughput, but not with advanced features such as defragmenting that we require. One solution that doesn’t involve starting from scratch is to write a custom std::streambuf implementation that can be used with the standard C++ io streams. The application using fast_ofstream was able to write a defragmented file with a single loop using tellp() at 75MB/s. If we enable Write Through then the Defragment option becomes important.

reserve

By using our own streambuf implementation, we can implement a better Defragment feature. We use an interface similar to std::vector in which we call a reserve() method passing the desired size in bytes. This is much better, because although we allocate space for the reserved size, we don’t actually have to use the space or know for sure how big the file will be. Any unused space will automatically be reclaimed when the stream is closed.

tellp()

Since we’re implementing our own buffer, we can make tellp() more efficient, thereby allowing us to use the simpler code from the original example while still retaining most of the performance benefits.

final results (100MB)

Write Through = 30 MB/s @ 42% CPU

Write Through + Defragment = 40 MB/s @ 50% CPU

DisableCaching = 40 MB/s @ 65% CPU

DisableCaching + Defragment = 43 MB/s @ 65% CPU

No Options = 75MB/s @ 100% CPU

Defragment Only = 75MB/s @ 100% CPU

The speed is close to what we observed with the best case using the default ofstream, but we’re able to use more advanced features, the most important being Defragment. Defragment is crucial to avoid overall file system fragmentation and to ensure optimal read speed. The current fast_ofstream implementation is not very robust, and it’s possible that even faster results could be achieved.

The biggest remaining problems are:

1. We’re still using 100% CPU at least when running at full speed.
2. We’re still not achieving the best performance.

Custom FileStream

It should be possible to achieve even better performance by avoiding the standard C++ iostream interfaces completely. I created a simple class as a test case. 
This simple class supports all the options identified previously, and has just enough operations to implement the test program.

enum FS_OPTIONS { 
  fsNone = 0, 
  fsDisableCaching = 1, 
  fsWriteThrough = 2, 
  fsSequential = 4 
}; 
class FileStream { 
  FileStreamImpl* impl_; 
public: 
  FileStream(const std::string& fname, FS_OPTIONS opts); 
  ~FileStream(); 
  void write(const std::string& s); 
  void write(const std::string& s, 
  std::string::size_type offset, 
  std::string::size_type length); 
  unsigned long long size() const; 
  void reserve(unsigned long long bytes); 
private: 
  FileStream(const FileStream&); 
  FileStream& operator=(const FileStream&); 
};
final results (100MB)

No Options = 255MB/s @ 100% CPU

Defrag Only = 267MB/s @ 96% CPU

Write Through = 55MB/s @ 12% CPU

Write Through + Defrag = 68 MB/s @ 19% CPU

Disable Cache = 53MB/s @ 50% CPU

Disable Cache + Defrag = 77MB/s @ 36% CPU

Disable Cache + Write Through + Defrag = 71MB/s @ 37% CPU

For many applications this could be worth the effort of implementing a custom FileStream.

Next I'll discuss the use of other languages and platforms.


Creating Text Files - Optimizing std::ofstream

Where to start?

Just to make sure I was on the right track, I decided to write a quick proof-of-concept Windows program to verify that I could come close to the desired performance. This program works by creating a 64K buffer filled with ‘X’ characters, and writes it repeatedly until it gets a file of the desired size. The program supports the following options.

  1. Sequential Scan
    Windows supports opening a file with a special flag to hint that the file will be accessed sequentially.
  2. Write Through
    This flag is supposed to ensure that any files are written directly to disk instead of being held in a cache and flushed later.
  3. Disable Caching
    This flag disables the Windows system cache for the file. In theory you could enable write-through, but still have the data in the system cache for reading. If the data is not expected to be read again, then this flag prevents filling the cache with unwanted data.
  4. Defragment
    Windows and Unix both support increasing the size of a file in one step. This allows the file to be written in as few fragments as possible.

With this test program I was able to achieve 77MB/s with caching disabled and 145MB/s with caching enabled for 1000MB test files. Smaller files were even faster, sometimes approaching 1200MB/s effective speeds with caching enabled.

For this test, the Sequential Scan option didn’t seem to make much difference. This makes sense, because I’m already accessing the file in fairly large chunks. It’s possible that this flag is of primary benefit when reading smaller amounts of data, or that the hint is ignored by my drivers and/or hardware.

The Write Through and Disable Caching options seem to have about the same effect. Without these options the program is obviously faster than the disk can physically handle.

The Defragment option was a little less conclusive. When I had all the above options turned on, I was only able to write at 65MB/s, but turning off Defragment only slowed it down to 63MB/s. However, without Write Through the Defragment option was good for 8MB/s difference. Despite this, it was very apparent when the Defragment option was enabled from a disk noise perspective, and as I mentioned previously, file fragmentation is of paramount importance when reading the files.

Optimizing std::ofstream

Now that I had some idea of what was possible given the underlying OS, I decided to attempt to speed up the original ofstream program. Some of the changes that made the most difference were surprising, and should be helpful when optimizing any program that uses ofstream.

defragment

ofstream does not provide an option to directly create a large file, but by using the following statements I was able to create a defragmented file.

out.rdbuf()->pubseekpos(num_bytes);
out.write("\n", 1);
out.rdbuf()->pubseekpos(0);

Unfortunately, this also slowed the program to 5.5MB/s from the original 11MB/s. For some applications this might be a worthwhile tradeoff, but hopefully this optimization will work better when combined with others.

buffer size

One problem with the existing program is that it takes 1200+ writes per MB to build the file. We should be able to improve performance by telling the ofstream to use a larger buffer.

char buffer[BUFSIZE];
out.rdbuf()->pubsetbuf(buffer, BUFSIZE);

When I ran this program with BUFSIZE=64K I was surprised to find that it slowed from 11MB/s to 1MB/s. What I discovered is that smaller BUFSIZE values actually increased performance. By using 1K I was able to achieve 15MB/s. This is a good example of why you should always test for performance before making design decisions based on faulty assumptions. This is not premature optimization. Addressing these issues early in a project before it’s too late is just good engineering.

Interestingly, BUFSIZE=64K resulted in 1037 writes/MB while BUFSIZE=1K resulted in 2000 writes/MB. Also, turning on the defragment option had no additional negative performance impact when using BUFSIZE=64K.

use ios::binary

Opening the stream in binary mode results in a speed increase to 23.5MB/s. This prevents automatic translation of ‘\n’ to “\r\n”, but has no other effect as far as I can tell. Furthermore, the Defragment option only slows this to 19MB/s, and the Buffer Size option now works as originally expected, decreasing the number of disk writes to 16 per MB (100MB/64K=16) resulting in 26MB/s. Combining Defragment with Buffer Size was still 19MB/s however.

writing single chars

Changing the loop that writes out tabs to write ‘\t’ instead of “\t” results in .7MB/s speedup.

better indentation

Although writing ‘\t’ instead of “\t” was an improvement, we can do better by creating a string of tab characters, and then writing a variable number of them.

const string tabs(MAX_TABS, ‘\t’);
…
int indent = num_lines % MAX_TABS;
if (indent > 0) {
  out.write(tabs.c_str(), indent - 1);
}

This results in 32MB/s with all options except Defragment enabled.

eliminate tellp()

When I first ran the (so I thought) fully optimized version of the program on my Mac Mini under OSX I was surprised to find that I could only get .8MB/s. A little tinkering with the code revealed that the call to tellp() was taking most of the time. I eliminated tellp() and the Mac Mini jumped to 40MB/s.

const string tabs(MAX_TABS, '\t');
const string msg1 = "All work and no play make ";
const string msg2 = " a dull boy.\n";
// Calculate num_lines in separate loop first
for (int bytes = 0; bytes < num_bytes;) {
  ++num_lines;
  bytes += num_lines % MAX_TABS;
  const string& name = names[num_lines % names.size()];
  bytes += name.length();
  bytes += msg1.length();
  bytes += msg2.length();
}
for (unsigned line = 1; line <= num_lines; ++line) {
  int indent = line % MAX_TABS;
  if (indent > 0) {
    out.write(tabs.c_str(), indent - 1);
  }
  const string& name = names[line % names.size()];
  out << msg1 << name << msg2;
}

The first loop duplicates the logic in the original loop without actually writing anything. This allows the second loop to use a simple for() loop, because we now know exactly how many lines to write. This is the first change that really complicated our code, but it also has a big performance benefit. The program now writes at 60MB/s.

writing strings

One surprising thing I found was that writing std::string can be significantly slower than writing const char*. Simply changing the above code to write:

out << msg1.c_str() << name.c_str() << msg2.c_str();

resulted in a speedup to 75MB/s. Making this change before the tellp() change did not have any significant effect.

ostream::write

Looking at the source code for operator<<(ostream& out, const char* str) showed that it essentially calls “out.write(str, strlen(str))”. We already know the length of each string, so we should be able to write faster using:

out.write(msg1.c_str(), msg1.length());
out.write(name.c_str(), name.length());
out.write(msg2.c_str(), msg2.length());

This resulted in a speedup to 80MB/s

defragment revisited

With all other options in place writing the file defragmented now writes at 40MB/s.

final results (100MB)

No Options = 80MB/s @ 100% cpu

Defrag Only = 40MB/s @ 50% cpu

The biggest remaining problems are:

  1. We’re still using 100% CPU at least when running at full speed.
  2. We’re not able to write defragmented files without halving our performance.
  3. We have no mechanism for specifying advanced options.
  4. Considering that caching is enabled, we’re still not very fast.


Performance Quiz - Creating Text Files

I have an interesting programming challenge. Write a program to create a sample text file of a specified size in megabytes. The program should write out as many lines as possible of the format “[tabs]All work and no play makes [name] a dull boy.”, which you may recognize as the text that Jack Torrance types repeatedly in 1980’s The Shining It seems appropriate. The program should provide a little variation to the lines by starting each line with a varying number of tabs, and alternating the name in the sentence.

Here’s a C++ program that does the job:  

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
const int FILE_BYTES = 100 * 1024 * 1024;
const int MAX_INDENT = 8;
const int NUM_NAMES = 6;
int main(int argc, char* argv[]) {
  const string names[NUM_NAMES] = {
    "Jack", 
    "Justin Michel",
    "Fred Flintstone",
    "Barney Rubble",
    "Homer J. Simpson",
    "John Jacob Jingleheimer Schmidt"
  };
  ofstream out("sample.txt", ios_base::out);
  for (unsigned num_lines = 1; out.tellp() <= FILE_BYTES; ++num_lines)
  {
    for (unsigned int i = 0; i < num_lines % MAX_INDENT; ++i) {
      out << "\t";
    }
    const string& name = names[num_lines % NUM_NAMES];
    out << "All work and no play makes " << name << " a dull boy.\n";
  }
  return 0;
}

When I run this program, I find that it’s able to generate a 100MB file at almost 11 MB/s, however I noticed several issues that indicate this could be improved.

  1. It’s completely CPU-bound.
  2. There is a lot of noisy disk-activity.
  3. The transfer rate does not come close to posted test results.
  4. No control over I/O options.

It was surprising to me that the above program was CPU-bound. After all, there’s not much apparent processing going on. Few calculations are done within the loop, and it writes the same short strings over and over. If this simple program uses 100% of the CPU then any less trivial program is only going to use more. Ideally we’d like to have close to 0% CPU usage so that the program could make use of the CPU for other functionality. For example, an application with a logging facility using code similar to that above would spend all the limited computer resources on logging with nothing left over for running the actual application.

The noisy disk activity provides a clue to the biggest single problem. C++ ofstream will automatically grow the file as needed, but if you think about it this can only lead to excessive file fragmentation. The OS has no idea how big the file is going to get, so it will allocate small chunks as needed. What we’d like is some way to tell the OS that we’re preparing to write 1GB so that it can allocate one nice consecutive chunk. This should allow optimal write speed assuming we write everything sequentially, but more importantly it will help prevent disk fragmentation and allow the file to be read in much faster. (Previous testing has shown that a fragmented file can take 10-15 times longer to read.)

Another reason for the excessive disk activity is that ofstream seems to use a small buffer. I know this because my real test program has instrumentation that measures Disk, CPU, and Memory usage, and it shows that a 100MB file is written using 125,820 disk writes.

I found the following review of my hard disk, which indicates that I should be able to achieve about 75MB/s presumably with caching disabled.

http://techreport.com/reviews/2006q2/raptor-wd1500/index.x?pg=12

I also checked the specs on the manufacturers website, which claim that 84 MB/s should be possible.

Western Digital WD1500ADFD

The other problem with using ofstream is that you don’t have any mechanism for specifying advanced features. For example, in Windows you can create a file with flags that specify Compression, Encryption, Temporary (Tells the OS to avoid writing to a physical file if possible), DeleteOnClose, DisableSystemCache, Sequential/Random access (A hint to the cache), DisableDriveCache, and many other options.

In my next post, I'll attempt to optimize the program while still using std::ofstream.