Justin's profileVarious Technical TopicsPhotosBlog Tools Help

Blog


    May 13

    Time for an SSD

    The time is right to buy an SSD, because they are finally at the price/performance level where it’s just ridiculous to use anything else. Naysayers will claim that it’s still too early to switch, because you can buy a terabyte hard disk for $50 while a decent 60GB SSD costs $200. However, I think this price difference is irrelevant even in the current economy, because the performance difference is enormous and most of us don’t want or need a drive that big. Those that do need more storage (for a PVR?) can use an extra $50 drive in addition to a primary fast drive.

    SSD technology is improving so rapidly that there are a lot of sub-par products still available. For example, you could buy an OCZ Apex 60GB drive for $145, and it will likely have similar performance to the more expensive new OCZ Vertex drives in some benchmarks. However, it may also suffer from some stuttering problems common to this generation of drives. For more info see the recent SSD articles at http://www.anandtech.com/storage/.

    The best advice is to buy only what you need as the prices are dropping quickly. Common wisdom is that the price is cut in half each year, however I paid $31/GB for a small MemoRight SLC for my laptop 1.5 years ago, and the latest drives I bought are much higher performance at 1/10th the price. No one predicted that the prices would drop this precipitously, and I see they’re still selling my drive for $21/GB even today. A year from now they’ll probably have terabyte DRAM speed drives for $5 at Walgreens. :-)

    The safest bet is probably the Intel X-25M which is available for $630 @ 160GB or about half that price for 80GB. Personally I took a little more risk and bought 4 60GB OCZ Vertex drives for $200 each. I have 2 each set up as RAID0 in my two home computers, and I believe I’m getting much better performance than the Intel drive in most cases, although I’ve had a few headaches. I’ve had to flash firmware updates to the drives to fix bugs and get new features, which erases the drives. I switched to Windows 7 RC while I was at it which has some new features to work better with an SSD.

    So what are the benefits?

    · SSD is completely silent

    · I can read files at ~400MB/s and write at ~300MB/s

    · Seek time is ~.1ms

    · Applications load instantly and never stutter or freeze.

    · Lower power consumption and heat generation

    My 2.66Ghz Core 2 Duo NVidia 680Sli machine takes about 40 seconds to start Windows 7 64bit from the point of pushing the power button with half of that time(20.5s) consisting of BIOS stuff. Here’s a picture of a common disk benchmark. Notice that it writes much faster than it reads in many cases. That seems to be an anomaly with this particular motherboard raid controller, and was the same when I was running Vista 32bit. I haven’t noticed it in practice.

    My 3.16Ghz Core 2 Duo Intel 975 machine takes 49 seconds to start Windows Vista 32 bit with 16s of that consisting of BIOS stuff (And I hit Esc to skip the memory test.) The same benchmark test on this machine shows much better (and more normal) results.

    This SSD thing is also a big deal for software developers. It’s time once again to adjust your perceptions and learn the new physical reality brought on by this change. Stop designing for outdated equipment. Planning your software architecture around the performance characteristics of a mechanical hard disk now makes no more sense than planning to run your software on a Cassette Tape Drive. The rules have changed.  Databases should be re-architected. Persistence should be revisited. Why load your files into in-memory RAM objects when you may get better overall performance leaving them on the disk? Change the way you do things, and your software is likely to be better than legacy solutions. Some companies and people have a hard time coping changing rules, so now’s a great time to challenge the status quo.

    Let me close with an analogy.

    If a truck were available that can teleport 30 miles at a time, and gets 100,000 mpg on regular gasoline, would you still want to buy a traditional truck? What if you could get a giant dump truck for $500 and the teleporting 100,000 mpg one cost $63,000? What if a sports car version were available for $20,000 that could go 500mph when you didn’t feel like teleporting and still get 100,000mpg? Does it really matter if 5 years from now you could get a 200,000mpg version that could teleport 50 miles at a time for $10,000? City planners, architects, shipping companies, and other affected parties would have to get their act together to take advantage of “physics 2.0” or new governments, communities, and businesses would replace them.

    If you are even slightly inconvenienced or annoyed by the speed, noise, stuttering, power consumption, or other problems associated with a legacy mechanical hard disk then a solution is available for as little as $200.

    Update 5/20/2009 : Newegg has the G.Skill Falcon 128GB for $309. I think this is identical to the OCZ Vertex 120GB, but ~$70 less.

    December 01

    My Blog Personality

    Try out Typealyzer to find out what your blog says about your personality. Here's mine…

    INTJ - The Scientists

    The long-range thinking and individualistic type. They are especially good at looking at almost anything and figuring out a way of improving it - often with a highly creative and imaginative touch. They are intellectually curious and daring, but might be pshysically hesitant to try new things.

    The Scientists enjoy theoretical work that allows them to use their strong minds and bold creativity. Since they tend to be so abstract and theoretical in their communication they often have a problem communcating their visions to other people and need to learn patience and use conrete examples. Since they are extremly good at concentrating they often have no trouble working alone.

    August 02

    Quantifying Experience

    It's hard to find good developers. And it's worse than useless to filter candidates by "experience" as included on a resume. Statements like "10 years Java experience" or "15 years SQL experience" are meaningless. The problem is that you can't sum up real experience in a single concise number, and if you could it wouldn't be measured in years.

    We could try to come up with a formula, but any attempt to do so makes it clear that "Software Developer" encompasses a very wide variety of actual skills. What I'd really like to see on a resume, or in addition to a resume, is a real list summarizing all the things you've actually done, and the lessons you learned while doing them. This includes all of the following…

    1. List each book you've read that seems pertinent to software development, and summarize what you learned. Books are your number one resource for accumulating some subset of the experience of others. If you're not the type to read a book, then I hope you're able to make up for it in other ways. It's not likely though. You might hope to get the same benefit from blogs, but the best of them are just going to restate something that's been said better and more thoroughly in a book.
    2. List each project you have worked on, no matter how small, and what you learned from each one. There's a lot to be learned from projects big and small. I've personally built full products by myself from scratch, and worked on a variety of teams ranging in size from 2 to 20 developers. I've even spent some time working on one of hundreds of small teams working together on a truly massive project with something like 10,000 developers spread throughout the country. I've learned different lessons from each of these experiences, and I feel it adds up to more than "14 years experience". Many people in our industry get sucked into giant corporations, where they work on some tiny piece of a large maintenance-mode project for years at a time. They tend to slow down over time, cranking out a new SQL report once every month or two, and gradually falling further behind. I've nearly been trapped in similar situations myself, so I know you have to be ever-diligent to avoid it.
    3. List the various roles you've played in the software development process, and what you learned from each experience. Have you ever lead a team of people? Have you taken a back seat to another tech lead? Have you ever designed a real software product by yourself? Have you written documentation, or even a book? Have you taught courses, mentored junior developers, pair-programmed, designed web pages, designed a non-web GUI, played the part of a non-technical manager, or any of dozens of other roles that developers may experience in their career? What did you learn? Did you have good mentors in your early years, or did you have to learn everything the hard way?
    4. List all the type of processes and tools you've experienced in real professional development, not counting hobbies you've tinkered with at home (Unless you built something that people actually use.) I've built both GUI and server-side applications, both commercial software products and internal applications, used 8 different languages on non-trivial projects, become proficient in 6 different IDE/Editors, etc. I've found each of these experiences valuable.
    5. List your major successes and failures. There are always some high points and low points that stand out from the rest. When you look back on projects that were less successful than you'd hoped, then take the time to think hard about what really went wrong, and what you personally could have done better. In your internal dialogue, accept full responsibility for any problems, and don't lay the blame on incompetent management, lazy coworkers, indecisive customers, or other distractions. Try instead to imagine what you could have done to make the project a perfect success. Maybe you should have tried harder to understand the customer requirements instead of trusting a Systems Analyst. Maybe you should have found a way to work around management incompetence? Maybe you should have devoted more time to mentoring junior developers instead of forcing them through the trial-by-fire that you suffered through? Or maybe you shouldn't have coddled those junior developers, and let them learn a few things the hard way? Then again maybe you yourself are falling short in some ways. Do you understand the business point of view? Do you over-engineer, under-engineer, fail to test, test too much, write sloppy code, spend too little time designing before coding, or fail in any of thousands of other ways?

    The above list probably does give us a fairly good measure of experience. If you can work through each of the above items completely in a fairly terse writing style and finish in less than a month, then you're probably not very experienced. Looking at this list, I can't even estimate how long it would take me to thoroughly explore the 5 items.

    I'm not sure, but it might be worth working through the exercise for yourself. Just don't expect anyone else to read it.

    My next post, …

    Experience Isn't Everything

    May 26

    Comparing Strings Using Natural Ordering

    This topic has been brought up numerous times before, but I wanted to take a shot at another solution, because I think there is still room for improvement, and I wanted a simple example to experiment with some new VB9 functionality.

    Here are a bunch of links to the most recent information on the problem and a bunch of potential solutions. By spelunking through these, you ought to be able to find a solution in any language you like, and every possible combination of algorithms.

    http://www.codinghorror.com/blog/archives/001018.html

    http://www.davekoelle.com/alphanum.html

    http://nedbatchelder.com/blog/200712/human_sorting.html

    http://www.interact-sw.co.uk/iangblog/2007/12/13/natural-sorting

    http://www.codeproject.com/KB/string/NaturalComparer.aspx

    Probably the simplest example of sorting above is the following Python program which originally came from a comment posted to Ned Batchelder's blog above…

       1: def natural_sort(lst): 
       2:   to_int = lambda text: int(text) if text.isdigit() else text 
       3:   alphanum_key = lambda key: [ to_int(c) for c in re.split('([0-9]+)', key) ] 
       4:   lst.sort( key=alphanum_key )

    Ian Griffiths was even able to more or less duplicate this solution in C# 3 by writing a couple of general purpose utility functions. After a little cleanup, it looks like this…

       1: static IOrderedEnumerable<string> NaturalSort(List<string> lst) { 
       2:     Func<string, object> ToInt = s => { 
       3:         try { 
       4:             return int.Parse(s); 
       5:         } catch { 
       6:         } 
       7:         return s; 
       8:     }; 
       9:     return lst.OrderBy(s => Regex.Split(s.Replace(" ", ""), "([0-9]+)")
      10:         .Select(ToInt), new EnumerableComparer<object>()); 
      11: } 

    One problem with Ian's solution is that it doesn't do an in-place sort, which makes it not quite a fair comparison with the original. Also, OrderBy uses deferred execution, so you must be careful when comparing the performance that you force the sort to happen (e.g. Call List.GetEnumerator().GetNext())

    Of course, if it's fair to write a missing EnumerableComparer class, then why not just implement a NaturalComparer, which would make the resulting code even simpler…

    lst.Sort(NaturalComparer.CurrentCultureIgnoreCase);

    or even…

    lst.OrderBy(s => s, NaturalComparer.CurrentCultureIgnoreCase);

    This seems to give you the best solution with the added flexibility of working with OrderBy and Sort. (Interestingly, OrderBy often seems to be slightly faster than Sort, which seems strange considering that Sort doesn't have to allocate an entirely new structure to contain the results.)

    Before I show you my implementation for NaturalComparer, here are some problems found in most of the previous solutions I've seen.

    • They often rely on converting sequences of numeric characters to integers or some other numeric type. This is completely unnecessary, and if you think about what must be going on behind the scenes, fairly costly. Maybe I'll post an article on how to convert strings to integers so that you can get an appreciation for how much work is going on behind that seemingly innocent call to int.Parse.
    • Even if converting to Integers were a good idea, it wouldn't be advisable to throw and catch exceptions repeatedly as part of the algorithm. This can be fixed by using int.TryParse in the code above, but there is no point since we don't want to go down that road at all because…
    • Converting to Integers doesn't work for larger sequences of Strings. For example, try calling ToInt(new String('1', 500) and see what happens.
    • Many solutions use a split() function to divide each string into a list of strings that are either numbers or alphabetic strings. This is completely unnecessary, and wasteful. For example, if I want to compare "1a2a3a…Na" with "2a3a…Ma", then this (silly, stupid, wasteful, inefficient, …) algorithm would allocate two lists of size N and M, whereas a reasonable algorithm would realize the string1 is less than string2 after comparing the first character.
    • Many solutions also tend to use regular expressions to do the split(). Although, this may be convenient, it's not a very efficient approach. I've seen some solutions that use a simple lookup table for the '0'-'9' to help somewhat, but the split() approach seems like a dead end anyway.
    • By the way, if you were to use a Regex here, then many platforms (including .NET) let you compile it to get better performance. When I compare performance later, I'll use a modified version of Ian's code that uses a compiled Regex and int.TryParse.
    • Most of the solutions I've seen ignore all culture issues with string comparison. Considering that the most likely usages are for sorting lists of user visible strings this seems pretty shortsighted.
    • Many solutions are also case sensitive, which is also pretty fundamentally wrong. Any correct NaturalSorting solution should at least have the option to sort case-insensitively.
    • Another requirement for NaturalSorting is to support ignoring non-alphanumeric characters. Typically you want "a " = "a", but "a b" != "ab". In general terms any character that is not a number or letter should only act as a separator. Many solutions, including the python above, ignore this requirement.
    • A general class of problems with almost every solution is the side effect of creating lots of stuff during comparison. Some solutions dynamically create temporary strings, lists of strings, numbers, or even more complex objects. Sometimes this creation is somewhat hidden in a call to string.substring on a platform with immutable strings. Some authors try to use something like StringBuilder to help, but this does not address the real issue. NaturalCompare can probably be implemented without any need for dynamic allocation at all.
    • Another class of problems is algorithms that tend to do the same thing twice. One example is using a regex to divide an input string into a list of strings and numbers, and then having to use another pass to check whether each on is a number again. It would be better to record the fact of whether it's a string or number on the first pass so you don't have to do the same work again. However, I confess that my own algorithm often has to make two passes through each section of a substring, once to search for a delimiter, and once to do the comparison.

    The following is the Compare function from my solution to the problem. You can download the full source from .

    Public Function Compare(ByVal x As String, ByVal y As String) _
        As Integer Implements IComparer(Of String).Compare
    
        If x Is Nothing AndAlso y Is Nothing Then
            Return 0
        End If
        If x Is Nothing Then
            Return -1
        End If
        If y Is Nothing Then
            Return 1
        End If
        Dim xpos, ypos As Integer
        Do While xpos < x.Length AndAlso ypos < y.Length
            xpos = FindFirstIndexOf(x, xpos, myIsLetOrDig)
            ypos = FindFirstIndexOf(y, ypos, myIsLetOrDig)
            If xpos = -1 AndAlso ypos = -1 Then
                Return 0
            ElseIf xpos = -1 Then
                Return -1
            ElseIf ypos = -1 Then
                Return 1
            ElseIf Char.IsNumber(x(xpos)) AndAlso Char.IsNumber(y(ypos)) Then
                Dim xtmp = FindNextIndexOf(x, xpos, myIsNonZero)
                Dim ytmp = FindNextIndexOf(y, ypos, myIsNonZero)
                Dim xend = FindNextIndexOf(x, xpos, myIsNotNum)
                Dim yend = FindNextIndexOf(y, ypos, myIsNotNum)
    
                xpos = If(xtmp = xend, xtmp - 1, xtmp)
                ypos = If(ytmp = yend, ytmp - 1, ytmp)
    
                If xend - xpos < yend - ypos Then
                    Return -1
                ElseIf xend - xpos > yend - ypos Then
                    Return 1
                Else
                    Dim iy = ypos
                    For ix = xpos To xend - 1
                        If x(ix) < y(iy) Then
                            Return -1
                        ElseIf x(ix) > y(iy) Then
                            Return 1
                        End If
                        iy += 1
                    Next
                End If
            ElseIf Char.IsNumber(x(xpos)) Then
                Return -1
            ElseIf Char.IsNumber(y(ypos)) Then
                Return 1
            Else
                Dim xend = FindNextIndexOf(x, xpos, myIsNotLet)
                Dim yend = FindNextIndexOf(y, ypos, myIsNotLet)
                Dim l = xend - xpos
                Dim r = myCmpInfo.Compare(x, xpos, l, y, ypos, l, myCmpOpt)
                If r <> 0 Then
                    Return r
                End If
                xpos = xend - 1    ' -1, because we're about to +1
                ypos = yend - 1
            End If
            xpos += 1
            ypos += 1
        Loop
        If xpos >= x.Length AndAlso ypos >= y.Length Then
            Return 0
        ElseIf xpos >= x.Length Then
            Return -1
        ElseIf ypos >= y.Length Then
            Return 1
        End If
        Return 0
    End Function

    The basic idea is to iterate forward through the two input strings, using the FindXXX functions to find the separators between numbers, letters, and ignored characters. We return from Compare as soon as possible without having to look at any characters following the first difference. No extra allocations are performed. For example, when comparing strings I used the form of CompareInfo.Compare that takes offsets and lengths for the two strings to avoid having to allocate substrings. Originally the code used inline lambda expressions for the arguments to FindXXX, however testing showed a significant performance increase from predefining those functions outside any individual Compare, as they are always the same. For example, here's the definition for myIsNonZero, and the others are much the same.

          Private Shared ReadOnly myIsNonZero As CharPred = Function(c) c <> "0"c  

    Finally, here's what FindNextIndexOf looks like…

       1: Public Delegate Function CharPred(ByVal c As Char) As Boolean
       2:  
       3: Public Function FindNextIndexOf(ByVal s As String, _
       4:     ByVal start As Integer, ByVal p As CharPred) As Integer
       5:     
       6:     If s Is Nothing OrElse s.Length = 0 Then
       7:         Return s.Length
       8:     End If
       9:     For i = start To s.Length - 1
      10:         Dim c = s(i)
      11:         If p(c) Then
      12:             Return i
      13:         End If
      14:     Next
      15:     Return s.Length
      16: End Function

    Some Numbers

    I manually did a few test comparisons between my solution and an optimized form of Ian's (compiled regex and int.TryParse) I compared using two lists of strings shuffled randomly. The first list contained the numbers 1-5000, and the second contained alternating strings, numbers, and special characters of the form "string n string n string n" where n was 1-5000.

    Optimized IanG, string type one (108-115ms)

    Optimized IanG, string type two (291-316ms)

    Mine, string type one (16-20ms)

    Mine, string type two (158-171ms)

    Once again, these numbers aren't really fair to compare, because my solution handles things like case insensitivity, ignoring non alphanums, culture idiosyncracies, etc. But it's nice to know that the extra features are more than paid for with the more efficient algorithm.

    There's still room for improvement, so I may make updates from time to time if I need to use this code on a real project.

    • The code doesn't handle floating point numbers or number separators (e.g. ','). If this is added, then it should probably be optional, as it's not clear you would always want it.
    • Although I have some unit tests, I don't have anything to verify the Culture constraints. For example, I'm trusting Char.IsLetter to work correctly.
    • I also assume that numeric characters can be compared by ordinal position. (e.g. '0' < '9')

    Feel free to use the code any way you see fit, and please let me know if you find any problems or have any other suggestions.

    February 05

    128bit Encoding Explained

    First, here's the promised VB version of the code. It's almost identical to the Java version, except that...
    1. Unsigned integers remove the need for >>>= (The java unsigned shift right operator)
    2. Pass by reference eliminates the need for the Int128 wrapper class, and simplifies the logic in the strToInt function slightly.
    3. Lack of decrement operator means two statements are required to access the character buffer and decrement the index.
    4. Support for Modules makes it clear that I'm providing utility functions. (As compared to using static methods in a final class.)
    5. Type inferencing eliminates the need to explicitly declare the type for local variables. (Not sure that I like this feature yet, because now it's sometimes hard to tell the type at a glance.
    6. The Select Case statement used in CharToMod64() is much easier to read than the corresponding Java if statements.
    Module Utils
    
        Const bUnder As Byte = 95
        Const bDollar As Byte = 36
        Const bQuest As Byte = 63
        Const bFirstUAlpha As Byte = 65
        Const bFirstLAlpha As Byte = 97
        Const bFirstNum As Byte = 48
        Const mask6Bits As UInt64 = 63
        Const mask4bits As UInt64 = 15
        Const mask2Bits As UInt64 = 3
        Const bitsPerChar As Integer = 6
    
        Dim CharMap() As Char = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$".ToCharArray
    
        Function UInt128ToStr(ByVal msb As UInt64, ByVal lsb As UInt64) As String
            Const MAX_BYTE As Integer = 31
            Dim buf(0 To MAX_BYTE) As Char
            Dim i = MAX_BYTE
    
            ' 64 bit number has ten 6bit encoded values plus four bits left over
            For n = 1 To 10
                If lsb = 0 AndAlso msb = 0 Then
                    Exit For ' Eliminate leading zeros
                End If
                Dim b = CByte(lsb And mask6Bits)
                buf(i) = CharMap(b)
                i -= 1
                lsb >>= bitsPerChar
            Next
    
            ' 4 bits from the lsb, and 2 from the msb
            If lsb > 0 OrElse msb > 0 OrElse i = MAX_BYTE Then
                Dim leftOver = lsb And mask4bits
                Dim firstTwo = msb And mask2Bits
                Dim b = CByte((firstTwo << 4) Or leftOver)
                buf(i) = CharMap(b)
                i -= 1
                msb >>= 2
            End If
    
            Do While msb <> 0
                Dim b = CByte(msb And mask6Bits)
                buf(i) = CharMap(b)
                i -= 1
                msb >>= bitsPerChar
            Loop
    
            ' It's easiest to simply prefix an underscore to avoid
            ' illegal identifiers and clashes with keywords and literals.
            buf(i) = "_"c
            i -= 1
    
            Return New String(buf, i + 1, MAX_BYTE - i)
        End Function
    
        Public Sub StrToUInt128(ByVal s As String, ByRef msb As UInt64, ByRef lsb As UInt64)
            Dim buf() = ASCIIEncoding.ASCII.GetBytes(s)
            Dim maxByte = buf.Length - 1
            Dim i = maxByte
            Dim minByte = 0
            If buf(0) = bUnder Then
                minByte = 1
            End If
            msb = 0
            lsb = 0
    
            For n = 0 To 9
                If i < minByte Then
                    Return
                End If
                Dim b = CharToMod64(buf(i))
                If b <> 0 Then
                    If n <> 0 Then
                        b <<= (n * bitsPerChar)
                    End If
                    lsb = lsb Or b
                End If
                i -= 1
            Next
    
            If i >= minByte Then
                Dim b = CharToMod64(buf(i))
                If b <> 0 Then
                    Dim leftOver = b And mask4bits
                    msb = (b >> 4) And mask2Bits
                    leftOver <<= 60
                    lsb = lsb Or leftOver
                End If
                i -= 1
            End If
    
            For m = 0 To 10
                If i < minByte Then
                    Return
                End If
                Dim b = CharToMod64(buf(i))
                If b <> 0 Then
                    b <<= (m * bitsPerChar + 2)
                    msb = msb Or b
                End If
                i -= 1
            Next
        End Sub
    
        Private Function CharToMod64(ByVal c As Byte) As Byte
            Const b10 As Byte = 10
            Select Case c
                Case bFirstNum To bFirstNum + 10
                    Return c - bFirstNum
                Case bFirstUAlpha To bFirstUAlpha + 26
                    Return c - bFirstUAlpha + b10
                Case bFirstLAlpha To bFirstLAlpha + 26
                    Return c - bFirstLAlpha + bDollar
                Case bUnder
                    Return 62
                Case bDollar
                    Return 63
                Case Else
                    Debug.Assert(False, "Unexpected Character " & c)
                    Return 0
            End Select
        End Function
    
    End Module

    Turning Numbers Into Strings

    Because neither language has native support for 128bit integers, we have to resort to using two 64 bit integers. The algorithm I devised simply encodes every 6 bits of the input integers as a single character. I thought this worked out perfectly, because my first reading of Section 3.8 of the Java Language Spec lead me to believe that Java supports A-Z, a-z, 0-9, $, and _ as the only legal characters in identifiers. I learned later that it actually supports many more, so this algorithm could probably be extended to use N bits instead of only 6, which would make the resulting strings even smaller. In the end, I decided against this, because the current choice is less likely to have problems with character sets and encoding/decoding.

    With 6 bits per character and 128 bits total, you can see that (128/6= 21.3~) at most 22 characters are required to hold any number. To ensure legal Java identifiers we also prefix an underscore to each generated string.

    The first step in the algorithm is to allocate a temporary buffer to hold the decoded characters.  I used 32, because I'm a power-of-two kind of guy.

    Next we process the least significant 64 bit number, repeatedly masking out the lowest 6 bits, converting them to a Character using a lookup table, and then shifting right 6 bits to get the next piece. In Java, we must use the unsigned shift operator for this, because a signed shift will pull in 1's from the left, destroying the number. Notice that we only break out of the loop when we've either taken the first 60 bits of the 64 bit number, or both msb and lsb are zero. If only lsb is zero, then we just repeatedly divide 0/6 until 60 bits have been processed. This ensures that we get any trailing zeros. (e.g. Because 100 <> 100000.)

    The middle section of code is to handle the remaining 4 bits from the lsb, and the first 2 bits of the msb. This is accomplished using masks and shifts appropriately. The i = MAX_BYTE condition is there to ensure that the number zero is written out as "_0" instead of "_".

    Finally we process the remaining 62 bits from the msb. As soon as msb = 0 we're free to break out of the loop, because we don't want any leading zeros to pad the string.

    One nice property of this algorithm is that it encodes sequential numbers in an efficient and almost human readable format. The first 10 numbers are just encoded as "_0" through "_9", followed by "_A" through "_Z", "_a" through "_z", then "_" and "$". It then rolls over to the next digit with the next 64 encoded as "_10" through "_1$". Anyone used to hexidecimal to string encoding should find this intuitive.  Furthermore it should be obvious that for random 128bit numbers this algorithm gives an optimal encoding to 64 characters.

    Turning Strings Into Numbers

    Reversing the process is only slightly trickier than the initial encoding. First we convert the input string into an array of Bytes. One complication is that I didn't want to assume in the decoder that every input string would start with an underscore, so we check for this at the start and set minByte to either 1 or 0. This was originally there, because in the original Int128ToString() function I only prepended the underscore when necessary to make a legal Java identifier. However, I removed that code due to all the complications in checking for keyword and literal conflicts, and the flexibility in the decoding seemed nice so I left it.

    Just as with encoding, the decoder steps through the first 60 bits worth of characters. This time, we use a function to convert the input characters back to base64 numbers using CharToMod64. Each pass through the loop using a bitwise OR operation to combine the returned base64 number with the current value of the lsb.  The trick here is that I found it most straightforward to shift the decoded base64 number to the correct position (b <<= n * 6) and then combine with the lsb using a bitwise OR.

    The middle section once again converts the 6 bits from the decoded number into the remaining 4 bits for the lsb and the first 2 bits of the msb.

    The final section processes the remaining characters into the final 62 bits of the msb.

    Performance

    I found a few interesting things with the performance of this algorithm.

    First, I wrote a simple program to time how long it takes to encode the first million numbers, encode the last million,  encode and decode the first million, and finally encode and decode the last million. Here's the code in VB.

    Public Sub Main()
        Dim startTime = DateTime.Now
        For i As UInt64 = 0 To 999999
            Dim enc = UInt128ToStr(i, i)
        Next
        Dim stopTime = DateTime.Now
        Console.WriteLine("Encode took " & (stopTime - startTime).TotalMilliseconds)
    
        startTime = DateTime.Now
        For i As Int64 = -1L To -1000000 Step -1
            Dim enc = UInt128ToStr(CULng(i), CULng(i))
            Dim msb, lsb As UInt64
            StrToUInt128(enc, msb, lsb)
        Next
        stopTime = DateTime.Now
        Console.WriteLine("Encode big took " & (stopTime - startTime).TotalMilliseconds)
    
        startTime = DateTime.Now
        For i As UInt64 = 0 To 999999
            Dim enc = UInt128ToStr(i, i)
            Dim msb, lsb As UInt64
            StrToUInt128(enc, msb, lsb)
        Next
        stopTime = DateTime.Now
        Console.WriteLine("Roundtrip took " & (stopTime - startTime).TotalMilliseconds)
    
        startTime = DateTime.Now
        For i As Int64 = -1L To -1000000 Step -1
            Dim enc = UInt128ToStr(CULng(i), CULng(i))
            Dim msb, lsb As UInt64
            StrToUInt128(enc, msb, lsb)
        Next
        stopTime = DateTime.Now
        Console.WriteLine("Roundtrip big took " & (stopTime - startTime).TotalMilliseconds)
    End Sub

    Although both Java and .NET are probably more than fast enough, Java was 2-3 times faster at encoding, but a round trip encode/decode took about the same time for both. (Further testing of decode seemed to show that the .NET version really is faster at decoding.) It's not worth it to me at this point to figure out why, but if someone is interested I would recommend using ILDASM to inspect the generated IL assembly code for the .NET version. I don't think this is the kind of thing that's going to be helped by source analysis or profiling.

    Java

    Encode took 188
    Encode big took 249
    Roundtrip took 763
    Roundtrip big took 985

    VB

    Encode took 555.6285
    Encode big took 565.3935
    Roundtrip took 778.2705
    Roundtrip big took 917.91

    For comparison I also found source code online for a fast Base64 encoder (http://migbase64.sourceforge.net/).

    Base64

    Encode took 375
    Encode big took 374
    Roundtrip took 1352
    Roundtrip big took 1332

    This is not a slight on Mikael Grev's algorithm at all, because mine doesn't even pretend to generate compliant RFC2054 Base64 encoded strings.

    In the end, I think I came up with a pretty tight little algorithm to solve a particular problem. I hope someone finds it useful.

    January 28

    Encoding 128bit Numbers as Strings

    My current project needed a way to convert between 128 bit numbers (e.g. java.util.UUID) and strings. A special requirement is that the strings be legal Java identifiers, because we use them for variable names in generated code. For the same reason, I wanted to ensure that the strings were of minimal size.

    I was able to come up with a trivial algorithm in a few minutes that works OK, but I felt that I should be able to come up with the most optimal solution given a little more time. It turns out that the optimal solution consumed most of this weekend, but I finally got it working.

    There's no way I can bill my customer for this exercise, so I thought I'd post it here in case anyone else finds it interesting. I'm also curious to see if anyone can come up with a better or faster solution.

    Basically my approach is to treat the 128 bit number as up to 21 groups of 6 bits each. This gives me 64 possible characters which works out perfectly, because Java has exactly 64 legal characters for identifiers (0-9, A-Z, a-z, _, and $). I also prepend an additional underscore to avoid illegal starting characters and clashes with keywords and literals.

    For now, here's the Java source to ponder. Tomorrow I'll discuss the code in more detail, and post a VB version which I actually wrote first, then ported to Java.

    001 
    002 public final class Utils {
    003 
    004     private static final byte bUnder = 95;
    005     private static final byte bDollar = 36;
    006     private static final byte bQuest = 63;
    007     private static final byte bFirstUAlpha = 65;
    008     private static final byte bFirstLAlpha = 97;
    009     private static final byte bFirstNum = 48;
    010     private static final long mask6Bits = 63;
    011     private static final long mask4Bits = 15;
    012     private static final long mask2Bits = 3;
    013     private static final int bitsPerChar = 6;
    014 
    015     private static final char[] charMap = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$"
    016                             .toCharArray();
    017 
    018     public final static class Int128 {
    019         public final long msb;
    020         public final long lsb;
    021         public Int128(long m, long l) {
    022             msb = m;
    023             lsb = l;
    024         }
    025         public Int128() {
    026             this(00);
    027         }
    028         @Override
    029         public final String toString() {
    030             return "" + msb + "," + lsb;
    031         }
    032     }
    033 
    034     public static String int128ToStr(long msb, long lsb) {
    035         final int MAX_BYTE = 31;
    036         char[] buf = new char[32];
    037         int i = MAX_BYTE;
    038 
    039         // Process the least significant bits
    040         // 64 bit number has 10 six bit encoded values plus four bits left over
    041         for (int n = 0; n < 10; ++n) {
    042             if (lsb == && msb == 0) {
    043                 break// Eliminate leading zeros
    044             }
    045             byte b = (byte) (lsb & mask6Bits);
    046             buf[i--= charMap[b];
    047             lsb >>>= bitsPerChar;
    048         }
    049 
    050         // 4 bits from the lsb, and 2 from the msb
    051         if (lsb != || msb != || i == MAX_BYTE) {
    052             long leftOver = lsb & 15;
    053             long firstTwo = msb & 3;
    054             byte b = (byte) ((firstTwo << 4| leftOver);
    055             buf[i--= charMap[b];
    056             msb >>>= 2;
    057         }
    058 
    059         // Process the most significant bits
    060         while (msb != 0) {
    061             byte b = (byte) (msb & mask6Bits);
    062             buf[i--= charMap[b];
    063             msb >>>= bitsPerChar;
    064         }
    065 
    066         // It's easiest to simply prefix an underscore to avoid
    067         // illegal identifiers and clashes with keywords and literals.
    068         buf[i--'_';
    069 
    070         return new String(buf, i + 1, MAX_BYTE - i);
    071     }
    072 
    073     public static Int128 strToInt128(String s) {
    074         byte[] buf = s.getBytes();
    075         int maxByte = buf.length - 1;
    076         int i = maxByte;
    077         int minByte = 0;
    078         if (buf[0== bUnder) {
    079             minByte = 1;
    080         }
    081         long msb = 0;
    082         long lsb = 0;
    083 
    084         for (int n = 0; n < 10; ++n) {
    085             if (i < minByte) {
    086                 return new Int128(msb, lsb);
    087             }
    088             long b = charToMod64(buf[i]);
    089             if (b != 0) {
    090                 if (n != 0) {
    091                     b <<= (n * 6);
    092                 }
    093                 lsb |= b;
    094             }
    095             --i;
    096         }
    097 
    098         if (i >= minByte) {
    099             long b = charToMod64(buf[i]);
    100             if (b != 0) {
    101                 long leftOver = b & mask4Bits;
    102                 msb = (b >>> 4& mask2Bits;
    103                 leftOver <<= 60;
    104                 lsb |= leftOver;
    105             }
    106             --i;
    107         }
    108 
    109         for (int n = 0; n < 11; ++n) {
    110             if (i < minByte) {
    111                 return new Int128(msb, lsb);
    112             }
    113             long b = charToMod64(buf[i]);
    114             if (b != 0) {
    115                 b <<= (n * bitsPerChar + 2);
    116                 msb |= b;
    117             }
    118             --i;
    119         }
    120         return new Int128(msb, lsb);
    121     }
    122 
    123     private static byte charToMod64(byte c) {
    124         if (c >= bFirstNum && c <= bFirstNum + 10) {
    125             return (byte) (c - 48);
    126         else if (c >= bFirstUAlpha && c <= bFirstUAlpha + 26) {
    127             return (byte) (c - bFirstUAlpha + 10);
    128         else {
    129             if (c >= bFirstLAlpha && c <= bFirstLAlpha + 26) {
    130                 return (byte) (c - bFirstLAlpha + 26 10);
    131             else if (c == bUnder) {
    132                 return 62;
    133             else if (c == bDollar) {
    134                 return 63;
    135             else {
    136                 assert false;
    137                 return 0;
    138             }
    139         }
    140     }
    141 }
    Java2html
    October 16

    Windows Environment Editor released on SourceForge

    I finally got around to submitting the Windows Environment Editor to SourceForge. You may recall this as the project originally created for the ill-fated OCI Summer of Code programming contest. We haven't done much work on it since then, but the major bugs have been addressed, and I've been using it regularly, as have a few brave volunteers. I think it's the best of the many replacement environment editors currently available, but you can judge that for yourself.

    Technologies

    One of my primary purposes for this project was to learn how to create GUI applications using the relatively new Windows Presentation Foundation (WPF 1.0) for .NET. This framework, while similar to others in many respects, is really something fundamentally different than libraries like Swing, WinForms, GTK, MFC, etc. For more information I recommend reading http://arstechnica.com/reviews/os/pretty-vista.ars

    One of the benefits of WPF is the separation between GUI design using a markup language (XAML) similar to HTML, and a programming API for making that GUI work. In theory, this will allow designers to use programs like Microsoft Expression Blend to make applications like mine look better. For this reason, we tried to express as much of the GUI as possible using XAML, rather than reverting to code. We also wasted huge amounts of time using Blend to play with the GUI design, but in the end we gave up and went with a simple (ugly?) design that eschews all the fancy gradient effects.

    Of course, for reasons best explained in my previous posts, I used Visual Basic 2005 as the programming language. All told the application is something less that 2000 lines of code&XAML.

    In the future, maybe I'll port everything to the upcoming new releases for WPF and VB.

    Features

    • Resizable dialog allows viewing more variables at once.
    • System and User settings are contained in collapsible sections.
    • Search box to quickly jump to a variable for editing.
    • Modify variables directly in the search box using Variable=Value syntax.
    • Edit variables directly without having to popup an additional dialog.
    • Both keyboard and mouse interfaces are supported.
    • Keyboard shortcuts for common tasks.
    • Size, Position, and other state is remembered between runs.
    • Vista support for running as a user or administrator.
    • Many small details enhance the user experience.

    Conclusion

    I estimate that this application represents several hundred hours of work by myself and Ngan that may have been better spent playing Scrabble or Age Of Empires. I hope that at least a few people find it useful. If you have any questions, want to contribute features or fixes, or just want to try it out, I encourage you to check out the project.

    https://sourceforge.net/projects/winenvedit/

    July 29

    NOT the Worlds Most Advanced Mouse

    I thought I'd share my experience with a recent consumer electronics purchase. Maybe I can save you some pain and trouble, and salvage something from my own wasted effort.

    The Problem

    I started the search for a new mouse, because I am occasionally annoyed by the cord on my other mice, and I was hoping that wireless mouse technology was finally at a usable point. I first tried cordless mice two years ago using a Christmas gift certificate to purchase the top-of-the-line Microsoft solution. (Wireless Laser 6000). However, I found this to be completely unusable, because of it's penchant for missing clicks, and poor tracking.

    The Solution?

    So, I plunked down $100 last Wednesday to purchase a Logitech MX Revolution, which claims to be "The Worlds Most Advanced Mouse". I was disappointed to find that this mouse shares many of the flaws the MS Wireless mouse, and even manages to bring back some problems I thought we'd left behind with the old ball mice, and has upped the ante with a product design that is so flawed that at first could not believe it. But first…

    What I Want In a Mouse

    5 Buttons

    I routinely make use of 5 buttons while using a mouse. I use the obvious left button to click and double-click my way through GUI interfaces. I use the right button to pull up context menus. I use the middle button to open browser links in separate tabs and to close those tabs with a single click. And I use the back and forward button to retrace my steps through browser-style interfaces. Any other buttons are more likely to be accidentally pressed than activated on purpose, and should be minimal and unobtrusive, or better yet, absent.

    Scroll Wheel

    I constantly use the scroll wheel (if I'm using the mouse at all) for anything with a document-like interface. However, I'm skeptical of the value of side-to-side scrolling found on many newer mice. This might be useful for working with large pictures or something, but I don't see any value for me, and would always choose to disable the feature.

    Still, the scroll wheel has a lot of room for improvement. For me, it often scrolls too much with each step. Even the default Windows mouse driver assumes that the scroll wheel should scroll one or more lines at minimum. What I really want is something more precise, which I can currently only get by clicking and dragging the scroll bar "puck" or "thumb". This is an area where the MX Revolution really had a chance to innovate to fix one of my major gripes with current scroll wheel implementations.

    Precision

    Many people don't notice the precision of their mouse, but the quickest way I know to grasp the concept is to either a) Try to write your name using a paint program, or b) try to play a game. Under these and similar scenarios, differences in mouse tracking become very apparent. When I'm playing Age Of Empires III, I only have so much time to click all the little guys to tell them where to attack. If my mouse causes me to miss, or makes it difficult to select the right units in the heat of battle, then it makes me upset. Of course, if you're observant you can readilly notice the difference between a good mouse and a poor one when using any GUI. A good mouse makes it much easier to click on the myriad buttons, hyperlinks, and other elements of the modern graphical interface.

    Clicking

    This should go without saying, but the mouse buttons should respond as expected when clicked. Both of the wireless mice I've used have occasionally failed to register a click the first time I've pressed a button. This could be a driver issue, or a problem related to RF interference, or any number of things, but it's definitely unacceptable.

    The Review

    I went with the Logitech, because I've been extremely happy with a MX518 wireless gaming mouse, which is, to date, the best mouse I've ever owned. My hope was that they would essentially provide an MX518 in wireless form, perhaps with fixes for my only few gripes with the wired version (Doesn't remember my DPI setting between reboots, and I occasionally accidentally hit the dpi and property buttons. )

    Gimmick One, A New Wheel

    The coolest thing about the new mouse, is the idea for a new scroll wheel. They installed this nice heavy wheel in the center which works in one of two software-controlled modes. In the first mode it behaves like any other scroll wheel, clicking as you spin it, and scrolling N lines for each click. The second mode, disengages something internally, allowing the wheel to freely spin. This could have been so cool had it been implemented correctly. What I expected was that the scroll wheel would finally give the illusion of a physical connection to the document. If I moved it the tiniest amount, then the document should scroll by a single pixel or less. If I were to spin it, then the document should scroll at the speed of the wheel until it lost momentum or I stopped the spinning manually. This is NOT what happened. Instead, the free-spinning mode seemed to simply control the same N-line scroll wheel mechanism as before. Perhaps a driver update could fix this issue, by tying the free spinning mode to sub-pixel manipulation of the scroll thumb instead of hooking into the scroll wheel mechanism. This wouldn't quite be the right effect for long documents, because you'd want to limit the maximum speed, but it would be better, and I'm not sure if perfection can be achieved without specific software support.

    The scroll wheel also supports the ubiquitous horizontal scrolling feature that I dislike, but I can't remember if it was possible to disable it.

    No Middle Button

    When using the Logitech mouse driver, pushing the scroll wheel either switches scrolling modes between free-spinning and "clicking", or can be disabled entirely. There is no provision at all for supporting a middle button, which I personally find totally unacceptable, especially considering the uselessness of their scroll wheel feature.

    Gimmick Two, Search

    Logitech added a new button behind the scroll wheel, which can only be used for their new search feature. When clicked on a highlighted word, presumably it will pull up the word the word in the search engine of your choice. I tried the feature only a few times, and never accepted the default Yahoo service. However, it rarely worked, and I was never quite sure what I was doing wrong, or if the feature is just buggy or non-intuitive. In any case, the only thing I really wanted to do was remap this button to control the scroll-wheel mode, and use the scroll wheel button for its usual middle-click duties. Of course, the Logitech software didn't support this. To be fair, clicking the search button on the mouse seemed to do the same thing as clicking the search button on my keyboard (MS Comfort Curve 2000. The best keyboard ever. I bought 3.). It brings up the new Vista search dialog, which I don't really like. Maybe there is simply some software conflict, or Logitech doesn't correctly support Vista.

    Gimmick Three, A Button On My Thumb

    Logitech added another strange feature to this mouse. There is a little toggle under your right thumb, which can be pressed forward or backward. When you do so, it will popup a very ugly window where you can choose between running applications. This seemed equivalent, but inferior, to the Alt-Tab switcher, or Flip3D found in Windows Vista. I wish I had thought to capture a picture so you could see just how poorly implemented this was. Once again, it was impossible to map these to more logical functions such as the usual forward/back buttons, which might have made sense. It also might have been cool to tie into Flip3D or the Alt-Tab switcher, but I probably would have mapped this feature to the buttons on the left of the mouse, as I would use it much less than browser forward/back navigation.

    The Software

    Strangely this mouse seems to require more than the usual driver. To enable most of the advanced features you must run the SetPoint software in the system tray. If you exit this program, then the mouse reverts to normal MS mouse behavior. This does re-enable the middle button, but the thumb button and search button are disabled. If I were forced to keep this mouse, then this is probably the way I would try use it.

    The biggest problem with the software is the lack of useful customizability. As mentioned above, most features were hardcoded to specific buttons, and could only be disabled rather than reassigned. There were several sliders for controlling the scroll wheel speed and acceleration, but I could find no setting that made this feature usable, or in any way better than any other scroll wheel.

    Precision Problems

    The most notable problems with this mouse had nothing to do with the flawed or missing features detailed above, but instead reflect my previous experience with a wireless mouse.

    Can't Click

    Several times while using my computer, I would click on a link while web browsing, or click a Settler in AOE3, only to find that the click didn't register. At the time, I blamed myself or even Windows Vista. I thought that perhaps I'd moved slightly, and the game thought I'd dragged rather than clicked. However, over time, and with back to back comparison with a MX518 wired mouse, it became apparent that the mouse was just occasionally missing clicks. Maybe I have too much interference. I do have a Wireless-G network and 2.4Ghz portable phones. However, I don't really think that's any excuse for Logitech here, because this sort of an environment is commonplace among the user base for this product.

    Can't Move

    If you've used a mouse before 1998 or so, then you probably remember ball mice, and all the problems associated with them. Somehow Logitech has managed to recreate the experience of using one of these. This probably has the same root cause as my click problem above, but frequently when using this mouse, the pointer would freeze to a spot on the screen, and I could get it to move only by quickly moving the mouse to get it going again. This only seemed to happen when I was moving the mouse slowly in the first place. For example, I might be trying to click on a AOE3 Settler or army unit to issue new orders.

    Lest you think that I'm just being picky, my girlfriend also noticed the same problems with both wireless mice while playing Age Of Empires, which was quickly remedied by reverting to an old Microsoft IntelliMouse Optical. These problems occurred with any software, it was just much more noticeable in a game, because you often need to move at a precise rate to click on a moving object.

    Summary

    In my first hardware review, I give Logitech MX Revolution a 0/10. Although many of the problems might be fixable by future driver, firmware, and software updates, I don't think many of the decisions such as the need to run a separate software program instead of just a driver, will likely ever be addressed.

    I know of other people who love this mouse, so feel free to experiment with it yourself, but I advise you to purchase from a local retailer with a forgiving return policy, and to hang on to your receipt. If your experience is anything like mine, then you'll probably regret the wasted effort.

    On Friday, I took the mouse back for a refund. In the end I'll probably just buy another Logitech MX518, but I'm still stuck with a wired mouse whose cable occasionally interferes with whatever I'm trying to do. Maybe I should just clean my desk.

    July 27

    RAII For .NET and Java

    One of the complaints that many C++ developers have with Java and .NET is the lack of destructor semantics. I've even included this as a proposal for a Scoped keyword in my post about VB shortcomings. Basically, with C++ you can implement a special method on a class which will automatically be called when instances go out of scope. Note, that this really has absolutely nothing to do with freeing memory, although that was it's most common usage in older (pre-1990?) C++ code. These days this mechanism is much more common, and used for all kinds of things, mostly still involving release of resources. (Files, Mutexes, Database Connections, etc.) It's even become a best practice with its own acronym, RAII (Resource Acquisition Is Initialization).

    C# and VB have a similar general-purpose mechanism with Using Statements, but the semantics can get unweildy. Java and older versions of VB are stuck using try/finally (also available in C#).

    C++

    void foo() {

    Connection a("1");

    Connection z("26");

    // Do stuff

    }

    C#

    void Foo() {

    using (Connection a = new Connection("1")

    using (Connection z = new Connection("26") {

    // Do stuff

    }

     

    }

    VB

    Sub Foo

    Using a As New Connection("1")

    … Using z As New Connection("26")

    VB (older versions)

    Sub Foo

    Try

    Dim a As New Connection("1")

    Dim z As New Connection("26")

    // Do stuff

    Finally

    a.Dispose()

    z.Dispose()

    End Try

    End Sub

    Java

    void foo() {

    try {

    Connection a = new Connection("1");

    Connection z = new Connection("26");

    } finally {

    a.close();

    z.close();

    }

    The C++ code is very clean, because the special destructor method is called to close each connection as they go out of scope at the bottom of the method. The C# code is much more verbose, but stacking the using statements at least alleviate the nesting problems of the VB version. The older VB and java versions are stuck manually closing everything in the finally block, which can be especially cumbersome when you have to check each instance for null.

    I think this is definitely an area where these languages need some more syntactic sugar. I think a Scoped keyword would be ideal. Here's how it could look in Java.

    void foo() {

    scoped Connection a = new Connection("1");

    scoped Connection z = new Connection("26");

    // Do stuff

    }

    Each Connection object would be responsible for implementing a special method which would automatically be called when the references went out of scope. This could either tie into the Finalizer mechanism similar to how .NET does it, or it could use a new approach perhaps consisting of a compile-time annotation for the Connection class. (For more information on .NET IDisposable) C# and VB could really use something similar, because the Using syntax often remains too bitter for my taste.

    For C# and especially VB, we can probably do better using something like Scott McMaster's suggestion. Here we trade some efficiency for convenience. I would write it like so…

    void Foo() {

    using (ScopeManager scope = new ScopeManager()) {

    Connection a = new Connection("1");

    scope.add(a);

    Connection z = new Connection("26");

    scope.add(z);

    }

    }

    Here we save ~26 lines of using statements, each possibly with its own nesting level. It's still not as nice as the C++ mechanism, but it's available now, and preferrable to any alternative I can think of at the moment. This could potentially be useful in Java too, but less so due to the lack of Using semantics.

     

     

    May 22

    What Does “Loosely Typed” Mean?

    Eric questions the meaning of "Loosely Typed"

    I think that there are two separate issues here.

    Required Declarations

    This corresponds to the "Option Explicit" feature added to VB3, and which still exists in VB.NET. You can choose for a particular file whether to require variable declarations or not. This feature helps catch bugs at compile time where you accidentally misspell the name of some symbol. Consider the following code.

    Class Person

        Private name

    Private age As Double

    Private money As Decimal = 100.0

    End Class

    The name field is declared, but no type is specified, whereas the age field is both declared and given a type, and money is declared, given a type, and an initial value.

    Required Type

    This somewhat corresponds to the "Option Strict" feature in VB, except that the feature enables quite a few other behaviors that are somewhat related, such as the ability to implicitly convert between Strings and Numbers. If I have "Option Strict Off" and "Option Explicit Off" at the top of a VB file, then the following will compile.

    to = "Hello"

    to = to.Length

    to = to + too

    This shows one problem with making variable declarations optional, because I was able to misspell "to" as "too" in the last line, and the compiler couldn't check it. This is why I like to be able to specify "Option Explicit On" in VB even when I set "Option Strict Off". I prefer to get a little red squiggly underline of "too" rather than waiting to catch this in a unit test.

    Inferred Type

    VB9 and C#3 add another twist to this issue, because they now support type inference. With this feature, you can still get strong type safety, but without the hassle of explicitly stating the type. Take the following C# example:

    var name = "Justin";

    var len = name.Length;

    name = 42.0;

    In this example the first line is exactly equivalent to "string name = "Justin";", and the third line will be a compile time error, because the inferred type of name is String, which doesn't support implicit conversion from Double.

    The only thing I don't like about the current implementation of the type inference feature, is that it's done by the compiler. What I'd really like is for the IDE to display the name of the inferred type, and perhaps let me modify its choice without making the type "sticky". Maybe the IDE could automatically display the above as "var string name = "Justin";", where "string" is non-editable and mutates itself automatically to match changes to the inferred type when refactoring.

    Strong vs. Loose

    These terms are currently pretty ambiguous. I think the term Strong should only mean that at runtime, the code will result in an error if you attempt an operation that is not known to be safe. Examples might include:

    • Calling a derived-type method on a base reference
    • Implicit numeric conversions
    • Calling a method that is unknown on the current reference

    The VB "Option Strict Off" setting basically throws "Loose Typing" in with "No Type Declaration Required", and makes the following code legal.

    Class A

    Public Sub Foo()

    End Sub

    End Class

    Dim v as Object

    v = new A()

    v.Foo()

    v = "42"

    v += 1

    Assert(v.Equals(43))

    This is true Loose Type behavior, and can be useful in certain limited contexts. However, I wish you could have finer-grained control over the VB options so that I could retain some type safety while still disabling the need for explicit type specifications. I guess some of the need for this is obviated by the type-inference feature.

    Dynamic vs. Static

    This seems to clearly refer to whether type information is known at compile time. Static Typing means that all type errors short of invalid explicit casts will be caught at compile time.

    Conclusion

    So I guess I'm a little unclear about what kind of type system is found in Ruby. If it claims to be "Strongly Typed" then that's pretty ambiguous. Is there a summary of exactly which operations result in an error vs which are implicitly allowed? What value does "Strongly Typed" provide at runtime? If you're going to wait until runtime to detect errors, then why not just use "Loosely Typed"?

    May 21

    VB and Language Choice

    A recent Coding Horror post got me thinking again about C# vs. Visual Basic. As you know, I myself think that VB is probably the best general purpose managed language available right now, so I thought I'd add my $.02 to the discussion.

    VB vs. C# is like Coke vs. Pepsi

    I don't think this analogy holds up, because it doesn't take into account the complexities of the situation. I believe that there are real substantive differences that make VB a better tool for writing programs than C#. While it's true that tools like CodeRush and Resharper can greatly improve the C# experience, they still fall short of what's theoretically possible from a VB-based toolset, because, no matter what you do to C# (or Java or any other C-style language) you will always be missing the three key features that make VB better. Namely the Readable Keywords, Line-Orientation, and Case-Insensitivity I detailed in a previous post.

    I'm not claiming VB/Visual Studio is better in every way than Eclipse/Java, IntelliJ/Java, Resharper+VS2005/C#, etc. I'm just saying that if all these tools were refined to their highest potential, I would like VB best because it has a better foundation. VB is currently missing some major features such as the ability to automatically handle Imports, that make it difficult to compare it favorably to these other tools.

    Case Insensitivity is Right and Case Sensitivity is Wrong

    I hope that most VB users would agree that it's not the Case Insensitivity of VB that we actually like. In fact, at the language level I believe it's a mistake for VB to be Case Insensitive. What people really like about VB is that most of the time it's not the author's responsibility to worry about case. The feature that we really like is that the IDE will fix the case to match the declaration or keyword. I think a better way to implement this feature would be to require Case Hyper-Sensitivity at the language level which would disallow multiple symbols in scope that differ only by case.

    No competent VB programmer would ever want something like "sySTeM.cONsolE.WRitELINE(foo)" to compile. We just think it's ridiculous that C# and Java IDE editors make us constantly contort our hands with two-key combinations. This only slows down our typing, causes hand cramps, and makes us 2% (est.) less happy. For those of you who've grown accustomed to Resharper, IntelliJ or Eclipse, this is exactly equivalent to many of the features you would miss if switching to Notepad/Java. In fact, my first impression using IntelliJ was that somebody copied 20% of the good features from VB, and then thought of a bunch of new must-have features that I didn't even consider. The problem is that none of the IDE vendors are really doing a good job of incorporating the best features of their competition. What would be ideal is a JetBrains/VB based on Mono, or better yet a completely new language that takes the important ideas of VB while stripping away the too-long keywords, and other features that I dislike in VB. (Overall I like the readable keywords in VB, but some are just too much. )

    Strongly Typed

    The Ruby and Python movements have convinced many people that they really want a loosely typed language. Of course, one of the great things about VB is that you can choose within a given file whether you want strong or loose typing, but I'm not convinced that most people even truly want loose typing in most cases. What people really seem to like is not having to waste time keying in all the type information. They like the flexibility that the tool can just figure it out.

    You can get most of this benefit without having to give up the performance and other benefits of a strongly typed language. Upcoming versions of C# and VB will have a feature called Type Inference, wherein the language will simply figure out the type from the context in which it is used. For instance, in C# 3.0 I can write "var x = foo.ToString();" and C# will infer that the type of x is String. One thing that I don't like about the new VB/C# type-inference feature is that it seems to be implemented at the language level.

    Language vs. IDE

    I believe that the single most important key to the future of programming is the realization that Programming Languages as classically defined are an idea whose time has passed. Tools like VB, IntelliJ, and Eclipse have blurred the line between IDE and language, but the real key is realizing that the language side of the line is no longer needed at all. In fact, it's a huge detriment. This is the primary reason why I can't really get excited about any new language such as Ruby. I see these all as vestiges of a legacy mode of thinking.

    Multi-Language

    One of the benefits of .NET was supposed to be that you could use any language that you like. However, as Jeff has noticed this dream has not been realized. The problem is that we're not really free to choose any language we like, because the fact is that most code is going to have to be maintained and written by multiple people each with different preferences. Jeff's experience is that most people have settled on C# as the de-facto .NET standard language despite the superiority of VB. Over time he seems to have been worn down by the C#-zealots, and their fanatical devotion to a language attuned to those with masochistic tendencies, but I have to believe that he still realizes deep-down that he knows a better way.

    The fundamental problem with a multi-language platform like .NET is that what we really want is the ability to maintain the same code in multiple languages. If I write a class in VB, then someone else needs to be able to maintain that class in C#, Python, or Ruby without having to translate the language. The key is that the code itself can't know what language it's written in, and that can only happen when we get rid of this silly old-fashioned notion of parsing languages written as text files.

    April 13

    5 Things

    I think this meme jumped the shark a long time ago, so I won't tag anyone else. However, I'll try to come up with my 5 things.
    1. I'm from a relatively small town of 35,000, although I grew up more on the fringes of that metropolis. We had one neighbor next door, and a farm across the street. There weren't any other kids around so I had to learn to play with myself most of the time. ;)
    2. I still play video games, although I haven't upgraded to the Xbox360 yet. I'm currently in the middle of Psychonauts and Half-Life 2, so I'm a little behind the times.
    3. I like to cook. I watch the food network frequently, and I think I'm getting decent at making a few things although I'm a long way from Iron Chef. I did win the 1st annual OCI Chili cookoff last year though. Ok, I tied. :)
    4. I used to be a Ham. I got my first amateur radio license when I was 11 or 12, but I lost interest when I found out the computers could be more than my commodore64.
    5. Although my first computer was a C64, and I wrote some simple BASIC programs for it, I never *really* learned to program until college, when I learned Modula-2. Maybe it's because my first two languages were of the no-curly-braces variety that I prefer Python, VB, and Pascal to C++, Java, C#, or Perl.

    When I started this blog, I didn't think I would have any trouble keeping a steady stream of posts going, but it's been a while since my last entry. I'll try to do better going forward.

    Thanks for getting me back on track Weiqi. :)

    January 18

    Things I Would Change in VB

    VB is a great language, but I still think it has room for improvement. For example, there are far too many keywords, some of the keywords are confusing, and some keywords have too many overloaded meanings. There are also problems with the IDE that should be addressed. More than any other language, VB has always been about a partnership with the IDE, and this functionality is crucial for its ease of use and power.

    Language Changes

    Declare instead of Dim for variable declaration

    The use of the keyword ‘Dim’ for variable declaration is an historical artifact of Basic. A more descriptive name would be ‘Declare’ which is already a VB keyword, but used for a feature of dubious value. Here are some examples, of what I would prefer, and I'll continue to do so in subsequent examples:

    Declare x,y,z As Integer 
    Declare names(1 to 10) As String 
    Declare teams() As String = GetTeams()
    
    Too many uses of parentheses

    I personally find it confusing that parentheses are used for expression grouping, array declarations, indexing, and function calling. I think it would be more readable if we used brackets for indexing and arrays.

    Declare names[10] As String 
    names[0] = “Justin” 
    names[(x + (y \ 2))] = “Michel” 
    Console.WriteLine(“Name = “ & GetNamePrefix() & names[2])
    
    GetType, TypeOf

    It doesn’t seem like we should need both of these keywords. One option would be to use TypeOf (T) for retreiving the type of a class. I think this is more consistent with VB’s readable syntax. Even better would be to support accessing a Shared GetType method or read-only Type property on any type, which would eliminate the need for a keyword in this use case completely. Best might be to allow the type name itself to be used.

    Declare t As Type = FileStream.GetType()
    

    —or—

    Declare t As Type = TypeOf(FileStream)
    

    —or—

    Declare t As Type = FileStream.Type
    

    —or—

    Declare t As Type = FileStream
    
    Is, IsA, and TypeOf

    The current keyword "Is" should be used to compare for identity only. If used with two reference types then it returns true if both refer to the same exact object. If used with value types then it returns true if both have the same value. Basically, I’m suggesting elimination of “If TypeOf X Is Y Then” as a special case, although if we retain TypeOf as outlined above, then “If TypeOf(X) Is Y Then” and “If TypeOf(X) = Y Then” are both legal and equivalent.

    A new keyword IsA should be introduced to allow easy comparison of types. A IsA B should return true if A is the same type as B, A derives from B, or if B is an interface implemented by A.

    Declare A, B As Foo 
    A IsA Foo ' returns true 
    Foo IsA A ' won’t compile, because left hand side is a type. 
    A IsA B ' won’t compile, because B is an object. 
    A IsA B.GetType() ' returns true
    

    It’s also confusing that Is must be used in a Select expression. It seems like the extra keyword should be unnecessary.

    Select age 
    Case 5 To 10, 12 To 15 
    Case < 5, 13, 16, > 80 ' No need for “Case Is < 5, 13, 16, Is > 80” 
    Case Default 
    End Select 
    Too many conversion keywords

    I think all of CBool, CByte, CChar, CDate, CDbl, etc. can be eliminated by replacing them with a single Convert function. It would behave exactly the same as the current keywords.

    Declare d As Double = 3.14 
    Declare x As Integer = Convert(d, Integer) 
    

    -- but also allowing --

    Declare x As Integer = Convert(d) 
    

    My proposed syntax is a little more verbose, but eliminates 15 keywords, some of which can be hard to remember. It also supports implicit detection of the desired type if possible, as in the example above where it can see that the target of the Convert is an Integer, and saves you the trouble of explicitly stating that.

    I would also allow a third parameter to Convert for specifying the type converted from. This can be useful when the source type is not always obvious, and when you want a warning when source type is refactored.

    Declare x As UInt32 = Convert(GetAge(), From := UInt64) 
    

    This takes advantage of current named argument syntax to allow both implicit detection of the To type and explicit specification of the From type. If the GetAge function is someday updated to return a Int64, then the code would no longer compile, preventing a possible bug when dealing with negative values.

    DirectCast too verbose

    This keyword is unnecessarilly verbose. Fortunately there is already an intuitive and logical replacement already in the language.

    Declare x As Object = GetFoo() 
    Declare y As Foo 
    y = x As Foo 
    

    The As keyword is already used to declare the type of an object, and the above syntax should be obvious in meaning. As with DirectCast, if x is not a Foo, then an exception would be thrown.

    It would also be nice to allow this syntax to be used when it’s known that the source and destination types differ. In this case, “As” would be equivalent to Convert.

    Declare d As Double = 3.14 
    Declare x As Integer = d As Integer 
    
    Eliminate TryCast

    The use of TryCast doesn’t really add any value. Instead of:

    Declare x As Foo = TryCast(y, Foo) 
    If x Is Nothing Then 
      x.Blah() 
    End If 
    

    you could just use:

    If y IsA Foo Then 
      Declare x As Foo = y As Foo 
      x.Blah() 
    End If 
    

    The compiler should be able to figure out how to make the latter syntax just as efficient as the former.

    No need for CType

    There should be no need for this keyword, which is kind of a hybrid between casting and conversion. This should be replaced by either As or Convert. The use of CType for defining conversion operators on a class would use the keyword Convert instead.

    Nothing vs. Null

    The VB concept of Nothing is almost always referred to as Null in other languages, and even within much of the VB community (Probably because of SQL). Having a separate term for such a common concept is just confusing for users old and new.

    Declare x As Foo = Null 
    Assert(x IsNot Null) 
    Assert(x = Null) 
    
    No need for AddressOf keyword

    Nowhere else in VB is the notion of pointer or address exposed, and it’s likely that AddressOf doesn’t really return an address anyway. We could just use the Function or Sub keyword in place of AddressOf.

    From the examples in AddressOf documentation:

    AddHandler Button1.Click, Sub Button1_Click 
    Declare t As New Thread(Sub CountSheep) 
    
    Add the ability to Define aliases

    The C++ typedef keyword is often useful for writing maintainable code, and sometimes C++ references are handy in situations other than parameter passing. I propose that a new Define keyword could provide this functionality for VB in an intuitive way.

    It can allow you to enhance readability. For example, by defining a new IntList type we save typing and enhance readability whereever List(Of Int) is used. This becomes more valuable with more complex generic types.

    Define IntList As List(Of Integer) 
    Declare v As IntList 
    

    It can also be useful for changing a type without having to update all the code, or to allow conditional compilation to use different types.

    #If SafetyEnabled Then 
      Define MyInt As SafeInteger 
    #Else 
      Define MyInt As Integer 
    #End If 
    

    Define is already partially available with Imports. I would remove the ability for Imports to define new aliases and use the Define syntax instead.

    Imports Con = System.Console 
    

    becomes

    Define Con As System.Console 
    

    This keyword could also be useful for defining aliases for other things besides Types.

    Sub Foo 
      Declare aReallyReallyLongName As SomeType = GetSomeType() 
      Define s As aReallyReallyLongName 
      s.SomeMethod() 
      Define sm As s.SomeReallyLongMethodName 
      s.sm() 
    End Sub 
    More flexible Imports

    I’d like to Import a symbol within small scope. I see no reason why this couldn’t be allowed anywhere instead of only at the top of a file.

    Sub Log() 
      Imports System.Console 
      Write(a) 
      Write(b) 
      WriteLine(c) 
    End Sub 
    
    No need for Delegate keyword

    We should just be able to define a delegate type like any other.

    Define Sub OnClick(ByVal e As EventArgs) 
    AddHandler Button1.Click, Sub OnClick 
    Public Event As OnClick 
    
    Remove Alias, Ansi, Auto, Declare, Lib, and Unicode

    These keywords for declaring access to external functions make it marginally easier to access external Win32 routines, but cause unnecessary clutter in the language, making it that much harder to understand. They are also unnecessary, because it’s not much harder to just use the DllImport attribute.

    Consider removing keywords for basic types

    The CLR already defines nice unambigous names for all the basic types such as Byte, Integer, Short, Long, etc. It might be better just to use those, and eliminate the unnecessary keywords from VB. Most of the VB keywords are the same as the names defined in System anyway, so this wouldn’t be that onerous. The proposed Define feature could even provide easy compatibility for old code.

    No need for the Call keyword

    In VB you can call a subroutine by saying “Call Foo”, but you can also get exactly the same behavior by saying “Foo()”. I don’t see any benefit to the former syntax, and it adds yet another unnecessary keyword to the language.

    Select instead of “Select Case”

    There’s no need to support an optional Case keyword in the first line of a Select statement. We should just standardize on the following syntax, and eliminate a little more clutter from the language.

    Select age ' vs. Select Case age
    Case 0 To 20 
      Foo1() 
    Case 21 
      Foo2() 
    Case 25, > 30 
      Foo3() 
    Case Else 
      Foo4() 
    End Select 
    
    Literals

    There are currently some basic types that have no syntax for literal declaration such as Byte. There are also some type such as Char where the literal syntax is unintuitive. A better character literal might use the back-tick (e.g. `c`).

    Eliminate the colon

    One of the strengths of VB is its readable line-oriented syntax. The colon operator allows you to subvert this. Eliminating this would help keep VB code readable to all VB programmers.

    The other use for the colon is to declare labels for GoTo statements, but I’m also proposing elimination of GoTo for VB.

    Eliminate the underscore

    Similarly the underscore is a crutch that should not be needed. Code should either rely on wrapping in the IDE, or introduction of temporary variables to make code more readable.

    However, to make this work, we need to eliminate some of the most egregious causes of lengthy lines. The UsesAttribute keyword fixes one such area, new syntax for Implementing interfaces fixes another, and the proposed Define keyword another.

    Bring Back the underscore

    Instead of using the underscore for line continuation, I would use it to disambiguate VB keywords that match symbols. Currently the [] brackets are used for this purpose, but I think those work better for array access as stated previously, and prepending an underscore seems a more intuitive solution. Hopefully the elimination of many keywords will make the need for disambiguation much less prevalent.

    Remove the \ symbol

    Although integer division is probably fairly frequently used, I find the inclusion of two separate division symbols confusing. Integer / Integer, should leave the arguments as integers, while Int / Float and Float / Div Int should convert the result to a floating point type. Anything else can be handled by explicit conversions and/or separate math library functions. It would also be helpful if the IDE would colorize integer types differently than floating point.

    Eliminate If … Then … Else … Statements

    One of the primary values of VB is its line-oriented nature. I was completely unaware that the current VB allows If Then Else on a single line without an End If. This should not be allowed, as it subverts the nature of VB, by allowing a tiny syntactic convenience with far greater potential for misuse than valid use cases.

    For example, this currently compiles, and I don’t think it should.

    If x > 0 Then Return “>” Else Return “<=” 
    Introduce a short-circuiting ternary expression

    One of the problems with the C ternary expression is that the default case is separated from the context by the boolean expression. For example, if I want to Trim() only a non-null string in C#:

    String trimmed = s != null ? s.Trim() : “”; 
    

    Notice how the code we really want to write “String trimmed = s.Trim();” is very different from the code we have to write to handle the null.

    I think VB has the opportunity to make this common expression more readable by reordering the “arguments” to allow:

    String trimmed = s.Trim() ? s != null : “”;

    Or in VB:

    Declare trimmed As String = s.Trim() IIf s IsNot Nothing Else “”; 
    

    Other examples:

    If a = (b IIf b < 5 Else 5) Then 
    If a = (b IIf b IsNot Nothing Else GetDefault()) Then 
    Foo(a, (x IIf a > 0 Else GetDefault()), z) 
    y = 10 IIf x Is Nothing Else x.Foo() 
    
    Introduce UsesAttribute

    Instead of overloading the < and > operators for applying attributes, I propose we add a new keyword to make this feel more natural in VB, and to eliminate one of the major needs for the underscore line continuation character.

    Sub Foo(ByVal s As String, ByVal n As Int32) 
      UsesAttribute Conditional(“A”) 
      UsesAttribute WebMethod 
      ' Body of subroutine 
    End Sub 
    
    Class SomeService 
      Inherits Foo 
      Implements Bar 
      UsesAttribute WebService(Namespace:=”blah”) 
    
      Sub New() 
      End Sub 
    
      Sub Calculate 
        UsesAttribute WebMethod 
      End Sub 
    End Class
    
    Unweildy Interface-based polymorphism

    A strength of VB is its readable English keywords, but in practice this can sometimes get out of hand.

    Public Interface Shape 
      Function CalculateArea(ByVal X As Double, ByVal Y As Double) As Double 
    End Interface 
    Public Class RightTriangleClass 
      Implements Shape 
    
      Function CalculateArea(ByVal X As Double, _ 
        ByVal Y As Double) As Double Implements Shape.CalculateArea 
        Return 0.5 * (X * Y) 
      End Function 
    End Class 
    

    In general, any time the underscore has to be introduced to allow a logical line to continue on the next physical line, we see an area for improvement. The following would be a much nicer syntax for the above, with the option of reverting to the really verbose syntax only when there’s a conflict, or when you want to change the name of the interface.

    Public Class RightTriangleClass2 
      Implements Shape2 
    
      Function Implements CalculateArea(ByVal X As Double, ByVal Y As  Double) As Double 
        Return 0.5 * (X * Y) 
      End Function 
    End Class 
    
    Erase

    There’s no need for this keyword, as you can get the same effect by assigning Null. The Erase statement supports a variable number of arguments, but the limited utility of this is not worth the introduction of a keyword. Instead, if this functionality is important, then it would be more useful to allow ParamArray to use ByRef making it easy to implement Erase as a user function, as well as other possibly useful functions.

    ReDim and Preserve

    This functionality seems easy enough to reproduce with a few small functions, possibly added to the Array class.

    On, Error, Resume

    These keywords are no longer necessary in a modern VB. We now have exception handling, and these just clutter the language in the name of reverse compatibility.

    No public fields in Class types

    By eliminating the capability to create Public members of a class type, we can use the following syntax to declare simple properties.

    Class Foo 
      Public ReadOnly name As String 
      Public age As Int32 
    End Class 
    

    More complex properties could revert to the older syntax using Property. Public fields would still be allowed in value types.

    No more Set or Get

    “Set” is just too valuable to waste on a keyword. I’d rather have a System.Collections.Generic.Set for holding sets of objects. When declaring properties we can just take advantage of the fact that the mutator is always a Sub, while the accessor is always a Function. Then we no longer need the two keywords.

    Private myName As String 
    Public Property Name As String 
      Sub (ByVal s As String) 
        myName = s 
      End Sub 
      Function 
        Return myName 
      End Function 
    End Property 
    
    Goodbye Goto

    Let’s be the first language to discard goto. CS programs usually go to great lengths to teach students that goto is almost always the wrong solution. The variety of languages available on the .NET platform means that some languages like VB and C# can eliminate goto, while others (C++?) could keep it.

    Generic RAII scope

    The Using statement is one useful way to ensure that object are disposed correctly, but sometimes it can be cumbersome, and can lead to unnecessarilly deep nesting. It would be useful to introduce Scoped variables to handle this more cleanly in some cases.

    Sub Test 
      If blah Then 
        Scoped x, y, z As Connection 
        ' x, y, aand z are disposed here 
      End If 
      If blahblah Then 
        Scoped lock1 As Mutex = … 
        Scoped lock2 As Mutex = … 
        … 
        ' lock2, lock3, … are released 
      End If 
    End Sub 
    
    Eliminate SyncLock

    More general Scoped and Using keywords can replace the need for this.

    Modules, Namespace, and Shared Class members

    I find it a little confusing that all these exist. Maybe this could be simplified by allowing global functions in a Namespace as an alternative to Modules, or eliminating the Namespace keyword entirely and using the Module keyword for this concept.

    No Next

    It would make VB easier to understand and more consistent if “Loop” were used to end For Loops. This is consistent with the other loop variants, and makes it clearer what’s happenning.

    Eliminate While Loops

    Do While … Loop should be sufficient without the need for While…End While. This doesn’t actually save keywords, but simplifies the language by eliminating an unnecessary and more verbose syntax for while loops. The IDE could allow leaving off the Do keyword, and filling it in automatically.

    Simplify Continue and Exit

    Continue is followed by Do, While, or For to indicate which type of loop to continue. This added flexibility is unnecessary, and Continue should just jump to the top of the current loop.

    For I = 1 to 10 
      Do While blah 
        … 
        If x Then 
          Continue 
        End If 
      Loop 
    Loop 
    

    The above, would simply restart the Do While loop when x is True. If you really need to restart at the beginning of the For loop then the code can be refactored into multiple methods or some other approach.

    Similarly, only “Exit Loop” should be supported in the future. Exiting a sub, function, or property is accomplished with “Return”, and exiting a Select is unnecessary.

    ElseIf is inconsistent

    "For Each" and "End If" use two separate words, so Else If should also.

    Declare x As Object

    It would be nice if the “As Object” part of the expression were optional even when Option Strict and Option Explicit are enabled. The default value for any type would be Object. So I would advocate:

    Declare x

    More flexible Options

    As fancy new features such as Lambda Expressions and Closures are added to the language, we may want more flexible control over the options. For example, we may want to turn Strict and Explicit off for Lambdas, while leaving them on for everything else. We might also want finer control of the scope for these options, perhaps turning off Option Strict within the scope of a single Function.

    Rather than have keywords for these two statements, just introduce new VB attributes.

    StrictMethods(on|off) – controls whether late binding is allowed

    StrictConversions(on|off) – controls whether Int64 auto-converts to Int32

    ExplicitLambdas(on|off) – controls whether lambda expressions require types

    etc.

    Consider eliminating or extending With

    By allowing an empty with statement we can introduce an artificial scope, which can sometimes be useful to avoid having too many variables in scope. Usually it’s better to split such a method into separate subroutines, but this can sometimes make an algorithm harder to understand.

    With 
      Declare x As IntegerEnd With 
    

    ' x is not in scope

    It would also be useful to assign an alias for the With object.

    Sub Foo() 
      With f = GetEmployee.Name.FirstName 
        f.blah() 
        Log(f) 
      End With 
    End Sub 
    

    However, this is not so necessary with the proposed Define keyword:

    Sub Foo() 
      Define f As GetEmployee.Name.FirstName 
      f.blah() 
      Log(f) 
    End Sub 
    
    RaiseEvent unnecessary

    This keyword is similar to Call, and is also unnecesary, when the obvious syntax of calling the event as a function would be more intuitive.

    Event LogonCompleted(ByVal UserName As String) 
    Sub Logon 
      LogonCompleted(“Justin”) 
    End Sub 
    
    AddHandler, RemoveHandler

    Using language keywords for these is confusing. The C# syntax might be fine, or VB could automatically allow any Event/Delegate to have implicit Add/RemoveHandler methods. Note that += should NOT be used, because we’re appending to the list of handlers, therefore &= concatentation is more appropriate.

    Public Event LogonCompleted(ByVal UserName As String) 
    Sub OnLogon(ByVal x As String) 
    End Sub 
    Sub New 
      LogonCompleted &= Sub OnLogon 
      LogonCompleted.AddHandler(Sub OnLogon) 
    End Sub 
    
    No more REM, and add comment blocks

    There’s no point in having two mechanism for line comments, although it might be useful to introduce a multiline comment.

    ‘’’ Anything between here

    and here

    is a comment

    ‘’’

    No more implicit return variable for functions

    Having two ways to return a value from a function just adds complexity to the language with very little benefit. At best it saves a line of code or two to declare the return value and return it, but often modern programs don’t (and shouldn’t) declare a single return value anyway.

    Rename Shadows

    The description for this feature, provides the answer for what the keyword should be. “Specifies that a declared programming element redeclares and hides an identically named element, or set of overloaded elements, in a base class.” By renaming the keyword to Hides, it becomes more accessible to a new or infrequent VB programmer.

    Hides should also be useable to explicitly state desired behavior.

    Sub Foo 
      Declare I As Integer 
      If blah Then 
        Hides I As String 
      End If 
    End Sub 
    No End or Stop

    The Stop keyword should be removed from the language. There’s no need for this to be a language-specific feature, as it’s easy enough to use the portable System.Diagnostics.Debugger.Break() method.

    The End keyword (that halts the process) should also be removed for similar reasons. Normally, you would either let an uncaught exception percolate up and kill the process, or you could call Process.Kill() possibly followed by Process.WaitForExit().

    Allow automatic assignment of Arrays to variables

    Python has some useful syntax for working with Tuples. VB could add some of the same convenient syntax for working with arrays.

    Allow automatically assigning array values to variables. The a,b, and c variables would be filled with the first three values from the array.

    Function Foo() As String() 
      … 
    End Function 
    Declare a, b, c As Integer = Foo() 
    

    Also, allow more flexibility for array initializers.

    Declare x, y, z As Integer = 1, 2, 3 
    
    Add Slices

    Python has the useful ability to access a subset of any List, String, or Tuple. VB could support similar syntax without complicating the language much.

    Declare s As String = “Hello World” 
    s[0] equals ‘H’ 
    s[0 To 2] equals “H 
    s[-1] equals ‘d’ 
    s[-4 To -2] equals “orl” 
    

    etc.

    Remove Mid

    By allowing slices to be assigned, you make a more flexible Mid that is also more intuitive to read.

    Eliminate need for GetChar

    A single element slice should automatically return a Char instead of a one element string. The new `a` character literal should be equivalent to “a”[0].

    No need for { } array initializers

    VB should be able to figure this out without the braces, and should be able to parentheses to make the association explicit where it would be ambiguous.

    Declare x, y, z As Integer = 1, 2, 3 
    Declare x[] As Integer = 1, 2, 3 
    Declare x[][] As Integer = (1, 2, 3), (1, 2, 3) 
    Declare x[2, 2] As Integer = (1, 2, 3), (1, 2, 3) 
    Sub Foo(ByVal a() As Int32, ByVal b As Integer) 
    End Sub
    Foo((1, 2, 3), 4) 
    Array Bounds vs Last index confusion

    I still find the syntax choice for array declarations confusing, and it doesn’t help much that the To keyword is supported. (Unless the IDE were to add it automatically.)

    Declare x(5) As Integer should create a 5 element array with a lower bound of zero.

    This use of the To keyword should be removed again, because allowing flexible lower bounds just makes arrays more confusing to work with. For those who want or need explicit lower bounds, a generic BoundedArray class could be provided.

    Automatic ToString when using concatenation

    One of the benefits of having separate unambiguous concatenation operator ‘&’ in VB, is that it can automatically convert its arguments to strings.

    Declare I As Integer 
    Declare f As New Foo 
    f & I & “blah” 
    I & f & “blah” 
    

    This is currently possible only by explicitly implementing the concatentation operator for Foo, but it requires providing an implementation for every possible valuetype, or inefficiently overriding for just Object, forcing boxing. This could be partially helped by allowing generic operator overloading, but that would still be far too tedious for the common case. It might even be desirable to remove operator overloading for the ‘&’ operator, and instead forcing that to always represent string concatenation and be handled automatically.

    Separate operator for assignment and equality

    One of the keys to VB’s ease of use is that it doesn’t allow assignment statements to return a value. Unlike other languages, you can’t do:

    Declare a, b, c As Integer 
    a = b = c = 5 
    

    One result is that VB can always tell from the context whether ‘=’ is meant as an assignment or a comparison. This is actually a valuable feature, because it saves typing and prevents a common error from other languages.

    What I propose is that we use the := operator that is already used for named arguments as the only allowed assignment operator, and have the IDE automatically change ‘=’ into ‘:=’ as it’s parsed in the same way that it automatically changes “Endif” into “End If” and other similar transformations. One benefit is that we get immediate visual feedback that VB understood what we meant when it makes the transformation.

    For Loop should iterate only over the specified range

    The following expression currently increments M, then checks to see if it’s less than 5.

    For M = 0 To 5 

    This is confusing given the VB syntax, because the syntax seems to imply that “M” will only take on the values 0, 1, 2, 3, 4 and 5, but it will actually be 6 after the loop.

    Besides being confusing, it can easily lead to infinite loops if the incremented variable overflows. If overflow checking is enabled, then instead of an infinite loop, you get an exception. Either is unwarranted.

    Eliminate MyClass

    I find this keyword hard to remember, and I’d just like to be able to use the name of my class to be explicit when necessary.

    Class Foo 
      Protected Overridable Sub Bar() 
        ... 
      End Sub 
      Public Sub CallMyBar() 
        Foo.Bar() ' calls our own Bar 
        Bar() ' calls the bar of our most derived class 
      End Sub 
    End Class 
    Allow alternate spellings for Overridable, NotOverridable

    It would be consistent with the English language to also allow Overrideable and NotOverrideable as synonyms.

    Eliminate Type Characters

    Old style code that used type characters should no longer be supported.

    Function StrFunc(ByVal x&, ByVal y$, ByVal z#) 
    

    They also shouldn’t be allowed for literals.

    Literal DateTime values should still be supported however, because they’re useful, and easier to read than alternatives.

    Declare d As DateTime = #11/11/1970 13:32# 
    
    Literal Values for every type
    tmp = 42    ' Int32 
    tmp = 42UI  ' Unsigned Int32 
    tmp = 42L   ' Int64 
    tmp = 42UL  ' Unsigned Int64 
    tmp = 42S   ' Int16 
    tmp = 42US  ' Unsigned Int16 
    tmp = 42B   ' Byte 
    tmp = 42SB  ' Signed Byte 
    tmp = 42.5F ' Float32 
    tmp = 42.5  ' Float64 

    Currently, only L, UL, and UI are supported

    Eliminate Assembly, Option Compare, Binary, and Text Keywords

    We shouldn’t have to use language keywords (even unreserved ones) to change these behaviors. The IDE or command line tools should be able to handle it when necessary, or UsesAttribute.

    Option Compare may be completely unnecessary, as String already provides explicit control.

    Add missing Volatile keyword

    This should have the same meaning as other languages.

    GoSub, Let, and Wend

    These have been deprecated long enough and can all be eliminated if making the kind of sweeping changes I’m advocating.

    Bring Back Variant

    Option Strict is currently used for many related features, such as implicit type conversions and duck typing. However, it might be even better to reintroduce the Variant keyword to allow duck typing for individual identifiers. This would work even with Option Strict enabled.

    No VB-specific extension libraries by default

    I’d like to be able to write portable code that doesn’t rely on VB-specific libraries. I don’t care about access to legacy APIs that are currently in the Microsoft.VisualBasic namespace. Any features, such as the Convert function that are needed should be available without having to pull in other baggage as well.

    Lambda Expressions

    This feature is partially planned for the next release, but I’m afraid it’s going to allow code that is too cryptic and terse for Basic.

    The proposed syntax is:  

    inc = Function(y) y + 1
    

    Which creates an anonymous single-parameter function that returns its parameter incremented by 1. This really doesn’t feel like VB, as a) It’s not line-oriented, and b) it implicitly returns y+1.

    Even though it would prevent some common lambda use-cases, I’d prefer VB to only allow lambda expressions as variable initialization, and requiring multiple lines.

    inc = Function(y) 
            Return y + 1 
          End Function 
    

    This is only a fairly minor extension to existing Function and Sub usages. The above would only be allowed with Option Strict and Explicit disabled. Otherwise you’d get the usual:

    Dim inc As Function(ByVal y As Integer) As Integer 
                 Return y + 1 
               End Function 
    
    Must Intrinsic Functions Be Keywords?

    Several VB features involve mapping syntax that looks like a function call to CLR features. For example, Convert(x) becomes an appropriate ILAsm conversion function for the target assignment. If none of these had to be reserved as keywords, then it would simplify the language quite a bit. Assuming all the intrinsic function are in a suitable namespace, the usual namespace disimbiguation rules would apply.

    If these keywords can become intrinsic functions in a namespace, then more useful variations could be added such as explicit SafeConvert and UnsafeConvert functions that map to the various ILAsm type conversion features.

    Support Unmanaged Code

    I’d like to be able to use VB syntax to write lower level code. Instead of dropping in to C++, I’d like to forgo garbage collection, and other fancy features to write small efficient code without giving up the readability of VB. A separate smaller set of System namespace libraries could be provided to give me everything I need to write device drivers, protocol stacks, and programs that need absolutely minimal footprint.

    Summary

    In summary the above proposals would eliminate 88 of the roughly 200 existing keywords while only adding 12 new ones. It would also eliminate some overloaded uses of current keywords, and introduce several powerful new features from other languages. The cost of all this is reducted compatibility with existing programs, eliminating the ability to simply compile. If this isn’t desirable, then a new Basic-style languages could be introduced. I believe the advantages of a much simpler yet more powerful language are worth it for writing new programs, and possibly even porting existing ones.

    Although the language would retain approximately 120 keywords after these changes, the remaining ones seemed to make for more readable code than alternatives from other languages. Many of the remaining words can only be used as part of compound word key words, as with “Each” which can only be used in a “For Each” statement.

    New Keywords (13)

    [], Declare, Null, IsA, Hides, Scoped, Volatile, ''', UsesAttribute, Define, ‘_’, Overrideable, NotOverrideable

    Removed Keywords (86)

    {}, CBool, CChar, CDec, CDbl, CInt, CLng, CObj, CSByte, CShort, CSng, CStr, CUInt, CULng, CUShort, CDate, CType, DirectCast, AddHandler, RemoveHandler, AddressOf, Alias, Declare, Ansi, Unicode, Lib, Dim, Object, Byte, Char, String, Integer, Short, Double, Single, Long, UInteger, ULong, UShort, SByte, Decimal, Date, Call, Declare, Delegate, Erase, ReDim, Preserve, Error, Resume, GetType, GoTo, Let, GoSub, Wend, Namespace (Or Module), Next, Nothing, Option, Explicit, Strict, Compare, Text, Binary, On, Off, RaiseEvent, REM, Shadows, Stop, SyncLock, TryCast, TypeOf, Assembly, IsTrue, IsFalse, Mid, ':', '_', ‘%’, ‘@’, ‘!’, ‘$’, Set, Get

    IDE Features

    Another problem with the current VB is some very annoying or broken IDE functionality.

    Case Fixing Broken

    VB is supposed to update the case of symbols to match their declaration, but this is currently broken in several instances. For example, when I type “imports system.data\n” at the top of the file, it should automatically change it to “Imports System.Data\n”.

    A concerted effort should be made to fix this in the various places where it doesn’t currently work, and additionally a simple Ctrl-K, Ctrl-D should fix up the whole document.

    Indentation is broken

    If you type “if blah then\n” then VB will automatically indent to the proper position on the next line. However, if you move the cursor to a different line, and then go back, it will no longer be at the correct position. This wouldn’t be too bad if the Tab key would take you to the correct position, but it does not.

    Improve syntax recognition

    The VB IDE used to be better about recognizing syntax as I type. For example, if I were to type “for<space>each<space>” on a line, then the IDE would capitalize “For” as soon as I hit the first <space> key. Currently, the IDE only colorizes “for” which tells me that it does recognize the keyword, but it doesn’t fix capitalization until the whole line is entered. I do think that most syntax errors shouldn’t be highlighted and indentation shouldn’t be changed until the line is finished, but case-fixing, coloring, and other formatting changes should happen as soon as possible. This used to be one of the best things about working with VB.

    Tab key shouldn’t indent

    It should be possible to redefine the tab key to always indent the current line to the correct position, and NEVER to actually insert indentation. Basically it should be have the same as Emacs Tab, but without having to choose Emacs emulation.

    Allow assigning macros to tab

    In previous versions of VisualStudio I was easily able to implement my own EmacsTab using a macro, and I was able to bind this to the tab key. This will require cleaning up the Code Snippets feature to not be hard-coded to use the Tab key for moving between editable fields.

    Macros Too Difficult

    A macro should be able to programmatically duplicate anything I can do using the keyboard. I've tried repeatedly to write a macro that would duplicate the functionality of the Emacs tab key, but I haven't been able to get what I want. This macro took me only a few minutes to figure out with Visual C++ 6. In general the macro API is just too difficult-to-use for the amount of time have to devote to customization. Maybe a simplified facade should be provided. 

    More Assistance

    VB should be more helpful by more often allowing me not to type long keywords such as NotInheritable. For example, it could let me type the shorter text “not”, and then automatically expand it to the real keyword based on the current context. In general the VB parser should provide more context-sensitive assistance as I type code. It should always provide immediate but non intrusive feedback that it understands what I’m typing.

    Non-Intrusive Layout

    Often when I’m editing code, the current VB will immediately make layout changes as I type. This can make it very confusing in practice, because the indentation could “jump around” causing me to lose track of what I was doing. In general this is an example of VB taking the viewpoint of the parser rather than providing the illusion that it’s reading my mind. If I want to surround some code with “If … Then … End If”, then it only upsets me when VB immediately matches my newly inserted If with a previously existing End If, and indents my code accordingly. It’s only a slight distraction that some previous If now has a squiggly red underline indicating that it no longer has a matching End If. The solution is to ensure that VB doesn’t “jump to conclusions” on incomplete information. If a block of code contains syntax errors, then it should not be formatted, and it probably shouldn’t even be shown as a syntax error until I do something (Ctrl-S, Ctrl-K D, etc.) to indicate that I think the code is now correct. Perhaps VB could use a squiggly beige underline to tell me that it knows the code is not yet correct, but is currently waiting to see what I’ll type next.

    Customizable

    I’d like some more advanced control over how the code is formatted. In addition to indentation stlye and coloring that are currently available, I’d like control over how keywords are capitalized, the ability to disallow language features, and other controls.

    No More Compiling

    When working with Eclipse/Java there is no concept of Compiling per se. Instead, the syntax is highlighted as you type, and certain actions like saving files or running the program will cause the compilation to happen in the background. This seems very much in the spirit of VB, and seems almost there already.

    MultiFile Assemblies

    It would make unit testing much better if VB allowed you to easily create unit tests in the same assembly as the tested code, but in a separate project. This allows me to expose testing interfaces as Friends. The current workaround is to just include the tests in the same project as the tested code.

    Intellisense shows invalid values

    Often, after typing “xyx<dot><ctrl-space>” I’ll be given a dropdown list of irrelevant information. If VB doesn’t know the actual members of the xyz object then it shouldn’t provide a list at all.

    Automatic Imports

    One extremely nice feature of Eclipse/Java is that it can automatically figure out the imports. If I type in “Dim sw As StringWriter<ctrl-space>” then VB should prompt me with a list of namespaces that have StringWriter types. If there is only one possible choice then it should not even prompt. Additionally it should insert the necessay Imports statement at the top of the file.

    A corresponding feature from Eclipse/Java is the Ctrl-Shift-O keystroke which optimizes Imports. This automatically removes Imports that are no longer needed, and attempts to add any that are necessary. VB would require a new syntax for only importing selected types from a namespace.

    Code Snippets

    Several things about this feature really annoy me. Foremost is that the replacment fields stay active till the file is closed. Maybe it wouldn’t bother me as much if it was formatted differently, such as a solid underline that only highlighted when the field was the active one. The ugly green color is enough to prevent me from using this feature. The workaround I’ve decided on is to make the background color white. This makes the code snippet fields invisible unless the cursor is inside the field. Still, I’d like to be able to hit Enter (or maybe Ctrl-Enter) and have the fields finalized as if I’d closed and reopened the file.

    Eliminate My

    I’ve never liked the My feature very much. It seems like it could be replaced with the ability to assign duplicate Imports aliases. This would also allow me to pick a different name than “My”, and better control exactly what is included. For example, I might do the following:

    Imports Foo As System.Text.RegularExpressions 
    Imports Foo As Microsoft.VisualBasic.Devices 
    Imports Foo As Microsoft.VisualBasic.FileIO 

    This would combine all the listed namespaces into a single Foo namespace. Any ambiguities would have to be resolved explicitly as usual. It would also work with classes, allowing Shared methods to be called directly as usual.

    Summary

    Even though the list of things I would change about VB seems longer than my previous post about the things I like about VB, that shouldn’t give you the impression that I dislike the language. On the contrary I think it’s the best choice available on the .NET platform for most problems. C++/CLI provides more control, and the unique ability to comingle unmanaged and managed code. C# is similar to Java, more portable (Mono Linux), and has a few features that haven’t yet made it into VB. (yield, anonymous methods, unchecked, etc.) However, VB is the easiest to read and type, has the best editor, and is generally easier to work with for many problems.

    January 17

    Why VB is Best

    As I said in my previous post, I believe that Visual Basic is the best language for many types of applications. It provides all the power of C# and Java, but has arguably better tools and syntax. For now, I’ll concentrate on the positive, but in a later post I’ll discuss what I feel are VB’s shortcomings, and my suggestions for addressing them.

    For me, the most valuable VB language features are its Readable English Keywords, Line-Orientation, and Case-Insensitivity. The ability to quickly create GUI applications using drag-and-drop may have been the original selling point, but to me what makes VB great is the language. It’s currently my preferred tool for working on personal projects.

    Readable English Keywords

    For the most part, VB is a very easy language to read, and the keywords chosen come closer to conveying the correct meaning than with other languages.

    Many languages have a long list of special symbols that the user is required to memorize before the language is readable. Often the same symbol will mean completely different things depending on the context. While VB has the commonly used Math symbols, and even a separate concatenation symbol, overall its philosophy is to favor the use of keywords to help the user to understand the code. (As you’ll see later, I even think some of the current symbols should be removed.)

    The best VB keywords are those that help the reader to understand what’s going on without having to be fluent in VB or any other OO language. This helps beginners to learn the concepts, and also helps experienced programmers write more maintainable code.

    Examples

    Inherits, NotInheritable, MustInherit

    The VB syntax for object derivation is very clear and explicit as long as you’re familiar with the concept of inheritance in object oriented languages.

    Class Foo 
      Inherits Bar 
    End Class
    

    Furthermore the NotInheritable keyword is clearly related, and has the obvious meaning.

    NotInheritable Class Bar 
    End Class
    

    This would cause the Foo class above to have a compilation error.

    The MustInherit keyword is also clearly related to the other two.

    MustInherit Class Bar 
    End Class
    

    Even someone new to the language should be able to guess that this prevents the Bar class from being instantiated on its own.

    If you contrast the above with the corresponding concepts in Java (extends, final, and abstract) and C# (‘:’, sealed, abstract), it should be clear that the VB keywords are superior. Although Extends, NotExtendable, and MustExtend would work almost as well, the term “inheritance” for the whole concept is usually used even in Java circles.

    Overrides, Overridable, MustOverride, NotOverridable

    Similarly, the concept of “virtual” is overused in computer science, and therefore a poor choice for describing class methods that can be overridden in derived classes. The VB keywords have a more obvious meaning, and may even make OO polymorphism easier to grasp for beginners.

    The equivalent Java ( [virtual], virtual, abstract, and final), and C# (overrides, virtual, abstract, and sealed) are not as clear and don’t work as well together.

    Shared, Static

    VB uses the keyword Shared for methods and fields that apply at the class level. The C#, Java, C++ keyword static seems a little less clear to me, and gets muddled up with the separate concept for local data within a function that lives past the lifetime of the function. All four languages use static to describe this concept.

    Class Foo

    Public Shared Function CreateFoo() As Foo

    End Function

    Private Sub Bar

    Static called As Integer = 0

    called += 1

    End Sub

    End Class

    And, Or, Not

    Using these simple keywords instead of the common &&, !, and || is more approachable and actually easier to type, because the latter require shift-key combinations. The C-style symbols are also fairly arbitrary and make C-style languages just a little harder to learn and use.

    ByRef, ByVal

    Like C#, these do require you to understand the concepts of passing by value versus passing by reference, but I find the VB keyword a little clearer. Allowing the programmer to be explicit about ByVal also enhances readability, because I no longer have to remember which is the default. (It’s ByVal.)

    Do, While, Until, Loop

    I find the VB loop syntax to be flexible enough for any purpose, and much more intuitive than Java, C#, or C++. All the following are allowed.

    Do Until x 
    Loop 
    Do While x 
    Loop 
    Do 
    Loop Until x 
    Do 
    Loop While x
    

    You can also Continue or Exit a loop explicitly.

    For, Each, In

    I think the VB iteration syntax is nearly perfect.

    For Each x In y
    

    is not so different from C#:

    foreach (x in iterable)
    

    but the required parentheses seem to break up the statement in a weird place.

    Java is worse, because it uses an arbitrary ‘:’ symbol that you have to train yourself to read as “in”.

    for (x : iterable)
    
    For, To, Step

    The counted loop is also fairly simple, and I think much easier to read than C#, Java, C++ equivalents.

    For c = 0 To 9 
    For n = 1 To 20 Step 2
    

    as compared to :

    for (c = 0; c < 10; ++c) 
    for (n = 1; n <= 20; n += 2)
    
    Summary

    The above is certainly not an exhaustive list of the well-named features in VB, and I certainly don’t imply that all VB keywords have good names. (Later, I’ll explore in more detail just which keywords I think should be renamed, and I even advocate removing almost 90 keywords.) However, overall I find most of the VB keywords to be more intuitive than alternative languages, especially for those who don’t have preconceived notions of what a concept should be named. (e.g. virtual) This makes the language more approachable for beginners, but also easier for experienced programmers.

    Line Oriented

    Another major feature of VB is its preference for having one statement per line. This is a common practice by many programmers in Java, C#, and C++ as well, but VB has features to encourage the practice.

    No Semi-colon necessary

    In C-based languages you are required to put a semi-colon at the end of each statement, and an end-of-line has no special meaning. This allows a single statement to span multiple lines, and multiple statements to reside on a single line. I find that both of these practices tend to make code difficult to read. Tools such as Eclipse/Java have built-in code formatters to ensure that each statement resides on a single line, and to ensure that statements that span multiple lines have a consistent format.

    In VB, the default is for each statement to be terminated at the end of the line unless an underscore ‘_’ is placed after the statement, allowing a single statement to span multiple lines. VB also allows a colon ‘:’ between statements to force multiple statements on one line. As you’ll see later, I’d actually like these exceptions removed to force the programmer to find a more readable line-oriented style.

    Assignment Side Effects

    In VB, assignment is a statement, not an expression, and has no return value.

    In C++ terms, the VB assignment operator is written like this:

    void operator=(T lhs, T rhs);

    If I have an example, java program like this:

    if (x == 1000) { 
      x = y = z = 0; 
    }
    

    Then in VB, I must write it as:

    If x = 1000 Then 
      x = 0 
      y = 0 
      z = 0 
    End If
    

    Alternatively, I could use the colon:

    If x = 1000 Then 
      x = 0: y = 0: z = 0 
    End If
    

    One other side-effect is that VB is able to disambiguate assignment from equality checking. The common problem of typing “if (x = y)” instead of “if (x == y)” in C-based languages is eliminated, because “If x = y Then” always refers to equality checking and “x = y” always refers to assignment. (Later, I discuss why I think VB should automatically display a different operator for assignment.)

    Syntax Validation

    When you press the Enter key, VB knows that the statement is likely complete, and usually does syntax validation as well as possible automatic syntax cleanup. For example, if you type the first line of a multi-line expression such as “If x Then”, VB will automatically insert a closing “End If”, and put the insertion point at the correct spot, formatting your code as necessary. In practice this makes it feel as if VB is really aware of what the programmer is trying to do, and the IDE is an active partner in code construction. Other tools have some of these features, but I’ve never seen another tool be as unintrusive and natural about it as VB, although with VB.NET some things aren’t quite as smooth as in the past, which I’ll discuss later.

    Case Insensitive

    It’s pretty well known that VB is a case-insensitive language, but some may not realize just how much of a benefit this is to the programmer.

    Faster Typing

    Despite its propensity for fairly verbose keywords, I’ve always found that actually typing in vb programs is often faster than with other languages. This is partly because you rarely need to use two-key combinations. The IDE will automatically capitalize symbols to match the declaration. My educated guess is that VB will be 10-20% faster to type than an equivalent language like C# or Java. The key is to allow the IDE to share the load, and when switching to VB from another language I usually need some time to train myself not to capitalize symbols, add unnecessary parentheses, and to use caps-lock effectively.

    Here’s an example. Type in:

    mustinherit class Foo<Enter>public mustoverride function Count as integer<Enter><Delete Line><End><Enter>class Bar<Enter>inherits foo<Enter><Down><Down>return 0<Down>

    And what you will see is:

    MustInherit Class Foo 
      Public MustOverride Function Count() As Integer 
    End Class 
    
    Class Bar 
      Inherits Foo 
      Public Overrides Function Count() As Integer 
        Return 0 
      End Function 
    End Class
    

    Only 3 two-key combinations are required for the above in VB. (The initial declarations of Foo, Count, and Bar.) For comparison, the same code requires at least 16 two-key combinations in C#. Furthermore, the VB code doesn’t require cleanup for formatting and indentation, making it even faster to write equivalent code.

    Better Naming

    By disallowing multiple symbols with names differing only by case, VB forces you to pick less ambiguous names. This leads to better readability than common code in other languages. For example, I’ve often seen code like the following:

    Foo foo = new Foo();

    If you read this aloud then both “foo” symbols sound exactly alike (which is the definition of less-readable).

    To make the best use of VB, you should also pick a naming convention that works well. For example, by always prepending or appending an underscore to private fields, you are increasing the number of two-key combinations required. A better convention is to prepend “my”  or "m" to private fields, and to use camel-case for all other names.

    Other

    Concatenation Operator

    Using the same operator for addition and concatenation leads to less readable code. By having a separate concatenation operator, the following code can compile.

    Dim s As String = 123 & "456"
    

    Note: If the + operator were used instead, then this would be a compile error.

    Optional Duck Typing

    VB supports optional relaxing of type rules. When “Option Strict Off” is specified for a file, then you no longer have to explicitly specify the type when declaring variables. VB will also automatically convert types. For example:

    Console.WriteLine(123 + “456”) 
    

    will print “579”, not “123456”.

    VB will also allow access to any public member of a type as long as it can be found at runtime. This is what’s known in Python as “Duck Typing”, because "If it walks like a duck and quacks like a duck, it must be a duck". VB is the only language I know which allows selectively enabling this feature.

    Optional Explicit Declaration

    Completely separate from the question of type safety is whether to require variable declarations. Even when duck typing is desired, often it’s nice to still enforce variable declaration. This can eliminate some common typos, but can be selectively disabled just as with type safety.

    Declarative Events

    VB supports the same events and delegates as C#, but also provides the ability to specify event handlers in the signature of a method.

    Class Foo 
      WithEvents myCon As Bar.Connection 
      
      Sub OnConnected() Handles myCon.ConnectedEvent 
      End Sub 
    End Class
    

    In this example, the OnConnected method is automatically “wired-up” to the myCon Connection object. This can often be more readable than the alternatives.

    Reference Parameters

    Occasionally it’s useful to be able to pass arguments by reference. This is supported by most (all?) .NET languages.

    Sub ChangeTwoInts(ByRef n As Int32, ByRef m As Int32) 
      n += 1 
      m += 2 
    End Sub
    
    Named Parameters

    VB allows specifying explicit names for parameters. One place this can be useful is when calling a function that take a boolean.

    Public Sub DoSomething(ByVal isFinished As Boolean) 
    End Sub 
    b.DoSomething(isFinished:=True)
    

    This makes the code much more readable, and eliminates the need for a comment or temporary variable to clarify the code.

    Optional Parameters

    VB also allows optional parameters, although using this feature for public methods is discouraged because doing so makes the code non-portable. For example, C# doesn’t support code with optional parameters. Still, optional parameters can eliminate code duplication that would be required with method overloading, and shouldn’t be avoided for internal functionality or when portability isn’t important.

    Parameter Arrays

    VB supports creating methods that take 0 or more optional arguments of a single type. For example:

    Public Sub Foo(ByVal ParamArray args() As String)
    

    This function accepts Foo(), Foo(“a”), Foo(“a”, “b”), etc.

    Powerful Select

    The VB Select statement is similar to the switch statement provided in C-like languages, except that it’s more flexible.

    Here are examples that use the advanced VB features.

    Select Case age 
    Case 0, 5, 10 
      Console.WriteLine("0,5,10") 
    Case Is > 50 
      Console.WriteLine(">50") 
    Case Is < 10 
      Console.WriteLine("<10") 
    Case 16 To 19, Is > 30 
      Console.WriteLine("16-19 or > 30") 
    Case Else 
      Console.WriteLine("else") 
    End Select 
    
    Dim name As String = "Justin" 
    Select Case name 
    Case "Abe" To "Barney" 
      Console.WriteLine("a-b") 
    Case "Boris" To "Kevin" 
      Console.WriteLine("b-k") 
    Case "Kevin" To "Vincent" 
      Console.WriteLine("k-v") 
    Case Else 
      Console.WriteLine("else") 
    End Select
    
    Full Featured Exception Catching

    The .NET framework supports an optional filter for catch statements. VB exposes this functionality allowing you to only catch exceptions when some criteria is met. For example:

    Try 
      ' do something 
    Catch ex As Exception When LoggingIsEnabled 
      Log(ex) 
    End Try
    
    Refactoring

    VB has some basic Refactoring support built in, and a license for Refactor! which provides even more. I’m generally not a big fan of refactoring tools, but the VB stuff is pretty intuitive and unintrusive. The important ability to rename things is what I use most.

    Powerful Imports

    VB can use the Import statement for more than just namespaces. For example, if I include “Imports System.Console” at the top of a file, then I can call “WriteLine()” directly without having to specify the Console class.

    Summary

    Despite the length of the above list, I’m sure I’ve forgotten some features, but maybe it will entice you to download the free Express version and try it yourself. Overall I think that VB is the best language for most programming, and it should get even better in the future. The next version will likely introduce Type Inference, Lambda Expressions, Closures, LINQ, and other enhancements. In my next post I’ll discuss a large list of new features and changes that I personally would like to see in a future version of VB.

    January 16

    Visual Basic

    I’ve been wanting to write some posts about Visual Basic for a long time. It’s one of the reasons I finally started blogging. While I haven’t been able to use VB for much beyond personal projects for the last 8 years, I had used it pretty extensively in the past. More recently, I’ve been using VB.NET for various personal projects, and I thought I’d share some of my thoughts and experiences.

    It seems that VB has always been misunderstood. Versions <= 6 have an undeserved reputation as a “toy” language, and this has carried over to some extent for VB.NET, even though it’s now almost identical to Java or C#. I feel VB was also misunderstood as a tool only suitable for database front-end applications. I actually never felt it was a very good tool for writing these types of programs. MS Access (which I suppose can be considered a VB dialect) was a far better choice, because of its improved support for bound controls and built-in reporting tools. For years VB was the most logical choice for many types of applications, but my problem was that it wasn’t terribly suitable for the kinds of programming that I prefer.

    With the advent of VB for .NET, I think VB may actually be the best choice for writing many applications that have been traditionally developed in C++. The Java community has proved that it’s at least possible to write Databases, Compilers, Development Tools, and other systems programs using a similar platform, and from a purely technical perspective, I think Visual Basic is a better choice than C# or Java for many of these kinds of applications. In coming posts, I’ll detail my reasons for believing so as well as a pretty exhaustive list of what I think could/should be done to make VB even better in the future.

    In my next post, I’ll discuss what I think are the best features of VB.

    December 12

    Lambda Expressions In Visual Basic

    Paul Vick had a post last week, where he solicited feedback on syntax for this feature that will likely be included in a future version of Visual Basic.

    http://www.panopticoncentral.net/archive/2006/12/08/18587.aspx#FeedBack

    I originally posted my opinion in a comment on his blog, but it apparently didn't make it past the censors. He proposed 3 possible syntax variations, but I'm not found of any of them. My proposal is a fourth style, and I think it has a lot of merit.

    What's a Lambda Expression?

    This is probably better explained elsewhere, but basically it as an anonymous function.

    For example, given the following code, ...

    Class Test
      Private myNames As List(Of Name)
      ...
      Public Sub PrintLegalNames()
        For Each n As Name In myNames
          Console.WriteLine(n.ToString())
        Next
      End Sub
    End Class
    

    we could replace the explicit loop using List.ForEach.

    Public Sub PrintLegalNames()
      myNames.ForEach(AddressOf WriteName)
    End Sub

    However, for this to work, we have to provide a suitable WriteName function.

    Public Sub WriteName(ByVal n As Name) 
      Console.WriteLine(n.ToString()) 
    End Sub
    

    My proposal is to support the following syntax.

    Public Sub PrintLegalNames()
      myNames.ForEach(Console.WriteLine(ByVal(0).ToString()))
    End Sub

    The lambda expression is just a normal expression that uses parameters and return statements inline. Here are some more examples that show multiple parameters, functions, and ByRef parameters compared to roughly equivalent code.

    Dim ary(0 to 9) As Integer
    
    Array.Find(ary, Return ByVal(0) = 2) 
    Function Find2(ByVal x As Integer) As Boolean
      Return x = 2
    End Function
    Array.Find(ary, AddressOf Find2)
    
    
    Array.ForEach(ary, ByRef(0) += 1) 
    Sub AddOne(ByRef x As Integer)
      x += 1
    End Sub
    Array.ForEach(ary, AddressOf AddOne)
    
    
    Array.Sort(ary, ByVal(1) >= ByVal(0))
    Function Less(ByVal lhs As Integer, ByVal rhs As Integer) As Boolean
      Return lhs < rhs
    End Function
    Array.Sort(ary, AddressOf Less)
    
    

    What about closures?

    It wasn't mentioned in Pauls post, but I think closures are also a necessary feature so that the following code will work.

    Sub SendPartyInvitation(ByVal state As State)
      ...
      Dim legal As Integer = state.LegalDrinkingAge
      Dim names As List(Of Name) = CreateList()
      ...
      tmp = List.FindAll(names, Return ByVal(0).Age >= legal)
      ...
    End Sub

    The key feature, is that the "legal" local variable is available for use within the lambda expression, which would not be the case if each expression where simply an anonymous function.

    November 18

    Creating Text Files - Downloads

    The code (create_text_file.zip) can now be found in my SkyDrive public folder.
     
    To build the C++ samples, you'll need to create project files using MPC. The C++/CLI and VB.NET projects use Visual Studio 2005 (The free Express edition will probably work.) The Java projects use Eclipse and Java 1.5.
     
    MPC is a program that a coworker and I designed to make working with C++ projects much easier. Most of the C++ code that we write needs to run on a wide variety of platforms, and supporting all the different build tools can be a pain. Rather than including project files or makefiles for VC++, nmake, bmake, gmake, etc, we just create simple text files that can be used to generate any of the above.
     
    How simple?
     
    Here's the complete set of MPC files for all seven C++ projects:
     
    create_text_file/create_text_file.mpb -- This is a Base project that is used to set defaults.
    project {
      after += timer
      libs += timer
      libpaths += ../timer
      includes += ../timer
    }
    This basically says that all projects that derive from this base are dependent on a timer library found at ../timer.
     
    create_text_file/create_text_file.mwc -- This is a workspace file used to group related projects
    workspace {
      cmdline += -static
      implicit = create_text_file
      exclude {
        FileStreamDotNet
      }
    }
    This says to automatically create a project in any suitable subdirectories containing either C++ source code or .mpc files. Each implicitly created project will automatically inherit from the .mpb above. We also exclude one directory, because it contains a C++/CLI .NET project.
     
    create_text_file/timer/timer.mpc -- This is an explicit project file
    project {
      staticname = timer
    }
     
    This file causes a project to be created with the name timer. There are staticname and sharedname keywords, allowing you to specify different names depending on whether you generate a project to create a dynamic or static library.  If only one name is specified then the other defaults to the same name.
    Actually, the timer.mpc file is totally unnecessary, because it merely specifies the same behavior as the default.
     
    I also explicitly specify that a static library will be created by including a cmdline += -static in the .mwc.
     
     
    You can read more about MPC and download it here.
    To run MPC, you'll also need Perl.
     
    To create the Visual Studio 2005 version of these projects unzip create_text_file.zip to a directory, and run mwc.pl like so:
     
    c:\create_text_file>mwc.pl -type vc8
     
    This should create a .SLN file which can then be used to build and run the projects.
     
     
    November 17

    Creating Text Files - Coming Soon...

    I forgot to include the source code in my previous posts, but now I can't figure out an easy way to do it. Why would my blogging server allow posting pictures, but not other files? It looks like I'm going to have to find some other server to hold the files. What a hassle.

    November 16

    Creating Text Files - Conclusion

    SCSI Blues

    On the one SCSI machine, the write speed was 2-5MB/s regardless of implementation or configuration options if WriteThrough was enabled. This probably just means that the drive does a more thorough job of honoring the WriteThrough setting. However the SCSI drive should have been able to achieve something close to its 125MB/s specified transfer rate.

     

    One thing I tried to improve the performance on the SCSI machine was to install Windows Server 2003 R2. This had no benefit until I enabled a new “Enable advanced performance” option for the disk driver. This seemed to help quite a bit with the slowest tests now in the 9MB/s range. This is still quite a bit slower than the fastest system (#1) however.

    results (win32 2000MB)

    Write Through + Disable Caching + Defrag = 9MB/s @ <1% CPU

    As above with 16MB buffer = 53MB/s @ <1% CPU

    Write Through + Defrag + 16MB buffer = 58MB/s @ 28% CPU

    Defrag = 55MB/s @ 20% CPU

    Defrag + 16MB buffer = 47MB/s @ 21% CPU

    compared to system #1

    Write Through + Disable Caching + Defrag = 55MB/s @ <2% CPU

    As above with 1MB buffer = 68MB/s @ <1% CPU (>1MB made no difference)

    Write Through + Defrag + 1MB buffer = 68MB/s @ 8% CPU

    Defrag = 90MB/s @ 10% CPU

    Defrag + 1MB buffer = 56MB/s @ 6% CPU

    A Grain of Salt

    The performance measurements above paint a certain picture which can be a little misleading. One thing they don’t show is subjective disk thrashing that seemed to occur without Defragmenting. They also don’t show how repeatable the results were from run to run. The numbers only represent the best result I was able to achieve with the given options, but some tests were very consistent, while others varied widely. For instance the .NET results seemed to vary more than the others.

    I also found that the Defrag option made a bigger difference on my older home machine #2, than it did on my work machine #1.  I suspect this was due to improvements in the newer version of the hard disk.

    Before putting too much stock in these measurements, you should run them using your own platforms and compliers.

    Conclusions

    • Measure performance using a simple application that has similar disk access patterns to your real application. (Or a subset of your application.)
    • Use the mechanisms above to help ensure that files are written defragmented.
    • If you have to use std::ofstream then performance may be greatly improved by providing a larger buffer, using ios::binary, preventing excessive calls to tellp(),  writing string::c_str() instead of string, and using ostream::write() instead of operator<<().
    • If you must use std::ostream, but need more powerful features such as writing encrypted or compressed files, automatically deleting a file when it’s closed, specifying caching hints to the OS, or just want better performance, then write a custom std::streambuf implementation similar to std::file_buf.
    • For the ultimate performance consider bypassing the std::ostream facility entirely.
    • SCSI drive systems behave differently, and must be tested separately. When using SCSI, you can get the best performance by using a very large buffer, and specifying WriteThrough and DisableCaching.
    • Let your ears guide you. A noisy disk drive may indicate fundamental problems with your design.
    • DotNet provided pretty good performance and features for very little effort, although some features such as Disable Buffering are inexplicably missing.
    • Java was not very performant or feature filled despite the NIO implementation. It was also very difficult to write the NIO version.

    System 1 Results for 100MB

    Test MB/s CPU% Defrag WriteThrough NoCache
    ofstream_unopt 11 100 0 0 0
    ofstream 40 50 1 0 0
      80 100 0 0 0
    fast_ofstream 30 42 0 1 0
      40 50 1 1 0
      43 65 1 0 1
      75 100 0 0 0
      75 100 1 0 0
    Custom FileStream 255 100 0 0 0
      267 96 1 0 0
      55 12 0 1 0
      68 19 1 1 0
      53 50 0 0 1
      77 36 1 0 1
      71 37 1 1 1
    VB 2005 206 100 0 0 0
      213 100 1 0 0
      59 40 0 1 0
      63 38 1 1 0
    C++ CLI 228 100 0 0 0
      237 100 1 0 0
      53 32 0 1 0
      66 38 1 1 0
    Java FileOutputStream 26 100 0 0 0
      20 100 1 0 0
    Java NIO 22 100 0 0 0
      24 100 1 0 0

     


    Creating Text Files - Other Languages and Platforms

    Some of the C++ programs above (fast_ofstream in particular) were fairly difficult to implement, and I wanted to see what could be achieved using a more rapid development tool. So far I’ve implemented the test in Visual Basic 2005, and Java (2 ways). Adam Mitz (a coworker) also contributed a C++/CLI implementation for comparison.

    I also tested many of the programs on Macintosh as well as Linux, and these platforms achieved similar benefits.

    In the future I’d like to try the tests under Mono on Linux, and possibly implement a Java version using FileOutputStream, since NIO didn’t seem to provide an advantage.

    I ran tests on multiple systems.

    1. WinXP Core 2 Duo 6700 w/ 10Krpm drive and 2GB RAM
    2. WinXP Athlon64 3400 w/ 10Krpm drive and 1GB RAM
    3. WinXP Dual AthlonMP 2800 w/ 15Krpm SCSI drive and 1.5GB RAM
    4. Linux Athlon64 3500 w/ 10Krpm drive and 1GB RAM
    5. OSX Mac Mini Core Duo
    6. WinXP Mac Mini Core Duo

    The results quoted throughout this article are those from the first configuration above, and this system unsurprisingly also gave the best performance.

    All of these systems gave very similar relative results with the exception of #3, which seemed due primarily to the use of a SCSI disk drive.

    Visual Basic 2005

    I chose to implement the test in VB, because I find the language and IDE allow me to write more quickly than other .NET alternatives. I think the VB language has a lot of problems (such as some of the keyword names), but I find that case insensitivity, line orientation, English keywords, and other features make it much easier to work with than the C-like alternatives.

    In the end it only took a few hours to create the VB version of the program, and it was actually this program that led me to create the fastest C++ programs. I was pretty satisfied with C++ fast_ofstream when it was running at 40MB/s, but the VB program was much faster with much less development effort. This led me to develop the C++ FileStream program which then led to further improvements in fast_ofstream.

    No Options = 206MB/s @ 100% CPU

    Defrag Only = 213MB/s @ 100% CPU

    Write Through = 59MB/s @ 40% CPU

    Write Through + Defrag = 63MB/s @ 38% CPU

    One problem with .NET is that the System.FileStream object doesn’t support disabling the system cache. I was therefore unable to measure that option.

    Another problem is that all Strings are Unicode, and there is some small overhead outside the main loop to encode them as ASCII. While this may affect this test, it also would be trivial to change the VB program to write out UTF8, UTF16, UTF32, and other formats. I wouldn’t even want to try this with the C++ version.

    For what it’s worth the VB program, despite the apparent verbosity of the language, was actually shorter than most of the C++ versions.

    Sub WriteSampleFile()
      Dim pt As New PerfTimer
      Dim fname As String = OUT_DIR & FILESIZE_MB & "MB.txt"
      Dim fs As FileStream = CreateFile(fname)
      Dim numLines As Long
      Dim totalBytes As Int64 = FILESIZE_MB * 1024L * 1024L
      Using fs
        Console.WriteLine("Creating {0}MB file.", FILESIZE_MB)
        If DEFRAGMENTED Then
          Dim reserve As Long = ((totalBytes \ BUFSIZE) + 1) * BUFSIZE
          fs.SetLength(reserve)
        End If
        Dim enc As Text.Encoding = Text.Encoding.ASCII
        Dim msg1() As Byte = enc.GetBytes("All work and no play makes ")
        Dim msg2() As Byte = enc.GetBytes(" a dull boy." & Environment.NewLine)
        Dim tabs() As Byte = {9, 9, 9, 9, 9, 9, 9, 9}
        Dim names As List(Of Byte()) = CreateNames()
        Do
          numLines += 1
          fs.Write(tabs, 0, CInt(numLines Mod tabs.Length))
          fs.Write(msg1, 0, msg1.Length)
          Dim name() As Byte = names.Item(CInt(numLines Mod names.Count))
          fs.Write(name, 0, name.Length)
          fs.Write(msg2, 0, msg2.Length)
        Loop Until fs.Position >= totalBytes
        fs.SetLength(totalBytes)
      End Using
      pt.PrintElapsed("Done. ")
      Dim tp As Double = CalcThroughput(FILESIZE_MB, pt.ElapsedWall)
      Console.WriteLine("Wrote {0} lines at {1} MB/s", numLines, tp)
    End Sub
    

    C++/CLI

    I want to thank my coworker Adam Mitz for contributing a C++/CLI version of the program. This version was a little faster, but .NET performance varies quite a bit, and the VB version and C++ version have about the same performance overall.

    No Options = 228MB/s @ 100% CPU

    Defrag Only = 237MB/s @ 100% CPU

    Write Through = 53MB/s @ 32% CPU

    Write Through + Defrag = 66MB/s @ 38%CPU

    int main(array<String^>^)
    {
      using IO::FileStream;
      PerfTimer pt;
      String^ fname = gcnew String(OUT_DIR);
      fname += FILESIZE_MB;
      fname += "MB.txt";
      long long totalBytes = FILESIZE_MB * 1024 * 1024;
      Console::WriteLine("Creating {0}MB file.", FILESIZE_MB);
      FileStream^ fs = CreateFile(fname);
      if(DEFRAGMENTED)
      {
        long long reserve = static_cast<long long>(((totalBytes / (double)BUFSIZE) + 1) * BUFSIZE);
        fs->SetLength(reserve);
      }
      array<unsigned char>^ msg1 = Text::Encoding::ASCII->GetBytes("All work and no play makes ");
      array<unsigned char>^ msg2 = Text::Encoding::ASCII->GetBytes(" a dull boy.\n");
      array<unsigned char>^ tabs = {9, 9, 9, 9, 9, 9, 9, 9};
      array<array<unsigned char>^>^ names =
      {
        Text::Encoding::ASCII->GetBytes("Jack"),
        Text::Encoding::ASCII->GetBytes("Justin Michel"),
        Text::Encoding::ASCII->GetBytes("Fred Flintstone"),
        Text::Encoding::ASCII->GetBytes("Barney Rubble"),
        Text::Encoding::ASCII->GetBytes("Homer J. Simpson"),
        Text::Encoding::ASCII->GetBytes("John Jacob Jingleheimer Schmidt")
      };
      long numLines(0);
      do
      {
        ++numLines;
        fs->Write(tabs, 0, numLines % tabs->Length);
        fs->Write(msg1, 0, msg1->Length);
        array<unsigned char>^ name = names[numLines % names->Length];
        fs->Write(name, 0, name->Length);
        fs->Write(msg2, 0, msg2->Length);
      }
      while(fs->Position < totalBytes);
      fs->SetLength(totalBytes);
      pt.PrintElapsed("Done. ");
      double tp = CalcThroughput(FILESIZE_MB, pt.ElapsedWall);
      Console::WriteLine("Wrote {0} lines at {1} MB/s", numLines, tp);
      return 0;
    }
    

    Java NIO

    I first implemented the Java version using NIO rather than the much simpler FileOutputStream, because the latter didn’t appear to support the necessary features to create defragmented files, and I thought that NIO should give the best potential performance. Later, I rewrote the program using FileOutputStream, because the complexity of using NIO didn’t seem to have any perceivable benefit.

    The first thing I noticed is that this implementation of the test program was far more difficult than any of the others. This is despite the fact that I have the most recent experience with Java, and had even implemented a project using NIO within the last six months. The biggest problems involved the complication of dealing with ByteBuffer and String encoding. Compare the code needed to send a String encoded as ASCII in VB.NET.

    Dim enc As Text.Encoding = Text.Encoding.ASCII
    Dim buf() As Byte = enc.GetBytes("This is a test.")
    fs.Write(buf, 0, buf.Length)
    

    To the equivalent Java NIO.

    Charset cs = Charset.forName("US-ASCII");
    CharsetEncoder enc = cs.newEncoder();
    enc.onMalformedInput(CodingErrorAction.REPLACE);
    enc.onUnmappableCharacter(CodingErrorAction.REPLACE);
    String str = “This is a test.“;
    ByteBuffer buf = ByteBuffer.allocateDirect(str);
    enc.reset();
    CharBuffer cb = CharBuffer.wrap(str);
    CoderResult cr = enc.encode(cb, buf, true);
    if (cr == CoderResult.OVERFLOW) {
    throw new Exception(“WTF”);
    } // CoderResult.UNDERFLOW ignored
    cr = enc.flush(buf);
    if (cr == CoderResult.OVERFLOW) {
    throw new Exception(“WTF”);
    } // CoderResult.UNDERFLOW ignored
    channel.write(buf);
    

    Another problem with the Java implementation is that I couldn’t find a good way to implement most of the features. There’s no capability to specify Write Through, Disable Caching, or Sequential options. Furthermore the FileChannel.truncate() method doesn’t support extending a file. I had to resort to using the same Defragment method that I used for ofstream.

    No Options = 22MB/s

    Defrag Only = 24MB/s

    I didn’t know a way to measure CPU usage in Java, but it seemed to be about 100%.

    The much simpler FileOutputStream implementation gave similar performance.

    No Options = 26MB/s

    Defrag Only = 20MB/s

    Next, I'll try to wrap this all up with some conclusions and a summary of what we've learned.