| Justin's profileVarious Technical TopicsPhotosBlog | Help |
|
Various Technical TopicsMay 13 Time for an SSDThe time is right to buy an SSD, because they are finally at the price/performance level where it’s just ridiculous to use anything else. Naysayers will claim that it’s still too early to switch, because you can buy a terabyte hard disk for $50 while a decent 60GB SSD costs $200. However, I think this price difference is irrelevant even in the current economy, because the performance difference is enormous and most of us don’t want or need a drive that big. Those that do need more storage (for a PVR?) can use an extra $50 drive in addition to a primary fast drive. SSD technology is improving so rapidly that there are a lot of sub-par products still available. For example, you could buy an OCZ Apex 60GB drive for $145, and it will likely have similar performance to the more expensive new OCZ Vertex drives in some benchmarks. However, it may also suffer from some stuttering problems common to this generation of drives. For more info see the recent SSD articles at http://www.anandtech.com/storage/. The best advice is to buy only what you need as the prices are dropping quickly. Common wisdom is that the price is cut in half each year, however I paid $31/GB for a small MemoRight SLC for my laptop 1.5 years ago, and the latest drives I bought are much higher performance at 1/10th the price. No one predicted that the prices would drop this precipitously, and I see they’re still selling my drive for $21/GB even today. A year from now they’ll probably have terabyte DRAM speed drives for $5 at Walgreens. :-) The safest bet is probably the Intel X-25M which is available for $630 @ 160GB or about half that price for 80GB. Personally I took a little more risk and bought 4 60GB OCZ Vertex drives for $200 each. I have 2 each set up as RAID0 in my two home computers, and I believe I’m getting much better performance than the Intel drive in most cases, although I’ve had a few headaches. I’ve had to flash firmware updates to the drives to fix bugs and get new features, which erases the drives. I switched to Windows 7 RC while I was at it which has some new features to work better with an SSD. So what are the benefits? · SSD is completely silent · I can read files at ~400MB/s and write at ~300MB/s · Seek time is ~.1ms · Applications load instantly and never stutter or freeze. · Lower power consumption and heat generation My 2.66Ghz Core 2 Duo NVidia 680Sli machine takes about 40 seconds to start Windows 7 64bit from the point of pushing the power button with half of that time(20.5s) consisting of BIOS stuff. Here’s a picture of a common disk benchmark. Notice that it writes much faster than it reads in many cases. That seems to be an anomaly with this particular motherboard raid controller, and was the same when I was running Vista 32bit. I haven’t noticed it in practice. My 3.16Ghz Core 2 Duo Intel 975 machine takes 49 seconds to start Windows Vista 32 bit with 16s of that consisting of BIOS stuff (And I hit Esc to skip the memory test.) The same benchmark test on this machine shows much better (and more normal) results. This SSD thing is also a big deal for software developers. It’s time once again to adjust your perceptions and learn the new physical reality brought on by this change. Stop designing for outdated equipment. Planning your software architecture around the performance characteristics of a mechanical hard disk now makes no more sense than planning to run your software on a Cassette Tape Drive. The rules have changed. Databases should be re-architected. Persistence should be revisited. Why load your files into in-memory RAM objects when you may get better overall performance leaving them on the disk? Change the way you do things, and your software is likely to be better than legacy solutions. Some companies and people have a hard time coping changing rules, so now’s a great time to challenge the status quo. Let me close with an analogy. If a truck were available that can teleport 30 miles at a time, and gets 100,000 mpg on regular gasoline, would you still want to buy a traditional truck? What if you could get a giant dump truck for $500 and the teleporting 100,000 mpg one cost $63,000? What if a sports car version were available for $20,000 that could go 500mph when you didn’t feel like teleporting and still get 100,000mpg? Does it really matter if 5 years from now you could get a 200,000mpg version that could teleport 50 miles at a time for $10,000? City planners, architects, shipping companies, and other affected parties would have to get their act together to take advantage of “physics 2.0” or new governments, communities, and businesses would replace them. If you are even slightly inconvenienced or annoyed by the speed, noise, stuttering, power consumption, or other problems associated with a legacy mechanical hard disk then a solution is available for as little as $200. Update 5/20/2009 : Newegg has the G.Skill Falcon 128GB for $309. I think this is identical to the OCZ Vertex 120GB, but ~$70 less. December 01 My Blog PersonalityTry out Typealyzer to find out what your blog says about your personality. Here's mine… INTJ - The Scientists The long-range thinking and individualistic type. They are especially good at looking at almost anything and figuring out a way of improving it - often with a highly creative and imaginative touch. They are intellectually curious and daring, but might be pshysically hesitant to try new things. August 02 Quantifying ExperienceIt's hard to find good developers. And it's worse than useless to filter candidates by "experience" as included on a resume. Statements like "10 years Java experience" or "15 years SQL experience" are meaningless. The problem is that you can't sum up real experience in a single concise number, and if you could it wouldn't be measured in years. We could try to come up with a formula, but any attempt to do so makes it clear that "Software Developer" encompasses a very wide variety of actual skills. What I'd really like to see on a resume, or in addition to a resume, is a real list summarizing all the things you've actually done, and the lessons you learned while doing them. This includes all of the following…
The above list probably does give us a fairly good measure of experience. If you can work through each of the above items completely in a fairly terse writing style and finish in less than a month, then you're probably not very experienced. Looking at this list, I can't even estimate how long it would take me to thoroughly explore the 5 items. I'm not sure, but it might be worth working through the exercise for yourself. Just don't expect anyone else to read it. My next post, … Experience Isn't Everything May 26 Comparing Strings Using Natural OrderingThis topic has been brought up numerous times before, but I wanted to take a shot at another solution, because I think there is still room for improvement, and I wanted a simple example to experiment with some new VB9 functionality. Here are a bunch of links to the most recent information on the problem and a bunch of potential solutions. By spelunking through these, you ought to be able to find a solution in any language you like, and every possible combination of algorithms. http://www.codinghorror.com/blog/archives/001018.html http://www.davekoelle.com/alphanum.html http://nedbatchelder.com/blog/200712/human_sorting.html http://www.interact-sw.co.uk/iangblog/2007/12/13/natural-sorting http://www.codeproject.com/KB/string/NaturalComparer.aspx Probably the simplest example of sorting above is the following Python program which originally came from a comment posted to Ned Batchelder's blog above… 1: def natural_sort(lst): 2: to_int = lambda text: int(text) if text.isdigit() else text 3: alphanum_key = lambda key: [ to_int(c) for c in re.split('([0-9]+)', key) ] 4: lst.sort( key=alphanum_key )Ian Griffiths was even able to more or less duplicate this solution in C# 3 by writing a couple of general purpose utility functions. After a little cleanup, it looks like this… 1: static IOrderedEnumerable<string> NaturalSort(List<string> lst) { 2: Func<string, object> ToInt = s => { 3: try { 4: return int.Parse(s); 5: } catch { 6: } 7: return s; 8: }; 9: return lst.OrderBy(s => Regex.Split(s.Replace(" ", ""), "([0-9]+)") 10: .Select(ToInt), new EnumerableComparer<object>()); 11: } One problem with Ian's solution is that it doesn't do an in-place sort, which makes it not quite a fair comparison with the original. Also, OrderBy uses deferred execution, so you must be careful when comparing the performance that you force the sort to happen (e.g. Call List.GetEnumerator().GetNext()) Of course, if it's fair to write a missing EnumerableComparer class, then why not just implement a NaturalComparer, which would make the resulting code even simpler… lst.Sort(NaturalComparer.CurrentCultureIgnoreCase); or even… lst.OrderBy(s => s, NaturalComparer.CurrentCultureIgnoreCase); This seems to give you the best solution with the added flexibility of working with OrderBy and Sort. (Interestingly, OrderBy often seems to be slightly faster than Sort, which seems strange considering that Sort doesn't have to allocate an entirely new structure to contain the results.) Before I show you my implementation for NaturalComparer, here are some problems found in most of the previous solutions I've seen.
The following is the Compare function from my solution to the problem. You can download the full source from . Public Function Compare(ByVal x As String, ByVal y As String) _ As Integer Implements IComparer(Of String).Compare If x Is Nothing AndAlso y Is Nothing Then Return 0 End If If x Is Nothing Then Return -1 End If If y Is Nothing Then Return 1 End If Dim xpos, ypos As Integer Do While xpos < x.Length AndAlso ypos < y.Length xpos = FindFirstIndexOf(x, xpos, myIsLetOrDig) ypos = FindFirstIndexOf(y, ypos, myIsLetOrDig) If xpos = -1 AndAlso ypos = -1 Then Return 0 ElseIf xpos = -1 Then Return -1 ElseIf ypos = -1 Then Return 1 ElseIf Char.IsNumber(x(xpos)) AndAlso Char.IsNumber(y(ypos)) Then Dim xtmp = FindNextIndexOf(x, xpos, myIsNonZero) Dim ytmp = FindNextIndexOf(y, ypos, myIsNonZero) Dim xend = FindNextIndexOf(x, xpos, myIsNotNum) Dim yend = FindNextIndexOf(y, ypos, myIsNotNum) xpos = If(xtmp = xend, xtmp - 1, xtmp) ypos = If(ytmp = yend, ytmp - 1, ytmp) If xend - xpos < yend - ypos Then Return -1 ElseIf xend - xpos > yend - ypos Then Return 1 Else Dim iy = ypos For ix = xpos To xend - 1 If x(ix) < y(iy) Then Return -1 ElseIf x(ix) > y(iy) Then Return 1 End If iy += 1 Next End If ElseIf Char.IsNumber(x(xpos)) Then Return -1 ElseIf Char.IsNumber(y(ypos)) Then Return 1 Else Dim xend = FindNextIndexOf(x, xpos, myIsNotLet) Dim yend = FindNextIndexOf(y, ypos, myIsNotLet) Dim l = xend - xpos Dim r = myCmpInfo.Compare(x, xpos, l, y, ypos, l, myCmpOpt) If r <> 0 Then Return r End If xpos = xend - 1 ' -1, because we're about to +1 ypos = yend - 1 End If xpos += 1 ypos += 1 Loop If xpos >= x.Length AndAlso ypos >= y.Length Then Return 0 ElseIf xpos >= x.Length Then Return -1 ElseIf ypos >= y.Length Then Return 1 End If Return 0 End Function The basic idea is to iterate forward through the two input strings, using the FindXXX functions to find the separators between numbers, letters, and ignored characters. We return from Compare as soon as possible without having to look at any characters following the first difference. No extra allocations are performed. For example, when comparing strings I used the form of CompareInfo.Compare that takes offsets and lengths for the two strings to avoid having to allocate substrings. Originally the code used inline lambda expressions for the arguments to FindXXX, however testing showed a significant performance increase from predefining those functions outside any individual Compare, as they are always the same. For example, here's the definition for myIsNonZero, and the others are much the same. Private Shared ReadOnly myIsNonZero As CharPred = Function(c) c <> "0"c Finally, here's what FindNextIndexOf looks like… 1: Public Delegate Function CharPred(ByVal c As Char) As Boolean 2: 3: Public Function FindNextIndexOf(ByVal s As String, _ 4: ByVal start As Integer, ByVal p As CharPred) As Integer 5: 6: If s Is Nothing OrElse s.Length = 0 Then 7: Return s.Length 8: End If 9: For i = start To s.Length - 1 10: Dim c = s(i) 11: If p(c) Then 12: Return i 13: End If 14: Next 15: Return s.Length 16: End Function Some NumbersI manually did a few test comparisons between my solution and an optimized form of Ian's (compiled regex and int.TryParse) I compared using two lists of strings shuffled randomly. The first list contained the numbers 1-5000, and the second contained alternating strings, numbers, and special characters of the form "string n string n string n" where n was 1-5000. Optimized IanG, string type one (108-115ms) Optimized IanG, string type two (291-316ms) Mine, string type one (16-20ms) Mine, string type two (158-171ms) Once again, these numbers aren't really fair to compare, because my solution handles things like case insensitivity, ignoring non alphanums, culture idiosyncracies, etc. But it's nice to know that the extra features are more than paid for with the more efficient algorithm. There's still room for improvement, so I may make updates from time to time if I need to use this code on a real project.
Feel free to use the code any way you see fit, and please let me know if you find any problems or have any other suggestions. February 05 128bit Encoding ExplainedFirst, here's the promised VB version of the code. It's almost identical to the Java version, except that...
Module Utils Const bUnder As Byte = 95 Const bDollar As Byte = 36 Const bQuest As Byte = 63 Const bFirstUAlpha As Byte = 65 Const bFirstLAlpha As Byte = 97 Const bFirstNum As Byte = 48 Const mask6Bits As UInt64 = 63 Const mask4bits As UInt64 = 15 Const mask2Bits As UInt64 = 3 Const bitsPerChar As Integer = 6 Dim CharMap() As Char = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$".ToCharArray Function UInt128ToStr(ByVal msb As UInt64, ByVal lsb As UInt64) As String Const MAX_BYTE As Integer = 31 Dim buf(0 To MAX_BYTE) As Char Dim i = MAX_BYTE ' 64 bit number has ten 6bit encoded values plus four bits left over For n = 1 To 10 If lsb = 0 AndAlso msb = 0 Then Exit For ' Eliminate leading zeros End If Dim b = CByte(lsb And mask6Bits) buf(i) = CharMap(b) i -= 1 lsb >>= bitsPerChar Next ' 4 bits from the lsb, and 2 from the msb If lsb > 0 OrElse msb > 0 OrElse i = MAX_BYTE Then Dim leftOver = lsb And mask4bits Dim firstTwo = msb And mask2Bits Dim b = CByte((firstTwo << 4) Or leftOver) buf(i) = CharMap(b) i -= 1 msb >>= 2 End If Do While msb <> 0 Dim b = CByte(msb And mask6Bits) buf(i) = CharMap(b) i -= 1 msb >>= bitsPerChar Loop ' It's easiest to simply prefix an underscore to avoid ' illegal identifiers and clashes with keywords and literals. buf(i) = "_"c i -= 1 Return New String(buf, i + 1, MAX_BYTE - i) End Function Public Sub StrToUInt128(ByVal s As String, ByRef msb As UInt64, ByRef lsb As UInt64) Dim buf() = ASCIIEncoding.ASCII.GetBytes(s) Dim maxByte = buf.Length - 1 Dim i = maxByte Dim minByte = 0 If buf(0) = bUnder Then minByte = 1 End If msb = 0 lsb = 0 For n = 0 To 9 If i < minByte Then Return End If Dim b = CharToMod64(buf(i)) If b <> 0 Then If n <> 0 Then b <<= (n * bitsPerChar) End If lsb = lsb Or b End If i -= 1 Next If i >= minByte Then Dim b = CharToMod64(buf(i)) If b <> 0 Then Dim leftOver = b And mask4bits msb = (b >> 4) And mask2Bits leftOver <<= 60 lsb = lsb Or leftOver End If i -= 1 End If For m = 0 To 10 If i < minByte Then Return End If Dim b = CharToMod64(buf(i)) If b <> 0 Then b <<= (m * bitsPerChar + 2) msb = msb Or b End If i -= 1 Next End Sub Private Function CharToMod64(ByVal c As Byte) As Byte Const b10 As Byte = 10 Select Case c Case bFirstNum To bFirstNum + 10 Return c - bFirstNum Case bFirstUAlpha To bFirstUAlpha + 26 Return c - bFirstUAlpha + b10 Case bFirstLAlpha To bFirstLAlpha + 26 Return c - bFirstLAlpha + bDollar Case bUnder Return 62 Case bDollar Return 63 Case Else Debug.Assert(False, "Unexpected Character " & c) Return 0 End Select End Function End Module Turning Numbers Into StringsBecause neither language has native support for 128bit integers, we have to resort to using two 64 bit integers. The algorithm I devised simply encodes every 6 bits of the input integers as a single character. I thought this worked out perfectly, because my first reading of Section 3.8 of the Java Language Spec lead me to believe that Java supports A-Z, a-z, 0-9, $, and _ as the only legal characters in identifiers. I learned later that it actually supports many more, so this algorithm could probably be extended to use N bits instead of only 6, which would make the resulting strings even smaller. In the end, I decided against this, because the current choice is less likely to have problems with character sets and encoding/decoding. With 6 bits per character and 128 bits total, you can see that (128/6= 21.3~) at most 22 characters are required to hold any number. To ensure legal Java identifiers we also prefix an underscore to each generated string. The first step in the algorithm is to allocate a temporary buffer to hold the decoded characters. I used 32, because I'm a power-of-two kind of guy. Next we process the least significant 64 bit number, repeatedly masking out the lowest 6 bits, converting them to a Character using a lookup table, and then shifting right 6 bits to get the next piece. In Java, we must use the unsigned shift operator for this, because a signed shift will pull in 1's from the left, destroying the number. Notice that we only break out of the loop when we've either taken the first 60 bits of the 64 bit number, or both msb and lsb are zero. If only lsb is zero, then we just repeatedly divide 0/6 until 60 bits have been processed. This ensures that we get any trailing zeros. (e.g. Because 100 <> 100000.) The middle section of code is to handle the remaining 4 bits from the lsb, and the first 2 bits of the msb. This is accomplished using masks and shifts appropriately. The i = MAX_BYTE condition is there to ensure that the number zero is written out as "_0" instead of "_". Finally we process the remaining 62 bits from the msb. As soon as msb = 0 we're free to break out of the loop, because we don't want any leading zeros to pad the string. One nice property of this algorithm is that it encodes sequential numbers in an efficient and almost human readable format. The first 10 numbers are just encoded as "_0" through "_9", followed by "_A" through "_Z", "_a" through "_z", then "_" and "$". It then rolls over to the next digit with the next 64 encoded as "_10" through "_1$". Anyone used to hexidecimal to string encoding should find this intuitive. Furthermore it should be obvious that for random 128bit numbers this algorithm gives an optimal encoding to 64 characters. Turning Strings Into NumbersReversing the process is only slightly trickier than the initial encoding. First we convert the input string into an array of Bytes. One complication is that I didn't want to assume in the decoder that every input string would start with an underscore, so we check for this at the start and set minByte to either 1 or 0. This was originally there, because in the original Int128ToString() function I only prepended the underscore when necessary to make a legal Java identifier. However, I removed that code due to all the complications in checking for keyword and literal conflicts, and the flexibility in the decoding seemed nice so I left it. Just as with encoding, the decoder steps through the first 60 bits worth of characters. This time, we use a function to convert the input characters back to base64 numbers using CharToMod64. Each pass through the loop using a bitwise OR operation to combine the returned base64 number with the current value of the lsb. The trick here is that I found it most straightforward to shift the decoded base64 number to the correct position (b <<= n * 6) and then combine with the lsb using a bitwise OR. The middle section once again converts the 6 bits from the decoded number into the remaining 4 bits for the lsb and the first 2 bits of the msb. The final section processes the remaining characters into the final 62 bits of the msb. PerformanceI found a few interesting things with the performance of this algorithm. First, I wrote a simple program to time how long it takes to encode the first million numbers, encode the last million, encode and decode the first million, and finally encode and decode the last million. Here's the code in VB. Public Sub Main() Dim startTime = DateTime.Now For i As UInt64 = 0 To 999999 Dim enc = UInt128ToStr(i, i) Next Dim stopTime = DateTime.Now Console.WriteLine("Encode took " & (stopTime - startTime).TotalMilliseconds) startTime = DateTime.Now For i As Int64 = -1L To -1000000 Step -1 Dim enc = UInt128ToStr(CULng(i), CULng(i)) Dim msb, lsb As UInt64 StrToUInt128(enc, msb, lsb) Next stopTime = DateTime.Now Console.WriteLine("Encode big took " & (stopTime - startTime).TotalMilliseconds) startTime = DateTime.Now For i As UInt64 = 0 To 999999 Dim enc = UInt128ToStr(i, i) Dim msb, lsb As UInt64 StrToUInt128(enc, msb, lsb) Next stopTime = DateTime.Now Console.WriteLine("Roundtrip took " & (stopTime - startTime).TotalMilliseconds) startTime = DateTime.Now For i As Int64 = -1L To -1000000 Step -1 Dim enc = UInt128ToStr(CULng(i), CULng(i)) Dim msb, lsb As UInt64 StrToUInt128(enc, msb, lsb) Next stopTime = DateTime.Now Console.WriteLine("Roundtrip big took " & (stopTime - startTime).TotalMilliseconds) End Sub Although both Java and .NET are probably more than fast enough, Java was 2-3 times faster at encoding, but a round trip encode/decode took about the same time for both. (Further testing of decode seemed to show that the .NET version really is faster at decoding.) It's not worth it to me at this point to figure out why, but if someone is interested I would recommend using ILDASM to inspect the generated IL assembly code for the .NET version. I don't think this is the kind of thing that's going to be helped by source analysis or profiling. Java Encode took 188
VB Encode took 555.6285
For comparison I also found source code online for a fast Base64 encoder (http://migbase64.sourceforge.net/). Base64 Encode took 375
This is not a slight on Mikael Grev's algorithm at all, because mine doesn't even pretend to generate compliant RFC2054 Base64 encoded strings. In the end, I think I came up with a pretty tight little algorithm to solve a particular problem. I hope someone finds it useful. January 28 Encoding 128bit Numbers as StringsMy current project needed a way to convert between 128 bit numbers (e.g. java.util.UUID) and strings. A special requirement is that the strings be legal Java identifiers, because we use them for variable names in generated code. For the same reason, I wanted to ensure that the strings were of minimal size. I was able to come up with a trivial algorithm in a few minutes that works OK, but I felt that I should be able to come up with the most optimal solution given a little more time. It turns out that the optimal solution consumed most of this weekend, but I finally got it working. There's no way I can bill my customer for this exercise, so I thought I'd post it here in case anyone else finds it interesting. I'm also curious to see if anyone can come up with a better or faster solution. Basically my approach is to treat the 128 bit number as up to 21 groups of 6 bits each. This gives me 64 possible characters which works out perfectly, because Java has exactly 64 legal characters for identifiers (0-9, A-Z, a-z, _, and $). I also prepend an additional underscore to avoid illegal starting characters and clashes with keywords and literals. For now, here's the Java source to ponder. Tomorrow I'll discuss the code in more detail, and post a VB version which I actually wrote first, then ported to Java.
October 16 Windows Environment Editor released on SourceForgeI finally got around to submitting the Windows Environment Editor to SourceForge. You may recall this as the project originally created for the ill-fated OCI Summer of Code programming contest. We haven't done much work on it since then, but the major bugs have been addressed, and I've been using it regularly, as have a few brave volunteers. I think it's the best of the many replacement environment editors currently available, but you can judge that for yourself. TechnologiesOne of my primary purposes for this project was to learn how to create GUI applications using the relatively new Windows Presentation Foundation (WPF 1.0) for .NET. This framework, while similar to others in many respects, is really something fundamentally different than libraries like Swing, WinForms, GTK, MFC, etc. For more information I recommend reading http://arstechnica.com/reviews/os/pretty-vista.ars One of the benefits of WPF is the separation between GUI design using a markup language (XAML) similar to HTML, and a programming API for making that GUI work. In theory, this will allow designers to use programs like Microsoft Expression Blend to make applications like mine look better. For this reason, we tried to express as much of the GUI as possible using XAML, rather than reverting to code. We also wasted huge amounts of time using Blend to play with the GUI design, but in the end we gave up and went with a simple (ugly?) design that eschews all the fancy gradient effects. Of course, for reasons best explained in my previous posts, I used Visual Basic 2005 as the programming language. All told the application is something less that 2000 lines of code&XAML. In the future, maybe I'll port everything to the upcoming new releases for WPF and VB. Features
ConclusionI estimate that this application represents several hundred hours of work by myself and Ngan that may have been better spent playing Scrabble or Age Of Empires. I hope that at least a few people find it useful. If you have any questions, want to contribute features or fixes, or just want to try it out, I encourage you to check out the project. July 29 NOT the Worlds Most Advanced MouseI thought I'd share my experience with a recent consumer electronics purchase. Maybe I can save you some pain and trouble, and salvage something from my own wasted effort. The ProblemI started the search for a new mouse, because I am occasionally annoyed by the cord on my other mice, and I was hoping that wireless mouse technology was finally at a usable point. I first tried cordless mice two years ago using a Christmas gift certificate to purchase the top-of-the-line Microsoft solution. (Wireless Laser 6000). However, I found this to be completely unusable, because of it's penchant for missing clicks, and poor tracking. The Solution?So, I plunked down $100 last Wednesday to purchase a Logitech MX Revolution, which claims to be "The Worlds Most Advanced Mouse". I was disappointed to find that this mouse shares many of the flaws the MS Wireless mouse, and even manages to bring back some problems I thought we'd left behind with the old ball mice, and has upped the ante with a product design that is so flawed that at first could not believe it. But first… What I Want In a Mouse5 ButtonsI routinely make use of 5 buttons while using a mouse. I use the obvious left button to click and double-click my way through GUI interfaces. I use the right button to pull up context menus. I use the middle button to open browser links in separate tabs and to close those tabs with a single click. And I use the back and forward button to retrace my steps through browser-style interfaces. Any other buttons are more likely to be accidentally pressed than activated on purpose, and should be minimal and unobtrusive, or better yet, absent. Scroll WheelI constantly use the scroll wheel (if I'm using the mouse at all) for anything with a document-like interface. However, I'm skeptical of the value of side-to-side scrolling found on many newer mice. This might be useful for working with large pictures or something, but I don't see any value for me, and would always choose to disable the feature. Still, the scroll wheel has a lot of room for improvement. For me, it often scrolls too much with each step. Even the default Windows mouse driver assumes that the scroll wheel should scroll one or more lines at minimum. What I really want is something more precise, which I can currently only get by clicking and dragging the scroll bar "puck" or "thumb". This is an area where the MX Revolution really had a chance to innovate to fix one of my major gripes with current scroll wheel implementations. PrecisionMany people don't notice the precision of their mouse, but the quickest way I know to grasp the concept is to either a) Try to write your name using a paint program, or b) try to play a game. Under these and similar scenarios, differences in mouse tracking become very apparent. When I'm playing Age Of Empires III, I only have so much time to click all the little guys to tell them where to attack. If my mouse causes me to miss, or makes it difficult to select the right units in the heat of battle, then it makes me upset. Of course, if you're observant you can readilly notice the difference between a good mouse and a poor one when using any GUI. A good mouse makes it much easier to click on the myriad buttons, hyperlinks, and other elements of the modern graphical interface. ClickingThis should go without saying, but the mouse buttons should respond as expected when clicked. Both of the wireless mice I've used have occasionally failed to register a click the first time I've pressed a button. This could be a driver issue, or a problem related to RF interference, or any number of things, but it's definitely unacceptable. The ReviewI went with the Logitech, because I've been extremely happy with a MX518 wireless gaming mouse, which is, to date, the best mouse I've ever owned. My hope was that they would essentially provide an MX518 in wireless form, perhaps with fixes for my only few gripes with the wired version (Doesn't remember my DPI setting between reboots, and I occasionally accidentally hit the dpi and property buttons. ) Gimmick One, A New WheelThe coolest thing about the new mouse, is the idea for a new scroll wheel. They installed this nice heavy wheel in the center which works in one of two software-controlled modes. In the first mode it behaves like any other scroll wheel, clicking as you spin it, and scrolling N lines for each click. The second mode, disengages something internally, allowing the wheel to freely spin. This could have been so cool had it been implemented correctly. What I expected was that the scroll wheel would finally give the illusion of a physical connection to the document. If I moved it the tiniest amount, then the document should scroll by a single pixel or less. If I were to spin it, then the document should scroll at the speed of the wheel until it lost momentum or I stopped the spinning manually. This is NOT what happened. Instead, the free-spinning mode seemed to simply control the same N-line scroll wheel mechanism as before. Perhaps a driver update could fix this issue, by tying the free spinning mode to sub-pixel manipulation of the scroll thumb instead of hooking into the scroll wheel mechanism. This wouldn't quite be the right effect for long documents, because you'd want to limit the maximum speed, but it would be better, and I'm not sure if perfection can be achieved without specific software support. The scroll wheel also supports the ubiquitous horizontal scrolling feature that I dislike, but I can't remember if it was possible to disable it. No Middle ButtonWhen using the Logitech mouse driver, pushing the scroll wheel either switches scrolling modes between free-spinning and "clicking", or can be disabled entirely. There is no provision at all for supporting a middle button, which I personally find totally unacceptable, especially considering the uselessness of their scroll wheel feature. Gimmick Two, SearchLogitech added a new button behind the scroll wheel, which can only be used for their new search feature. When clicked on a highlighted word, presumably it will pull up the word the word in the search engine of your choice. I tried the feature only a few times, and never accepted the default Yahoo service. However, it rarely worked, and I was never quite sure what I was doing wrong, or if the feature is just buggy or non-intuitive. In any case, the only thing I really wanted to do was remap this button to control the scroll-wheel mode, and use the scroll wheel button for its usual middle-click duties. Of course, the Logitech software didn't support this. To be fair, clicking the search button on the mouse seemed to do the same thing as clicking the search button on my keyboard (MS Comfort Curve 2000. The best keyboard ever. I bought 3.). It brings up the new Vista search dialog, which I don't really like. Maybe there is simply some software conflict, or Logitech doesn't correctly support Vista. Gimmick Three, A Button On My ThumbLogitech added another strange feature to this mouse. There is a little toggle under your right thumb, which can be pressed forward or backward. When you do so, it will popup a very ugly window where you can choose between running applications. This seemed equivalent, but inferior, to the Alt-Tab switcher, or Flip3D found in Windows Vista. I wish I had thought to capture a picture so you could see just how poorly implemented this was. Once again, it was impossible to map these to more logical functions such as the usual forward/back buttons, which might have made sense. It also might have been cool to tie into Flip3D or the Alt-Tab switcher, but I probably would have mapped this feature to the buttons on the left of the mouse, as I would use it much less than browser forward/back navigation. The SoftwareStrangely this mouse seems to require more than the usual driver. To enable most of the advanced features you must run the SetPoint software in the system tray. If you exit this program, then the mouse reverts to normal MS mouse behavior. This does re-enable the middle button, but the thumb button and search button are disabled. If I were forced to keep this mouse, then this is probably the way I would try use it. The biggest problem with the software is the lack of useful customizability. As mentioned above, most features were hardcoded to specific buttons, and could only be disabled rather than reassigned. There were several sliders for controlling the scroll wheel speed and acceleration, but I could find no setting that made this feature usable, or in any way better than any other scroll wheel. Precision ProblemsThe most notable problems with this mouse had nothing to do with the flawed or missing features detailed above, but instead reflect my previous experience with a wireless mouse. Can't ClickSeveral times while using my computer, I would click on a link while web browsing, or click a Settler in AOE3, only to find that the click didn't register. At the time, I blamed myself or even Windows Vista. I thought that perhaps I'd moved slightly, and the game thought I'd dragged rather than clicked. However, over time, and with back to back comparison with a MX518 wired mouse, it became apparent that the mouse was just occasionally missing clicks. Maybe I have too much interference. I do have a Wireless-G network and 2.4Ghz portable phones. However, I don't really think that's any excuse for Logitech here, because this sort of an environment is commonplace among the user base for this product. Can't MoveIf you've used a mouse before 1998 or so, then you probably remember ball mice, and all the problems associated with them. Somehow Logitech has managed to recreate the experience of using one of these. This probably has the same root cause as my click problem above, but frequently when using this mouse, the pointer would freeze to a spot on the screen, and I could get it to move only by quickly moving the mouse to get it going again. This only seemed to happen when I was moving the mouse slowly in the first place. For example, I might be trying to click on a AOE3 Settler or army unit to issue new orders. Lest you think that I'm just being picky, my girlfriend also noticed the same problems with both wireless mice while playing Age Of Empires, which was quickly remedied by reverting to an old Microsoft IntelliMouse Optical. These problems occurred with any software, it was just much more noticeable in a game, because you often need to move at a precise rate to click on a moving object. SummaryIn my first hardware review, I give Logitech MX Revolution a 0/10. Although many of the problems might be fixable by future driver, firmware, and software updates, I don't think many of the decisions such as the need to run a separate software program instead of just a driver, will likely ever be addressed. I know of other people who love this mouse, so feel free to experiment with it yourself, but I advise you to purchase from a local retailer with a forgiving return policy, and to hang on to your receipt. If your experience is anything like mine, then you'll probably regret the wasted effort. On Friday, I took the mouse back for a refund. In the end I'll probably just buy another Logitech MX518, but I'm still stuck with a wired mouse whose cable occasionally interferes with whatever I'm trying to do. Maybe I should just clean my desk. July 27 RAII For .NET and JavaOne of the complaints that many C++ developers have with Java and .NET is the lack of destructor semantics. I've even included this as a proposal for a Scoped keyword in my post about VB shortcomings. Basically, with C++ you can implement a special method on a class which will automatically be called when instances go out of scope. Note, that this really has absolutely nothing to do with freeing memory, although that was it's most common usage in older (pre-1990?) C++ code. These days this mechanism is much more common, and used for all kinds of things, mostly still involving release of resources. (Files, Mutexes, Database Connections, etc.) It's even become a best practice with its own acronym, RAII (Resource Acquisition Is Initialization). C# and VB have a similar general-purpose mechanism with Using Statements, but the semantics can get unweildy. Java and older versions of VB are stuck using try/finally (also available in C#). C++ void foo() { Connection a("1"); … Connection z("26"); // Do stuff } C# void Foo() { using (Connection a = new Connection("1") … using (Connection z = new Connection("26") { // Do stuff }
} VB Sub Foo Using a As New Connection("1") … … Using z As New Connection("26") VB (older versions) Sub Foo Try Dim a As New Connection("1") … Dim z As New Connection("26") // Do stuff Finally a.Dispose() … z.Dispose() End Try End Sub Java void foo() { try { Connection a = new Connection("1"); … Connection z = new Connection("26"); } finally { a.close(); … z.close(); } The C++ code is very clean, because the special destructor method is called to close each connection as they go out of scope at the bottom of the method. The C# code is much more verbose, but stacking the using statements at least alleviate the nesting problems of the VB version. The older VB and java versions are stuck manually closing everything in the finally block, which can be especially cumbersome when you have to check each instance for null. I think this is definitely an area where these languages need some more syntactic sugar. I think a Scoped keyword would be ideal. Here's how it could look in Java. void foo() { scoped Connection a = new Connection("1"); … scoped Connection z = new Connection("26"); // Do stuff } Each Connection object would be responsible for implementing a special method which would automatically be called when the references went out of scope. This could either tie into the Finalizer mechanism similar to how .NET does it, or it could use a new approach perhaps consisting of a compile-time annotation for the Connection class. (For more information on .NET IDisposable) C# and VB could really use something similar, because the Using syntax often remains too bitter for my taste. For C# and especially VB, we can probably do better using something like Scott McMaster's suggestion. Here we trade some efficiency for convenience. I would write it like so… void Foo() { using (ScopeManager scope = new ScopeManager()) { Connection a = new Connection("1"); scope.add(a); … Connection z = new Connection("26"); scope.add(z); } } Here we save ~26 lines of using statements, each possibly with its own nesting level. It's still not as nice as the C++ mechanism, but it's available now, and preferrable to any alternative I can think of at the moment. This could potentially be useful in Java too, but less so due to the lack of Using semantics.
May 22 What Does “Loosely Typed” Mean?Eric questions the meaning of "Loosely Typed" I think that there are two separate issues here. Required DeclarationsThis corresponds to the "Option Explicit" feature added to VB3, and which still exists in VB.NET. You can choose for a particular file whether to require variable declarations or not. This feature helps catch bugs at compile time where you accidentally misspell the name of some symbol. Consider the following code. Class Person Private name Private age As Double Private money As Decimal = 100.0 End Class The name field is declared, but no type is specified, whereas the age field is both declared and given a type, and money is declared, given a type, and an initial value. Required TypeThis somewhat corresponds to the "Option Strict" feature in VB, except that the feature enables quite a few other behaviors that are somewhat related, such as the ability to implicitly convert between Strings and Numbers. If I have "Option Strict Off" and "Option Explicit Off" at the top of a VB file, then the following will compile. to = "Hello" to = to.Length to = to + too This shows one problem with making variable declarations optional, because I was able to misspell "to" as "too" in the last line, and the compiler couldn't check it. This is why I like to be able to specify "Option Explicit On" in VB even when I set "Option Strict Off". I prefer to get a little red squiggly underline of "too" rather than waiting to catch this in a unit test. Inferred TypeVB9 and C#3 add another twist to this issue, because they now support type inference. With this feature, you can still get strong type safety, but without the hassle of explicitly stating the type. Take the following C# example: var name = "Justin"; var len = name.Length; name = 42.0; In this example the first line is exactly equivalent to "string name = "Justin";", and the third line will be a compile time error, because the inferred type of name is String, which doesn't support implicit conversion from Double. The only thing I don't like about the current implementation of the type inference feature, is that it's done by the compiler. What I'd really like is for the IDE to display the name of the inferred type, and perhaps let me modify its choice without making the type "sticky". Maybe the IDE could automatically display the above as "var string name = "Justin";", where "string" is non-editable and mutates itself automatically to match changes to the inferred type when refactoring. Strong vs. LooseThese terms are currently pretty ambiguous. I think the term Strong should only mean that at runtime, the code will result in an error if you attempt an operation that is not known to be safe. Examples might include:
The VB "Option Strict Off" setting basically throws "Loose Typing" in with "No Type Declaration Required", and makes the following code legal. Class A Public Sub Foo() End Sub End Class Dim v as Object v = new A() v.Foo() v = "42" v += 1 Assert(v.Equals(43)) This is true Loose Type behavior, and can be useful in certain limited contexts. However, I wish you could have finer-grained control over the VB options so that I could retain some type safety while still disabling the need for explicit type specifications. I guess some of the need for this is obviated by the type-inference feature. Dynamic vs. StaticThis seems to clearly refer to whether type information is known at compile time. Static Typing means that all type errors short of invalid explicit casts will be caught at compile time. ConclusionSo I guess I'm a little unclear about what kind of type system is found in Ruby. If it claims to be "Strongly Typed" then that's pretty ambiguous. Is there a summary of exactly which operations result in an error vs which are implicitly allowed? What value does "Strongly Typed" provide at runtime? If you're going to wait until runtime to detect errors, then why not just use "Loosely Typed"? May 21 VB and Language ChoiceA recent Coding Horror post got me thinking again about C# vs. Visual Basic. As you know, I myself think that VB is probably the best general purpose managed language available right now, so I thought I'd add my $.02 to the discussion. VB vs. C# is like Coke vs. PepsiI don't think this analogy holds up, because it doesn't take into account the complexities of the situation. I believe that there are real substantive differences that make VB a better tool for writing programs than C#. While it's true that tools like CodeRush and Resharper can greatly improve the C# experience, they still fall short of what's theoretically possible from a VB-based toolset, because, no matter what you do to C# (or Java or any other C-style language) you will always be missing the three key features that make VB better. Namely the Readable Keywords, Line-Orientation, and Case-Insensitivity I detailed in a previous post. I'm not claiming VB/Visual Studio is better in every way than Eclipse/Java, IntelliJ/Java, Resharper+VS2005/C#, etc. I'm just saying that if all these tools were refined to their highest potential, I would like VB best because it has a better foundation. VB is currently missing some major features such as the ability to automatically handle Imports, that make it difficult to compare it favorably to these other tools. Case Insensitivity is Right and Case Sensitivity is WrongI hope that most VB users would agree that it's not the Case Insensitivity of VB that we actually like. In fact, at the language level I believe it's a mistake for VB to be Case Insensitive. What people really like about VB is that most of the time it's not the author's responsibility to worry about case. The feature that we really like is that the IDE will fix the case to match the declaration or keyword. I think a better way to implement this feature would be to require Case Hyper-Sensitivity at the language level which would disallow multiple symbols in scope that differ only by case. No competent VB programmer would ever want something like "sySTeM.cONsolE.WRitELINE(foo)" to compile. We just think it's ridiculous that C# and Java IDE editors make us constantly contort our hands with two-key combinations. This only slows down our typing, causes hand cramps, and makes us 2% (est.) less happy. For those of you who've grown accustomed to Resharper, IntelliJ or Eclipse, this is exactly equivalent to many of the features you would miss if switching to Notepad/Java. In fact, my first impression using IntelliJ was that somebody copied 20% of the good features from VB, and then thought of a bunch of new must-have features that I didn't even consider. The problem is that none of the IDE vendors are really doing a good job of incorporating the best features of their competition. What would be ideal is a JetBrains/VB based on Mono, or better yet a completely new language that takes the important ideas of VB while stripping away the too-long keywords, and other features that I dislike in VB. (Overall I like the readable keywords in VB, but some are just too much. ) Strongly TypedThe Ruby and Python movements have convinced many people that they really want a loosely typed language. Of course, one of the great things about VB is that you can choose within a given file whether you want strong or loose typing, but I'm not convinced that most people even truly want loose typing in most cases. What people really seem to like is not having to waste time keying in all the type information. They like the flexibility that the tool can just figure it out. You can get most of this benefit without having to give up the performance and other benefits of a strongly typed language. Upcoming versions of C# and VB will have a feature called Type Inference, wherein the language will simply figure out the type from the context in which it is used. For instance, in C# 3.0 I can write "var x = foo.ToString();" and C# will infer that the type of x is String. One thing that I don't like about the new VB/C# type-inference feature is that it seems to be implemented at the language level. Language vs. IDEI believe that the single most important key to the future of programming is the realization that Programming Languages as classically defined are an idea whose time has passed. Tools like VB, IntelliJ, and Eclipse have blurred the line between IDE and language, but the real key is realizing that the language side of the line is no longer needed at all. In fact, it's a huge detriment. This is the primary reason why I can't really get excited about any new language such as Ruby. I see these all as vestiges of a legacy mode of thinking. Multi-LanguageOne of the benefits of .NET was supposed to be that you could use any language that you like. However, as Jeff has noticed this dream has not been realized. The problem is that we're not really free to choose any language we like, because the fact is that most code is going to have to be maintained and written by multiple people each with different preferences. Jeff's experience is that most people have settled on C# as the de-facto .NET standard language despite the superiority of VB. Over time he seems to have been worn down by the C#-zealots, and their fanatical devotion to a language attuned to those with masochistic tendencies, but I have to believe that he still realizes deep-down that he knows a better way. The fundamental problem with a multi-language platform like .NET is that what we really want is the ability to maintain the same code in multiple languages. If I write a class in VB, then someone else needs to be able to maintain that class in C#, Python, or Ruby without having to translate the language. The key is that the code itself can't know what language it's written in, and that can only happen when we get rid of this silly old-fashioned notion of parsing languages written as text files. April 13 5 ThingsI think this meme jumped the shark a long time ago, so I won't tag anyone else. However, I'll try to come up with my 5 things.
When I started this blog, I didn't think I would have any trouble keeping a steady stream of posts going, but it's been a while since my last entry. I'll try to do better going forward. Thanks for getting me back on track Weiqi. :) January 18 Things I Would Change in VB
VB is a great language, but I still think it has room for improvement. For example, there are far too many keywords, some of the keywords are confusing, and some keywords have too many overloaded meanings. There are also problems with the IDE that should be addressed. More than any other language, VB has always been about a partnership with the IDE, and this functionality is crucial for its ease of use and power. Language ChangesDeclare instead of Dim for variable declarationThe use of the keyword ‘Dim’ for variable declaration is an historical artifact of Basic. A more descriptive name would be ‘Declare’ which is already a VB keyword, but used for a feature of dubious value. Here are some examples, of what I would prefer, and I'll continue to do so in subsequent examples: Declare x,y,z As Integer Declare names(1 to 10) As String Declare teams() As String = GetTeams() Too many uses of parenthesesI personally find it confusing that parentheses are used for expression grouping, array declarations, indexing, and function calling. I think it would be more readable if we used brackets for indexing and arrays. Declare names[10] As String names[0] = “Justin” names[(x + (y \ 2))] = “Michel” Console.WriteLine(“Name = “ & GetNamePrefix() & names[2]) GetType, TypeOfIt doesn’t seem like we should need both of these keywords. One option would be to use TypeOf (T) for retreiving the type of a class. I think this is more consistent with VB’s readable syntax. Even better would be to support accessing a Shared GetType method or read-only Type property on any type, which would eliminate the need for a keyword in this use case completely. Best might be to allow the type name itself to be used. Declare t As Type = FileStream.GetType() —or— Declare t As Type = TypeOf(FileStream) —or— Declare t As Type = FileStream.Type —or— Declare t As Type = FileStream Is, IsA, and TypeOfThe current keyword "Is" should be used to compare for identity only. If used with two reference types then it returns true if both refer to the same exact object. If used with value types then it returns true if both have the same value. Basically, I’m suggesting elimination of “If TypeOf X Is Y Then” as a special case, although if we retain TypeOf as outlined above, then “If TypeOf(X) Is Y Then” and “If TypeOf(X) = Y Then” are both legal and equivalent. A new keyword IsA should be introduced to allow easy comparison of types. A IsA B should return true if A is the same type as B, A derives from B, or if B is an interface implemented by A. Declare A, B As Foo A IsA Foo ' returns true Foo IsA A ' won’t compile, because left hand side is a type. A IsA B ' won’t compile, because B is an object. A IsA B.GetType() ' returns true It’s also confusing that Is must be used in a Select expression. It seems like the extra keyword should be unnecessary. Select age Case 5 To 10, 12 To 15 Case < 5, 13, 16, > 80 ' No need for “Case Is < 5, 13, 16, Is > 80” Case Default End Select Too many conversion keywordsI think all of CBool, CByte, CChar, CDate, CDbl, etc. can be eliminated by replacing them with a single Convert function. It would behave exactly the same as the current keywords. Declare d As Double = 3.14 Declare x As Integer = Convert(d, Integer) -- but also allowing -- Declare x As Integer = Convert(d) My proposed syntax is a little more verbose, but eliminates 15 keywords, some of which can be hard to remember. It also supports implicit detection of the desired type if possible, as in the example above where it can see that the target of the Convert is an Integer, and saves you the trouble of explicitly stating that. I would also allow a third parameter to Convert for specifying the type converted from. This can be useful when the source type is not always obvious, and when you want a warning when source type is refactored. Declare x As UInt32 = Convert(GetAge(), From := UInt64) This takes advantage of current named argument syntax to allow both implicit detection of the To type and explicit specification of the From type. If the GetAge function is someday updated to return a Int64, then the code would no longer compile, preventing a possible bug when dealing with negative values. DirectCast too verboseThis keyword is unnecessarilly verbose. Fortunately there is already an intuitive and logical replacement already in the language. Declare x As Object = GetFoo() Declare y As Foo y = x As Foo The As keyword is already used to declare the type of an object, and the above syntax should be obvious in meaning. As with DirectCast, if x is not a Foo, then an exception would be thrown. It would also be nice to allow this syntax to be used when it’s known that the source and destination types differ. In this case, “As” would be equivalent to Convert. Declare d As Double = 3.14 Declare x As Integer = d As Integer Eliminate TryCastThe use of TryCast doesn’t really add any value. Instead of: Declare x As Foo = TryCast(y, Foo) If x Is Nothing Then x.Blah() End If you could just use: If y IsA Foo Then Declare x As Foo = y As Foo x.Blah() End If The compiler should be able to figure out how to make the latter syntax just as efficient as the former. No need for CTypeThere should be no need for this keyword, which is kind of a hybrid between casting and conversion. This should be replaced by either As or Convert. The use of CType for defining conversion operators on a class would use the keyword Convert instead. Nothing vs. NullThe VB concept of Nothing is almost always referred to as Null in other languages, and even within much of the VB community (Probably because of SQL). Having a separate term for such a common concept is just confusing for users old and new. Declare x As Foo = Null Assert(x IsNot Null) Assert(x = Null) No need for AddressOf keywordNowhere else in VB is the notion of pointer or address exposed, and it’s likely that AddressOf doesn’t really return an address anyway. We could just use the Function or Sub keyword in place of AddressOf. From the examples in AddressOf documentation: AddHandler Button1.Click, Sub Button1_Click Declare t As New Thread(Sub CountSheep) Add the ability to Define aliasesThe C++ typedef keyword is often useful for writing maintainable code, and sometimes C++ references are handy in situations other than parameter passing. I propose that a new Define keyword could provide this functionality for VB in an intuitive way. It can allow you to enhance readability. For example, by defining a new IntList type we save typing and enhance readability whereever List(Of Int) is used. This becomes more valuable with more complex generic types. Define IntList As List(Of Integer) Declare v As IntList It can also be useful for changing a type without having to update all the code, or to allow conditional compilation to use different types. #If SafetyEnabled Then Define MyInt As SafeInteger #Else Define MyInt As Integer #End If Define is already partially available with Imports. I would remove the ability for Imports to define new aliases and use the Define syntax instead. Imports Con = System.Console
becomes Define Con As System.Console This keyword could also be useful for defining aliases for other things besides Types. Sub Foo Declare aReallyReallyLongName As SomeType = GetSomeType() Define s As aReallyReallyLongName s.SomeMethod() Define sm As s.SomeReallyLongMethodName s.sm() End Sub More flexible ImportsI’d like to Import a symbol within small scope. I see no reason why this couldn’t be allowed anywhere instead of only at the top of a file. Sub Log() Imports System.Console Write(a) Write(b) WriteLine(c) End Sub No need for Delegate keywordWe should just be able to define a delegate type like any other. Define Sub OnClick(ByVal e As EventArgs) AddHandler Button1.Click, Sub OnClick Public Event As OnClick Remove Alias, Ansi, Auto, Declare, Lib, and UnicodeThese keywords for declaring access to external functions make it marginally easier to access external Win32 routines, but cause unnecessary clutter in the language, making it that much harder to understand. They are also unnecessary, because it’s not much harder to just use the DllImport attribute. Consider removing keywords for basic typesThe CLR already defines nice unambigous names for all the basic types such as Byte, Integer, Short, Long, etc. It might be better just to use those, and eliminate the unnecessary keywords from VB. Most of the VB keywords are the same as the names defined in System anyway, so this wouldn’t be that onerous. The proposed Define feature could even provide easy compatibility for old code. No need for the Call keywordIn VB you can call a subroutine by saying “Call Foo”, but you can also get exactly the same behavior by saying “Foo()”. I don’t see any benefit to the former syntax, and it adds yet another unnecessary keyword to the language. Select instead of “Select Case”There’s no need to support an optional Case keyword in the first line of a Select statement. We should just standardize on the following syntax, and eliminate a little more clutter from the language. Select age ' vs. Select Case age Case 0 To 20 Foo1() Case 21 Foo2() Case 25, > 30 Foo3() Case Else Foo4() End Select LiteralsThere are currently some basic types that have no syntax for literal declaration such as Byte. There are also some type such as Char where the literal syntax is unintuitive. A better character literal might use the back-tick (e.g. `c`). Eliminate the colonOne of the strengths of VB is its readable line-oriented syntax. The colon operator allows you to subvert this. Eliminating this would help keep VB code readable to all VB programmers. The other use for the colon is to declare labels for GoTo statements, but I’m also proposing elimination of GoTo for VB. Eliminate the underscoreSimilarly the underscore is a crutch that should not be needed. Code should either rely on wrapping in the IDE, or introduction of temporary variables to make code more readable. However, to make this work, we need to eliminate some of the most egregious causes of lengthy lines. The UsesAttribute keyword fixes one such area, new syntax for Implementing interfaces fixes another, and the proposed Define keyword another. Bring Back the underscoreInstead of using the underscore for line continuation, I would use it to disambiguate VB keywords that match symbols. Currently the [] brackets are used for this purpose, but I think those work better for array access as stated previously, and prepending an underscore seems a more intuitive solution. Hopefully the elimination of many keywords will make the need for disambiguation much less prevalent. Remove the \ symbolAlthough integer division is probably fairly frequently used, I find the inclusion of two separate division symbols confusing. Integer / Integer, should leave the arguments as integers, while Int / Float and Float / Div Int should convert the result to a floating point type. Anything else can be handled by explicit conversions and/or separate math library functions. It would also be helpful if the IDE would colorize integer types differently than floating point. Eliminate If … Then … Else … StatementsOne of the primary values of VB is its line-oriented nature. I was completely unaware that the current VB allows If Then Else on a single line without an End If. This should not be allowed, as it subverts the nature of VB, by allowing a tiny syntactic convenience with far greater potential for misuse than valid use cases. For example, this currently compiles, and I don’t think it should. If x > 0 Then Return “>” Else Return “<=” Introduce a short-circuiting ternary expressionOne of the problems with the C ternary expression is that the default case is separated from the context by the boolean expression. For example, if I want to Trim() only a non-null string in C#: String trimmed = s != null ? s.Trim() : “”;
Notice how the code we really want to write “String trimmed = s.Trim();” is very different from the code we have to write to handle the null. I think VB has the opportunity to make this common expression more readable by reordering the “arguments” to allow: String trimmed = s.Trim() ? s != null : “”; Or in VB: Declare trimmed As String = s.Trim() IIf s IsNot Nothing Else “”; Other examples: If a = (b IIf b < 5 Else 5) Then If a = (b IIf b IsNot Nothing Else GetDefault()) Then Foo(a, (x IIf a > 0 Else GetDefault()), z) y = 10 IIf x Is Nothing Else x.Foo() Introduce UsesAttributeInstead of overloading the < and > operators for applying attributes, I propose we add a new keyword to make this feel more natural in VB, and to eliminate one of the major needs for the underscore line continuation character. Sub Foo(ByVal s As String, ByVal n As Int32) UsesAttribute Conditional(“A”) UsesAttribute WebMethod ' Body of subroutine End Sub Class SomeService Inherits Foo Implements Bar UsesAttribute WebService(Namespace:=”blah”) Sub New() End Sub Sub Calculate UsesAttribute WebMethod End Sub End Class Unweildy Interface-based polymorphismA strength of VB is its readable English keywords, but in practice this can sometimes get out of hand. Public Interface Shape Function CalculateArea(ByVal X As Double, ByVal Y As Double) As Double End Interface Public Class RightTriangleClass Implements Shape Function CalculateArea(ByVal X As Double, _ ByVal Y As Double) As Double Implements Shape.CalculateArea Return 0.5 * (X * Y) End Function End Class In general, any time the underscore has to be introduced to allow a logical line to continue on the next physical line, we see an area for improvement. The following would be a much nicer syntax for the above, with the option of reverting to the really verbose syntax only when there’s a conflict, or when you want to change the name of the interface. Public Class RightTriangleClass2 Implements Shape2 Function Implements CalculateArea(ByVal X As Double, ByVal Y As Double) As Double Return 0.5 * (X * Y) End Function End Class EraseThere’s no need for this keyword, as you can get the same effect by assigning Null. The Erase statement supports a variable number of arguments, but the limited utility of this is not worth the introduction of a keyword. Instead, if this functionality is important, then it would be more useful to allow ParamArray to use ByRef making it easy to implement Erase as a user function, as well as other possibly useful functions. ReDim and PreserveThis functionality seems easy enough to reproduce with a few small functions, possibly added to the Array class. On, Error, ResumeThese keywords are no longer necessary in a modern VB. We now have exception handling, and these just clutter the language in the name of reverse compatibility. No public fields in Class typesBy eliminating the capability to create Public members of a class type, we can use the following syntax to declare simple properties. Class Foo Public ReadOnly name As String Public age As Int32 End Class More complex properties could revert to the older syntax using Property. Public fields would still be allowed in value types. No more Set or Get“Set” is just too valuable to waste on a keyword. I’d rather have a System.Collections.Generic.Set for holding sets of objects. When declaring properties we can just take advantage of the fact that the mutator is always a Sub, while the accessor is always a Function. Then we no longer need the two keywords. Private myName As String Public Property Name As String Sub (ByVal s As String) myName = s End Sub Function Return myName End Function End Property Goodbye GotoLet’s be the first language to discard goto. CS programs usually go to great lengths to teach students that goto is almost always the wrong solution. The variety of languages available on the .NET platform means that some languages like VB and C# can eliminate goto, while others (C++?) could keep it. Generic RAII scopeThe Using statement is one useful way to ensure that object are disposed correctly, but sometimes it can be cumbersome, and can lead to unnecessarilly deep nesting. It would be useful to introduce Scoped variables to handle this more cleanly in some cases. Sub Test If blah Then Scoped x, y, z As Connection ' x, y, aand z are disposed here End If If blahblah Then Scoped lock1 As Mutex = … Scoped lock2 As Mutex = … … ' lock2, lock3, … are released End If End Sub Eliminate SyncLockMore general Scoped and Using keywords can replace the need for this. Modules, Namespace, and Shared Class membersI find it a little confusing that all these exist. Maybe this could be simplified by allowing global functions in a Namespace as an alternative to Modules, or eliminating the Namespace keyword entirely and using the Module keyword for this concept. No NextIt would make VB easier to understand and more consistent if “Loop” were used to end For Loops. This is consistent with the other loop variants, and makes it clearer what’s happenning. Eliminate While LoopsDo While … Loop should be sufficient without the need for While…End While. This doesn’t actually save keywords, but simplifies the language by eliminating an unnecessary and more verbose syntax for while loops. The IDE could allow leaving off the Do keyword, and filling it in automatically. Simplify Continue and ExitContinue is followed by Do, While, or For to indicate which type of loop to continue. This added flexibility is unnecessary, and Continue should just jump to the top of the current loop. For I = 1 to 10 Do While blah … If x Then Continue End If Loop Loop The above, would simply restart the Do While loop when x is True. If you really need to restart at the beginning of the For loop then the code can be refactored into multiple methods or some other approach. Similarly, only “Exit Loop” should be supported in the future. Exiting a sub, function, or property is accomplished with “Return”, and exiting a Select is unnecessary. ElseIf is inconsistent"For Each" and "End If" use two separate words, so Else If should also. Declare x As ObjectIt would be nice if the “As Object” part of the expression were optional even when Option Strict and Option Explicit are enabled. The default value for any type would be Object. So I would advocate: Declare x More flexible OptionsAs fancy new features such as Lambda Expressions and Closures are added to the language, we may want more flexible control over the options. For example, we may want to turn Strict and Explicit off for Lambdas, while leaving them on for everything else. We might also want finer control of the scope for these options, perhaps turning off Option Strict within the scope of a single Function. Rather than have keywords for these two statements, just introduce new VB attributes. StrictMethods(on|off) – controls whether late binding is allowed StrictConversions(on|off) – controls whether Int64 auto-converts to Int32 ExplicitLambdas(on|off) – controls whether lambda expressions require types etc. Consider eliminating or extending WithBy allowing an empty with statement we can introduce an artificial scope, which can sometimes be useful to avoid having too many variables in scope. Usually it’s better to split such a method into separate subroutines, but this can sometimes make an algorithm harder to understand. With Declare x As Integer … End With ' x is not in scope It would also be useful to assign an alias for the With object. Sub Foo() With f = GetEmployee.Name.FirstName f.blah() Log(f) End With End Sub However, this is not so necessary with the proposed Define keyword: Sub Foo() Define f As GetEmployee.Name.FirstName f.blah() Log(f) End Sub RaiseEvent unnecessaryThis keyword is similar to Call, and is also unnecesary, when the obvious syntax of calling the event as a function would be more intuitive. Event LogonCompleted(ByVal UserName As String) Sub Logon LogonCompleted(“Justin”) End Sub AddHandler, RemoveHandlerUsing language keywords for these is confusing. The C# syntax might be fine, or VB could automatically allow any Event/Delegate to have implicit Add/RemoveHandler methods. Note that += should NOT be used, because we’re appending to the list of handlers, therefore &= concatentation is more appropriate. Public Event LogonCompleted(ByVal UserName As String) Sub OnLogon(ByVal x As String) End Sub Sub New LogonCompleted &= Sub OnLogon LogonCompleted.AddHandler(Sub OnLogon) End Sub No more REM, and add comment blocksThere’s no point in having two mechanism for line comments, although it might be useful to introduce a multiline comment. ‘’’ Anything between here and here is a comment ‘’’ No more implicit return variable for functionsHaving two ways to return a value from a function just adds complexity to the language with very little benefit. At best it saves a line of code or two to declare the return value and return it, but often modern programs don’t (and shouldn’t) declare a single return value anyway. Rename ShadowsThe description for this feature, provides the answer for what the keyword should be. “Specifies that a declared programming element redeclares and hides an identically named element, or set of overloaded elements, in a base class.” By renaming the keyword to Hides, it becomes more accessible to a new or infrequent VB programmer. Hides should also be useable to explicitly state desired behavior. Sub Foo Declare I As Integer If blah Then Hides I As String End If End Sub No End or StopThe Stop keyword should be removed from the language. There’s no need for this to be a language-specific feature, as it’s easy enough to use the portable System.Diagnostics.Debugger.Break() method. The End keyword (that halts the process) should also be removed for similar reasons. Normally, you would either let an uncaught exception percolate up and kill the process, or you could call Process.Kill() possibly followed by Process.WaitForExit(). Allow automatic assignment of Arrays to variablesPython has some useful syntax for working with Tuples. VB could add some of the same convenient syntax for working with arrays. Allow automatically assigning array values to variables. The a,b, and c variables would be filled with the first three values from the array. Function Foo() As String() … End Function Declare a, b, c As Integer = Foo() Also, allow more flexibility for array initializers. Declare x, y, z As Integer = 1, 2, 3 Add SlicesPython has the useful ability to access a subset of any List, String, or Tuple. VB could support similar syntax without complicating the language much. Declare s As String = “Hello World” s[0] equals ‘H’ s[0 To 2] equals “H s[-1] equals ‘d’ s[-4 To -2] equals “orl” etc. Remove MidBy allowing slices to be assigned, you make a more flexible Mid that is also more intuitive to read. Eliminate need for GetCharA single element slice should automatically return a Char instead of a one element string. The new `a` character literal should be equivalent to “a”[0]. No need for { } array initializersVB should be able to figure this out without the braces, and should be able to parentheses to make the association explicit where it would be ambiguous. Declare x, y, z As Integer = 1, 2, 3 Declare x[] As Integer = 1, 2, 3 Declare x[][] As Integer = (1, 2, 3), (1, 2, 3) Declare x[2, 2] As Integer = (1, 2, 3), (1, 2, 3) Sub Foo(ByVal a() As Int32, ByVal b As Integer) End Sub Foo((1, 2, 3), 4) Array Bounds vs Last index confusionI still find the syntax choice for array declarations confusing, and it doesn’t help much that the To keyword is supported. (Unless the IDE were to add it automatically.) Declare x(5) As Integer should create a 5 element array with a lower bound of zero. This use of the To keyword should be removed again, because allowing flexible lower bounds just makes arrays more confusing to work with. For those who want or need explicit lower bounds, a generic BoundedArray class could be provided. Automatic ToString when using concatenationOne of the benefits of having separate unambiguous concatenation operator ‘&’ in VB, is that it can automatically convert its arguments to strings. Declare I As Integer Declare f As New Foo f & I & “blah” I & f & “blah” This is currently possible only by explicitly implementing the concatentation operator for Foo, but it requires providing an implementation for every possible valuetype, or inefficiently overriding for just Object, forcing boxing. This could be partially helped by allowing generic operator overloading, but that would still be far too tedious for the common case. It might even be desirable to remove operator overloading for the ‘&’ operator, and instead forcing that to always represent string concatenation and be handled automatically. Separate operator for assignment and equalityOne of the keys to VB’s ease of use is that it doesn’t allow assignment statements to return a value. Unlike other languages, you can’t do: Declare a, b, c As Integer a = b = c = 5 One result is that VB can always tell from the context whether ‘=’ is meant as an assignment or a comparison. This is actually a valuable feature, because it saves typing and prevents a common error from other languages. What I propose is that we use the := operator that is already used for named arguments as the only allowed assignment operator, and have the IDE automatically change ‘=’ into ‘:=’ as it’s parsed in the same way that it automatically changes “Endif” into “End If” and other similar transformations. One benefit is that we get immediate visual feedback that VB understood what we meant when it makes the transformation. For Loop should iterate only over the specified rangeThe following expression currently increments M, then checks to see if it’s less than 5. For M = 0 To 5 This is confusing given the VB syntax, because the syntax seems to imply that “M” will only take on the values 0, 1, 2, 3, 4 and 5, but it will actually be 6 after the loop. Besides being confusing, it can easily lead to infinite loops if the incremented variable overflows. If overflow checking is enabled, then instead of an infinite loop, you get an exception. Either is unwarranted. Eliminate MyClassI find this keyword hard to remember, and I’d just like to be able to use the name of my class to be explicit when necessary. Class Foo Protected Overridable Sub Bar() ... End Sub Public Sub CallMyBar() Foo.Bar() ' calls our own Bar Bar() ' calls the bar of our most derived class End Sub End Class Allow alternate spellings for Overridable, NotOverridableIt would be consistent with the English language to also allow Overrideable and NotOverrideable as synonyms. Eliminate Type CharactersOld style code that used type characters should no longer be supported. Function StrFunc(ByVal x&, ByVal y$, ByVal z#) They also shouldn’t be allowed for literals. Literal DateTime values should still be supported however, because they’re useful, and easier to read than alternatives. Declare d As DateTime = #11/11/1970 13:32# Literal Values for every typetmp = 42 ' Int32 tmp = 42UI ' Unsigned Int32 tmp = 42L ' Int64 tmp = 42UL ' Unsigned Int64 tmp = 42S ' Int16 tmp = 42US ' Unsigned Int16 tmp = 42B ' Byte tmp = 42SB ' Signed Byte tmp = 42.5F ' Float32 tmp = 42.5 ' Float64 Currently, only L, UL, and UI are supported Eliminate Assembly, Option Compare, Binary, and Text KeywordsWe shouldn’t have to use language keywords (even unreserved ones) to change these behaviors. The IDE or command line tools should be able to handle it when necessary, or UsesAttribute. Option Compare may be completely unnecessary, as String already provides explicit control. Add missing Volatile keywordThis should have the same meaning as other languages. GoSub, Let, and WendThese have been deprecated long enough and can all be eliminated if making the kind of sweeping changes I’m advocating. Bring Back VariantOption Strict is currently used for many related features, such as implicit type conversions and duck typing. However, it might be even better to reintroduce the Variant keyword to allow duck typing for individual identifiers. This would work even with Option Strict enabled. No VB-specific extension libraries by defaultI’d like to be able to write portable code that doesn’t rely on VB-specific libraries. I don’t care about access to legacy APIs that are currently in the Microsoft.VisualBasic namespace. Any features, such as the Convert function that are needed should be available without having to pull in other baggage as well. Lambda ExpressionsThis feature is partially planned for the next release, but I’m afraid it’s going to allow code that is too cryptic and terse for Basic. The proposed syntax is: inc = Function(y) y + 1
Which creates an anonymous single-parameter function that returns its parameter incremented by 1. This really doesn’t feel like VB, as a) It’s not line-oriented, and b) it implicitly returns y+1. Even though it would prevent some common lambda use-cases, I’d prefer VB to only allow lambda expressions as variable initialization, and requiring multiple lines. inc = Function(y) Return y + 1 End Function This is only a fairly minor extension to existing Function and Sub usages. The above would only be allowed with Option Strict and Explicit disabled. Otherwise you’d get the usual: Dim inc As Function(ByVal y As Integer) As Integer Return y + 1 End Function Must Intrinsic Functions Be Keywords?Several VB features involve mapping syntax that looks like a function call to CLR features. For example, Convert(x) becomes an appropriate ILAsm conversion function for the target assignment. If none of these had to be reserved as keywords, then it would simplify the language quite a bit. Assuming all the intrinsic function are in a suitable namespace, the usual namespace disimbiguation rules would apply. If these keywords can become intrinsic functions in a namespace, then more useful variations could be added such as explicit SafeConvert and UnsafeConvert functions that map to the various ILAsm type conversion features. Support Unmanaged CodeI’d like to be able to use VB syntax to write lower level code. Instead of dropping in to C++, I’d like to forgo garbage collection, and other fancy features to write small efficient code without giving up the readability of VB. A separate smaller set of System namespace libraries could be provided to give me everything I need to write device drivers, protocol stacks, and programs that need absolutely minimal footprint. SummaryIn summary the above proposals would eliminate 88 of the roughly 200 existing keywords while only adding 12 new ones. It would also eliminate some overloaded uses of current keywords, and introduce several powerful new features from other languages. The cost of all this is reducted compatibility with existing programs, eliminating the ability to simply compile. If this isn’t desirable, then a new Basic-style languages could be introduced. I believe the advantages of a much simpler yet more powerful language are worth it for writing new programs, and possibly even porting existing ones. Although the language would retain approximately 120 keywords after these changes, the remaining ones seemed to make for more readable code than alternatives from other languages. Many of the remaining words can only be used as part of compound word key words, as with “Each” which can only be used in a “For Each” statement. New Keywords (13)[], Declare, Null, IsA, Hides, Scoped, Volatile, ''', UsesAttribute, Define, ‘_’, Overrideable, NotOverrideable Removed Keywords (86){}, CBool, CChar, CDec, CDbl, CInt, CLng, CObj, CSByte, CShort, CSng, CStr, CUInt, CULng, CUShort, CDate, CType, DirectCast, AddHandler, RemoveHandler, AddressOf, Alias, Declare, Ansi, Unicode, Lib, Dim, Object, Byte, Char, String, Integer, Short, Double, Single, Long, UInteger, ULong, UShort, SByte, Decimal, Date, Call, Declare, Delegate, Erase, ReDim, Preserve, Error, Resume, GetType, GoTo, Let, GoSub, Wend, Namespace (Or Module), Next, Nothing, Option, Explicit, Strict, Compare, Text, Binary, On, Off, RaiseEvent, REM, Shadows, Stop, SyncLock, TryCast, TypeOf, Assembly, IsTrue, IsFalse, Mid, ':', '_', ‘%’, ‘@’, ‘!’, ‘$’, Set, Get IDE FeaturesAnother problem with the current VB is some very annoying or broken IDE functionality. Case Fixing BrokenVB is supposed to update the case of symbols to match their declaration, but this is currently broken in several instances. For example, when I type “imports system.data\n” at the top of the file, it should automatically change it to “Imports System.Data\n”. A concerted effort should be made to fix this in the various places where it doesn’t currently work, and additionally a simple Ctrl-K, Ctrl-D should fix up the whole document. Indentation is brokenIf you type “if blah then\n” then VB will automatically indent to the proper position on the next line. However, if you move the cursor to a different line, and then go back, it will no longer be at the correct position. This wouldn’t be too bad if the Tab key would take you to the correct position, but it does not. Improve syntax recognitionThe VB IDE used to be better about recognizing syntax as I type. For example, if I were to type “for<space>each<space>” on a line, then the IDE would capitalize “For” as soon as I hit the first <space> key. Currently, the IDE only colorizes “for” which tells me that it does recognize the keyword, but it doesn’t fix capitalization until the whole line is entered. I do think that most syntax errors shouldn’t be highlighted and indentation shouldn’t be changed until the line is finished, but case-fixing, coloring, and other formatting changes should happen as soon as possible. This used to be one of the best things about working with VB. Tab key shouldn’t indentIt should be possible to redefine the tab key to always indent the current line to the correct position, and NEVER to actually insert indentation. Basically it should be have the same as Emacs Tab, but without having to choose Emacs emulation. Allow assigning macros to tabIn previous versions of VisualStudio I was easily able to implement my own EmacsTab using a macro, and I was able to bind this to the tab key. This will require cleaning up the Code Snippets feature to not be hard-coded to use the Tab key for moving between editable fields. Macros Too DifficultA macro should be able to programmatically duplicate anything I can do using the keyboard. I've tried repeatedly to write a macro that would duplicate the functionality of the Emacs tab key, but I haven't been able to get what I want. This macro took me only a few minutes to figure out with Visual C++ 6. In general the macro API is just too difficult-to-use for the amount of time have to devote to customization. Maybe a simplified facade should be provided. More AssistanceVB should be more helpful by more often allowing me not to type long keywords such as NotInheritable. For example, it could let me type the shorter text “not”, and then automatically expand it to the real keyword based on the current context. In general the VB parser should provide more context-sensitive assistance as I type code. It should always provide immediate but non intrusive feedback that it understands what I’m typing. Non-Intrusive LayoutOften when I’m editing code, the current VB will immediately make layout changes as I type. This can make it very confusing in practice, because the indentation could “jump around” causing me to lose track of what I was doing. In general this is an example of VB taking the viewpoint of the parser rather than providing the illusion that it’s reading my mind. If I want to surround some code with “If … Then … End If”, then it only upsets me when VB immediately matches my newly inserted If with a previously existing End If, and indents my code accordingly. It’s only a slight distraction that some previous If now has a squiggly red underline indicating that it no longer has a matching End If. The solution is to ensure that VB doesn’t “jump to conclusions” on incomplete information. If a block of code contains syntax errors, then it should not be formatted, and it probably shouldn’t even be shown as a syntax error until I do something (Ctrl-S, Ctrl-K D, etc.) to indicate that I think the code is now correct. Perhaps VB could use a squiggly beige underline to tell me that it knows the code is not yet correct, but is currently waiting to see what I’ll type next. CustomizableI’d like some more advanced control over how the code is formatted. In addition to indentation stlye and coloring that are currently available, I’d like control over how keywords are capitalized, the ability to disallow language features, and other controls. No More CompilingWhen working with Eclipse/Java there is no concept of Compiling per se. Instead, the syntax is highlighted as you type, and certain actions like saving files or running the program will cause the compilation to happen in the background. This seems very much in the spirit of VB, and seems almost there already. MultiFile AssembliesIt would make unit testing much better if VB allowed you to easily create unit tests in the same assembly as the tested code, but in a separate project. This allows me to expose testing interfaces as Friends. The current workaround is to just include the tests in the same project as the tested code. Intellisense shows invalid valuesOften, after typing “xyx<dot><ctrl-space>” I’ll be given a dropdown list of irrelevant information. If VB doesn’t know the actual members of the xyz object then it shouldn’t provide a list at all. Automatic ImportsOne extremely nice feature of Eclipse/Java is that it can automatically figure out the imports. If I type in “Dim sw As StringWriter<ctrl-space>” then VB should prompt me with a list of namespaces that have StringWriter types. If there is only one possible choice then it should not even prompt. Additionally it should insert the necessay Imports statement at the top of the file. A corresponding feature from Eclipse/Java is the Ctrl-Shift-O keystroke which optimizes Imports. This automatically removes Imports that are no longer needed, and attempts to add any that are necessary. VB would require a new syntax for only importing selected types from a namespace. Code SnippetsSeveral things about this feature really annoy me. Foremost is that the replacment fields stay active till the file is closed. Maybe it wouldn’t bother me as much if it was formatted differently, such as a solid underline that only highlighted when the field was the active one. The ugly green color is enough to prevent me from using this feature. The workaround I’ve decided on is to make the background color white. This makes the code snippet fields invisible unless the cursor is inside the field. Still, I’d like to be able to hit Enter (or maybe Ctrl-Enter) and have the fields finalized as if I’d closed and reopened the file. Eliminate MyI’ve never liked the My feature very much. It seems like it could be replaced with the ability to assign duplicate Imports aliases. This would also allow me to pick a different name than “My”, and better control exactly what is included. For example, I might do the following: Imports Foo As System.Text.RegularExpressions Imports Foo As Microsoft.VisualBasic.Devices Imports Foo As Microsoft.VisualBasic.FileIO This would combine all the listed namespaces into a single Foo namespace. Any ambiguities would have to be resolved explicitly as usual. It would also work with classes, allowing Shared methods to be called directly as usual. SummaryEven though the list of things I would change about VB seems longer than my previous post about the things I like about VB, that shouldn’t give you the impression that I dislike the language. On the contrary I think it’s the best choice available on the .NET platform for most problems. C++/CLI provides more control, and the unique ability to comingle unmanaged and managed code. C# is similar to Java, more portable (Mono Linux), and has a few features that haven’t yet made it into VB. (yield, anonymous methods, unchecked, etc.) However, VB is the easiest to read and type, has the best editor, and is generally easier to work with for many problems. January 17 Why VB is Best
As I said in my previous post, I believe that Visual Basic is the best language for many types of applications. It provides all the power of C# and Java, but has arguably better tools and syntax. For now, I’ll concentrate on the positive, but in a later post I’ll discuss what I feel are VB’s shortcomings, and my suggestions for addressing them. For me, the most valuable VB language features are its Readable English Keywords, Line-Orientation, and Case-Insensitivity. The ability to quickly create GUI applications using drag-and-drop may have been the original selling point, but to me what makes VB great is the language. It’s currently my preferred tool for working on personal projects. Readable English KeywordsFor the most part, VB is a very easy language to read, and the keywords chosen come closer to conveying the correct meaning than with other languages. Many languages have a long list of special symbols that the user is required to memorize before the language is readable. Often the same symbol will mean completely different things depending on the context. While VB has the commonly used Math symbols, and even a separate concatenation symbol, overall its philosophy is to favor the use of keywords to help the user to understand the code. (As you’ll see later, I even think some of the current symbols should be removed.) The best VB keywords are those that help the reader to understand what’s going on without having to be fluent in VB or any other OO language. This helps beginners to learn the concepts, and also helps experienced programmers write more maintainable code. ExamplesInherits, NotInheritable, MustInheritThe VB syntax for object derivation is very clear and explicit as long as you’re familiar with the concept of inheritance in object oriented languages. Class Foo Inherits Bar End Class Furthermore the NotInheritable keyword is clearly related, and has the obvious meaning. NotInheritable Class Bar End Class This would cause the Foo class above to have a compilation error. The MustInherit keyword is also clearly related to the other two. MustInherit Class Bar End Class Even someone new to the language should be able to guess that this prevents the Bar class from being instantiated on its own. If you contrast the above with the corresponding concepts in Java (extends, final, and abstract) and C# (‘:’, sealed, abstract), it should be clear that the VB keywords are superior. Although Extends, NotExtendable, and MustExtend would work almost as well, the term “inheritance” for the whole concept is usually used even in Java circles. Overrides, Overridable, MustOverride, NotOverridableSimilarly, the concept of “virtual” is overused in computer science, and therefore a poor choice for describing class methods that can be overridden in derived classes. The VB keywords have a more obvious meaning, and may even make OO polymorphism easier to grasp for beginners. The equivalent Java ( [virtual], virtual, abstract, and final), and C# (overrides, virtual, abstract, and sealed) are not as clear and don’t work as well together. Shared, StaticVB uses the keyword Shared for methods and fields that apply at the class level. The C#, Java, C++ keyword static seems a little less clear to me, and gets muddled up with the separate concept for local data within a function that lives past the lifetime of the function. All four languages use static to describe this concept. Class Foo Public Shared Function CreateFoo() As Foo End Function Private Sub Bar Static called As Integer = 0 called += 1 End Sub End Class And, Or, NotUsing these simple keywords instead of the common &&, !, and || is more approachable and actually easier to type, because the latter require shift-key combinations. The C-style symbols are also fairly arbitrary and make C-style languages just a little harder to learn and use. ByRef, ByValLike C#, these do require you to understand the concepts of passing by value versus passing by reference, but I find the VB keyword a little clearer. Allowing the programmer to be explicit about ByVal also enhances readability, because I no longer have to remember which is the default. (It’s ByVal.) Do, While, Until, LoopI find the VB loop syntax to be flexible enough for any purpose, and much more intuitive than Java, C#, or C++. All the following are allowed. Do Until x Loop Do While x Loop Do Loop Until x Do Loop While x You can also Continue or Exit a loop explicitly. For, Each, InI think the VB iteration syntax is nearly perfect. For Each x In y is not so different from C#: foreach (x in iterable) but the required parentheses seem to break up the statement in a weird place. Java is worse, because it uses an arbitrary ‘:’ symbol that you have to train yourself to read as “in”. for (x : iterable)
For, To, StepThe counted loop is also fairly simple, and I think much easier to read than C#, Java, C++ equivalents. For c = 0 To 9 For n = 1 To 20 Step 2 as compared to : for (c = 0; c < 10; ++c) for (n = 1; n <= 20; n += 2) SummaryThe above is certainly not an exhaustive list of the well-named features in VB, and I certainly don’t imply that all VB keywords have good names. (Later, I’ll explore in more detail just which keywords I think should be renamed, and I even advocate removing almost 90 keywords.) However, overall I find most of the VB keywords to be more intuitive than alternative languages, especially for those who don’t have preconceived notions of what a concept should be named. (e.g. virtual) This makes the language more approachable for beginners, but also easier for experienced programmers. Line OrientedAnother major feature of VB is its preference for having one statement per line. This is a common practice by many programmers in Java, C#, and C++ as well, but VB has features to encourage the practice. No Semi-colon necessaryIn C-based languages you are required to put a semi-colon at the end of each statement, and an end-of-line has no special meaning. This allows a single statement to span multiple lines, and multiple statements to reside on a single line. I find that both of these practices tend to make code difficult to read. Tools such as Eclipse/Java have built-in code formatters to ensure that each statement resides on a single line, and to ensure that statements that span multiple lines have a consistent format. In VB, the default is for each statement to be terminated at the end of the line unless an underscore ‘_’ is placed after the statement, allowing a single statement to span multiple lines. VB also allows a colon ‘:’ between statements to force multiple statements on one line. As you’ll see later, I’d actually like these exceptions removed to force the programmer to find a more readable line-oriented style. Assignment Side EffectsIn VB, assignment is a statement, not an expression, and has no return value. In C++ terms, the VB assignment operator is written like this: void operator=(T lhs, T rhs); If I have an example, java program like this: if (x == 1000) {
x = y = z = 0;
}
Then in VB, I must write it as: If x = 1000 Then x = 0 y = 0 z = 0 End If Alternatively, I could use the colon: If x = 1000 Then x = 0: y = 0: z = 0 End If One other side-effect is that VB is able to disambiguate assignment from equality checking. The common problem of typing “if (x = y)” instead of “if (x == y)” in C-based languages is eliminated, because “If x = y Then” always refers to equality checking and “x = y” always refers to assignment. (Later, I discuss why I think VB should automatically display a different operator for assignment.) Syntax ValidationWhen you press the Enter key, VB knows that the statement is likely complete, and usually does syntax validation as well as possible automatic syntax cleanup. For example, if you type the first line of a multi-line expression such as “If x Then”, VB will automatically insert a closing “End If”, and put the insertion point at the correct spot, formatting your code as necessary. In practice this makes it feel as if VB is really aware of what the programmer is trying to do, and the IDE is an active partner in code construction. Other tools have some of these features, but I’ve never seen another tool be as unintrusive and natural about it as VB, although with VB.NET some things aren’t quite as smooth as in the past, which I’ll discuss later. Case InsensitiveIt’s pretty well known that VB is a case-insensitive language, but some may not realize just how much of a benefit this is to the programmer. Faster TypingDespite its propensity for fairly verbose keywords, I’ve always found that actually typing in vb programs is often faster than with other languages. This is partly because you rarely need to use two-key combinations. The IDE will automatically capitalize symbols to match the declaration. My educated guess is that VB will be 10-20% faster to type than an equivalent language like C# or Java. The key is to allow the IDE to share the load, and when switching to VB from another language I usually need some time to train myself not to capitalize symbols, add unnecessary parentheses, and to use caps-lock effectively. Here’s an example. Type in: mustinherit class Foo<Enter>public mustoverride function Count as integer<Enter><Delete Line><End><Enter>class Bar<Enter>inherits foo<Enter><Down><Down>return 0<Down> And what you will see is: MustInherit Class Foo Public MustOverride Function Count() As Integer End Class Class Bar Inherits Foo Public Overrides Function Count() As Integer Return 0 End Function End Class Only 3 two-key combinations are required for the above in VB. (The initial declarations of Foo, Count, and Bar.) For comparison, the same code requires at least 16 two-key combinations in C#. Furthermore, the VB code doesn’t require cleanup for formatting and indentation, making it even faster to write equivalent code. Better NamingBy disallowing multiple symbols with names differing only by case, VB forces you to pick less ambiguous names. This leads to better readability than common code in other languages. For example, I’ve often seen code like the following: Foo foo = new Foo(); If you read this aloud then both “foo” symbols sound exactly alike (which is the definition of less-readable). To make the best use of VB, you should also pick a naming convention that works well. For example, by always prepending or appending an underscore to private fields, you are increasing the number of two-key combinations required. A better convention is to prepend “my” or "m" to private fields, and to use camel-case for all other names. OtherConcatenation OperatorUsing the same operator for addition and concatenation leads to less readable code. By having a separate concatenation operator, the following code can compile. Dim s As String = 123 & "456" Note: If the + operator were used instead, then this would be a compile error. Optional Duck TypingVB supports optional relaxing of type rules. When “Option Strict Off” is specified for a file, then you no longer have to explicitly specify the type when declaring variables. VB will also automatically convert types. For example: Console.WriteLine(123 + “456”) will print “579”, not “123456”. VB will also allow access to any public member of a type as long as it can be found at runtime. This is what’s known in Python as “Duck Typing”, because "If it walks like a duck and quacks like a duck, it must be a duck". VB is the only language I know which allows selectively enabling this feature. Optional Explicit DeclarationCompletely separate from the question of type safety is whether to require variable declarations. Even when duck typing is desired, often it’s nice to still enforce variable declaration. This can eliminate some common typos, but can be selectively disabled just as with type safety. Declarative EventsVB supports the same events and delegates as C#, but also provides the ability to specify event handlers in the signature of a method. Class Foo WithEvents myCon As Bar.Connection Sub OnConnected() Handles myCon.ConnectedEvent End Sub End Class In this example, the OnConnected method is automatically “wired-up” to the myCon Connection object. This can often be more readable than the alternatives. Reference ParametersOccasionally it’s useful to be able to pass arguments by reference. This is supported by most (all?) .NET languages. Sub ChangeTwoInts(ByRef n As Int32, ByRef m As Int32) n += 1 m += 2 End Sub Named ParametersVB allows specifying explicit names for parameters. One place this can be useful is when calling a function that take a boolean. Public Sub DoSomething(ByVal isFinished As Boolean) End Sub b.DoSomething(isFinished:=True) This makes the code much more readable, and eliminates the need for a comment or temporary variable to clarify the code. Optional ParametersVB also allows optional parameters, although using this feature for public methods is discouraged because doing so makes the code non-portable. For example, C# doesn’t support code with optional parameters. Still, optional parameters can eliminate code duplication that would be required with method overloading, and shouldn’t be avoided for internal functionality or when portability isn’t important. Parameter ArraysVB supports creating methods that take 0 or more optional arguments of a single type. For example: Public Sub Foo(ByVal ParamArray args() As String) This function accepts Foo(), Foo(“a”), Foo(“a”, “b”), etc. Powerful SelectThe VB Select statement is similar to the switch statement provided in C-like languages, except that it’s more flexible. Here are examples that use the advanced VB features. Select Case age Case 0, 5, 10 Console.WriteLine("0,5,10") Case Is > 50 Console.WriteLine(">50") Case Is < 10 Console.WriteLine("<10") Case 16 To 19, Is > 30 Console.WriteLine("16-19 or > 30") Case Else Console.WriteLine("else") End Select Dim name As String = "Justin" Select Case name Case "Abe" To "Barney" Console.WriteLine("a-b") Case "Boris" To "Kevin" Console.WriteLine("b-k") Case "Kevin" To "Vincent" Console.WriteLine("k-v") Case Else Console.WriteLine("else") End Select Full Featured Exception CatchingThe .NET framework supports an optional filter for catch statements. VB exposes this functionality allowing you to only catch exceptions when some criteria is met. For example: Try ' do something Catch ex As Exception When LoggingIsEnabled Log(ex) End Try RefactoringVB has some basic Refactoring support built in, and a license for Refactor! which provides even more. I’m generally not a big fan of refactoring tools, but the VB stuff is pretty intuitive and unintrusive. The important ability to rename things is what I use most. Powerful ImportsVB can use the Import statement for more than just namespaces. For example, if I include “Imports System.Console” at the top of a file, then I can call “WriteLine()” directly without having to specify the Console class. SummaryDespite the length of the above list, I’m sure I’ve forgotten some features, but maybe it will entice you to download the free Express version and try it yourself. Overall I think that VB is the best language for most programming, and it should get even better in the future. The next version will likely introduce Type Inference, Lambda Expressions, Closures, LINQ, and other enhancements. In my next post I’ll discuss a large list of new features and changes that I personally would like to see in a future version of VB. January 16 Visual BasicI’ve been wanting to write some posts about Visual Basic for a long time. It’s one of the reasons I finally started blogging. While I haven’t been able to use VB for much beyond personal projects for the last 8 years, I had used it pretty extensively in the past. More recently, I’ve been using VB.NET for various personal projects, and I thought I’d share some of my thoughts and experiences. It seems that VB has always been misunderstood. Versions <= 6 have an undeserved reputation as a “toy” language, and this has carried over to some extent for VB.NET, even though it’s now almost identical to Java or C#. I feel VB was also misunderstood as a tool only suitable for database front-end applications. I actually never felt it was a very good tool for writing these types of programs. MS Access (which I suppose can be considered a VB dialect) was a far better choice, because of its improved support for bound controls and built-in reporting tools. For years VB was the most logical choice for many types of applications, but my problem was that it wasn’t terribly suitable for the kinds of programming that I prefer. With the advent of VB for .NET, I think VB may actually be the best choice for writing many applications that have been traditionally developed in C++. The Java community has proved that it’s at least possible to write Databases, Compilers, Development Tools, and other systems programs using a similar platform, and from a purely technical perspective, I think Visual Basic is a better choice than C# or Java for many of these kinds of applications. In coming posts, I’ll detail my reasons for believing so as well as a pretty exhaustive list of what I think could/should be done to make VB even better in the future. In my next post, I’ll discuss what I think are the best features of VB. December 12 Lambda Expressions In Visual BasicPaul Vick had a post last week, where he solicited feedback on syntax for this feature that will likely be included in a future version of Visual Basic. http://www.panopticoncentral.net/archive/2006/12/08/18587.aspx#FeedBack I originally posted my opinion in a comment on his blog, but it apparently didn't make it past the censors. He proposed 3 possible syntax variations, but I'm not found of any of them. My proposal is a fourth style, and I think it has a lot of merit. What's a Lambda Expression?This is probably better explained elsewhere, but basically it as an anonymous function. For example, given the following code, ... Class Test Private myNames As List(Of Name) ... Public Sub PrintLegalNames() For Each n As Name In myNames Console.WriteLine(n.ToString()) Next End Sub End Class we could replace the explicit loop using List.ForEach. Public Sub PrintLegalNames() myNames.ForEach(AddressOf WriteName) End Sub However, for this to work, we have to provide a suitable WriteName function. Public Sub WriteName(ByVal n As Name) Console.WriteLine(n.ToString()) End Sub My proposal is to support the following syntax. Public Sub PrintLegalNames() myNames.ForEach(Console.WriteLine(ByVal(0).ToString())) End Sub The lambda expression is just a normal expression that uses parameters and return statements inline. Here are some more examples that show multiple parameters, functions, and ByRef parameters compared to roughly equivalent code. Dim ary(0 to 9) As Integer Array.Find(ary, Return ByVal(0) = 2) Function Find2(ByVal x As Integer) As Boolean Return x = 2 End Function Array.Find(ary, AddressOf Find2) Array.ForEach(ary, ByRef(0) += 1) Sub AddOne(ByRef x As Integer) x += 1 End Sub Array.ForEach(ary, AddressOf AddOne) Array.Sort(ary, ByVal(1) >= ByVal(0)) Function Less(ByVal lhs As Integer, ByVal rhs As Integer) As Boolean Return lhs < rhs End Function Array.Sort(ary, AddressOf Less) What about closures?It wasn't mentioned in Pauls post, but I think closures are also a necessary feature so that the following code will work. Sub SendPartyInvitation(ByVal state As State) ... Dim legal As Integer = state.LegalDrinkingAge Dim names As List(Of Name) = CreateList() ... tmp = List.FindAll(names, Return ByVal(0).Age >= legal) ... End Sub The key feature, is that the "legal" local variable is available for use within the lambda expression, which would not be the case if each expression where simply an anonymous function. November 18 Creating Text Files - DownloadsThe code (create_text_file.zip) can now be found in my SkyDrive public folder.
To build the C++ samples, you'll need to create project files using MPC. The C++/CLI and VB.NET projects use Visual Studio 2005 (The free Express edition will probably work.) The Java projects use Eclipse and Java 1.5.
MPC is a program that a coworker and I designed to make working with C++ projects much easier. Most of the C++ code that we write needs to run on a wide variety of platforms, and supporting all the different build tools can be a pain. Rather than including project files or makefiles for VC++, nmake, bmake, gmake, etc, we just create simple text files that can be used to generate any of the above.
How simple?
Here's the complete set of MPC files for all seven C++ projects:
create_text_file/create_text_file.mpb -- This is a Base project that is used to set defaults.
project {
after += timer libs += timer libpaths += ../timer includes += ../timer } This basically says that all projects that derive from this base are dependent on a timer library found at ../timer.
create_text_file/create_text_file.mwc -- This is a workspace file used to group related projects
workspace {
cmdline += -static implicit = create_text_file exclude { FileStreamDotNet } } This says to automatically create a project in any suitable subdirectories containing either C++ source code or .mpc files. Each implicitly created project will automatically inherit from the .mpb above. We also exclude one directory, because it contains a C++/CLI .NET project.
create_text_file/timer/timer.mpc -- This is an explicit project file
project {
staticname = timer
}
This file causes a project to be created with the name timer. There are staticname and sharedname keywords, allowing you to specify different names depending on whether you generate a project to create a dynamic or static library. If only one name is specified then the other defaults to the same name.
Actually, the timer.mpc file is totally unnecessary, because it merely specifies the same behavior as the default.
I also explicitly specify that a static library will be created by including a cmdline += -static in the .mwc.
You can read more about MPC and download it here.
To run MPC, you'll also need Perl.
To create the Visual Studio 2005 version of these projects unzip create_text_file.zip to a directory, and run mwc.pl like so:
c:\create_text_file>mwc.pl -type vc8
This should create a .SLN file which can then be used to build and run the projects.
November 17 Creating Text Files - Coming Soon...I forgot to include the source code in my previous posts, but now I can't figure out an easy way to do it. Why would my blogging server allow posting pictures, but not other files? It looks like I'm going to have to find some other server to hold the files. What a hassle. November 16 Creating Text Files - ConclusionSCSI BluesOn the one SCSI machine, the write speed was 2-5MB/s regardless of implementation or configuration options if WriteThrough was enabled. This probably just means that the drive does a more thorough job of honoring the WriteThrough setting. However the SCSI drive should have been able to achieve something close to its 125MB/s specified transfer rate.
One thing I tried to improve the performance on the SCSI machine was to install Windows Server 2003 R2. This had no benefit until I enabled a new “Enable advanced performance” option for the disk driver. This seemed to help quite a bit with the slowest tests now in the 9MB/s range. This is still quite a bit slower than the fastest system (#1) however. results (win32 2000MB)Write Through + Disable Caching + Defrag = 9MB/s @ <1% CPU As above with 16MB buffer = 53MB/s @ <1% CPU Write Through + Defrag + 16MB buffer = 58MB/s @ 28% CPU Defrag = 55MB/s @ 20% CPU Defrag + 16MB buffer = 47MB/s @ 21% CPU compared to system #1Write Through + Disable Caching + Defrag = 55MB/s @ <2% CPU As above with 1MB buffer = 68MB/s @ <1% CPU (>1MB made no difference) Write Through + Defrag + 1MB buffer = 68MB/s @ 8% CPU Defrag = 90MB/s @ 10% CPU Defrag + 1MB buffer = 56MB/s @ 6% CPU A Grain of SaltThe performance measurements above paint a certain picture which can be a little misleading. One thing they don’t show is subjective disk thrashing that seemed to occur without Defragmenting. They also don’t show how repeatable the results were from run to run. The numbers only represent the best result I was able to achieve with the given options, but some tests were very consistent, while others varied widely. For instance the .NET results seemed to vary more than the others. I also found that the Defrag option made a bigger difference on my older home machine #2, than it did on my work machine #1. I suspect this was due to improvements in the newer version of the hard disk. Before putting too much stock in these measurements, you should run them using your own platforms and compliers. Conclusions
System 1 Results for 100MB
Creating Text Files - Other Languages and PlatformsSome of the C++ programs above (fast_ofstream in particular) were fairly difficult to implement, and I wanted to see what could be achieved using a more rapid development tool. So far I’ve implemented the test in Visual Basic 2005, and Java (2 ways). Adam Mitz (a coworker) also contributed a C++/CLI implementation for comparison. I also tested many of the programs on Macintosh as well as Linux, and these platforms achieved similar benefits. In the future I’d like to try the tests under Mono on Linux, and possibly implement a Java version using FileOutputStream, since NIO didn’t seem to provide an advantage. I ran tests on multiple systems.
The results quoted throughout this article are those from the first configuration above, and this system unsurprisingly also gave the best performance. All of these systems gave very similar relative results with the exception of #3, which seemed due primarily to the use of a SCSI disk drive. Visual Basic 2005I chose to implement the test in VB, because I find the language and IDE allow me to write more quickly than other .NET alternatives. I think the VB language has a lot of problems (such as some of the keyword names), but I find that case insensitivity, line orientation, English keywords, and other features make it much easier to work with than the C-like alternatives. In the end it only took a few hours to create the VB version of the program, and it was actually this program that led me to create the fastest C++ programs. I was pretty satisfied with C++ fast_ofstream when it was running at 40MB/s, but the VB program was much faster with much less development effort. This led me to develop the C++ FileStream program which then led to further improvements in fast_ofstream.
One problem with .NET is that the System.FileStream object doesn’t support disabling the system cache. I was therefore unable to measure that option. Another problem is that all Strings are Unicode, and there is some small overhead outside the main loop to encode them as ASCII. While this may affect this test, it also would be trivial to change the VB program to write out UTF8, UTF16, UTF32, and other formats. I wouldn’t even want to try this with the C++ version. For what it’s worth the VB program, despite the apparent verbosity of the language, was actually shorter than most of the C++ versions. Sub WriteSampleFile() Dim pt As New PerfTimer Dim fname As String = OUT_DIR & FILESIZE_MB & "MB.txt" Dim fs As FileStream = CreateFile(fname) Dim numLines As Long Dim totalBytes As Int64 = FILESIZE_MB * 1024L * 1024L Using fs Console.WriteLine("Creating {0}MB file.", FILESIZE_MB) If DEFRAGMENTED Then Dim reserve As Long = ((totalBytes \ BUFSIZE) + 1) * BUFSIZE fs.SetLength(reserve) End If Dim enc As Text.Encoding = Text.Encoding.ASCII Dim msg1() As Byte = enc.GetBytes("All work and no play makes ") Dim msg2() As Byte = enc.GetBytes(" a dull boy." & Environment.NewLine) Dim tabs() As Byte = {9, 9, 9, 9, 9, 9, 9, 9} Dim names As List(Of Byte()) = CreateNames() Do numLines += 1 fs.Write(tabs, 0, CInt(numLines Mod tabs.Length)) fs.Write(msg1, 0, msg1.Length) Dim name() As Byte = names.Item(CInt(numLines Mod names.Count)) fs.Write(name, 0, name.Length) fs.Write(msg2, 0, msg2.Length) Loop Until fs.Position >= totalBytes fs.SetLength(totalBytes) End Using pt.PrintElapsed("Done. ") Dim tp As Double = CalcThroughput(FILESIZE_MB, pt.ElapsedWall) Console.WriteLine("Wrote {0} lines at {1} MB/s", numLines, tp) End Sub C++/CLII want to thank my coworker Adam Mitz for contributing a C++/CLI version of the program. This version was a little faster, but .NET performance varies quite a bit, and the VB version and C++ version have about the same performance overall.
int main(array<String^>^) { using IO::FileStream; PerfTimer pt; String^ fname = gcnew String(OUT_DIR); fname += FILESIZE_MB; fname += "MB.txt"; long long totalBytes = FILESIZE_MB * 1024 * 1024; Console::WriteLine("Creating {0}MB file.", FILESIZE_MB); FileStream^ fs = CreateFile(fname); if(DEFRAGMENTED) { long long reserve = static_cast<long long>(((totalBytes / (double)BUFSIZE) + 1) * BUFSIZE); fs->SetLength(reserve); } array<unsigned char>^ msg1 = Text::Encoding::ASCII->GetBytes("All work and no play makes "); array<unsigned char>^ msg2 = Text::Encoding::ASCII->GetBytes(" a dull boy.\n"); array<unsigned char>^ tabs = {9, 9, 9, 9, 9, 9, 9, 9}; array<array<unsigned char>^>^ names = { Text::Encoding::ASCII->GetBytes("Jack"), Text::Encoding::ASCII->GetBytes("Justin Michel"), Text::Encoding::ASCII->GetBytes("Fred Flintstone"), Text::Encoding::ASCII->GetBytes("Barney Rubble"), Text::Encoding::ASCII->GetBytes("Homer J. Simpson"), Text::Encoding::ASCII->GetBytes("John Jacob Jingleheimer Schmidt") }; long numLines(0); do { ++numLines; fs->Write(tabs, 0, numLines % tabs->Length); fs->Write(msg1, 0, msg1->Length); array<unsigned char>^ name = names[numLines % names->Length]; fs->Write(name, 0, name->Length); fs->Write(msg2, 0, msg2->Length); } while(fs->Position < totalBytes); fs->SetLength(totalBytes); pt.PrintElapsed("Done. "); double tp = CalcThroughput(FILESIZE_MB, pt.ElapsedWall); Console::WriteLine("Wrote {0} lines at {1} MB/s", numLines, tp); return 0; } Java NIOI first implemented the Java version using NIO rather than the much simpler FileOutputStream, because the latter didn’t appear to support the necessary features to create defragmented files, and I thought that NIO should give the best potential performance. Later, I rewrote the program using FileOutputStream, because the complexity of using NIO didn’t seem to have any perceivable benefit. The first thing I noticed is that this implementation of the test program was far more difficult than any of the others. This is despite the fact that I have the most recent experience with Java, and had even implemented a project using NIO within the last six months. The biggest problems involved the complication of dealing with ByteBuffer and String encoding. Compare the code needed to send a String encoded as ASCII in VB.NET. Dim enc As Text.Encoding = Text.Encoding.ASCII Dim buf() As Byte = enc.GetBytes("This is a test.") fs.Write(buf, 0, buf.Length) To the equivalent Java NIO.
Another problem with the Java implementation is that I couldn’t find a good way to implement most of the features. There’s no capability to specify Write Through, Disable Caching, or Sequential options. Furthermore the FileChannel.truncate() method doesn’t support extending a file. I had to resort to using the same Defragment method that I used for ofstream.
I didn’t know a way to measure CPU usage in Java, but it seemed to be about 100%. The much simpler FileOutputStream implementation gave similar performance.
Next, I'll try to wrap this all up with some conclusions and a summary of what we've learned.
Creating Text Files - fast_ofstream and Custom FileStreamfast_ofstreamAs we saw with the analysis of the ofstream program we can achieve close to optimal throughput, but not with advanced features such as defragmenting that we require. One solution that doesn’t involve starting from scratch is to write a custom std::streambuf implementation that can be used with the standard C++ io streams. The application using fast_ofstream was able to write a defragmented file with a single loop using tellp() at 75MB/s. If we enable Write Through then the Defragment option becomes important. reserveBy using our own streambuf implementation, we can implement a better Defragment feature. We use an interface similar to std::vector in which we call a reserve() method passing the desired size in bytes. This is much better, because although we allocate space for the reserved size, we don’t actually have to use the space or know for sure how big the file will be. Any unused space will automatically be reclaimed when the stream is closed. tellp()Since we’re implementing our own buffer, we can make tellp() more efficient, thereby allowing us to use the simpler code from the original example while still retaining most of the performance benefits. final results (100MB)
The speed is close to what we observed with the best case using the default ofstream, but we’re able to use more advanced features, the most important being Defragment. Defragment is crucial to avoid overall file system fragmentation and to ensure optimal read speed. The current fast_ofstream implementation is not very robust, and it’s possible that even faster results could be achieved. The biggest remaining problems are:
Custom FileStreamIt should be possible to achieve even better performance by avoiding the standard C++ iostream interfaces completely. I created a simple class as a test case. enum FS_OPTIONS { fsNone = 0, fsDisableCaching = 1, fsWriteThrough = 2, fsSequential = 4 }; class FileStream { FileStreamImpl* impl_; public: FileStream(const std::string& fname, FS_OPTIONS opts); ~FileStream(); void write(const std::string& s); void write(const std::string& s, std::string::size_type offset, std::string::size_type length); unsigned long long size() const; void reserve(unsigned long long bytes); private: FileStream(const FileStream&); FileStream& operator=(const FileStream&); }; final results (100MB)
For many applications this could be worth the effort of implementing a custom FileStream. Next I'll discuss the use of other languages and platforms.
Creating Text Files - Optimizing std::ofstreamWhere to start?Just to make sure I was on the right track, I decided to write a quick proof-of-concept Windows program to verify that I could come close to the desired performance. This program works by creating a 64K buffer filled with ‘X’ characters, and writes it repeatedly until it gets a file of the desired size. The program supports the following options.
With this test program I was able to achieve 77MB/s with caching disabled and 145MB/s with caching enabled for 1000MB test files. Smaller files were even faster, sometimes approaching 1200MB/s effective speeds with caching enabled. For this test, the Sequential Scan option didn’t seem to make much difference. This makes sense, because I’m already accessing the file in fairly large chunks. It’s possible that this flag is of primary benefit when reading smaller amounts of data, or that the hint is ignored by my drivers and/or hardware. The Write Through and Disable Caching options seem to have about the same effect. Without these options the program is obviously faster than the disk can physically handle. The Defragment option was a little less conclusive. When I had all the above options turned on, I was only able to write at 65MB/s, but turning off Defragment only slowed it down to 63MB/s. However, without Write Through the Defragment option was good for 8MB/s difference. Despite this, it was very apparent when the Defragment option was enabled from a disk noise perspective, and as I mentioned previously, file fragmentation is of paramount importance when reading the files. Optimizing std::ofstreamNow that I had some idea of what was possible given the underlying OS, I decided to attempt to speed up the original ofstream program. Some of the changes that made the most difference were surprising, and should be helpful when optimizing any program that uses ofstream. defragmentofstream does not provide an option to directly create a large file, but by using the following statements I was able to create a defragmented file.
Unfortunately, this also slowed the program to 5.5MB/s from the original 11MB/s. For some applications this might be a worthwhile tradeoff, but hopefully this optimization will work better when combined with others. buffer sizeOne problem with the existing program is that it takes 1200+ writes per MB to build the file. We should be able to improve performance by telling the ofstream to use a larger buffer.
When I ran this program with BUFSIZE=64K I was surprised to find that it slowed from 11MB/s to 1MB/s. What I discovered is that smaller BUFSIZE values actually increased performance. By using 1K I was able to achieve 15MB/s. This is a good example of why you should always test for performance before making design decisions based on faulty assumptions. This is not premature optimization. Addressing these issues early in a project before it’s too late is just good engineering. Interestingly, BUFSIZE=64K resulted in 1037 writes/MB while BUFSIZE=1K resulted in 2000 writes/MB. Also, turning on the defragment option had no additional negative performance impact when using BUFSIZE=64K. use ios::binaryOpening the stream in binary mode results in a speed increase to 23.5MB/s. This prevents automatic translation of ‘\n’ to “\r\n”, but has no other effect as far as I can tell. Furthermore, the Defragment option only slows this to 19MB/s, and the Buffer Size option now works as originally expected, decreasing the number of disk writes to 16 per MB (100MB/64K=16) resulting in 26MB/s. Combining Defragment with Buffer Size was still 19MB/s however. writing single charsChanging the loop that writes out tabs to write ‘\t’ instead of “\t” results in .7MB/s speedup. better indentationAlthough writing ‘\t’ instead of “\t” was an improvement, we can do better by creating a string of tab characters, and then writing a variable number of them. const string tabs(MAX_TABS, ‘\t’); … int indent = num_lines % MAX_TABS; if (indent > 0) { out.write(tabs.c_str(), indent - 1); } This results in 32MB/s with all options except Defragment enabled. eliminate tellp()When I first ran the (so I thought) fully optimized version of the program on my Mac Mini under OSX I was surprised to find that I could only get .8MB/s. A little tinkering with the code revealed that the call to tellp() was taking most of the time. I eliminated tellp() and the Mac Mini jumped to 40MB/s. const string tabs(MAX_TABS, '\t'); const string msg1 = "All work and no play make "; const string msg2 = " a dull boy.\n"; // Calculate num_lines in separate loop first for (int bytes = 0; bytes < num_bytes;) { ++num_lines; bytes += num_lines % MAX_TABS; const string& name = names[num_lines % names.size()]; bytes += name.length(); bytes += msg1.length(); bytes += msg2.length(); } for (unsigned line = 1; line <= num_lines; ++line) { int indent = line % MAX_TABS; if (indent > 0) { out.write(tabs.c_str(), indent - 1); } const string& name = names[line % names.size()]; out << msg1 << name << msg2; } The first loop duplicates the logic in the original loop without actually writing anything. This allows the second loop to use a simple for() loop, because we now know exactly how many lines to write. This is the first change that really complicated our code, but it also has a big performance benefit. The program now writes at 60MB/s. writing stringsOne surprising thing I found was that writing std::string can be significantly slower than writing const char*. Simply changing the above code to write: out << msg1.c_str() << name.c_str() << msg2.c_str(); resulted in a speedup to 75MB/s. Making this change before the tellp() change did not have any significant effect. ostream::writeLooking at the source code for operator<<(ostream& out, const char* str) showed that it essentially calls “out.write(str, strlen(str))”. We already know the length of each string, so we should be able to write faster using: out.write(msg1.c_str(), msg1.length()); out.write(name.c_str(), name.length()); out.write(msg2.c_str(), msg2.length()); This resulted in a speedup to 80MB/s defragment revisitedWith all other options in place writing the file defragmented now writes at 40MB/s. final results (100MB)
The biggest remaining problems are:
Performance Quiz - Creating Text FilesI have an interesting programming challenge. Write a program to create a sample text file of a specified size in megabytes. The program should write out as many lines as possible of the format “[tabs]All work and no play makes [name] a dull boy.”, which you may recognize as the text that Jack Torrance types repeatedly in 1980’s The Shining It seems appropriate. The program should provide a little variation to the lines by starting each line with a varying number of tabs, and alternating the name in the sentence. Here’s a C++ program that does the job: #include <iostream> #include <fstream> #include <string> using namespace std; const int FILE_BYTES = 100 * 1024 * 1024; const int MAX_INDENT = 8; const int NUM_NAMES = 6; int main(int argc, char* argv[]) { const string names[NUM_NAMES] = { "Jack", "Justin Michel", "Fred Flintstone", "Barney Rubble", "Homer J. Simpson", "John Jacob Jingleheimer Schmidt" }; ofstream out("sample.txt", ios_base::out); for (unsigned num_lines = 1; out.tellp() <= FILE_BYTES; ++num_lines) { for (unsigned int i = 0; i < num_lines % MAX_INDENT; ++i) { out << "\t"; } const string& name = names[num_lines % NUM_NAMES]; out << "All work and no play makes " << name << " a dull boy.\n"; } return 0; } When I run this program, I find that it’s able to generate a 100MB file at almost 11 MB/s, however I noticed several issues that indicate this could be improved.
It was surprising to me that the above program was CPU-bound. After all, there’s not much apparent processing going on. Few calculations are done within the loop, and it writes the same short strings over and over. If this simple program uses 100% of the CPU then any less trivial program is only going to use more. Ideally we’d like to have close to 0% CPU usage so that the program could make use of the CPU for other functionality. For example, an application with a logging facility using code similar to that above would spend all the limited computer resources on logging with nothing left over for running the actual application. The noisy disk activity provides a clue to the biggest single problem. C++ ofstream will automatically grow the file as needed, but if you think about it this can only lead to excessive file fragmentation. The OS has no idea how big the file is going to get, so it will allocate small chunks as needed. What we’d like is some way to tell the OS that we’re preparing to write 1GB so that it can allocate one nice consecutive chunk. This should allow optimal write speed assuming we write everything sequentially, but more importantly it will help prevent disk fragmentation and allow the file to be read in much faster. (Previous testing has shown that a fragmented file can take 10-15 times longer to read.) Another reason for the excessive disk activity is that ofstream seems to use a small buffer. I know this because my real test program has instrumentation that measures Disk, CPU, and Memory usage, and it shows that a 100MB file is written using 125,820 disk writes. I found the following review of my hard disk, which indicates that I should be able to achieve about 75MB/s presumably with caching disabled. http://techreport.com/reviews/2006q2/raptor-wd1500/index.x?pg=12 I also checked the specs on the manufacturers website, which claim that 84 MB/s should be possible. The other problem with using ofstream is that you don’t have any mechanism for specifying advanced features. For example, in Windows you can create a file with flags that specify Compression, Encryption, Temporary (Tells the OS to avoid writing to a physical file if possible), DeleteOnClose, DisableSystemCache, Sequential/Random access (A hint to the cache), DisableDriveCache, and many other options. In my next post, I'll attempt to optimize the program while still using std::ofstream.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|