Wednesday, July 14, 2010

How to Think Like a Programmer: Part 3

This is the third of a three part series regarding how to think like a programmer.

One of the biggest mistakes that I continually see in the world of technology is the problem of short term expediency versus long term solutions. All too often, a programmer takes shortcuts based on what is easiest, what tools are in his toolbox, or what technologies are currently popular. These kinds of compromises lead to systems that are kludged together becoming legacy code that is nearly impossible to maintain, inefficient, and slow.

Recently on TechRepublic, there was a great discussion on whether or not a developer should try to anticipate their users’ needs. Part of that discussion came around to the idea of just how generic the code should be, which really balances the trade off between initial development time and long term flexibility. For example, in an OOP language, the programmer can implement every piece of code as a class, ignoring the chance that in the future their might be a need for that class to have been an interface. On the other hand, to write everything as an interface and only have one class implement it adds and extraordinary amount of time to the development process. It is indeed a fine line.

One example of where this tradeoff never should be made is in the readability of the code. I do not care what kind of deadline you are under, there is simply no excuse for naming variables “a,” “b,” and “c,” the extra bytes will not break your budget. In fact, for every half second saved with writing the code, you are probably adding three minutes to the debugging time.

Another shortcut that causes big problems down the road is the inefficient usage of libraries. It is all too tempting to load up a million libraries for just one, easily replicated function. The end result is an application that does not scale well. Take that little Web application you wrote, the one with a million dependencies, and look at its memory usage. “But it is only using a few hundred kilobytes!” you say. Multiply that by what happens if your application suddenly gets very popular, or used in a large corporate environment. Multiply “only using a few hundred kilobytes” by 70,000 simultaneous users, and all of a sudden your systems administrator is banging on your door and using words like “refactor the code.” If you are loading a whole library for just one small call or two, ask yourself if that is something you really need, or could possibly re-write within your existing code. That is one nice thing about open source, you can take what you need and leave the rest behind.

XML (and similar technologies) are another pitfall. I know that I keep harping and nagging about XML, but it bear repeating since it seems like everyone likes it so much. XML is easy to use, there are plenty of libraries and built in functions in most major languages now to handle it. But it is ridiculously inefficient, both as a transport mechanism, and in terms of systems resource usage. XML manages to get the worst of all worlds; it involves a zillion more bytes worth of delimiters than a standard flat file, and uses a tree structure that is CPU intensive (to say the least!) to parse. If you find yourself reaching for the XML scheme for tabular data, think again. Take the twenty minutes to write and test a CSV creator on the data end and a CSV parser on the client end. Not only will you save yourself the overhead of those XML functions, but the size of the data will be significantly smaller, and the parsing will be faster by an exponential amount. Even better yet, if you have the same language on each end, serialize your data structures and pass them around. It may be bigger than a flat file, but it will be even lighter on the CPU. Microsoft learned this lesson in the Live applications; they switched from XML to JSON and the speed went up dramatically.

It has been my experience that debugging and testing often take up at least 25% of the programming process, and frequently extend to 50%. Code is relatively trivial to write if you start with a solid knowledge of the language, the business goals, and some quality pseudo code (which you should be writing anyways to verify that the program meets the customers’ needs). Spending a few extra moments to reduce unnecessary lines of code, ensuring consistent variable naming, giving variables proper names, skipping implicit operators unless they are obviously being used, and so on goes a very long way towards making the debugging and code review process go as smoothly as possible. What is the use of saving twenty minutes coding if you increase your debugging and maintenance by 10%?

Also learn about how your interpreter or compiler handles conditional statements and loops. It is all too easy to write one of these statements in a way that causes your application to run much slower than needed out of laziness or ignorance. For example, if the language you are using evaluates conditionals from right to left, put the conditions that are most likely to make or break the condition on the right. Why evaluate conditions that are not likely to make a difference? The same holds true for loops. Sure, it is easier to write something like: for (intRowCounter = 0; intRowCounter < tableDataSet.Rows.Count - 1; intRowCounter++) { but your users will be much, much happier if instead you assign tableDataSet.Rows.Count - 1 to a variable and check that variable. This eliminates the software from having to keep going down the object tree to find the count of the row object, and then subtracting from it. Little things like this add up very quickly, especially in a commonly run block of code.

Programming with this type of thinking requires a wholesale shift away from the short term, towards long term thinking. I have listed only a few tips here, but I think you get the idea. Ignoring the long term consequences of your coding for short term expediency is the path to buggy and slow code that you will regret ever having written, and your users will wish you never wrote it.

No comments:

Post a Comment