Stages of Programmer Skill
There’s an attitude the tech industry today that you can place anyone who can write code into any position and expect working stuff. To a certain degree, I suppose that’s true—the question is whether the resulting code will work right, well, and reliably; whether it’ll be secure, handle troubles reliably, and be maintainable.
This has lead me to reflect on my 30+ years in tech, and speculate on a pattern along which we grow. I’ll be the first to admit not everyone will grow in precisely the stages I describe, but I think it’s a fair outline.
- 1. The Stages of Skill
- 2. The Problem of Judging Programmers
Whatever the case, whether it’s Python, C, HTML or R, in looking into their little project they see the power, and they want it. The excitement suckers them in, and now their plans expand and new plans form. They start building things.
Neophytes have no sense of design, so they build haphazardly. They don’t use functions or methods or data structures, because they don’t know what they are yet, thus don’t know how to (or even that they should) use them.
For the neophyte, writing code is slow-going and what comes out is clumsy. They don’t know the libraries well, so their code is confined to a sort of Pidgin, utilizing the most common functions available. Consequently, wheels get reinvented; there is more code than necessary to get a task done.
Alternately, a neophyte bodges something together from some snippets they found on the 'net, complete with unneeded junk inherited through their cargo-cult programming.
Either way, they’re dependent on looking up material (be it on Stack Overflow, reference manuals, or finding example code to cookbook from). Code typically has no error handling at all; the neophyte writes code that assumes the expected input, that every call produces only good, correct results. They hard-code data and sizes of buffers and the names of input and output files they want to process.
- features poorly scoped variables, with names that make no sense.
- has separate if-true and if-false conditions instead of using
elseclauses, or uses multiple
ifstatements where a
- may use nested
ifstatements instead of
anding together several conditions or successive duplicate
ifstatements instead of
- comments, if they exist, are long-winded explanations of what’s going on.
What the neophyte builds reflects the process they would go through if they did the work it by hand, and everything went well.
We all have to start somewhere; in the beginning, we all did something like this. It’s how we learn, how we grow out of being a neophyte and move onto…
She has a growing knowledge of libraries, but it’s far from complete. She still reinvents wheels.
She may code using functions, but they are named as incoherently
as her variables. The breakdown of functions makes no sense:
instead of coherent, reusable functions, random sections of code
have been moved out-of-line. Parameters and behavior are
suboptimal: for example, a
ProcFile function that
“processes a file” but instead of being passed an
already-opened file handle on which to work, it’s passed a
filename and has the side-effect of terminating the program if the
file can’t be opened. Or, if it was passed a file handle,
it’ll do something unexpected like closing it before return
(an asymmetric behavior).
Simple data structures appear. There may be a structure, an array, maybe even an array of structures—but nothing truly complex.
The beginner checks some input for validity and checks some error conditions, such as whether a file was successfully opened.
While the beginner codes is a step up from Pidgin, she hasn’t attained fluency yet—she still reasons out what the code does consciously.
Problems you can expect:
- If there is any dynamic allocation going on, it’s probably unchecked and has leaks.
- File handles that get opened may also leaked/not explicitly closed.
- There’s no sense of design, no overall structure; things are thrown together haphazardly.
- Comments, if they exist, aren’t on point.
- Algorithms use brute-force approaches.
- Arrays are fixed-size for expected data sets, ready to fail or cause unexpected behavior in the event of excess data.
The beginner builds a tool similar to that of the neophyte, but her code is slightly more solid; rudimentary checks for user error are in place—invalid input, or a wrong filename. Yet there’s still a frequent assumption that all will go well.
The code becomes more organized. Functions start doing discrete tasks, with fewer side-effects. Related functions are kept together in a source file, or organized into a class if the language supports it.
She begins using a wider range of data structures, and begins using some of the more complex ones—but requires time to get them right. This goes hand-in-hand with a better range of algorithms, producing more elegant and/or efficient solutions: instead of brute-force linear array searches, she may use sorted data with efficient binary searches or, oooh, maybe even hash tables or a database if it’s not too hard to set up.
She has a growing knowledge of the library. She may even attempt to use some abstraction, but will at times use it inappropriately, excessively, or bizarrely.
She begins to develop fluency. She understand different looping constructs, how to iterate over things, how to use data structures. She no longer has to work out what to do, then intentionally translate that into code; the language barrier is gone.
She may even have some self-tests for the code.
She may start using source control.
She names functions, types and variables well. Functions are without side-effects, or the side-effects are clearly documented. Functionality is grouped together in an orderly fashion using classes (or by convention if classes aren’t supported). Globals are rare, except perhaps a singleton instance; class variables and internal functions are private unless there’s good reason otherwise—only a public API is exposed.
She probably uses assert() to document function contracts, or adds them to assist with debugging.
She has good judgement in selecting algorithms and data structures, and can use complex data structures correctly and without stumbling.
She reliably manages dynamic memory, checking allocations for success before use and freeing memory on completion.
She uses abstraction correctly.
If she needs to revise a knot in the code, she refactors it into something cleaner instead of creating a bigger snarl of spaghetti.
She documents using JavaDoc or language-equivalent comments,
processing them with
Doxygen or some other
She uses source control.
She knows the library well, and rarely reinvents things, thus achieves more with less code.
If requirements are lax enough to allow fixed-size arrays, she’ll have a range-check to catch overflows. But she may prefer making it dynamic in the first place; it’s not that painful and avoids all the trouble (bug reports, having to document workarounds, emergency code fixes, potential security issues) when the limit is encountered. Because one day, she knows, it will be fed something unexpected and writing it correctly now will avoid the panic and scramble then.
There will be self-tests for the code.
She has started writing multithreaded code, but it may be susceptible to race conditions and such.
Her weak spots are the unusual error-handling cases.
So while the established programmer might do well with dynamic memory under normal circumstances, there may be little leaks or issues when an error condition is encountered. The experienced programmer gets it right in both cases.
Little sneaks by unchecked: error conditions, buffer vs. data sizes, race conditions, potential numerical overflows.
She has a good sense of the care needed when writing threaded code.
Her code is concise, simple, and clear—and correct, as evidenced by her automated test suite. She aggressively refactors until it meets these criteria.
All her code is kept in source control. And to save herself time, she’s got a check-in hook that automatically rebuilds documentation and runs a build and regression test on code check-in.
Unfortunately, the software industry quite does a poor job of evaluating programmer skill. Hiring managers or HR folks sorting through résumés look for keywords, but don’t understand the subtleties between skill levels. Basically, all they’re asking is, “Can you make this go?”
It would be like watching a couple episodes of Junkyard Wars and deciding that farmers are perfectly serviceable engineers because they manage to cobble together machines that go just as well—probably better than—engineers.
But building one-off machines from junk parts when you’re able to watch over and tweak the resulting machine is very different from building something that works reliably, day after day; that doesn’t have exposed parts ready to rip an arm off; that doesn’t risk burning your house down if it malfunctions while you’re away. There’s more to engineering than making a thing go. Hire those farmers to design your product instead of engineers, and your customers are going to be peeved about your temperamental, hassleful, possibly dangerous product line.
It’s the same thing in software: inexperienced programmers create incomplete code that doesn’t account for possible failures. When lots of this type of code comes together, it results in fragile, unreliable software. Although it works when its environment and input is correct, the code crashes or the application behaves erratically when things are off. Maybe there’s even data loss, or the application scrags its caches or config files, resulting in crashes every times it’s opened because it’s left itself bad data.
Additionally, the Dunning-Kruger effect leads inexperienced programmers overestimate their skills, thinking that they are hot-shit because of their success with little projects; meanwhile, skilled programmers are humble because they’ve observed the maddening, furious pace of the industry, and been around enough to have a handle on just how little they know. Unfortunately, I doubt that HR folk can grok the difference between an overconfident beginner (perhaps with a few hundred hours programming experience in just the niche the HR person wants) and an expert (attained only after multiple thousands of hours hacking on a variety of projects, languages, systems).
I haven’t yet concluded where this leads us, but I’m sure it’s nowhere good.