Average Income per Programming Language

Update 8/21:  I’ve gotten a lot of feedback about issues with these rankings from comments, and have tried to address some of them here The data there has been updated to include confidence intervals.

———————————————————————————————————

A few weeks ago I described how I used Git commit metadata plus the Rapleaf API to build aggregate demographic profiles for popular GitHub organizations (blog post here, per-organization data available here).

I was also interested in slicing the data somewhat differently, breaking down demographics per programming language instead of per organization.  Stereotypes about developers of various languages abound, but I was curious how these lined up with reality.  The easiest place to start was age, income, and gender breakdowns per language. Given the data I’d already collected, this wasn’t too challenging:

  • For each repository I used GitHub’s estimate of a repostory’s language composition.  For example, GitHub estimates this project at 75% Java.
  • For each language, I aggregated incomes for all developers who have contributed to a project which is at least 50% that language (by the above measure).
  • I filtered for languages with > 100 available income data points.

Here are the results for income, sorted from lowest average household income to highest:

Language Average Household Income ($) Data Points
Puppet 87,589.29 112
Haskell 89,973.82 191
PHP 94,031.19 978
CoffeeScript 94,890.80 435
VimL 94,967.11 532
Shell 96,930.54 979
Lua 96,930.69 101
Erlang 97,306.55 168
Clojure 97,500.00 269
Python 97,578.87 2314
JavaScript 97,598.75 3443
Emacs Lisp 97,774.65 355
C# 97,823.31 665
Ruby 98,238.74 3242
C++ 99,147.93 845
CSS 99,881.40 527
Perl 100,295.45 990
C 100,766.51 2120
Go 101,158.01 231
Scala 101,460.91 243
ColdFusion 101,536.70 109
Objective-C 101,801.60 562
Groovy 102,650.86 116
Java 103,179.39 1402
XSLT 106,199.19 123
ActionScript 108,119.47 113

Here’s the same data in chart form:

Language vs Income

Most of the language rankings were roughly in line with my expectations, to the extent I had any:

  • Haskell is a very academic language, and academia is not known for generous salaries
  • PHP is a very accessible language, and it makes sense that casual / younger / lower paid programmers can easily contribute
  • On the high end of the spectrum, Java and ActionScript are used heavily in enterprise software, and enterprise software is certainly known to pay well

On the other hand, I’m unfamiliar with some of the other languages on the high/low ends like XSLT, Puppet, and CoffeeScript.  Any ideas on why these languages ranked higher or lower than average?

Caveats before making too many conclusions from the data here:

  • These are all open-source projects, which may not accurately represent compensation among closed-source developers
  • Rapleaf data does not have total income coverage, and the sample may be biased
  • I have not corrected for any other skew (age, gender, etc)
  • I haven’t crawled all repositories on GitHub, so the users for whom I have data may not be a representative sample

That said, even though the absolute numbers may be biased, I think this is a good starting point when comparing relative compensation between languages.

Let me know any thoughts or suggestions about the methodology or the results.  I’ll follow up soon with age and gender breakdowns per language in a similar fashion.

197 thoughts on “Average Income per Programming Language

  1. XSLT is a pretty old technology, not even a language strictly speaking (it allows to convert XML documents into HTML using a set of XML rules), which didn’t get much hype. I think that there isn’t so many developers which have the level to maintain them, and so there are probably well paid.

    However, I would have thought that older languages such as Haskell would be much more paid for the same reason, so maybe that’s not the real reason.

    1. > “XSLT is a pretty old technology, not even a language strictly speaking ”

      XSLT is a language, it is even Turing Complete (http://www.unidex.com/turing/utm.htm). It’s a functional language (like Erlang or Clojure) rather than a procedural one (like Java or PHP). It might be the declarative style of functional languages that makes you think otherwise.

      > “it allows to convert XML documents into HTML using a set of XML rules”

      Actually it allows the transformation of XML to different XML. One of the most common uses for this is to transform XML into XHTML, but that’s just a use case.

    2. Not sure why you believe XSLT is not really a language “strictly speaking.” XSLT 2.0 is a Turing-complete language. In fact, it’s probably the most consistently popular functional programming languages outside of academia.

      Not necessarily suggesting we all go out and learn XSLT; while it is technically Turing-complete that doesn’t mean it’s *easy* to do everything you might like to do in it. However it is a powerful, expressive (if entirely too verbose…) language.

      1. > In fact, it’s probably the most consistently popular functional programming languages outside of academia.

        You mean, apart from SQL and Excel macros? 😛

  2. Average salary is bias towards higher salaries so it is not useful as a metric in this regard.

    What is the median salary?

      1. When people say ‘Average’, they mean ‘Sum of all numbers divided by number of numbers’, which is the Mean.

        I have never heard of Median and Mode being referred to as averages.

  3. Coffeescript is a new Javascript framework.
    XSLT is a language used for querying XML documents. I’m familiar with it from the digital library world, but that doesn’t pay that much. Could it be one or two organisations with a large codebase throwing off your analysis?

  4. It looks like the most data points are found in the middle of the field. The extremes at both ends are underrepresented (except Java). That is a common finding, because the probability of deviation from a “true norm” would be highest in the smallest sample.

    The big question is: How significantly do these datapoints deviate from the null hypothesis of “all programming languages average the same pay).

  5. XSLT transforms XML from one schema to another, so it’s widely used by “middleware”; ie. translation layers between GiantXmlSystemA and GiantXmlSystemB, the kind of things enterprise contractors make.

    I would have expected Haskell to be higher, considering that many financial institutions use and contribute to functional languages (Haskell, OCaml and variants).

    1. Other people have pointed out that many financial institutions don’t allow / encourage open-source contributions, which could certainly skew the numbers here. Unfortunately that’s a limitation of the data here, which only uses open-source contributions.

  6. I work in a Java shop, we use XSLT to render all of our templates. I think XSLT and Java showing up together makes a lot of sense.

    In my experience many Java programmers also use AS3, the syntax is similar to Java and if you need to do something AS3 is good at it is a natural choice for a Java shop.

  7. It would be interesting to see median (which is statistically more significant) and standard deviation (which gives us error bars). Could you please add those (or equivalently publish the data set so we can play with it)?

  8. XSLT is a terrible and difficult language, so salaries must be high for such developers.
    Puppet is simple, but the demand for it is very, very high.
    Coffeescript is cool, demand for it is high.

  9. Your post suggests the differences in averages reflect some real differences in the compensation of programmers. Yes, the numeric values obtained in your analysis are not exactly the same. The question is rather: Are they really different?

    To illustrate this: assume person A and B are each tossing the same 3 dice, obtaining [1,3,4], and [2,4,5] as results. Averages are 8/3 and 11/3 respectively — yes, they do differ. But did the dice change in the meantime?? Or will person A consistently get a lower average than person B? Yet this is exactly the kind of conclusion your post appears to suggest…

    There’s an established method to do this check — see http://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing-two-samples/v/hypothesis-test-for-difference-of-means for an example. Any statistics package nowadays supports this test.

    Also — why use averages at all? Would median, mode or some percentile not be more appropriate?

  10. 1. XSLT is an XML transformation language, used quite a bit in enterprise software.

    2. Puppet isn’t so much a language as a deployment and automation toolkit. Probably doesn’t belong on this list.

    3. CoffeeScript is an alternative to JavaScript that compiles to JavaScript. Very popular in the Ruby on Rails and Node.js circles, but gaining wider popularity in other web stacks as well.

  11. XSLT is a language for basically doing XML -> XML transforms, like turning XML from a file or database into HTML to be served to the user, or into a structure that a different program can use. Used a lot in web services/enterprise. I’ve never heard of the others.

  12. XSLT is an XML Transformation language and is mostly used in enterprise as well. Especially in the “Service” sector where IT plays an enormous role. Transformation of XML data from one format to another is what I see regularly here in the service industry which usually pays pretty decent.

    However, XSLT programming is mostly done in conjunction with another language, like C# or Java which explains why it’s among the higher ranking ones.

  13. I’ve only used XSLT to generate reports off data before. XSLT can be used to get XML from one format to another, and it can be used to generate PDFs. The high salary for it is surprising, but it could be a few people getting specialized jobs.

  14. XSLT is XML data transformation.
    CoffeeScript generates javascript, it looks like python mixed with haskell
    Puppet controls server cluster automated installs, not sure why it pays so little.

  15. XSLT is for transforming XML into things like other XML or even XHTML and web pages. It is used heavily in enterprise. The feel good example is that your back end computing would generate XML which would then get converted into many different formats for consumption: web sites for all sorts of clients, APIs/web services, etc.. In general it mostly gets used for web services. At a large enterprise, to do one user initiated action, you may end up contacting a dozen XML/SOAP web services all spitting out XML and having to coordinate their transaction witch each server having streaming XML processing with dozens of filters inserting things like security and authentication and rate limiting. Hackers more prefer REST and JSON services so wouldn’t encounter it often.

    Puppet is used by system admins to manage multiple servers at once, something like that. I’m a programmer, so I don’t know much about it. It’s from a different profession. CoffeeScript is syntactic sugar for JavaScript. It allows shorter, cleaner, syntax and things like that. It’s use is kind of dead to big enterprise, though, because big corps often have thousands of programmers they can’t even get to use a modern language or platform, let alone some fancy hacker variant of an established language. If you are a big corp with 1,000 Windows CE programmers, half those may refuse to learn any new language, and you might only be able to afford bringing in trainers for 100 to get them even up to being web programmers, and that’s for HTML and JS and CSS, no one would waste time and money on CoffeeScript.

  16. Background: mostly games industry, but lots of enterprise like Sony and General Dynamics as well. I have personally known a lot of Actionscript developers and I can say that without a doubt, this is not the highest paid language in the industry. I usually code in a combination of C++, C#, Lua, Javascript, PHP, and SQL. My salary is usually off your scale. But I have never seen any Actionscript job that pays anywhere close to what I earn. Java jobs also tend to top out around 100k but there may be some that pay better than that.

    Another huge surprise in your chart is Cold Fusion… really? This isn’t a dead technology per se, but it’s demand in industry is near 0. Take a look at Indeed.com’s website. I can’t claim that Indeed.com is an oracle, but based on broad sampling of keywords, they show that Cold Fusion and Actionscript are dying.
    http://www.indeed.com/jobtrends?q=HTML5%2C+cold+fusion%2C+c%2B%2B%2C+Java%2C+Actionscript&l=

  17. It would be interesting to see these broken down by City/Region. For example, folks in Midwestern cities generally do not get paid as highly as those in West Coast cities. Cool stats though. Keep it up!

  18. XSLT is used a lot in Enterprise. XML ‘n all that. Coffeescript is reviled by a lot of JavaScript developers; it’s also very, very new — not many established companies putting Coffeescript codebases into production. Puppet is probably more often used by techops/sysadmin type roles.

  19. I for one would like to know where an XSLT programmer could be banking $100k+. If that’s real, then there’s a gaggle of XSLT programmers in Pittsburgh that would like to talk to someone 😉

  20. I don’t think this methodology can provide reliable data.

    Just one example:
    According to these results any backend developer which is not using Java or Perl is earning less than a frontend developer which writes CSS.

  21. XSLT is likely on the high end because of it’s use in Enterprise CMSs like HPs Teamsite/Livesite. Puppet isnt a language so much as a framework for sys admins. It is interesting that CoffeeScript is lower then JS since they are the same thing. Likely skewed based on the number of people being represented by each.

  22. Any ideas on why these languages ranked higher or lower than average?

    Smaller samples show higher variability. This is the same reason that small schools dominate tables as both better and worse performers; why different poorly-populated rural areas turn up as both the highest and lowest average health.

  23. Very interesting! I blog about that popularity of various languages. I focus mostly those related to data analysis at http://bit.ly/statpop, but I do start out pointing to the various sites that cover programming language popularity measures in general. This is the first work I’ve seen that tied salary to it.

    I think Puppet is used to manage Linux systems and someone would not bother to use it unless they managed quite a few systems. That much responsibility would likely come with a high salary.

    Thanks for the interesting perspective!

    Cheers,
    Bob Muenchen

  24. Coldfusion is used in a lot of government contracting jobs, but is on the decline. Puppet is relatively new and used more by system admins than developers. You can try to correlate some data points from monster salary / craigslist.

  25. Puppet is less a language and more a configuration management system. These project strike me as strange outliers. XSLT is an XML stylesheet language; without looking at the projects it’s hard to say why they might be positioned so high, but it probably goes along with your enterprise software assumption. CoffeeScript is like a shorthand version of Javascript (wow, that’s a horrible overgeneralization); I think its popularity is largely in the Web 2.0 startup world, though I might be making some lousy assumptions there, too.

  26. Coffeescript is a language that generates JavaScript as its target. The idea is to let you write code in a cleaner manner than raw JavaScript. Puppet is (as far as I know) a configuration language for managing/administering multiple machines. XSLT is a style sheet language for XML

    I’m struggling with the motion that someone would be using only one of these things and still be earning those salaries.

  27. In my experience, XSLT is primarily used for plumbing between XML services, which mostly means enterprise (especially SOA).

  28. XSLT and CoffeeScript are generally used by high-skilled web frontend developers, this explains higher ratings.

  29. Just a theory on some of your unknown languages:

    Puppet: server config / devOps automation scripting. Tools of the relatively new devOps field.
    CoffeeScript: a syntax sugar language that compiles to javascript.

    For the above languages, I have a feeling that due to their new-ishness they are more welcomed in open source projects willing to chance new tech and not Enterprise, which skews the salary.

    On the contrary, XSLTs are used to consume/transmit the crusty XML/SOAP web services that Enterprise juggernauts love to expose to the world, so that will pull more enterprise salaries.

  30. XSLT is an XML transformation language, also huge in Enterprise, particularly middleware (TIBCO/Oracle Fusion). In fact I would have thought this variation would be by far the highest paying, but I’m willing to bet this is obfuscated under the various branded extensions of XML

  31. Humble suggestion: the graph would benefit from being redone to show the Y axis starting at zero. Otherwise, it distorts the relative values of the data. For example, in the current graph Puppet is 22px tall and ActionScript is 298 px tall. This visually implies income from ActionScript is ~13.5 times as much as Puppet. In reality, it is only ~1.2 times as much.

    A best practice quote from Edward Tufte, one of the most respected writers on the display of quantitative information, says:

    “The representation of numbers, as physically measure on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.”

    Cheers!

    1. Yeah, the chart is from google docs which decided to adjust the axis to keep things pretty. I’ll try to swap it out with an unscaled version.

      1. I hear ya, it’s kinda frustrating that google docs is always doing that to folks. No sweat. Have a great day!

    1. I guess GitHub doesn’t consider SQL a language, I didn’t find any repositories with SQL in the language breakdown.

Leave a reply to Gregory Patmore Cancel reply