Hello Julia: New Data Science Programming Language Shows Promise Despite Flaws
If you're a programmer, then you may have run into a problem much like Stefan Karpinski did. I'll tell you the story, and you can see if it sounds familiar.
Stefan was an experienced grad student working on a software tool that could emulate wireless networks, and he was getting frustrated. He was using Matlab to code the linear algebra functionality, but Matlab isn't very good for statistics and visualization. For those parts, he was using R.
Neither language is very fast, so to speed things up he was using C for looping and other procedural optimizations. Finally, Ruby was his "glue," a general purpose tool to keep the other three languages working together. One project, four languages. Later, he would describe it as a "Rube Goldberg" sort of program — complicated, elaborate, but not terribly elegant.
Understandably, Stefan began to wish for a language that could do everything he needed, an all-inclusive language to replace his Frankenstein-ian creation. He mentioned his frustration to fellow grad student Viral Shah, who in turn mentioned it to Jeff Bezanson, a computer scientist specializing in programming languages. Suddenly, the ball was rolling.
That's the origin story of the Julia programming language in a nutshell. Designed by Bezanson, Karpinski, Shah, and Bezanson's graduate advisor at MIT, Alan Edelman, Julia was intended to fix the so-called "two-language" problem in technical computing.
Ideally, a data scientist or mathematician can start and end with Julia, from prototyping to the finished project, without ever having to call on another language. And to some degree, it's working.
This new language's popularity is largely due to three things: support, community, and speed. Though the project is completely open-source, the developers have created a start-up support company for the language. The new business, appropriately called Julia Computing, not only allows the developers to work on Julia full-time, but also helps businesses feel more comfortable adopting the new language because they know they can find assistance.
Those wanting to code in Julia can also seek help from the thriving community, which includes quite a number of MIT students, faculty and alumni (including Bezanson and Edelman themselves), many of whom also contributed to the development of the language. And finally, Julia is primarily used in technical computing, where large data sets mean speed is key. Because Julia is almost as fast as C/++, while having greater functionality, it has seen most of its use in the data science community.
Being open-source is the first of a few powerful features in the Julia arsenal. Getting to the source is quick and straightforward, meaning that users who need grassroots flexibility can get in there and "hack" the language, providing themselves with their own features and tweaks. For advanced programmers, this will be a big advantage over proprietary languages like Matlab, where the source code is unavailable and the feature set unchangeable.
Because its primary usage is in data science, Julia was designed with data parallelism and distributed computation. In other words, cluster computing is fairly straightforward in Julia, an intended feature rather than an afterthought. For the discerning data scientist, the ability to quickly implement cluster computing will be invaluable.
Julia's multiple dispatch (which, like many of the language's features, borrows heavily from Lisp) simplifies the coding process. It also features macros that are described by the official site as "Lisp-like." Another feature that simplifies programming is that the language supports both built-in and user-defined typing with equal vigor.
There are more features that I haven't mentioned here. The main page has a quick list of the developers' favorite features, which you can check out here.
Let's be clear; the goal of Julia is not to compromise between the function of R and the speed of C; rather, it wants to combine them (and also the best features from Matlab, Python, Lisp, Ruby, and virtually all other languages). It seems to be doing remarkably well at its Herculean task, but it's not perfect.
As with most new languages, it struggles with the usual adoption problems. Luckily for the fledgling contender, Julia has a healthy and committed community. Even so, it's been criticized for its lack of libraries, and many programmers aren't willing to switch simply because they can't afford to abandon their old code.
Additionally, Julia has been accused of having a high number of core bugs, and just this April the business intelligence startup Staffjoy announced via blog post that it would have to abandon Julia, citing lack of testing capability and stability issues as its motivation. Responding to the criticism, Julia Computing admitted in an interview with VentureBeat.com that much of it was warranted, but also said Julia has improved immensely since then.
"Many of the criticisms ... were quite legitimate," they said. Still, the future is bright: "We've improved our testing of both packages and the language itself massively since then ... on the whole, we think that Julia is remarkably reliable and stable for such a young language."
Every new programming language faces an uphill battle right from inception, and the trickiest part of that climb is leaving the niche, early-adopter audience and proving itself as a viable mainstream tool. Julia has reached this step. She has a lot going for her: She's not replicating the niche of any other language, she has a dedicated community and defined leadership, and she's remarkably well-designed.
Ultimately, the success or failure of this language depends on its ability to leave behind the bugs and instability of adolescence and take on the reliability of adulthood. Hopefully, the idealistic team that put Julia together can compromise enough to allow that to happen.
Currently, Julia has no viable certifications, though we can look to Julia Computing for these in the future. For more information on Big Data certifications, you can check out our previous breakdowns of the best certifications here and here.