More Big Data Certifications to Prime Your Employment Pump
Editor's Note: Last week, David Telford discussed Spark-specific Big Data certs. This week, he's back for another dive into Big Data.
All over the world, organizations large and small are gearing-up for the Internet of Things.
We're seeing it already. The apps on your phone send data back to their servers, to help their servers serve you. We're fairly used to that, but what about when your washing machine finishes your laundry and pings your phone to let you know the clothes are ready to be dried? Will it also send a few points of data back to HP so they can improve the next model?
Will your stove let General Electric know whether you're more likely to burn or undercook your spaghetti, and how you like your eggs? Will your fridge let Frigidaire know which foods are most likely to spoil without being eaten?
The adoption of IPv6 means there will soon be a connection between every single one of your smart devices (phone, tablet, toaster) and the internet. It also means that the company servers that monitor those devices will be receiving floods of data, far surpassing anything before. Data sets like that can't be processed by humans; in fact, they can't even be processed by most computers. This is the challenge of Big Data.
Put simply, big data is the IT field dealing with really, really big data sets; how to receive, how to process, how to analyze. Big data incorporates elements from many other disciplines and creates something new with them. There's a new kind of IT professional in town, somebody who can code solutions to process massive sets and draw useful conclusions from them; the Data Scientist. Sound interesting?
Great! Below, we've compiled five certifications that can help you break into the field. Be warned that in a complicated field such as big data, there really isn't any such thing as an entry-level certification. If you don't have any experience with Apache Hadoop or, at the very least, a working understanding of database administration (including shell scripting and SQL), you might want to start there first. Otherwise, happy hunting!
Cloudera provides its own Hadoop-based solutions, and is easily the biggest big data solution provider in the world. If you work with big data, the chance that you'll be working with or around some manner of Cloudera distribution is almost a certainty, and because it's built on Hadoop your skills will be largely transferrable even if you don't. Cloudera's most respected cert is the CCP:DS, but just for getting started, the Spark and Hadoop Developer certification is a better bet.
The exam for this certification is 10 to 12 tasks that you'll be required to solve in two hours or less. You'll need knowledge of Scala and Python programming languages to pass, as well as be able to use the Apache data transfer tool Sqoop to import and export data to a MySQL database. You can click the link above for a full description of the exam.
Registration is $295 USD, and the exam itself requires a reliable internet connection, Google Chrome, and a webcam. To prepare, Cloudera recommends it's Cloudera Developer Training for Spark and Hadoop program.
A grand majority of big data servers use MySQL servers, and luckily Oracle provides a certification for that. The OCP: MySQL 5.6 Database Administrator Certification certifies your ability to create MySQL Servers and perform typical administrative tasks on them, such as database security, optimization, and replication. If you're not familiar with SQL, or if your own experience may be out of date, this would be a good exam to study for.
Registration for the exam costs $245 USD. The exam is administered in PeasonVUE testing centers, and the format is 100 multiple-choice questions with a time limit of 150 minutes. Oracle recommends that applicants have practical experience, and they also provide a useful exam guide.
MCSE: Business Intelligence
Microsoft needs no introduction, and its certifications are generally well accepted. As part of its MCSE series, the Business Intelligence certification indicates its holder has the ability to build and implement a data solution for their organization.
To earn this certification, applicants will need to pass five exams based on Microsoft's SQL server. Expect questions on administration, querying, data models, data warehousing and reporting. Your first three exams can get you the Microsoft Certified Solutions Associate (MCSA) SQL Server 2012 certification by themselves, and then the last two will earn you the expert level.
Microsoft recommends applicants engage in Microsoft training courses to prepare, and they also provide an exam guide. The exams are administered by Pearson VUE and will set you back $150 USD a piece. You can click the link above for a full overview.
Designed to measure the applicant's ability to develop MapReduce solutions in Java, this particular certification might be the most specific. Although MapR is a respectable member of the big data community, it's not as well known as some of the other vendors on this list, meaning that this certification is more of a guide on the material than something that's going to hold a lot of clout in an interview.
The exam covers writing MapReduce programs, using MapReduce API, and managing, monitoring and testing MapReduce programs and workflows.
MapR recommends that applicants have two years of Java programming experience, and they also recommend their own DEV 301 course. The exam itself is 60 to 80 questions in two hours or less, and registration is $250 USD. It is proctored via internet and webcam.
EMC: Data Science Associate (EMCDSA)
An industry leader in big data, EMC Corporation also provides many other services related to cloud computing and analytics. Their EMC Proven Professional Certification program is well-established and respected, and among the 12 different tracks they offer, the EMCDSA is the big one for data scientists. There's quite a bit to go over in each exam, so I've included links to the exam overviews below:
Applicants can earn this certification with either of two exams; E20-007 (Data Science and Big Data Analytics) for the Associate or E20-005 (Backup Recovery Systems and Architecture Exam) for the Specialist.
Either way, the exam will be 60 questions in 90 minutes or less, and is administered in Pearson VUE testing centers. Registration is $200 USD. EMC provides a number of preparation options, ranging from a practice exam to an instructor-led classroom course.