Hortonworks Beta a Huge Step for Hadoop on Microsoft WindowsBy Marco Shaw
When talking about big data, Apache Hadoop is almost always discussed as the underlying platform for project endeavors.
Since Hadoop is open-source, it almost goes without saying that most implementations are on Linux. Although the source code is available and Hadoop is programmed using the Java programming language, Linux is a logical choice, especially for proof-of-concepts.
You can download the Hadoop source, use your own preferred Linux distribution, and fire up your own Hadoop cluster. But that can seem like a daunting task for some.
Fortunately, just like Linux has Red Hat, Hadoop has a relatively long list of vendors that offer packaged Hadoop services. These vendors also offer packaged up virtual machines with Linux and Hadoop pre-installed that you can download.
But what if your organization doesn’t know the first thing about Linux and you want to get into big data? What if you don’t know the first thing about Java programming? When evaluating new projects, one of the first things you need to think about is “in-house skills.” Does it really make sense to roll out a big data analytics project on Linux and Hadoop when you’re a 100 percent “Microsoft shop”?
Big Data Without Linux
Fortunately, for those heavily invested in Microsoft technologies, Hadoop on Microsoft Windows took a big step yesterday: Hortonworks announced a beta of its Hortonworks Data Platform (HDP) for Microsoft Windows.
I was a bit surprised by the Hortonworks-only announcement, because in 2011, Hortonworks and Microsoft announced a partnership to bring Hadoop to the Windows platform. In October last year, as a result of the partnership, Microsoft announced public test builds of Windows Azure HDInsight Service for its Windows Azure public cloud service and HDInsight Server for Windows for regular Windows Server installations.
Now with HDInsight and HDP, it’s not yet clear what the similarities and differences are, but I’m sure these things will come to light as the public has more time to use the Hortonworks beta. However, Microsoft has said publicly that the two products will be differentiated by their level integration and support.
Back to my initial thoughts about how a Microsoft shop might deal with all of these new technologies when considering a big data project or initiative. For the most part, to get the best performance from Hadoop, it means having to write MapReduce code in Java, but what if you don’t have Java coders? One of the benefits of having Microsoft on-board is that the company is quite focused on developer features, and it has released a .NET-based SDK for Hadoop. That means you can leverage all of your previous Microsoft Windows skills, like administration and programming, so your new project can be a success!
HDInsight vs. Hortonworks Data Platform
I have one initial comment about the install experience of HDInsight versus HDP. I’ve installed HDInsight a few times before and did again as I was writing this. The user experience provided by HDInisight to get Hadoop on Windows up and running is currently miles ahead of HDP. If you just go to the Microsoft Big Data website, it’s just a few clicks to launch the Microsoft Web Platform Installer (WebPI) and let an automated install take over.
On the other hand, once I installed HDP, I tried to double-click the MSI and got a warning message about having to pass a “response file” on the command line. I checked out the Hortonworks site briefly, and found that I needed to manually install a few pre-requisites. It wasn’t anything that appeared overly difficult to accomplish, but the WebPI automation is a nice touch, especially when you’re trying to be as productive as possible. I’m not sure whether WebPI has an offline feature or not though, which could be one positive for the HDP procedure, so that a manual and more controlled install is available.
I can’t wait to see what Microsoft brings to the table for big data. Even if HDP turns out to be a better or more accepted implementation, the tools and integration possibilities seem to be pretty exciting for the data analytics space.
About the Author
Marco Shaw is an IT consultant working in Canada. He has been working in the IT industry for over 12 years. He was awarded the Microsoft MVP award for his contributions to the Windows PowerShell community for 5 consecutive years (2007-2011). He has co-authored a book on Windows PowerShell, contributed to Microsoft Press and Microsoft TechNet magazine, and also contributed chapters for other books such as Microsoft System Center Operations Manager and Microsoft SQL Server. He has spoken at Microsoft TechDays in Canada and at TechMentor in the United States. He currently holds the GIAC GSEC and RHCE certifications, and is actively working on others.