Apache Spark, a high-speed analytics engine for the Hadoop distributed processing framework, is now available to plug into the YARN resource management tool.
This development means that it can now be easily deployed along with other workloads on a Hadoop cluster, according to Hadoop specialist Hortonworks.
Released as version 1.0.0 at the end of May, Apache Spark is a high-speed engine for large-scale data processing, created with the aim of being much faster than Hadoop’s better-known MapReduce function, but for more specialised applications.
Hortonworks vice president of Corporate Strategy Shaun Connolly told The INQUIRER, “Spark is a memory-oriented system for doing machine learning and iterative analytics. It’s mostly used by data scientists and high-end analysts and statisticians, making it a sub-segment of Hadoop workloads but a very interesting one, nevertheless.”
As a relatively new addition to the Hadoop suite of tools, Spark is getting a lot of interest from developers using the Scala language to perform analysis on data in Hadoop for customer segmentation or other advanced analytics techniques such as clustering and classification of datasets, according to Connolly.
With Spark certified as YARN-ready, enterprise customers will be able to run memory and CPU-intensive Spark applications alongside other workloads on a Hadoop cluster, rather than having to deploy them in separate a cluster.
“Since Spark has requirements that are much heavier on memory and CPU, YARN-enabling it will ensure that the resources of a Spark user don’t dominate the cluster when SQL or MapReduce users are running their application,” Connolly explained.
Meanwhile, Hortonworks is also collaborating with Databricks, a firm founded by the creators of Apache Spark, in order to ensure that new tools and applications built on Spark are compatible with all implementations of it.
“We’re working to ensure that Apache Spark and its APIs and applications maintain a level of compatibility, so as we deliver Spark in our Hortonworks Data Platform, any applications will be able to run on ours as well as any other platform that includes the technology,” Connolly said.
Apache Software Foundation released an advisory warning that a patch issued in March for a zero-day vulnerability in Apache Struts did not fully patch the bug. Apparently, the patch for the patch is in development and will be released likely within the next 72 hours.
Rene Gielen of the Apache Struts team said that once the release is available, all Struts 2 users are strongly recommended to update their installations. ASF provided a temporary mitigation that users are urged to apply. On March 2, a patch was made available for a ClassLoader vulnerability in Struts up to version 220.127.116.11. All it took was an attacker to manipulate the ClassLoader via request parameters. However Apache admitted that its fix was insufficient to repair the vulnerability. An attacker exploiting the vulnerability could also cause a denial-of-service condition on a server running Struts 2.
“The default upload mechanism in Apache Struts 2 is based on Commons FileUpload version 1.3 which is vulnerable and allows DoS attacks. Additional ParametersInterceptor allows access to ‘class’ parameter which is directly mapped to getClass() method and allows ClassLoader manipulation.”
It will be the third time that Struts has been updated this year. In February, the Apache Struts team urged developers to upgrade Struts 2-based projects to use a patched version of the Commons FileUpload library to prevent denial-of-service attacks.
Intel has released its Apache Hadoop distribution, claiming significant performance benefits through its hardware and software optimisation.
Intel’s push into the datacentre has largely been visible with its Xeon chips but the firm works pretty hard on software as well, including contributing to open source projects such as the Linux kernel and Apache’s Hadoop to ensure that its chips win benchmark tests.
Now Intel has released its Apache Hadoop distribution, the third major revision of its work on Hadoop, citing significant performance benefits and claiming it will open source much of its work and push it back upstream into the Hadoop project.
According to Intel, most of the work it has done in its Hadoop distribution is open source, however the firm said it will retain the source code for the Intel Manager for Apache Hadoop, the cluster management part of the distribution. Intel said it will use this to offer support services to datacentres that deploy large Hadoop clusters.
Boyd Davis, VP and GM of Intel’s Datacentre Software Division said, “People and machines are producing valuable information that could enrich our lives in so many ways, from pinpoint accuracy in predicting severe weather to developing customised treatments for terminal diseases. Intel is committed to contributing its enhancements made to use all of the computing horsepower available to the open source community to provide the industry with a better foundation from which it can push the limits of innovation and realise the transformational opportunity of big data.”
Intel trotted out some impressive industry partners that it has been working with on the Hadoop distribution and while the firm’s direct income from the Hadoop distribution will come from support services, the indirect income from Xeon chip sales is likely what Intel is most looking towards as Hadoop adoption grows to manage the extremely large data sets that the industry calls “big data”.
Dell is offering access to its Zinc ARM based server to the Apache Software Foundation for development and testing purposes.
Dell had already shown off its Copper ARM based server earlier this year and said it intends to bring ARM servers to market “at the appropriate time”. Now the firm has allowed the Apache Software Foundation access to another Calxeda ARM based server codenamed Zinc.
Dell’s decision to give the Apache Software Foundation access to the hardware is not surprising as it is the organisation that oversees development of the popular Apache HTTPD, Hadoop and Cassandra software products, all applications that are widely regarded as perfect for ARM based servers. The firm said its Zinc server is accessible to all Apache projects for the development and porting of applications.
Forrest Norrod, VP and GM of Server Solutions at Dell said, “With this donation, Dell is further working hand-in-hand with the community to enable development and testing of workloads for leading-edge hyperscale environments. We recognize the market potential for ARM servers, and with our experience and understanding of the market, are enabling developers with systems and access as the ARM server market matures.”
Dell didn’t give any technical details on its Zinc server and said it won’t be generally available. However the firm reiterated its goal of bringing ARM based servers to the market, though given that it is trying to help the Apache Foundation, a good indicator of ARM server viability will be when the Apache web server project has been ported to the ARM architecture and has matured to production status.
Java Developers looking for a mobile-friendly platform could be happy with the next release of IBM’s Websphere Application Server, which is aimed at offering a lighter, more dynamic version of the app middleware.
Shown off at the IBM Impact show in Las Vegas on Tuesday, Websphere Application Server 8.5, codenamed Liberty, has a footprint of just 50MB. This makes it small enough to run on machines such as the Raspberry Pi, according to Marie Wieck, GM for IBM Application and Infrastructure Middleware.
Updates and bug fixes can also be done on the fly with no need to take down the server, she added.
The Liberty release will be launched this quarter, and already has 6,000 beta users, according to Wieck.
John Rymer of Forrester said that the compact and dynamic nature of the new version of Websphere Application Server could make it a tempting proposition for Java developers.
“If you want to install version seven or eight, it’s a big piece of software requiring a lot of space and memory. The installation and configuration is also tricky,” he explained.
“Java developers working in the cloud and on mobile were moving towards something like Apache Tomcat. It’s very light, starts up quickly and you can add applications without having to take the system down. IBM didn’t have anything to respond to that, and that’s what Liberty is.”
For firms needing to update applications three times a year, for example, the dynamic capability of Liberty will make it a much easier process.
“If developers want to run Java on a mobile device, this is good,” Rymer added.
The new features are also backwards compatible, meaning current Websphere users will be able to take advantage of the improvements.
However, IBM could still have difficulty competing in the app server space on a standalone basis, according to Rymer.
“Red Hat JBoss costs considerably less, and there’s been an erosion for IBM as it’s lost customers to Red Hat and Apache. Liberty might have an effect here,” he said.
“But IBM wins where the customer isn’t just focused on one product. It will never compete on price, but emphasises the broader values of a platform or environment.”
IBM will be demoing Websphere running on Raspberry Pi at Impact today.
The open source software project has reached the milestone of its first full release after six years of development. Hadoop is a software framework for reliable, scalable and distributed computing under a free licence. Apache describes it as “a foundation of cloud computing”.
“This release is the culmination of a lot of hard work and cooperation from a vibrant Apache community group of dedicated software developers and committers that has brought new levels of stability and production expertise to the Hadoop project,” said Arun Murthy, VP of Apache Hadoop.
“Hadoop is becoming the de facto data platform that enables organizations to store, process and query vast torrents of data, and the new release represents an important step forward in performance, stability and security,” he added.
Apache Hadoop allows for the distributed processing of large data sets, often Petabytes, across clusters of computers using a simple programming model.
The Hadoop framework is used by some big name organisations including Amazon, Ebay, IBM, Apple, Facebook and Yahoo.
Yahoo has significantly contributed to the project and hosts the largest Hadoop production environment with more than 42,000 nodes.
Jay Rossiter, SVP of the cloud platform group at Yahoo said, “Apache Hadoop will continue to be an important area of investment for Yahoo. Today Hadoop powers every click at Yahoo, helping to deliver personalized content and experiences to more than 700 million consumers worldwide.”
Google wants to make the Web faster. As well as optimizing its own sites and services to run at blazing speed, the company has been helping to streamline the rest of the Web, too. Now Google has released free software that could make many sites load twice as fast.
The software, called mod_pagespeed, can be installed and configured on Apache Web servers, the most commonly used software for running websites. Once installed, mod_pagespeed determines ways to optimize a site’s performance on the fly. For example, it will compress images more efficiently and change settings so that more of the pages are stored in a user’s browser cache, so that the same data doesn’t have to be loaded repeatedly. The software will be automatically updated, notes Richard Rabbat, product manager for the new project. He says that this means that as Google and others make improvements, people who install it will benefit without having to make any changes.
“We think making the whole Web faster is critical to Google’s success,” says Rabbat. Making the Web faster should encourage people to use it more and increase the likelihood that they will use Google’s services and software. Rabbat points to the frustration that people feel when they click a link or type a URL and see a blank page for several seconds. “In many cases,” he says, “I’ll navigate away when that happens.”
Google already offers a tool called Page Speed that measures the speed at which a website loads and suggests ways to make improvements. “We asked ourselves, instead of just telling people what the problems are, can we just fix it for them automatically?” Rabbat says.
The software could be particularly useful to operators of small websites. Such people may not have the skill or time to optimize their site’s performance themselves. It should also be useful for companies that use content management systems to operate their websites and lack the technical capabilities needed to make speed improvements to Web server software themselves.
Google tested mod_pagespeed on a representative sample of websites and found that it made some sites load three times faster, depending on how much optimization had already been done.
Speeding up the Web has a clear financial payoff for Google. “If websites are faster, Google makes more money,” says Ed Robinson, CEO of Aptimize, a startup that also provides software that automatically optimizes Web pages, much as Google’s new offering does. Robinson explains that the faster a website is, the more pages users will view, and the more ads Google can serve—on its search pages or through its ad networks. Because the company’s reach is so wide, even small improvements can add up to massive revenue gains for the Web giant. He adds, “Making the Web faster is the logical next step for moving the Web forward.”