Apache Spark, a high-speed analytics engine for the Hadoop distributed processing framework, is now available to plug into the YARN resource management tool.
This development means that it can now be easily deployed along with other workloads on a Hadoop cluster, according to Hadoop specialist Hortonworks.
Released as version 1.0.0 at the end of May, Apache Spark is a high-speed engine for large-scale data processing, created with the aim of being much faster than Hadoop’s better-known MapReduce function, but for more specialised applications.
Hortonworks vice president of Corporate Strategy Shaun Connolly told The INQUIRER, “Spark is a memory-oriented system for doing machine learning and iterative analytics. It’s mostly used by data scientists and high-end analysts and statisticians, making it a sub-segment of Hadoop workloads but a very interesting one, nevertheless.”
As a relatively new addition to the Hadoop suite of tools, Spark is getting a lot of interest from developers using the Scala language to perform analysis on data in Hadoop for customer segmentation or other advanced analytics techniques such as clustering and classification of datasets, according to Connolly.
With Spark certified as YARN-ready, enterprise customers will be able to run memory and CPU-intensive Spark applications alongside other workloads on a Hadoop cluster, rather than having to deploy them in separate a cluster.
“Since Spark has requirements that are much heavier on memory and CPU, YARN-enabling it will ensure that the resources of a Spark user don’t dominate the cluster when SQL or MapReduce users are running their application,” Connolly explained.
Meanwhile, Hortonworks is also collaborating with Databricks, a firm founded by the creators of Apache Spark, in order to ensure that new tools and applications built on Spark are compatible with all implementations of it.
“We’re working to ensure that Apache Spark and its APIs and applications maintain a level of compatibility, so as we deliver Spark in our Hortonworks Data Platform, any applications will be able to run on ours as well as any other platform that includes the technology,” Connolly said.
Apache Software Foundation released an advisory warning that a patch issued in March for a zero-day vulnerability in Apache Struts did not fully patch the bug. Apparently, the patch for the patch is in development and will be released likely within the next 72 hours.
Rene Gielen of the Apache Struts team said that once the release is available, all Struts 2 users are strongly recommended to update their installations. ASF provided a temporary mitigation that users are urged to apply. On March 2, a patch was made available for a ClassLoader vulnerability in Struts up to version 184.108.40.206. All it took was an attacker to manipulate the ClassLoader via request parameters. However Apache admitted that its fix was insufficient to repair the vulnerability. An attacker exploiting the vulnerability could also cause a denial-of-service condition on a server running Struts 2.
“The default upload mechanism in Apache Struts 2 is based on Commons FileUpload version 1.3 which is vulnerable and allows DoS attacks. Additional ParametersInterceptor allows access to ‘class’ parameter which is directly mapped to getClass() method and allows ClassLoader manipulation.”
It will be the third time that Struts has been updated this year. In February, the Apache Struts team urged developers to upgrade Struts 2-based projects to use a patched version of the Commons FileUpload library to prevent denial-of-service attacks.
Database company SkySQL has announced a $20m round of funding to develop the open source software MariaDB database fork of MySQL.
The Series B funding round was led by Intel Capital to support developing the MariaDB database into “a world-class database to challenge American rivals such as IBM and Oracle”.
With the merger of SkySQL, founded by ex-members of the MySQL team, and MariaDB architect Monty Programme back in April, the new company has been seeking ways to develop the open source project, including increased back end support and scalability of the MariaDB server software.
Other investors in the consortium are California Technology Ventures, Finnish Industry Investment, Open Ocean Capital and Spintop Private Partners alongside the lead California investors.
“Adoption of the MariaDB database server has grown explosively in the last year,” said SkySQL CEO Patrik Sallner. “With the help of our loyal user base, we have built up significant market share when compared to other open source database technologies.”
Sallner added, “For large-scale internet players like Google and Wikipedia, MariaDB database server delivers clear benefits over existing relational databases.
“With this funding we plan to deliver commercial solutions that make it even easier for enterprises of any size to run MariaDB databases at scale.”
Since its formation in 2010, SkySQL has attracted some blue chip clients including Craigslist, EA, HP and Disney.
Intel has released its Apache Hadoop distribution, claiming significant performance benefits through its hardware and software optimisation.
Intel’s push into the datacentre has largely been visible with its Xeon chips but the firm works pretty hard on software as well, including contributing to open source projects such as the Linux kernel and Apache’s Hadoop to ensure that its chips win benchmark tests.
Now Intel has released its Apache Hadoop distribution, the third major revision of its work on Hadoop, citing significant performance benefits and claiming it will open source much of its work and push it back upstream into the Hadoop project.
According to Intel, most of the work it has done in its Hadoop distribution is open source, however the firm said it will retain the source code for the Intel Manager for Apache Hadoop, the cluster management part of the distribution. Intel said it will use this to offer support services to datacentres that deploy large Hadoop clusters.
Boyd Davis, VP and GM of Intel’s Datacentre Software Division said, “People and machines are producing valuable information that could enrich our lives in so many ways, from pinpoint accuracy in predicting severe weather to developing customised treatments for terminal diseases. Intel is committed to contributing its enhancements made to use all of the computing horsepower available to the open source community to provide the industry with a better foundation from which it can push the limits of innovation and realise the transformational opportunity of big data.”
Intel trotted out some impressive industry partners that it has been working with on the Hadoop distribution and while the firm’s direct income from the Hadoop distribution will come from support services, the indirect income from Xeon chip sales is likely what Intel is most looking towards as Hadoop adoption grows to manage the extremely large data sets that the industry calls “big data”.
Big Blue wants to take on competitors such as Oracle and Hewlett Packard by offering a cheap and cheerful Power Systems server and storage product range.
Rod Adkins, a Senior Vice President in IBM’s Systems & Technology Group said the company was was rolling out new servers based on its Power architecture with the Power Express 710 starting at $5,947. He said that the 710 is competitively priced to commodity hardware from Oracle and HP.
Adkins added that IBM is expanding its Power and Storage Systems business into SMB and growth markets. The product launches on Tuesday. IBM said it will start delivering by February 20.
Adding to an already considerable set of cloud IT offerings, Amazon has unveiled a hosted data warehouse service called Redshift, pitching it as a lower-cost alternative to on-premise data warehouse deployments.
“Anyone who has used a traditional old-guard data warehouse solution knows that it is really expensive and complicated to manage,” said Andy Jassy, senior vice president of Amazon Web services, who announced the new offering at the company’s AWS re: Invent conference being held this week in Las Vegas. In contrast, Redshift “is about the tenth of a cost of [a] traditional data warehouse,” Jassy said. “It automates the deployment and administration and works with popular business intelligence tools.”
A limited preview version of the service is now available. Amazon said it will launch the service commercially in early 2013.
Redshift works with a number of business intelligence (BI) applications, including software packages from Microstrategy, SAP, IBM and Jaspersoft. Users would use one of these BI packages to parse data in the Amazon cloud, using PostgreSQL drivers along with ODBC (Open Database Connectivity) and JDBC (Java Database Connectivity) APIs.
Users can store up to 1.6 petabytes, in either 2 terabyte or 16 terabyte nodes, up to 100 nodes. The data will be stored in a columnar format so the “queries will be much faster,” Jassy said.
Amazon will offer the service on a pay-as-you-go billing basis, or for slightly less expensive rates by reserving the service ahead of time. Prices start at US$0.85 per hour for ad-hoc querying and decline from there for greater usage. On the whole, the service could cost as little as $1,000 per year per terabyte of data, compared to an average cost of $19,000 to $25,000 per terabyte per year to maintain data warehouse operations in-house, Jassy noted.
Dell is offering access to its Zinc ARM based server to the Apache Software Foundation for development and testing purposes.
Dell had already shown off its Copper ARM based server earlier this year and said it intends to bring ARM servers to market “at the appropriate time”. Now the firm has allowed the Apache Software Foundation access to another Calxeda ARM based server codenamed Zinc.
Dell’s decision to give the Apache Software Foundation access to the hardware is not surprising as it is the organisation that oversees development of the popular Apache HTTPD, Hadoop and Cassandra software products, all applications that are widely regarded as perfect for ARM based servers. The firm said its Zinc server is accessible to all Apache projects for the development and porting of applications.
Forrest Norrod, VP and GM of Server Solutions at Dell said, “With this donation, Dell is further working hand-in-hand with the community to enable development and testing of workloads for leading-edge hyperscale environments. We recognize the market potential for ARM servers, and with our experience and understanding of the market, are enabling developers with systems and access as the ARM server market matures.”
Dell didn’t give any technical details on its Zinc server and said it won’t be generally available. However the firm reiterated its goal of bringing ARM based servers to the market, though given that it is trying to help the Apache Foundation, a good indicator of ARM server viability will be when the Apache web server project has been ported to the ARM architecture and has matured to production status.
Java Developers looking for a mobile-friendly platform could be happy with the next release of IBM’s Websphere Application Server, which is aimed at offering a lighter, more dynamic version of the app middleware.
Shown off at the IBM Impact show in Las Vegas on Tuesday, Websphere Application Server 8.5, codenamed Liberty, has a footprint of just 50MB. This makes it small enough to run on machines such as the Raspberry Pi, according to Marie Wieck, GM for IBM Application and Infrastructure Middleware.
Updates and bug fixes can also be done on the fly with no need to take down the server, she added.
The Liberty release will be launched this quarter, and already has 6,000 beta users, according to Wieck.
John Rymer of Forrester said that the compact and dynamic nature of the new version of Websphere Application Server could make it a tempting proposition for Java developers.
“If you want to install version seven or eight, it’s a big piece of software requiring a lot of space and memory. The installation and configuration is also tricky,” he explained.
“Java developers working in the cloud and on mobile were moving towards something like Apache Tomcat. It’s very light, starts up quickly and you can add applications without having to take the system down. IBM didn’t have anything to respond to that, and that’s what Liberty is.”
For firms needing to update applications three times a year, for example, the dynamic capability of Liberty will make it a much easier process.
“If developers want to run Java on a mobile device, this is good,” Rymer added.
The new features are also backwards compatible, meaning current Websphere users will be able to take advantage of the improvements.
However, IBM could still have difficulty competing in the app server space on a standalone basis, according to Rymer.
“Red Hat JBoss costs considerably less, and there’s been an erosion for IBM as it’s lost customers to Red Hat and Apache. Liberty might have an effect here,” he said.
“But IBM wins where the customer isn’t just focused on one product. It will never compete on price, but emphasises the broader values of a platform or environment.”
IBM will be demoing Websphere running on Raspberry Pi at Impact today.
The open source software project has reached the milestone of its first full release after six years of development. Hadoop is a software framework for reliable, scalable and distributed computing under a free licence. Apache describes it as “a foundation of cloud computing”.
“This release is the culmination of a lot of hard work and cooperation from a vibrant Apache community group of dedicated software developers and committers that has brought new levels of stability and production expertise to the Hadoop project,” said Arun Murthy, VP of Apache Hadoop.
“Hadoop is becoming the de facto data platform that enables organizations to store, process and query vast torrents of data, and the new release represents an important step forward in performance, stability and security,” he added.
Apache Hadoop allows for the distributed processing of large data sets, often Petabytes, across clusters of computers using a simple programming model.
The Hadoop framework is used by some big name organisations including Amazon, Ebay, IBM, Apple, Facebook and Yahoo.
Yahoo has significantly contributed to the project and hosts the largest Hadoop production environment with more than 42,000 nodes.
Jay Rossiter, SVP of the cloud platform group at Yahoo said, “Apache Hadoop will continue to be an important area of investment for Yahoo. Today Hadoop powers every click at Yahoo, helping to deliver personalized content and experiences to more than 700 million consumers worldwide.”