Nvidia has made the latest GPU programming language CUDA 6 Release Candidate available for developers to download for free.
The release arrives with several new features and improvements to make parallel programming “better, faster and easier” for developers creating next generation scientific, engineering, enterprise and other applications.
Nvidia has aggressively promoted its CUDA programming language as a way for developers to exploit the floating point performance of its GPUs. Available now, the CUDA 6 Release Candidate brings a major new update in unified memory access, which lets CUDA applications access CPU and GPU memory without the need to manually copy data from one to the other.
“This is a major time saver that simplifies the programming process, and makes it easier for programmers to add GPU acceleration in a wider range of applications,” Nvidia said in a blog post on Thursday.
There’s also the addition of “drop-in libraries”, which Nvidia said will accelerate applications by up to eight times.
“The new drop-in libraries can automatically accelerate your BLAS and FFTW calculations by simply replacing the existing CPU-only BLAS or FFTW library with the new, GPU-accelerated equivalent,” the chip designer added.
Multi-GPU Scaling has also been added to the CUDA 6 programming language, introducing re-designed BLAS and FFT GPU libraries that automatically scale performance across up to eight GPUs in a single node. Nvidia said this provides over nine teraflops of double-precision performance per node, supporting larger workloads of up to 512GB in size, more than it’s supported before.
“In addition to the new features, the CUDA 6 platform offers a full suite of programming tools, GPU-accelerated math libraries, documentation and programming guides,” Nvidia said.
The previous CUDA 5.5 Release Candidate was issued last June, and added support for ARM based processors.
Aside from ARM support, Nvidia also improved Hyper-Q support in CUDA 5.5, which allowed developers to use MPI workload prioritisation. The firm also touted improved performance analysis and improved performance for cross-compilation on x86 processors.
AMD’s Mantle has been a hot topic for quite some time and despite its delayed birth, it has finally came delivered performance in Battlefield 4. Microsoft is not sleeping it has its own answer to Mantle that we mentioned here.
Oddly enough we heard some industry people calling it DirectX 12 or DirectX Next but it looks like Microsoft is getting ready to finally update the next generation DirectX. From what we heard the next generation DirectX will fix some of the driver overhead problems that were addressed by Mantle, which is a good thing for the whole industry and of course gamers.
AMD got back to us officially stating that “AMD would like you to know that it supports and celebrates a direction for game development that is aligned with AMD’s vision of lower-level, ‘closer to the metal’ graphics APIs for PC gaming. While industry experts expect this to take some time, developers can immediately leverage efficient API design using Mantle. “
AMD also told us that we can expect some information about this at the Game Developers Conference that starts on March 17th, or in less than two weeks from now.
We have a feeling that Microsoft is finally ready to talk about DirectX Next, DirectX 11.X, DirectX 12 or whatever they end up calling it, and we would not be surprised to see Nvidia 20nm Maxwell chips to support this API, as well as future GPUs from AMD, possibly again 20nm parts.
Intel’s NUC is about to get its biggest overhaul yet later this year. The tiny barebones should get Broadwell-based Core i3 and Core i5 processors, but that’s not all.
It appears that Intel is planning to introduce completely redesigned boxes with plenty of new features. Codenamed Rock Canyon, the new NUC kits will feature miniHDMI and miniDP video outputs, allowing triple display support and 4K/UHD support.
In the storage department, Intel went for a standard 2.5-inch bay backed by an M.2 SSD. This means users will be able to use a small SSD as a system drive along with cheap mechanical storage. On the other hand, the M.2 form factor is anything but popular at this point. All new NUCs will feature USB 3.0 and in terms of connectivity they’ll have built-in WiFi and Bluetooth, IR sensor for HTPC remote controls and replaceable lids with NFC and Wireless Charging.
That’s not all though. Rock Canyon is the mainstream kit, but another one is on the way. Maple Canyon is a “professional” unit and it features Intel vPro technology and TPM hardware, but it does not have an IR sensor or lids with wireless charging.
So, while Broadwell probably won’t appear in the form of socketed desktop CPUs, you’ll still be able to buy a Broadwell desktop, albeit a NUC.
Intel has released details about its new Xeon E7 v2 chipset. The Xeon processor E7 8800/4800/2800 v2 product family is designed to support up to 32-socket servers with configurations of up to 15 processing cores and up to 1.5 terabytes of memory per socket.
The chip is designed for the big data end of the Internet of Things movement, which the processor maker projected will grow to consist of at least 30 billion devices by 2020. Beyond two times better performance power, Intel is promising a few other upgrades with the next generation of this data-focused chipset, including triple the memory capacity, four times the I/O bandwidth and the potential to reduce total cost of ownership by up to 80 percent.
The 15-core variants with the largest thermal envelope (155W) run at 2.8GHz with 37.5MB of cache and 8 GT/s QuickPath connectivity. The lowest-power models in the list have 105W TDPs and run at 2.3GHz with 24MB of cache and 7.2 GT/s of QuickPath bandwidth. There was also talk of 40W, 1.4GHz models at ISSCC but they have not been announced yet.
Intel has signed on nearly two dozen hardware partners to support the platform, including Asus, Cisco, Dell, EMC, and Lenovo. On the software end, Microsoft, SAP, Teradata, Splunk, and Pivotal also already support the new Xeon family. IBM and Oracle are among the few that support Xeon E7 v2 on both sides of the spectrum.
GPU shipments in the fourth quarter of 2013 were in the green. Shipments were up 2 percent year-on-year and 1.6 percent sequentially. However, AMD did not have a stellar quarter. According to Jon Peddie Research, AMD’s overall unit shipments were down 10.4 percent last quarter. Intel gained 5.1 percent, while Nvidia was up 3.4 percent.
The attach rate was 137 percent and 34 percent of all PC’s sold in Q4 featured discrete graphics, while 66 percent relied solely on embedded graphics. The research firm pointed out that the overall PC market grew 1.8 percent quarter-on-quarter, but it was still down 8.5 percent compared to a year ago.
“The one bright spot in the PC market has been the growth of gaming PCs where discrete GPUs play a significant role. The CAGR for total PC graphics from 2013 to 2017 is -1.3% in 2013, 446 million GPUs were shipped and the forecast for 2017 is 422 million,” Jon Peddie Research said.
AMD’s shipments of desktop APUs were up 15 percent sequentially, but they dropped 26.7 percent in notebooks. AMD’s discrete desktop shipments increased 1.8 percent, while discrete notebook shipments were down 6.7 percent. Overall AMD’s PC graphics shipments were down 10.4 percent.
“Notebook build cycles are specific, and AMD was late with its new parts,” the researchers pointed out.
Nvidia’s desktop shipments were up 3.6 percent quarter-on-quarter and its notebook discrete shipments increased 3.2 percent. Overall Nvidia’s PC GPU shipments were up 3.4 percent.
Oxide Games’ Dan Baker is getting all excited about Mantle in the upcoming game Star Swarm. He told Maximum PC that Mantle isn’t just a low-level API that’s close to the metal. But when compared to DirectX, Mantle is lower in the overall software stack.
Baker said that Mantle still abstracts the details of the shader cores themselves, so that it is not clear if it is running on a vector machine or a scalar machine. However, what isn’t abstracted is the basic way a GPU operates, he said. The GPU is another processor, just like any other, that reads and writes memory. One thing that has happened is that GPUs are now general in terms of functionality. They can read memory anywhere. They can write memory anywhere.”
Mantle puts the responsibility onto the developer. Some feel that is too much, but this really is not any different from managing multiple CPUs on a system, which Oxide have gotten good at. Oxide does not program multiple CPUs with an API, it just does it itself. Mantle gives us a similar capability for the GPU, he said. When asked about the performance in Star Swarm, Baker indicated that the performance will depend on how exploitative you are, and the specifics of the engine. In the case of Star Swarm, the team was limited in what they could do by driver overhead problems. There have been decisions made where the team traded GPU performance for CPU.
Baker said that the Direct3D performance for the game absolutely outstanding. We have spent a huge amount of time optimising around D3D, and are biased in D3D’s favor. “Mantle, on the other hand, we’ve spent far less time with and currently have only pretty basic optimizations. But Mantle is such an elegant API that it still dwarfs our D3D performance,” Baker said.
Ubuntu will not offer cross-platform apps as soon as it had hoped.
Canonical had raised hopes that its plan for Ubuntu to span PCs and mobile devices would be realised with the upcoming Ubuntu 14.04 release, providing a write-once, run-on-many template similar to that planned by Google for its Chrome OS and Android app convergence.
This is already possible on paper and the infrastructure is in place on smartphone and tablet versions of Ubuntu through its new Unity 8 user interface.
However, Canonical has decided to postpone the rollout of Unity 8 for desktop machines, citing security concerns, and it will now not appear along with the Mir display server this coming autumn.
This will apply only to apps in the Ubuntu store, and in the true spirit of open source, anyone choosing to step outside that ecosystem will be able to test the converged Ubuntu before then.
Ubuntu community manager Jono Bacon told Ars Technica, “We don’t plan on shipping apps in the new converged store on the desktop until Unity 8 and Mir lands.
“The reason is that we use app insulation to (a) run apps securely and (b) not require manual reviews (so we can speed up the time to get apps in the store). With our plan to move to Mir, our app insulation doesn’t currently insulate against X apps sniffing events in other X apps. As such, while Ubuntu SDK apps in click packages will run on today’s Unity 7 desktop, we don’t want to make them readily available to users until we ship Mir and have this final security consideration in place.
“Now, if a core-dev or motu wants to manually review an Ubuntu SDK app and ship it in the normal main/universe archives, the security concern is then taken care of with a manual review, but we are not recommending this workflow due to the strain of manual reviews.”
As well as the aforementioned security issues, there are still concerns that cross-platform apps don’t look quite as good on the desktop as native desktop versions and the intervening six months will be used to polish the user experience.
Getting the holistic experience right is essential for Ubuntu in order to attract OEMs to the converged operating system. Attempts to crowdfund its own Ubuntu handset fell short of its ambitious $20m target, despite raising $10.2 million, the single largest crowdfunding total to date.
Samsung has joined Google, Mellanox, Nvidia and other tech companies as part of IBM’s OpenPower Consortium. The OpenPower Consortium is working toward giving developers access to an expanded and open set of server technologies to improve data centre hardware using chip designs based on the IBM Power architecture.
Last summer, IBM announced the formation of the consortium, following its decision to license the Power architecture. The OpenPower Foundation, the actual entity behind the consortium, opened up the Power architecture technology, including specs, firmware and software under a license. Firmware is offered as open source. Originally, OpenPower was the brand of a range of System p servers from IBM that utilized the Power5 CPU. Samsung’s products currently utilize both x86 and ARM-based processors.
The intention of the consortium is to develop advanced servers, networking, storage and GPU-acceleration technology for new products. The four priority technical areas for development are system software, application software, open server development platform and hardware architecture. Along with its announcement of Samsung’s membership, the organization said that Gordon MacKean, Google’s engineering director of the platforms group, will now become chairman of the group. Nvidia has said it will use its graphics processors on Power-based hardware, and Tyan will be releasing a Power-based server, the first one outside IBM.
Red Hat and Hortonworks have expanded their strategic partnership surrounding Hadoop and Big Data.
At a joint press briefing yesterday the partners announced innovations including Hortonworks Data Platform (HDP) on Red Hat Storage, Red Hat Enterprise Linux with OpenJDK and Red Hat JBoss Data Virtualisation with HDP.
The companies will offer a joint support service for customers to take advantage of the alliance, with Red Hat customers able to enjoy Hortonworks’ Hadoop specialist knowledge.
Hortonworks VP of corporate strategy Shaun Connolly said, “This enables you to pull data from multiple data sources including NoSQL, enterprise applications, and now Hadoop. It normalises the data model and makes it easier for developers to write apps.”
“People want to consume data in Hadoop using their existing skills and tools when building new analytic apps. This integration speaks to enabling business analysts and application developers in the enterprise today.”
The two firms have been working behind the scenes to combine their product stacks with the goal of allowing users of both systems to inject Hadoop workloads into the combined stack.
Red Hat VP of Storage and Big Data Ranga Rangachari said, “Now we truly have a best-of-breed solution that is truly open as well as giving customers choice.”
Some elements of the partnership are still in beta while others are ready for use now, with further announcements expected in the near future.
The news follows Red Hat’s recent announcement of its latest testing release, Fedora 20, codenamed Heisenbug.
Defense contractor Lockheed Martin has successfully demonstrated a 30-kilowatt fiber optic laser for the battlefield. As one of the most powerful lasers ever seen, the gizmo is seen as a major step forward to getting directed-energy weapons on the battlefield.
The 30-kilowatt beam combines many fibre lasers operating a slightly different wavelengths into a single “near perfect” band of light. Lockheed says the upgraded system produced the highest power ever documented while retaining beam quality and electrical efficiency and using half of the electrical power than solid-state lasers. The idea is that the systems could be installed on military platforms such as aircraft, helicopters, ships, and trucks.
However, you need to get a 100-kilowatt system to destroy military targets like incoming artillery or drones. It will also have to maintain near-perfect beam quality over long distances. What’s more, electrical efficiency will be crucial to ensuring the system can be cooled effectively and made manageable in size. The next stage is to develop a 60-kilowatt laser.
It should be noted that using lasers against humans isn’t exactly legal, although technically it can be done with relative ease. Use of such blinding weapons is prohibited. There are already a number of experimental laser systems designed to take out missiles, along with smaller systems used to wreck sensitive optical sensors on military gear, namely main battle tanks.
Scientist have emerged from their smoke filled labs with transparent thin-film organic semiconductors that could become the foundation for cheap, high-performance displays. Two university research teams have worked together to produce the world’s fastest thin-film organic transistors, proving that this experimental technology has the potential to achieve the performance needed for high-resolution television screens and similar electronic devices.
According to the latest issue of Nature Communications, engineers from the University of Nebraska-Lincoln (UNL) and Stanford University show how they created thin-film organic transistors that could operate more than five times faster than previous examples of this experimental technology.
Research teams led by Zhenan Bao, professor of chemical engineering at Stanford, and Jinsong Huang, assistant professor of mechanical and materials engineering at UNL used their new process to make organic thin-film transistors with electronic characteristics comparable to those found in expensive, curved-screen television displays based on a form of silicon technology.
At the moment the high tech method is to drop a special solution, containing carbon-rich molecules and a complementary plastic, onto a spinning platter made of glass. The spinning action deposits a thin coating of the materials over the platter. The boffins worked out that if they spun the platter faster and coated a tiny portion of the spinning surface, equivalent to the size of a postage stamp they could put a denser concentration of the organic molecules into a more regular alignment. The result was a great improvement in carrier mobility, which measures how quickly electrical charges travel through the transistor.
Researchers at Purdue University have designed AI chips to run complex neural networks could let personal devices make sense of the world with the idea that it might end up in smartphones.
The big idea is to get commercialized designs for a chip to help mobile processors make use of the AI method known as deep learning. Deep learning has inspired companies including Google, Facebook, and Baidu to invest in the technology, so far it has been limited to large clusters of high-powered computers.
Eugenio Culurciello, a professor at Purdue working on the project said that being able to implement deep learning in more compact and power-efficient ways could lead to smartphones and other mobile devices that can understand the content of images and video.
At the Neural Information Processing Systems conference in Nevada, the group demonstrated that a co-processor connected to a conventional smartphone processor could help it run deep learning software. The software was able to detect faces or label parts of a street scene. The co-processor’s design was tested on an FPGA, a reconfigurable chip that can be programmed to test a new hardware design without the considerable expense of fabricating a completely new chip.
While this is less powerful than systems like Google’s cat detector, but it shows how new forms of hardware could make it possible to use the power of deep learning more widely.
As we reported earlier, AMD scored a major coup with the new Mac Pro. The lovely bucket from Cupertino features AMD graphics, which came as a surprise in many circles, as Nvidia is the dominant player in the professional graphics market.
The high-profile design win is expected to generate quite a bit of cash for AMD’s professional graphics business and it will also help boost its unimpressive market share. AMD currently holds about a fifth of the market, so for every FirePro card sold Nvidia manages to ship about four Quadros.
According to Digitimes, this will change next year. AMD could bump its share up to 30 percent by the end of 2014, thanks to Apple. Since we are talking about high-margin products, they should also boost AMD’s overall profitability in 2014.
AMD’s new FirePro S10000 and Sky series server products are competitive, too. They use PCI Express Gen 3, while Nvidia’s Tesla K20C is on PCI Express Gen 2. AMD uses OpenCL, while Nvidia uses its own proprietary CUDA platform.
In addition, AMD has a habit of pricing its professional products more aggressively than Nvidia. Earlier this year we were told by some AMD reps that the company hopes to gain ground in the professional space, but then again AMD has been saying that for years. This time around AMD seems to have a good chance of eroding Nvidia’s lead. However, Nvidia won’t take this lying down.
Haswell launched in June 2013, it came across the whole desktop portfolio and was available for a wide range of products, with chips ranging from 15W to 84W. This won’t be the case for Broadwell.
New high-end desktop processors, including the replacement for Core i7 4770K parts, are coming from the house of Haswell and they are called Haswell refresh, while Broadwell comes to desktops but in a different form. It turns out that Broadwell for desktop will replace Haswell 28W and 15W desktop parts, mostly targeting all-in-one systems as well as home entertainment living room PCs or nettops. It is still unclear whether we will see NUC products based on Broadwell.
The highest end Core i7 Haswell based to be replaced by Broadwell is the Core i7 4558U, a 2.8GHz clocked dual-core with 4MB cache that works up to 3.3GHz with Turbo. This processor has support for four threads and a maximum TDP of 28W. With Broadwell version the performance should rise and TDP might actually go down.
At some point in the second half of 2014 Broadwell comes to the 15W segment to replace 1.7GHz clocked dual core Core i7 4650U that has 4MB of cache and a max turbo frequency of 3.3Ghz. Even at 22nm Haswell core this CPU is maxing out at 15W and with Broadwell replacement things might get even better.
All 15W and 28W All-in-One Haswell parts will be replaced by Broadwell, with more than a dozen of SKUs. This is also confirmation that Broadwell doesn’t come as a traditional socketed CPU as these parts cannot be user-replaced. Core i7 4650 U uses FCBGA1168 which is not compatible with motherboards and LGA 1150 desktop boards and we suspect that Broadwell won’t be either.
We are quite sure that Apple might be very interested in 28W and 15W Broadwell processors for its overpriced iMac series coming in the second part of 2014.
This basically means that the only way of getting a Broadwell desktop will be to buy an AIO, or alternatively small-form factor PCs and barebones based on mobile parts.
Red Hat has made available a beta of Red Hat Enterprise Linux 7 (RHEL 7) for testers, just weeks after the final release of RHEL 6.5 to customers.
RHEL 7 is aimed at meeting the requirements of future applications as well as delivering scalability and performance to power cloud infrastructure and enterprise data centers.
Available to download now, the RHEL 7 beta introduces a number of enhancements, including better support for Linux Containers, in-place upgrades, XFS as the default file system, improved networking support and improved compatibility with Windows networks.
Inviting customers, partners, and members of the public to download the RHEL 7 beta and provide feedback, Red Hat is promoting the upcoming version as its most ambitious release to date. The code is based on Red Hat’s community developed Fedora 19 distribution of Linux and the upstream Linux 3.10 kernel, the firm said.
“Red Hat Enterprise Linux 7 is designed to provide the underpinning for future application architectures while delivering the flexibility, scalability, and performance needed to deploy across bare metal, virtual machines, and cloud infrastructure,” Senior Product Marketing Manager Kimberly Craven wrote on the Red Hat Enterprise Linux blog.
These improvements address a number of key areas, including virtualisation, management and interoperability.
Linux Containers, for example, was partially supported in RHEL 6.5, but this release enables applications to be created and deployed using Linux Container technology, such as the Docker tool. Containers offers operating system level virtualisation, which provides isolation between applications without the overhead of virtualising the entire server.
Red Hat said it is now supporting an in-place upgrade feature for common server deployment types. This will allow customers to migrate existing RHEL 6.5 systems to RHEL 7 without downtime.
RHEL 7 also makes the switch to XFS as its default file system, supporting file configurations up to 500TB, while ext4 file systems are now supported up to 50TB in size and B-tree file system (btrfs) implementations are available for users to test.
Interoperability with Windows has also been improved, with Red Hat now including the ability to bridge Windows and Linux infrastructure by integrating RHEL 7 and Samba 4.1 with Microsoft Active Directory domains. Red Hat Enterprise Linux Identity Management can also be deployed in a parallel trust zone alongside Active Directory, the firm said.
On the networking side, RHEL 7 provides support for 40Gbps Ethernet, along with improved channel bonding, TCP performance improvements and low latency socket poll support.
Other enhancements include support for very large scale storage configurations, including enterprise storage arrays, and uniform management tools for networking, storage, file systems, identities and security using the OpenLMI framework.