50 times faster than CPU? NVIDIA uses rapids to open up the imagination space of machine learning
at the NVIDIA GTC Conference on October 10, the company released the open source GPU acceleration platform of rapids. For this company with "hardware" as the main label, founder Huang Renxun spent a lot of space on the GTC conference to introduce this software product and the background of its launch
shortly after, a media briefing was held in NVIDIA Beijing office. Zhao Liwei, senior director of solution architecture in the Asia Pacific region, communicated in detail about the market background and key technical details of the GPU acceleration platform
the $20billion market that is easy to be ignored
"in the field of data science, although AI and deep learning have been talked about more in recent two years, the machine learning market has existed for a longer time than the deep learning market. In the development process of more than ten to two years, which can make the article more professional, there will be a great market value now, and the market capacity will be about $20billion." Zhao Liwei said, "if the data analysis (big data analysis) market is regarded as a segment of HPC, the market will be larger, which should be about US $36billion."
on the other hand, "data driven" has become a means for more and more enterprises to enhance their core competitiveness. For example, about one third of Amazon's purchasing behavior is recommended through the system. Similar cases also occurred in retail, insurance, finance and other fields. "Now, once we leave data and the so-called big data decision support system, many business behaviors have become unimaginable."
then why did NVIDIA choose to launch such a product at this time? In the media briefing, the author combed several keywords about rapids: 1 It is open source; 2. It is a software platform; 3. It is oriented to data science and machine learning market
data scientist: either drinking coffee or on the way to drinking coffee, but this life will be ended by rapids
data scientist is a job that everyone in the world wants to do. Because they used to have a lot of time to drink coffee. They are either drinking coffee or on their way to it. Because in the data preparation stage, if you want to bring the data set down, it may be a large packet of G or T level. After downloading, you need to deal with the process of ETL data extraction, data conversion and data loading. At this time, you can drink coffee, because the loading and calculation process based on the whole data is very time-consuming. But for data scientists, there is a lot of time to enjoy a leisurely coffee life. But for it managers or enterprise managers, it is not a happy thing. When you want to find a decision-making process support result from a data, if it takes dozens of hours, days or even weeks, it is unimaginable for enterprise decision-makers. "These judgments should be second level or even millisecond level. Each judgment should have been made when sliding." Zhao Liwei said
rapids software platform has helped data scientists significantly improve their work performance. "Data analysis and machine learning are the largest segment of the high-performance computing market, but they have not yet been accelerated," said Huang Renxun, founder and CEO of NVIDIA, in his keynote speech at the GPU technology conference. "The world's largest industries are running machine learning algorithms on a massive scale, The purpose is to understand the complex models in the market and environment, and make predictions that will directly affect its foundation quickly and accurately. "After data scientists use the acceleration service, the process of data loading and processing will become very short. Data scientists can participate in the analysis process of these materials, which can also be widely used in the infrastructure of electric vehicles. They should give full play to their subjective initiative, and because GPU acceleration can also improve the accuracy of analysis.
it is understood that rapids has provided a complete set of open source libraries for GPU accelerated analysis and machine learning Visualization is about to be the next goal. For the first time, rapids provides data scientists with the tools they need to run the entire data science pipeline on GPU. The initial rapids benchmark analysis used xgboost machine learning algorithm in NVIDIA dgx-2 ™ The results of training on the system show that its speed can be accelerated by 50 times compared with the system with only CPU. This can help data scientists reduce the typical training time from days to hours, or from hours to minutes, depending on the size of their data set
two meanings of "open source"
it is understood that rapids is built on popular open source projects such as Apache arrow, pandas and scikit learn. As the most popular data science tool, it is necessary to understand their respective advantages. GPU speeds up. In order to introduce more machine learning libraries and functions into rapids, NVIDIA has worked extensively with open source ecosystem contributors, including anaconda, blazingdb, databricks, quansight, scikit learn, Wes McKinney, head of Ursa labs and founder of Apache arrow, and panda, a rapidly growing Python data science library. According to Zhao Liwei, open source has two meanings, in addition to the close cooperation between rapids and many open source communities. Secondly, the application scope of RA material testing machine PIDs platform itself is open source, "We hope that in this way, we can have more open source friends to contribute their code, share wisdom, and constantly improve the new and rich basic features of the whole platform, so as to serve more scenarios in the future."
in addition to NVIDIA's dgx-2, dgx-1 and DGX station, rapids also supports a number of server products based on hgx-1 and hgx-2
!
LINK
Copyright © 2011 JIN SHI