Apache Spark 2.0.0 is the first release on the 2.x line. The major updates are API usability, SQL 2003 support, performance improvements, structured streaming, R UDF support, as well as operational improvements. In addition, this release includes over 2500 patches from over 300 contributors.
To download Apache Spark 2.0.0, visit the downloads page. You can consult JIRA for the detailed changes. We have curated a list of high level changes here, grouped by major modules.
Apache Spark 2.0.0 is the first release in the 2.X major line. Spark is guaranteeing stability of its non-experimental APIs for all 2.X releases. Although the APIs have stayed largely similar to 1.X, Spark 2.0.0 does have API breaking changes. They are documented in the Removals, Behavior Changes and Deprecations section.
One of the largest changes in Spark 2.0 is the new updated APIs:
Spark 2.0 substantially improved SQL functionalities with SQL2003 support. Spark SQL can now run all 99 TPC-DS queries. More prominently, we have improved:
In addition, when building without Hive support, Spark SQL should have almost all the functionality as when building with Hive support, with the exception of Hive connectivity, Hive UDFs, and script transforms.
The DataFrame-based API is now the primary API. The RDD-based API is entering maintenance mode. See the MLlib guide for details
This talk lists many of these new features.
Vectors and Matrices stored in DataFrames now use much more efficient serialization, reducing overhead in calling MLlib algorithms. (SPARK-14850)
The largest improvement to SparkR in Spark 2.0 is user-defined functions. There are three user-defined functions: dapply, gapply, and lapply. The first two can be used to do partition-based UDFs using dapply and gapply, e.g. partitioned model learning. The latter can be used to do hyper-parameter tuning.
In addition, there are a number of new features:
Spark 2.0 ships the initial experimental release for Structured Streaming, a high level streaming API built on top of Spark SQL and the Catalyst optimizer. Structured Streaming enables users to program against streaming sources and sinks using the same DataFrame/Dataset API as in static data sources, leveraging the Catalyst optimizer to automatically incrementalize the query plans.
For the DStream API, the most prominent update is the new experimental support for Kafka 0.10.
There are a variety of changes to Spark’s operations and packaging process:
The following features have been removed in Spark 2.0:
The following changes might require updating existing applications that depend on the old behavior or API.
For a more complete list, please see SPARK-11806 for deprecations and removals.
The following features have been deprecated in Spark 2.0, and might be removed in future versions of Spark 2.x:
Last but not least, this release would not have been possible without the following contributors: Aaron Tokhy, Abhinav Gupta, Abou Haydar Elias, Abraham Zhan, Adam Budde, Adam Roberts, Ahmed Kamal, Ahmed Mahran, Alex Bozarth, Alexander Ulanov, Allen, Anatoliy Plastinin, Andrew, Andrew Ash, Andrew Or, Andrew Ray, Anthony Truchet, Anton Okolnychyi, Antonio Murgia, Arun Allamsetty, Azeem Jiva, Ben McCann, BenFradet, Bertrand Bossy, Bill Chambers, Bjorn Jonsson, Bo Meng, Brandon Bradley, Brian O’Neill, BrianLondon, Bryan Cutler, Burak Köse, Burak Yavuz, Carson Wang, Cazen, Cedar Pan, Charles Allen, Cheng Hao, Cheng Lian, Claes Redestad, CodingCat, Cody Koeninger, DB Tsai, DLucky, Daniel Jalova, Daoyuan Wang, Darek Blasiak, David Tolpin, Davies Liu, Devaraj K, Dhruve Ashar, Dilip Biswal, Dmitry Erastov, Dominik Jastrzębski, Dongjoon Hyun, Earthson Lu, Egor Pakhomov, Ehsan M.Kermani, Ergin Seyfe, Eric Liang, Ernest, Felix Cheung, Feynman Liang, Fokko Driesprong, Fonso Li, Franklyn D’souza, François Garillot, Fred Reiss, Gabriele Nizzoli, Gary King, GayathriMurali, Gio Borje, Grace, Greg Michalopoulos, Grzegorz Chilkiewicz, Guillaume Poulin, Gábor Lipták, Hemant Bhanawat, Herman van Hovell, Hiroshi Inoue, Holden Karau, Hossein, Huaxin Gao, Hyukjin Kwon, Imran Rashid, Imran Younus, Ioana Delaney, Iulian Dragos, Jacek Laskowski, Jacek Lewandowski, Jakob Odersky, James Lohse, James Thomas, Jason Lee, Jason Moore, Jason White, Jean Lyn, Jean-Baptiste Onofré, Jeff L, Jeff Zhang, Jeremy Derr, JeremyNixon, Jia Li, Jo Voordeckers, Joan, Jon Maurer, Joseph K. Bradley, Josh Howes, Josh Rosen, Joshi, Juarez Bochi, Julien Baley, Junyang, Junyang Qian, Jurriaan Pruis, Kai Jiang, KaiXinXiaoLei, Kay Ousterhout, Kazuaki Ishizaki, Kevin Yu, Koert Kuipers, Kousuke Saruta, Koyo Yoshida, Krishna Kalyan, Lewuathe, Liang-Chi Hsieh, Lianhui Wang, Lin Zhao, Lining Sun, Liu Xiang, Liwei Lin, Liye, Luc Bourlier, Luciano Resende, Lukasz, Maciej Brynski, Malte, Maciej Szymkiewicz, Marcelo Vanzin, Marcin Tustin, Mark Grover, Mark Yang, Martin Menestret, Masayoshi TSUZUKI, Matei Zaharia, Mathieu Longtin, Matthew Wise, Miao Wang, Michael Allman, Michael Armbrust, Michael Gummelt, Michel Lemay, Mike Dusenberry, Mortada Mehyar, Nakul Jindal, Nam Pham, Narine Kokhlikyan, Neelesh Srinivas Salian, Nezih Yigitbasi, Nicholas Chammas, Nicholas Tietz, Nick Pentreath, Nilanjan Raychaudhuri, Nirman Narang, Nishkam Ravi, Nong, Nong Li, Oleg Danilov, Oliver Pierson, Oscar D. Lara Yejas, Parth Brahmbhatt, Patrick Wendell, Pete Robbins, Peter Ableda, Pierre Borckmans, Prajwal Tuladhar, Prashant Sharma, Pravin Gadakh, QiangCai, Qifan Pu, Raafat Akkad, Rahul Tanwani, Rajesh Balamohan, Rekha Joshi, Reynold Xin, Richard W. Eggert II, Robert Dodier, Robert Kruszewski, Robin East, Ruifeng Zheng, Ryan Blue, Sachin Aggarwal, Saisai Shao, Sameer Agarwal, Sandeep Singh, Sanket, Sasaki Toru, Sean Owen, Sean Zhong, Sebastien Rainville, Sebastián Ramírez, Sela, Sergiusz Urbaniak, Seth Hendrickson, Shally Sangal, Sheamus K. Parkes, Shi Jinkui, Shivaram Venkataraman, Shixiong Zhu, Shuai Lin, Shubhanshu Mishra, Sin Wu, Sital Kedia, Stavros Kontopoulos, Stephan Kessler, Steve Loughran, Subhobrata Dey, Subroto Sanyal, Sumedh Mungee, Sun Rui, Sunitha Kambhampati, Suresh Thalamati, Takahashi Hiroshi, Takeshi YAMAMURO, Takuya Kuwahara, Takuya UESHIN, Tathagata Das, Ted Yu, Tejas Patil, Terence Yim, Thomas Graves, Timothy Chen, Timothy Hunter, Tom Graves, Tom Magrino, Tommy YU, Travis Crawford, Tristan Reid, Victor Chima, Vijay Kiran, Villu Ruusmann, Wang Fei, Wayne Song, Wei Mao, WeichenXu, Weiqing Yang, Wenchen Fan, Wesley Tang, Wilson Wu, Wojciech Jurczyk, Xiangrui Meng, Xiao Li, Xin Ren, Xin Wu, Xinh Huynh, Xiu Guo, Xusen Yin, Yadong Qi, Yanbo Liang, Yang Bo., Yash Datta, Yin Huai, Yonathan Randolph, Yong Gang Cao, Y ong Tang, Yu ISHIKAWA, Yucai Yu, Yuhao Yang, Yury Liavitski, Zhang, Zheng RuiFeng, Zheng Tan, dding3, depend, echo2mei, fwang1, guoxu1231, huangzhaowei, hushan, jayadevanmurali, junhao, kaklakariada, mcheah, meiyoula, movelikeriver, nfraison, oraviv, peng.zhang, petermaxlee, prabs, pshearer, rotems, sandy, seddonm1, sharkd, thomastechs, wangfei, wangyang, wujian, yzhou2001, zhonghaihua, zhuol, zlpmichelle, Örjan Lundberg, Łukasz Gieroń