Joint Program: 3rd Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH)

and

4th International Workshop on Distributed Storage Systems and Coding for Big Data (DSSCB)

In conjunction with 2016 IEEE Conference on Big Data (IEEE BigData 2016)

Dec. 5-8, 2016 @ Washington D.C., USA

Introduction

This workshop aims at bridging the latest technology development in hardware and software with big data end users. The topics of the workshop are centered on the accessibility and applicability of the latest hardware and software to practical domain problems and education settings. The workshop will discuss issues in facilitating data-driven discovery with the latest software and hardware technologies for domain researchers, such as performance evaluation, optimization, accessibility, usability, and education of new technologies. We anticipate workshop participation from computer scientists, domain users, service providers, technology inventors in industry, as well as educators in computer science and computing technology. We intend to invite cyber-infrastructure specialists to share their experience with the latest hardware and software advancements, data scientists to share their experiences and perspectives in using those technologies for data-driven discovery, and educators to share their stories in educating big data theory, computing foundation, and essential tools and resources.   

Data-intensive science has become the fourth paradigm in science and brings a profound transformation of scientific research. Indeed, data-driven discovery has already happened in various research fields, such as earth sciences, medical sciences, biology and physics, to name a few.  In brief, a vast volume of scientific data captured by new instruments will be publically accessible for the purposes of continued and deeper data analysis. Big Data analytic will result in the development of many new theories and discoveries but also will require substantial computational resources in the process. However, the main stream of many domain sciences still mostly relies on traditional experimental paradigms. It is often a major challenge on its own to transform a solution working on smaller scale on a standalone server into a massively parallel one running on tens, hundreds, or even thousands of servers. It is a crucial issue to make the latest technology advancements in software and hardware accessible and usable to the ultimate the domain scientists, especially those in fields traditionally not strong in computation and programming, who are driving forces of scientific discovery.

Fueled by the big data analytics needs, new computing and storage technologies are also in rapid development and pushing for new high-end hardware geared for solving big data problems. These new hardware brings new opportunities for performance improvement but also new challenges. The overall performance bottleneck of a problem can be shifted, requiring different workload balancing strategy due to significant performance boost of a particular hardware. While those technologies have the potential to greatly improve the capabilities in big data analytics and make significant contributions to data driven science, the costs, sophistications of those technology and limited initial application support often make them remote to the end users and not fully utilized in academia years later. So it is even more important to make those technologies understood and accessible by data scientists early. Meanwhile, comprehensive open source analytic software environment and platform, such as R and Python, are freely available and have become increasing popular open-source platform for data analysis. Most data scientists have had experience with small to medium data; and now Big Data poses its own challenges in terms of its size. Those software not only providing collection of analytic methods but also has the potential to utilize new hardware transparently and ease the efforts required from the end user.

Following the success of the workshop held with IEEE Big data conference in the past two years, we are looking forward to organizing this workshop again with invited talks and peer reviewed paper presentations. We believe this workshop will bring technology innovators, service providers, domain researchers, and computer science and computing educators together to discuss the research issues in the emerging field of data science with particular focus on how to utilize the latest software and hardware technologies to facilitate data driven science. This unique combination of opportunities and challenges will attract much attention from both academia and industry. This workshop will directly contribute to facilitating data driven discoveries in the near future.

Topics of interest include, but are not limited to

·         Adopt latest hardware technology with for Big Data analytics

·         Using high performance computing resources, cyber-infrastructures and large system for Big Data to knowledge discovery

·         New software schema designs and data models for big data collection management and analysis

·         Analysis, visualization, and retrieval on large-scale data sets

·         Application and use cases in using novel tools and resources for Big Data in sciences and engineering

·         Service oriented architectures to enable data science

·         Big data and interactive analysis languages (e.g., R, Python, Scala, and Matlab)

·         Demos of new software tools and hardware technologies

·         Education of data theory, computing foundation, and data infrastructure for data science

 

Important dates

·         Oct. 10, 2016: Due date for full workshop papers submission

·         Nov. 5, 2016: Notification of paper acceptance to authors 

·         Nov. 15, 2016: Camera-ready of accepted papers 

·         Dec. 5-8, 2016: Workshops

 

Program chairs

·         Weijia Xu (Texas Advanced Computing Center)

·         Hui Zhang (University of Louisville)

·         Hongfeng Yu (University of Nebraska)

 

 

Program Committee

·         Weijia Xu (Texas Advanced Computing Center)

·         Hui Zhang (University of Louisville)

·         Dan Stanzione (Texas Advanced Computing Center) 

·         Eric Wernert (Pervasive Technology Institute/Indiana University)

·         Nirav Merchant (University of Arizona)

·         Hongfeng Yu (University of Nebraska)

·         J. Ray Scott (Pittsburg Supercomputing Center)

·         Ian Foster (Argonne National Laboratory)

·         George Ostrouchov (Oak Ridge National Lab/UTK)

·         Jian Li (Huawei Technology Inc.)

·         Avishkar Misra (Oracle Inc.)

·         Dhabaleswar K. Panda (Ohio State University)

·         Chaoli Wang (University of Notre Dame)

·         Robert Hsu (Chung Hua University, Taiwan)

·         Frank Zou (Worcester Polytechnic Institute)

·         Guangchen Ruan (Research Technology, Indiana University)

 

 

Paper Submission

Submit your paper to ASH’16.

1)

Papers should be formatted to 10 pages IEEE Computer Society Proceedings Manuscript Formatting Guidelines (see link to "formatting instructions" below).

Formatting Instructions:
8.5" x 11" (DOC, PDF) 
LaTex Formatting Macros 

2)

Although we accept submissions in the form of PDF, PS, and DOC/RTF files, you are strongly encouraged 
to generate a PDF version for your paper submission if your paper was prepared in Word. 

Final Program

Joint Program Schedule

 

8:15am – 8:30am

ASH & DSSCB opening remarks

slides

8:30am – 8:55am

A Scalable and Composable Map-Reduce System

Mahwish Arif, Hans Vandierendonck, Dimitrios S. Nikolopoulos, and Bronis R. de Supinski

slides

8:55am – 9:20am

Big Data Analytics on HPC Architectures: Performance and Cost

Peter Xenopoulos, Jamison Daniel, Michael Matheson, and Sreenivas Sukumar

slides

9:20am – 9: 45am

Evaluation of K-Means Data Clustering Algorithm on Intel Xeon Phi

Sunwoo Lee, Wei-keng Liao, Ankit Agrawal, Nikos Hardavellas, and Alok Choudhary

slides

9:45am – 10:00am

Building a Research Data Science Platform from Industrial Machines

Fang Liu and Duen Horng Chau

slides

10:00am – 10:20am

Coffee Break

 

10:20am – 10:45am

Visually Programming Dataflows for Distributed Data Analytics

Lauritz Thamsen, Thomas Renner, Marvin Byfeld, Markus Paeschke, Daniel Schröder, and Felix Böhm

slides

10:45am – 11:05am

A Geohydrologic Data Visualization Framework with an Extendable User Interface Design

Yanfu Zhou, Jieting Wu, Lina Yu, Hongfeng Yu, and Zhenghong Tang

 

11:05am-11:30am

Efficient Portfolio Allocation with Sparse Volatility Estimation for High-Frequency Financial Data

Jian Zou and Chuqin Huang

 

11:30am – 11:55am

Accelerating Mathematical Knot Simulations with R on theWeb

Juan Lin, Hui Zhang, Di Zhong, and Yiwen Zhong

slides

noon – 1:45pm

Lunch Break

 

1:45pm – 2:10pm

A Workload Aware Model of Computational Resource Selection for Big Data Applications

Amit Gupta, Weijia Xu, Natalia Ruiz-Juri, and Kenneth Perrine

slides

2:10pm – 2:35pm

Supporting Large Scale Connected Vehicle Data Analysis using Hive

Weijia Xu, Natalia Ruiz-Juri, Amit Gupta, Amanda Deering, Chandra Bhat, James Kuhr, and Jackson Archer

 

2:35pm – 3:00pm

Legion-based Scientific Data Analytics on Heterogeneous Processors

Lina Yu and Hongfeng Yu

slides

3:00pm – 3:25pm

Materials Discovery: Understanding Polycrystals from Large-Scale Electron Patterns

Ruoqian Liu, Ankit Agrawal, Wei-keng Liao, Marc De Graef, and Alok Choudhary

slides

3:30pm - 3:50pm

Coffee Break

 

3:50pm - 4:15pm

Towards Optimizing Large-Scale Data Transfers with End-to-End Integrity Verification

Si Liu, Eun-Sung Jun, Rajkumar Kettimuthu, Xian-He Sun, and Michael Papka

 

4:15pm - 4:40pm

EStore: An Effective Optimized Data Placement Structure for Hive

Xin Li, Hui Li, Zhihao Huang, Bing Zhu, and Jiawei Cai

 

4:40pm - 5:05pm

SS-Dedup: A High Throughput Stateful Data Routing Algorithm for Cluster Deduplication System

Zhihao Huang, Hui Li, Xin Li, and Wei He

 

5:05pm - 5:30pm

CoLoc: Distributed Data and Container Colocation for Data-Intensive Applications

Thomas Renner, Lauritz Thamsen, and Odej Kao

 

5:30pm - 5:55pm

Persisting In-Memory Databases Using SCM

Ellis Giles, Kshitij Doshi, and Peter Varman

 

 

Registration

To attend the workshop, you will need to register with the IEEE Big Data 2016 Conference.

 

Hotel Information

IEEE Big Data 2016 will take place at the Hyatt Regency Washington on Capitol Hill from December 5-8, 2016. Discounted rates ($199 plus tax) are available until Friday, November 18, 2016 at 5pm EST, or until the room block is sold out, whichever comes first. Please note, rooms at the headquarter hotel will fill up fast, so please book your reservation as soon as possible. Booking your room in the conference block allows IEEE Big Data 2016 to secure the meeting space and services required to host the meeting.

Hyatt Regency Washington on Capitol Hill

400 New Jersey Avenue, NW
Washington, D.C., USA, 20001
Tel: +1 202 737 1234
Rate:$199 S/D plus tax, Cut off 11/18/2016

A special link for Big Data reservations and special group rate, will be provided shortly. In the meantime, please visit the hotel website for more information

 

Previous ASH Workshops

·         ASH 2015

·         ASH 2014