Joint
Program: 3rd Workshop on Advances in Software and Hardware for Big
Data to Knowledge Discovery (ASH)
and
4th
International Workshop on Distributed Storage Systems and Coding for Big Data
(DSSCB)
In
conjunction with 2016 IEEE
Conference on Big Data (IEEE BigData 2016)
Dec.
5-8, 2016 @ Washington D.C., USA
Introduction
This workshop aims at bridging the latest technology
development in hardware and software with big data end users. The topics of the
workshop are centered on the accessibility and applicability of the latest
hardware and software to practical domain problems and education settings. The
workshop will discuss issues in facilitating data-driven discovery with the
latest software and hardware technologies for domain researchers, such as
performance evaluation, optimization, accessibility, usability, and education
of new technologies. We anticipate workshop participation from computer
scientists, domain users, service providers, technology inventors in industry,
as well as educators in computer science and computing technology. We intend to
invite cyber-infrastructure specialists to share their experience with the
latest hardware and software advancements, data scientists to share their
experiences and perspectives in using those technologies for data-driven
discovery, and educators to share their stories in educating big data theory,
computing foundation, and essential tools and resources.
Data-intensive science has become the fourth paradigm in
science and brings a profound transformation of scientific research. Indeed,
data-driven discovery has already happened in various research fields, such as
earth sciences, medical sciences, biology and physics, to name a few. In brief, a vast volume of scientific data
captured by new instruments will be publically accessible for the purposes of
continued and deeper data analysis. Big Data analytic will result in the development
of many new theories and discoveries but also will require substantial
computational resources in the process. However, the main stream of many domain
sciences still mostly relies on traditional experimental paradigms. It is often
a major challenge on its own to transform a solution working on smaller scale
on a standalone server into a massively parallel one running on tens, hundreds,
or even thousands of servers. It is a crucial issue to make the latest
technology advancements in software and hardware accessible and usable to the
ultimate the domain scientists, especially those in fields traditionally not
strong in computation and programming, who are driving forces of scientific
discovery.
Fueled by the big data analytics needs, new computing and
storage technologies are also in rapid development and pushing for new high-end
hardware geared for solving big data problems. These new hardware brings new
opportunities for performance improvement but also new challenges. The overall
performance bottleneck of a problem can be shifted, requiring different
workload balancing strategy due to significant performance boost of a
particular hardware. While those technologies have the potential to greatly
improve the capabilities in big data analytics and make significant
contributions to data driven science, the costs, sophistications of those
technology and limited initial application support often make them remote to
the end users and not fully utilized in academia years later. So it is even
more important to make those technologies understood and accessible by data
scientists early. Meanwhile, comprehensive open source analytic software
environment and platform, such as R and Python, are freely available and have
become increasing popular open-source platform for data analysis. Most data
scientists have had experience with small to medium data; and now Big Data
poses its own challenges in terms of its size. Those software not only
providing collection of analytic methods but also has the potential to utilize
new hardware transparently and ease the efforts required from the end user.
Following the success of the workshop held with IEEE Big
data conference in the past two years, we are looking forward to organizing
this workshop again with invited talks and peer reviewed paper presentations.
We believe this workshop will bring technology innovators, service providers,
domain researchers, and computer science and computing educators together to
discuss the research issues in the emerging field of data science with
particular focus on how to utilize the latest software and hardware
technologies to facilitate data driven science. This unique combination of
opportunities and challenges will attract much attention from both academia and
industry. This workshop will directly contribute to facilitating data driven
discoveries in the near future.
Topics
of interest include, but are not limited to
·
Adopt latest hardware
technology with for Big Data analytics
·
Using high performance
computing resources, cyber-infrastructures and large system for Big Data to
knowledge discovery
·
New software schema
designs and data models for big data collection management and analysis
·
Analysis,
visualization, and retrieval on large-scale data sets
·
Application and use
cases in using novel tools and resources for Big Data in sciences and
engineering
·
Service oriented
architectures to enable data science
·
Big data and
interactive analysis languages (e.g., R, Python, Scala, and Matlab)
·
Demos of new software
tools and hardware technologies
·
Education of data
theory, computing foundation, and data infrastructure for data science
Important
dates
·
Oct. 10, 2016: Due date for full workshop papers submission
·
Nov. 5, 2016: Notification of paper acceptance to authors
·
Nov. 15, 2016: Camera-ready of accepted papers
·
Dec. 5-8, 2016: Workshops
Program
chairs
· Weijia Xu (Texas Advanced Computing Center)
· Hui Zhang (University of Louisville)
·
Hongfeng Yu
(University of Nebraska)
Program
Committee
· Weijia Xu (Texas Advanced Computing Center)
· Hui Zhang (University of Louisville)
· Dan Stanzione (Texas Advanced Computing Center)
· Eric Wernert (Pervasive Technology Institute/Indiana University)
·
Nirav Merchant (University of Arizona)
·
Hongfeng Yu (University of Nebraska)
· J. Ray Scott (Pittsburg Supercomputing Center)
·
Ian Foster (Argonne
National Laboratory)
·
George Ostrouchov (Oak Ridge National Lab/UTK)
·
Jian Li (Huawei
Technology Inc.)
·
Avishkar Misra (Oracle Inc.)
· Dhabaleswar K. Panda (Ohio State University)
· Chaoli Wang (University of Notre Dame)
· Robert Hsu (Chung Hua University, Taiwan)
· Frank Zou (Worcester Polytechnic Institute)
· Guangchen Ruan (Research Technology, Indiana University)
Paper
Submission
1) |
Papers
should be formatted to 10 pages IEEE
Computer Society Proceedings Manuscript Formatting Guidelines (see link to
"formatting instructions" below). |
2) |
Although we accept submissions
in the form of PDF, PS, and DOC/RTF files, you are strongly encouraged |
Final
Program
Joint Program Schedule |
|
||
8:15am – 8:30am |
ASH & DSSCB opening remarks |
||
8:30am – 8:55am |
A Scalable and Composable
Map-Reduce System |
Mahwish Arif, Hans Vandierendonck, Dimitrios S. Nikolopoulos, and Bronis R. de Supinski |
|
8:55am – 9:20am |
Big Data Analytics on HPC Architectures: Performance and
Cost |
Peter
Xenopoulos, Jamison Daniel, Michael Matheson, and Sreenivas Sukumar |
|
9:20am – 9: 45am |
Evaluation of K-Means Data Clustering Algorithm on Intel
Xeon Phi |
Sunwoo Lee, Wei-keng Liao, Ankit Agrawal, Nikos Hardavellas,
and Alok Choudhary |
|
9:45am – 10:00am |
Building a Research Data Science Platform from Industrial
Machines |
Fang
Liu and Duen Horng Chau |
|
10:00am – 10:20am |
Coffee Break |
|
|
10:20am – 10:45am |
Visually Programming Dataflows
for Distributed Data Analytics |
Lauritz Thamsen,
Thomas Renner, Marvin Byfeld, Markus Paeschke, Daniel Schröder, and
Felix Böhm |
|
10:45am – 11:05am |
A Geohydrologic Data
Visualization Framework with an Extendable User Interface Design |
Yanfu Zhou, Jieting
Wu, Lina Yu, Hongfeng Yu, and Zhenghong
Tang |
|
11:05am-11:30am |
Efficient Portfolio Allocation with Sparse Volatility Estimation for
High-Frequency Financial Data |
Jian
Zou and Chuqin Huang |
|
11:30am – 11:55am |
Accelerating Mathematical Knot Simulations with R on theWeb |
Juan Lin, Hui Zhang, Di Zhong, and Yiwen Zhong |
|
noon – 1:45pm |
Lunch Break |
|
|
1:45pm – 2:10pm |
A Workload Aware Model of Computational Resource Selection
for Big Data Applications |
Amit Gupta, Weijia
Xu, Natalia Ruiz-Juri, and Kenneth Perrine |
|
2:10pm – 2:35pm |
Supporting Large Scale Connected Vehicle Data Analysis
using Hive |
Weijia Xu, Natalia Ruiz-Juri, Amit Gupta, Amanda Deering, Chandra Bhat, James Kuhr, and Jackson Archer |
|
2:35pm – 3:00pm |
Legion-based Scientific Data Analytics on Heterogeneous
Processors |
Lina Yu and Hongfeng
Yu |
|
3:00pm – 3:25pm |
Materials Discovery: Understanding Polycrystals
from Large-Scale Electron Patterns |
Ruoqian Liu, Ankit Agrawal, Wei-keng Liao, Marc De Graef, and Alok Choudhary |
|
3:30pm - 3:50pm |
Coffee Break |
|
|
3:50pm - 4:15pm |
Towards Optimizing Large-Scale Data Transfers with End-to-End
Integrity Verification |
Si Liu, Eun-Sung
Jun, Rajkumar Kettimuthu,
Xian-He Sun, and Michael Papka |
|
4:15pm - 4:40pm |
EStore: An
Effective Optimized Data Placement Structure for Hive |
Xin Li, Hui Li, Zhihao Huang, Bing Zhu, and Jiawei Cai |
|
4:40pm - 5:05pm |
SS-Dedup: A High Throughput Stateful Data Routing Algorithm for Cluster Deduplication
System |
Zhihao Huang, Hui Li, Xin Li, and Wei He |
|
5:05pm - 5:30pm |
CoLoc: Distributed
Data and Container Colocation for Data-Intensive Applications |
Thomas Renner, Lauritz
Thamsen, and Odej Kao |
|
5:30pm - 5:55pm |
Persisting In-Memory Databases Using SCM |
Ellis Giles, Kshitij
Doshi, and Peter Varman |
|
Registration
To attend the workshop, you will need to
register
with the IEEE Big Data 2016 Conference.
Hotel
Information
IEEE Big Data 2016 will take place at the Hyatt Regency Washington on Capitol Hill from December 5-8, 2016. Discounted rates ($199 plus tax) are available until Friday, November 18, 2016 at 5pm EST, or until the room block is sold out, whichever comes first. Please note, rooms at the headquarter hotel will fill up fast, so please book your reservation as soon as possible. Booking your room in the conference block allows IEEE Big Data 2016 to secure the meeting space and services required to host the meeting.
400 New Jersey Avenue, NW
Washington, D.C., USA, 20001
Tel: +1 202 737 1234
Rate:$199 S/D plus tax, Cut off 11/18/2016
A special link for Big Data reservations and special group rate, will be provided shortly. In the meantime, please visit the hotel website for more information
Previous
ASH Workshops
·
ASH
2015
·
ASH
2014