12 Dec 2016
2016 Workshop on Non-volatile Memory and Its Application
Date: Monday, Dec. 12, 2016
Venue: E202, State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China
Contact: Dr. LI Qingan, qingan@whu.edu.cn
Time | Title & Speaker |
---|---|
9:00 - 9:30 | Flash Memory Performance and Reliability Optimization: From Chips, Controllers to Systems Prof. Chun Jason Xue, City University of Hongkong |
9:30 - 10:00 | Design Software and Systems for Non-volatile Processors on Self-Powered Devices Dr. Jingtong Hu, Oklahoma State University |
10:00 - 10:20 | Improving LDPC Performance Via Asymmetric Sensing Level Placement on Flash Memory Dr. Liang Shi, Chongqing University |
10:20 - 10:30 | Break |
10:30 - 10:50 | Near-storage processing for data-intensive applications Dr. Jianhua Li, Hefei University of Technology |
10:50 - 11:10 | Heterogeneous architecture for convolutional neural network (CNN) Dr. Keni Qiu, Capital Normal University |
11:10-11:30 | Designing FPGAs using non-volatile memory technologies Dr. Mengying Zhao, Shandong University |
Flash Memory Performance and Reliability Optimization: From Chips, Controllers to Systems
In this talk, I will present our recent works on Flash memory for performance and reliability optimization. This talk will include three parts. The first part will present the background and related works on Flash chip level design. As the quality and reliability of the data written is highly correlated with the Flash programming speed, how to exploit these characteristics are important for Flash memory based storage systems. In addition, state-of-the-art Flash memory is experiencing degraded reliability together with significant process variations. To cope with these issues, several works will be presented which optimize performance and reliability for Flash memories. The second part will present the work on Flash controller design, including parallelism exploration for performance and error correction codes optimizations for reliability. We are especially interested in Low Density Parity Codes, which have good error correction capability, but with high cost in terms of decoding performance. We will also discuss how to exploit the significant process variations for performance and lifetime improvement. The third part will discuss the work on system level of flash based storage devices. Current systems of flash storages have not yet fully exploited the characteristics of flash memory. In this part, I will presents the work on fragmentation issues on smartphone and discuss a cross layer supported I/O scheduling algorithm to achieve performance improvement.
Dr. Chun Jason Xue is an Associate Professor at City University of Hong Kong Computer Science Department. His research interests include non-volatile memories, embedded and realtime systems. He is currently Associate Editor for ACM Transaction on Embedded Computing Systems, Associate Editor for ACM Transaction on CPS, Associated Editor for ACM Transaction on Storage. He was the TPC co-chair for LCTES 2015, TPC co-char for ISVLSI 2016, and has served as TPC members in premiere conferences such as DAC, DATE, RTSS, RTAS, CODES, EMSOFT and ISLPED.
Design Software and Systems for Non-volatile Processors on Self-Powered Devices
The vision of Internet of Things (IoT) is to connect “things” embedded with electronics, software, sensors, and connectivity to enable objects to exchange data with the operators and other connected devices, resulting in improved social/economic benefits and overall human well-being. It is estimated that the IoT will consist of almost 50 billion objects by 2020. While the vision is promising and exciting, it is challenging to power these 50 billion embedded devices. Energy harvesting is one of the most promising techniques to power these embedded devices. However, harvested energy is intrinsically unstable. With an unstable power supply, the whole computer system will be interrupted frequently. In this talk, the speaker will present solutions and challenges in designing computer systems to survive frequent power interruption.
Jingtong Hu is currently an Assistant Professor in the School of Electrical and Computer Engineering at Oklahoma State University, OK, USA. He received his B.E. degree from School of Computer Science and Technology, Shandong University, China in 2007 and Ph.D. degree in Computer Science from the University of Texas at Dallas, TX, USA in Aug. 2013. His research interests include embedded systems, FPGA, non-volatile memory, and wireless sensor network. His research has been sponsored by National Science Foundation (NSF), Air Force Research Lab (AFRL), and Altera. He has served as Technical Program Committee for many international conferences such as ASP-DAC, DATE, DAC, ESWEEK, RTSS, etc. He is also the recipient of OSU CEAT Outstanding New Faculty Award, Women’s Faculty Council Research Award, and Air Force Summer Faculty Fellowship.
Improving LDPC Performance Via Asymmetric Sensing Level Placement on Flash Memory
Flash memory development through technology scaling and bit density has significant impact on the reliability of flash cells. Hence strong ECC schemes are highly recommended. With a strong error correction capability, LDPC is now applied for the state-of-the-art flash memory. However, LDPC needs fine- grained soft sensing between states to iteratively decode the raw data when the raw bit error rates are high, which requires long decoding latency and leads to performance degradation.
In this talk, I will introduce a novel sensing level placement scheme to reduce the LDPC decoding latency. The basic idea for the placement scheme is motivated by two asymmetric error characteristics of flash memory: the asymmetric errors at different states, and the asymmetric errors caused by voltage left-shifts and right-shifts. With understanding of these two types of error characteristics, the sensing levels are smartly placed to achieve reduced sensing levels while maintaining the error correction capability of LDPC. Experiment analysis shows that the proposed scheme achieves significant performance improvement.
石亮,重庆大学计算机学院副教授。2013年在中国科学技术大学获得博士学位和香港城市大学联合培养博士学位。研究方向是嵌入式系统、实时系统与存储系统。目前主持一项国家自然科学基金青年基金和多项中央高校基金,作为核心成员,参加多项国家自然科学基金项目,国家重点研究计划863项目以及企业合作项目等。近年来在国际重要学术期刊和会议上发表超过50篇学术论文,包括近18篇IEEE Transactions 和ACM Transactions 论文,以及多篇存储系统与嵌入式系统的重要会议论文,并最近三年发表了ICCD、DAC、MSST以及FAST等多个国际高水平会议论文。石亮博士一直在多个国际会议和期刊进行学术服务,其中包括担任LCTES、NVMSA、ASPDAC、VLSI、RTCSA和DAC等会议的技术评议会委员,担任IEEE TVLSI、IEEE ESL、IEEE TCAD、IEEE TC、ACM TECS、IEEE MSCS、 IEEE TPDS等杂志的审稿人。石亮博士是CCF、IEEE和ACM会员。
Near-storage processing for data-intensive applications
Modern high-performance processors are constrained by the end of Dennard scaling in terms of energy dissipation. Traditional compute-centric model is not effective for modern big data applications with large working sets and limited locality. The system performance is now dominated by large data volume and the cost of moving data to the locations where computations are performed. These trends challenge the traditional computation model, leading to the transition to move computation units nearby data, the so-called Near-Data Processing. In this talk, I will present the recent trends in near data processing, focusing on the storage layer. In addition, I will talk about our in-progress works on this emerging area.
Jianhua Li now is a full-time lecturer in the School of Computer and Information, Hefei University of Technology. His research interest is computer architecture, specifically on-chip memory system, network-on-chip, etc. Currently he is working with Professor Chun Jason Xue in City University of Hong Kong with the support of Hong Kong Scholar program, investigating energy-efficient near-data processing for analytics applications and data-intensive applications. Specifically, his work focuses on exploring FPGA for efficient and practical near-data processing for SSD-based storage systems. He received his PhD in Computer Science in University of Science and Technology of China and City University of Hong Kong in 2013.
Heterogeneous architecture for convolutional neural network (CNN)
Architecting emerging NVMs, especially STT-RAM, PCM, ReRAM etc., as cache or main memory for MCUs is a promising technique to replace conventional SRAM/DRAM deployment. However, these emerging NVMs often suffer from long write latency and large write energy. Targeting stencil loops with data dependencies, we present several loop transformation techniques such as loop retiming, loop tiling and loop scheduling to reschedule loop traversals such that the drawbacks of NVMs can be greatly mitigated. In this way, high performance and low dynamic power can be achieved during loop execution.
Most recently, we focus on designing a heterogeneous architecture to implement convolutional neural network (CNN) for lightweight and power budget devices. The architecture consists of ReRAM crossbar and CPU+FPGA SoC where the PIM-featured ReRAM crossbar offers highly efficient MAC computing and the SoC part provides other uniform computing and controlling. The key issues such as CNN layer mapping, data communication between modules and application-level parallelism have been ongoing explorations.
邱柯妮博士,2014年于香港城市大学获得计算机科学博士学位,目前为首都师范大学信息工程学院副研究员。邱柯妮博士的研究方向包括嵌入式系统及计算机体系结构、非易失性存储器体系结构,尤其关注系统的软硬件协同设计对计算和存储的性能和功耗的影响,她在IEEE/ACM TC、TCAD、TECS、DAC、ICCD等计算机体系结构顶级国际期刊和会议上发表了多篇高水平学术论文。邱柯妮博士于2016年获得计算机设计国际会议(ICCD’16)的最佳论文奖,于2016年接收超大规模集成电路国际会议(VLSI-SoC’16)邀请提交邀请论文。自2014年入职首都师范大学以来,邱柯妮博士主持4项来自国家自然科学基金、北京市教委基金、北京市大学生“实培计划”和北京市未来芯片高精尖中心开放基金项目,并参与过香港研究资助局、总参通信和二炮武器装备部的多个项目。她现为IEEE会员、ACM会员以及中国计算机学会CCF会员,并担任TC、TODAES、TECS、JSA等多家国际顶级期刊的审稿人。
Designing FPGAs using non-volatile memory technologies
Field Programmable Gate Array (FPGA) has been widely adopted as modern reconfigurable computing platforms. Traditionally, the storage elements in FPGAs are conventional static RAM (SRAM), which has large leakage power and limited scalability. Recently, non-volatile memory (NVM) is proposed to replace the SRAM in FPGA systems for static power and density considerations, at the cost of inducing larger dynamic power. Although NVM-based FPGA architectures have been proposed, the design flow of FPGA systems is still SRAM-oriented. In this talk, I will present our recent work on NVM-based FPGA design, where NVM characteristics are integrated into the FPGA design flow, including read/write asymmetry and usage of multiple level cells.
Mengying Zhao received B.E. degree in School of Computer Science and Technology from Shandong University, China in July 2011, and Ph.D. degree in Department of Computer Science, City University of Hong Kong in July 2015. She is now an Assistant Professor in School of Computer Science and Technology in Shandong University. Her research interests include computer architecture, embedded and real-time systems, and non-volatile memory.