TL;DR: An ultrafast non-volatile flash memory based on MoS2/hBN/multilayer graphene van der Waals heterostructures, which achieves an ultrafast writing/erasing speed of 20 ns through two-triangle-barrier modified Fowler–Nordheim tunnelling is demonstrated.
Abstract: Flash memory has become a ubiquitous solid-state memory device, it is widely used in portable digital devices, computers, and enterprise applications. The development of the information age has put forward higher requirements for memory speed and retention performance. Here, we demonstrate an ultrafast non-volatile memory based on MoS2/h-BN/multi-layer graphene (MLG) van der Waals heterostructures, which has an atomic-level flat interface and achieves ultrafast writing/erasing speed (~20 ns), surpassing the reported state-of-the-art flash memory (~100 ns). The ultrafast flash memory could lay the foundation for the next-generation of high-speed non-volatile memory.
TL;DR: A new sensing level placement scheme with reduced number of sensing levels is proposed to achieve reduced read latency for LDPC decoding while maintaining the error correction capability of LDPC.
Abstract: By stacking layers vertically, the adoption of 3D NAND has significantly increased the capacity for storage systems. The complex structure of 3D NAND introduces more errors than planer flash. To address the reliability issue, low-density parity-check (LDPC) code with a strong error correction capability is now widely applied on 3D NAND flash memory. However, LDPC has long decoding latency when the raw bit error rates (RBER) are high. This is because it needs fine-grained soft sensing between voltage states to iteratively decode the raw data. Multiple sensing voltages are applied on flash cell array to gain necessary information for decoding. In this article, a new sensing level placement scheme with reduced number of sensing levels is proposed. The basic idea for the placement scheme is motivated by three asymmetric error characteristics of flash memory: the asymmetric errors between different states, the asymmetric errors caused by voltage left-shifts and right-shifts and asymmetric errors among layers in a 3D NAND flash block. With awareness of these three types of error characteristics, reduced number of sensing levels are placed to achieve reduced read latency for LDPC decoding while maintaining the error correction capability of LDPC. Experiment analysis shows that the proposed scheme achieves significant performance improvement.
TL;DR: This article presents a reconfigurable flash storage controller design that serves as a rapid prototype that can be synthesized into a field-programmable gate array device and used in a realistic performance evaluation environment.
Abstract: As semiconductor technology has advanced, many storage systems have begun to use non-volatile memories as storage media. The organization and architecture of storage controllers have become more complex to meet various design requirements in terms of performance, response time, quality of service (QoS), and so on. In addition, due to the evolution of memory technology and the emergence of new applications, storage controllers employ new firmware algorithms and hardware modules. When designing storage controllers, engineers often evaluate the performance impact of using new software and hardware components using software simulators. However, this technique often yields limited evaluation accuracy because of the difficulty of modeling complex operations of components and the interactions among them. In this article, we present a reconfigurable flash storage controller design that serves as a rapid prototype. This design can be synthesized into a field-programmable gate array device and used in a realistic performance evaluation environment. We show the usefulness of our design by demonstrating the performance impact of design parameters.
TL;DR: Experimental and simulation results demonstrate >10200 key space, 0.58-pJ/b energy efficiency for < 5% controllable bit error rate at 80 °C, up to 192.3-Mbps throughput, high Shannon entropy, and resiliency toward machine learning attacks.
Abstract: We present a lightweight, suitable for Internet of Things (IoT) devices, integrated design of physically unclonable function (PUF) and true random number generator (TRNG) based on embedded flash memory in 55-nm CMOS. In the proposed approach, the randomness in nonlinear ${I} - {V}$ characteristics and temporal current fluctuations of embedded flash memories are exploited to generate the dynamic and static entropies. Shared silicon in designing PUF and TRNG results in a very compact and energy-efficient topology. Experimental and simulation results demonstrate >10200 key space, 0.58-pJ/b energy efficiency for < 5% controllable bit error rate at 80 °C, up to 192.3-Mbps throughput, high Shannon entropy, and resiliency toward machine learning attacks. Accelerated aging measurements indicate stable physical unclonable function response after 900 min of baking at 85 °C.
TL;DR: In this paper, the authors present a review of recent progress in physically transient resistive switching memories that are comparable in performance to traditional nonvolatile switching memory and expand the potential application fields of the memory devices.
Abstract: With the aim to address the physical limits of recent non-volatile flash memory, resistive switching memory has been intensively investigated as a strong competitor of the next generation memory technology. In particular, by combining the advantages of non-volatile resistive switching memory and degradable electronics, physically transient resistive switching memory has attracted more and more attention for its great potential applications in green electronics, degradable or implanted electronic devices, and secure information storage systems. This review attempts to present prompt and comprehensive summaries of recent progress in physically transient resistive switching memories that are comparable in performance to traditional resistive switching memory and expand the potential application fields of the memory devices. First, we introduce the concept and development of physically transient electronics and resistive switching memory. Second, an overview of various degradable resistive switching materials, including organic and inorganic active films, electrodes, and substrates, is described in detail. Third, recent advances in some representative mechanisms of physically transient resistive switching memories, including resistive switching and degradable mechanisms, are reviewed. Finally, we end the review with a summary and outlook, pointing out further challenges of physically transient resistive switching memory.
TL;DR: A nonvolatile floating-gate memory device based on an ReS2/boron nitride/graphene heterostructure that can endure hundreds of switching cycles and shows stable retention characteristics with ∼40% charge remaining after 10 years is reported.
Abstract: Charge-trapping memory devices based on two-dimensional (2D) material heterostructures possess an atomically thin structure and excellent charge transport capability, making them promising candidates for next-generation flash memories to achieve miniaturized size, high storage capacity, fast switch speed, and low power consumption. Here, we report a nonvolatile floating-gate memory device based on an ReS2/boron nitride/graphene heterostructure. The implemented ReS2 memory device displays a large memory window exceeding 100 V, leading to an ultrahigh current ratio over 108 between programming and erasing states. The ReS2 memory device also exhibits an ultrafast switch speed of 1 μs. In addition, the device can endure hundreds of switching cycles and shows stable retention characteristics with ∼40% charge remaining after 10 years. More importantly, taking advantage of its anisotropic electrical properties, a single ReS2 flake can achieve direction-sensitive multi-level data storage to enhance the data storage density. On the basis of these characteristics, the proposed ReS2 memory device is potentially able to serve the entire memory device hierarchy, meeting the need for scalability, capacity, speed, retention, and endurance at each level.
TL;DR: In this paper, dual acceptors are incorporated into the copolymer backbone to improve the electron affinity to achieve ambipolarity for use in specific electronic devices, for example, organic field-effect transistors (OFETs) and flash memories.
Abstract: Incorporation of dual acceptors into the copolymer backbone can effectively improve the electron affinity to achieve ambipolarity for use in specific electronic devices, for example, organic field-effect transistors (OFETs) and flash memories. Herein, two diketopyrrolopyrrole (DPP)-based copolymers, pDPPy-BTz and pDPPy-ffBTA, are developed by introducing dialkoxybithiazole and difluorobenzotriazole, respectively. Upon evaluating their electrical properties in OFETs, the polarity of dominant carriers in the OFET channel is found to be tuned by the monomer structures and measurement atmosphere. Specifically, the device based on pDPPy-BTz exhibits a p-type dominant ambipolar character with a μh/μe of 5.3 in air, whereas the electron mobility is enhanced by removing the oxygen and water (vacuum condition), resulting in a charge carrier polarity change to display an n-type dominant feature with the μh/μe decreasing to 0.3. Owing to the electron-donating property of thiophene groups, the pDPPy-ffBTA polymer exhibits a p-type unipolar performance in air and balanced ambipolar (μh/μe = 0.8) charge carrier transport under vacuum. On account of the well-defined ambipolar behavior, the polymers are used in nonvolatile memory devices. High performance is obtained with both polymers with memory windows of 10–16 V, stable data retention of over 105 s, and high reliability during >500 programming and erasing cycles. Overall, this study demonstrates a charge carrier polarity change in OFETs fabricated with DPP-based dual-acceptor copolymers by incorporating various acceptors into the polymer backbone and reports a high-performance nonvolatile ambipolar flash memory.
TL;DR: A novel transaction-supporting SSD is proposed, called WAL-SSD, which logs transaction data at the internally-managed WAL area and relocates the data atomically via the FTL-level remap operation at the transaction checkpointing, and can be used to transform random write requests to sequential requests.
Abstract: Recent advances in flash memory technology have reduced the cost-per-bit of flash storage devices such as solid-state drives (SSDs), thereby enabling the development of large-capacity SSDs for enterprise-scale storage. However, two major concerns arise in designing SSDs. First, the size of the address mapping table is increasing in proportion to the capacity of the SSD. The SSD-internal firmware, called flash translation layer (FTL), must maintain the address mapping table in the internal DRAM. Although the previously proposed demand map loading technique uses a small size of cached map table, the technique aggravates poor random performance. Second, there are many redundant writes in storage workloads, which have an adverse effect on the performance and lifetime of the SSD. For example, many transaction-supporting applications use the write-ahead-log (WAL) scheme, which writes the same data twice. To resolve these problems, we propose a novel transaction-supporting SSD, called WAL-SSD, which logs transaction data at the internally-managed WAL area and relocates the data atomically via the FTL-level remap operation at the transaction checkpointing. It can also be used to transform random write requests to sequential requests. We implemented a prototype of WAL-SSD with a real SSD device. Experiments demonstrate the performance improvement by WAL-SSD with three use cases: remap-journaling, atomic multi-block update, and random write logging.
TL;DR: Results show that the 4-bit VMM of 200-element vectors, using the commercially available 64-layer gate-all-around macaroni-type 3D-NAND memory blocks designed in the 55-nm technology node, may provide an unprecedented area efficiency and energy efficiency, including the input/output and other peripheral circuitry overheads.
Abstract: We propose an extremely dense, energy-efficient mixed-signal vector-by-matrix-multiplication (VMM) circuits based on the existing 3D-NAND flash memory blocks, without any need for their modification Such compatibility is achieved using time-domain-encoded VMM design We have performed rigorous simulations of such a circuit, taking into account non-idealities such as drain-induced barrier lowering, capacitive coupling, charge injection, parasitics, process variations, and noise Our results, for example, show that the 4-bit VMM of 200-element vectors, using the commercially available 64-layer gate-all-around macaroni-type 3D-NAND memory blocks designed in the 55-nm technology node, may provide an unprecedented area efficiency of 014 pm2/byte and energy efficiency of ~11 fJ/Op, including the input/output and other peripheral circuitry overheads
TL;DR: A novel recurrent neural network (RNN)-based detector to effectively detect the data stored in the multi-level-cell (MLC) flash memory without the prior knowledge of the channel, and an RNN-aided (RNNA) dynamic threshold detector, whose detection thresholds can be derived based on the outputs of the RNN detector.
Abstract: The practical NAND flash memory suffers from various non-stationary noises that are difficult to be predicted. For example, the data retention noise induced channel offset is unknown during the readback process, and hence severely affects the reliability of data recovery from the memory cell. In this paper, we first propose a novel recurrent neural network (RNN)-based detector to effectively detect the data stored in the multi-level-cell (MLC) flash memory without the prior knowledge of the channel. However, compared with the conventional threshold detector, the proposed RNN detector introduces much longer read latency and more power consumption. To tackle this problem, we further propose an RNN-aided (RNNA) dynamic threshold detector, whose detection thresholds can be derived based on the outputs of the RNN detector. We thus only need to activate the RNN detector periodically when the system is idle. Moreover, to enable soft-decision decoding of error-correction codes, we first show how to obtain more read thresholds based on the hard-decision read thresholds derived from the RNN detector. We then propose integer-based reliability mappings based on the designed read thresholds, which can generate the soft information of the channel. Finally, we propose to apply density evolution (DE) combined with the differential evolution algorithm to optimize the read thresholds for low-density parity-check (LDPC) coded flash memory channels. Computer simulation results demonstrate the effectiveness of our proposed RNNA dynamic read thresholds design, for both the uncoded and LDPC-coded flash memory channels, without any prior knowledge of the channel.
TL;DR: A novel technique for generating aging-resistant, physical unclonable function (PUF) using commercial off-the-shelf NAND flash memory chips using a novel “program-disturb” method using a single memory page to extract the inherent process variations unique to each chip.
Abstract: This article demonstrates a novel technique for generating aging-resistant, physical unclonable function (PUF) using commercial off-the-shelf NAND flash memory chips. The technique utilizes a novel “program-disturb” method using a single memory page to extract the inherent process variations unique to each chip. In addition, it employs an adaptively tunable PUF generation method to reduce the aging effects on PUF accuracy. The experimental evaluation utilizing several commercial flash memory chips shows that the proposed technique ensures accuracy, uniqueness, and randomness of PUFs generated from a single memory page for at least 1000 PUF-generating operations. Unlike prior flash PUF techniques, the proposed technique does not involve complex memory characterization or lengthy postprocessing steps, making it suitable for a wide range of resource constraint systems.
TL;DR: Results suggest that the internal stability of CDT-DPP-TVT makes this copolymer a promising material for application in reliable organic flash memory.
Abstract: Organic flash memories that employ solution-processed polymer semiconductors preferentially require internal stability of their active channel layers. In this paper, a series of new donor–acceptor ...
TL;DR: In this paper, a memristive Dynamic Adaptive Neural Network Array (DANN) was developed to emulate the functionality of a biological neuron system using resistive random access memory (ReRAM), a form of nonvolatile memory.
Abstract: Resistive Random Access Memory (ReRAM), a form of non-volatile memory, has been proposed as a Flash memory replacement. In addition, novel circuit architectures have been proposed that rely on newly discovered or predicted behavior of ReRAM. One such architecture is the memristive Dynamic Adaptive Neural Network Array, developed to emulate the functionality of a biological neuron system. We demonstrated ReRAM devices that show a synaptic tendency by changing their resistance in an analog fashion. The CMOS compatible nanoscale ReRAM devices shown are based on an HfO2 switching layer that sits on a tungsten electrode and is covered by a titanium oxygen scavenger layer and a titanium nitride top electrode. In this work, we showed devices exceeding endurance values of 10B cycles with a discrete Roff/Ron ratio of 15. Multi-level states were achieved by using consecutive ultra-short 5/1.5 ns pulses during the reset operation. A neural network simulation was performed in which the synaptic weights were perturbed with the ReRAM variability, which was extracted from two different characterization methods: (1) via direct write, and (2) via a write/read verification approach during the reset operation. A substantial improvement of the neural network fitness was demonstrated when using the write/read verification approach.
TL;DR: A page semantic-aware strategy is proposed to precisely predict, mark, and relocate data or memory pages to the fast memory in advance by exploiting the process access patterns, so that the frequency of the slow memory accesses can be further reduced.
Abstract: To provide larger memory space with lower costs, NVDIMM is a production-ready device. However, directly placing NVDIMM as the main memory would seriously degrade the system performance because of the “great memory wall” caused by the fact that in NVDIMM, the slow memory (e.g., flash memory) is several orders of magnitude slower than the fast memory (e.g., DRAM). In this article, we present a joint management framework of host/CPU and NVDIMM to break down the great memory wall by bridging the process information gap between host/CPU and NVDIMM. In this framework, a page semantic-aware strategy is proposed to precisely predict, mark, and relocate data or memory pages to the fast memory in advance by exploiting the process access patterns, so that the frequency of the slow memory accesses can be further reduced. The proposed framework with the proposed strategy was evaluated with several well-known benchmarks and the results are encouraging.
TL;DR: TLSM is presented, a temperature-aware persistent data management scheme for LSM-Tree-based KV store on 3-D NAND flash memory that can significantly enhance the data integrity and reduce write amplifications compared to representative schemes.
Abstract: Key-value (KV) store has been widely deployed in both embedded systems and enterprise systems. Most KV stores today use log structured merge tree (LSM-Tree), as LSM-Tree can eliminate random write operations to the secondary storage and maintain acceptable read performance. LSM-Tree is originally designed for the secondary storage device with hard disk drives. As the emerging storage media, three-dimensional (3-D) flash memory has become the mainstream technology to replace hard disk drives. Different from hard disk drives and the conventional planar flash memory, 3-D flash memory is vulnerable to temperature. High temperature will introduce both charge loss and retention degradation. Since LSM-Tree transfers random write operations to the sequential ones, the access to consecutive physical address in flash memory will cause the temperature issue. This will affect the integrity of data stored in 3-D flash memory. This article presents TLSM, a temperature-aware persistent data management scheme for LSM-Tree-based KV store on 3-D NAND flash memory. TLSM offers both application-level LSM-Tree optimization and firmware-level address management to allocate persistent data to 3-D flash. At the application-level, TLSM presents a novel temperature-aware LSM data structure to reduce the amount of data issued from LSM-Tree to 3-D flash memory. At the firmware-level, TLSM reallocates the data to physical blocks with relatively low temperature. This cross-layer optimization can effectively handle the temperature issue to ensure the data integrity of LSM-Tree in 3-D flash memory. We demonstrate the viability of the proposed scheme using a set of standard benchmarks. Our extensive evaluations show that, TLSM can significantly enhance the data integrity and reduce write amplifications compared to representative schemes.
TL;DR: Experimental results demonstrate that single-bit bit-set faults can be injected in code and data without corrupting the Flash memory, even with a laser spot of more than 20 µm in diameter, which is several orders of magnitude larger than the process node of the floating-gate transistors in the experiments.
Abstract: Laser injection is a powerful fault injection technique with a high spatial accuracy which allows an adversary to efficiently extract the secret information from an electronic device. The control and the repeatability of faults requires the attacker to understand the relation of the fault model to the setup (notably the laser spot size) and the process node of the target device. Most studies on laser fault injection report fault models resulting from a photo-electric current in CMOS transistors. This study provides a black-box analysis of the effect of a photo-electric current in floating-gate transistors of two embedded NOR Flash memories from two different manufacturers. Experimental results demonstrate that single-bit bit-set faults can be injected in code and data without corrupting the Flash memory, even with a laser spot of more than 20 µm in diameter, which is several orders of magnitude larger than the process node of the floating-gate transistors in the experiments. This article also presents the specifics of performing a "safe-error" attack on AES, leveraging the previously detailed single-bit bit-set fault model.
TL;DR: A new FTL named DSFTL (Dynamic Setting for FTL), which uses many SW log blocks to increase operation of switch merge and to decrease operation of partial merge, and decreases the garbage collection overhead.
Abstract: Flash memory is widely used in solid state drives (SSD), smartphones and so on because of their non-volatility, low power consumption, rapid access speed, and resistance to shocks. Due to the hardware features of flash memory that differ from hard disk drives (HDD), a software called FTL (Flash Translation Layer) was presented. The function of FTL is to make flash memory device appear as a block device to its host. However, due to the erase before write features of flash memory, flash blocks need to be constantly availed through the garbage collection (GC) of invalid pages, which incurs high-priced overhead. In the previous hybrid mapping schemes, there are three problems that cause GC overhead. First, operation of partial merge causes more page copies than operation of switch merge. However, many authors just concentrate on reducing operation of full merge. Second, the availability between a data block and a log block makes the space availability of the log block lower, and it also generates a very high-priced operation of full merge. Third, the space availability of the data block is low because the data block, which has many free pages, is merged. Therefore, we propose a new FTL named DSFTL (Dynamic Setting for FTL). In this FTL, we use many SW (sequential write) log blocks to increase operation of switch merge and to decrease operation of partial merge. In addition, DSFTL dynamically handles the data blocks and log blocks to reduce the operations of erase and the high-priced operation of full merge. Additionally, our scheme prevents the data block with many free pages from being merged to increase the space availability of the data block. Our extensive experimental results prove that our proposed approach (DSFTL) reduces the count of erase and increases the operation of switch merge. As a result, DSFTL decreases the garbage collection overhead.
TL;DR: A nonvolatile floating-gate flash memory based on MoTe2/h-BN/graphene van der Waals heterostructure, which possesses increased data storage capacity per cell and versatile tunability and can operate in both p- and n-type modes.
Abstract: Heterostructures formed by stacking atomically thin two-dimensional materials are promising candidates for flash memory devices to achieve premium performances, due to the capability of effective carrier modulation and unique charge trapping behavior at the interfaces with atomic flatness. Here, we report a nonvolatile floating-gate flash memory based on MoTe2/h-BN/graphene van der Waals heterostructure, which possesses increased data storage capacity per cell and versatile tunability. The decent memory behavior of the device is enabled by the carriers stored in the floating gate of graphene layer, which tunnel through the dielectric layer of h-BN from the channel layer of MoTe2 under static-electrical field. Consequently, the developed memory device is capable to store 2 bits per cell by applying varied gate bias to implement multi-distinctive current levels. The device also exhibits remarkable erase/program current ratio of ∼105 with 1 µs switch speed and stable retention with estimated ∼30% charge loss after 10 yr. Furthermore, the memory device can operate in both p- and n-type modes through contact engineering, offering wide adaptability for emerging applications in electronic technologies, such as neuromorphic computing, data-adaptive energy efficient memory, and complex digital circuits.
TL;DR: An adaptive error prediction scheme is proposed to mitigate the process-variation effects on 3D TLC flash reliability and includes two parts: endurance variation and error feature variation.
Abstract: In Solid State Drives, flash management techniques such as wear-leveling and refresh usually assume NAND flash memories have the same endurance value. However, the actual endurance values differ from blocks to blocks. This reliability difference is introduced by process-variation during flash fabrication. In recent years, for improving flash management techniques, various works have been done on the reliability variation of 2D flash memory. As 2D NAND transmitted to 3D NAND flash, the vertical structure and multi-layer stacking changed the effect of previously known reliability problems. In this paper, we are first to characterize the process-variation effects on 3D TLC flash reliability. The characterization includes two parts: endurance variation and error feature variation. Second, we propose an adaptive error prediction scheme to mitigate the process-variation effects. This scheme uses the machine-learning model to realize the error prediction operation. We also discuss the implications of this scheme on main flash management techniques.
TL;DR: This paper proposes a novel embedded file system, LOFFS, to tackle the above issues and manage large-capacity NAND flash on resource-limited embedded devices and redesign the space management mechanisms and construct hybrid file structures to achieve high performance with minimum resource occupation.
Abstract: Emerging applications like machine learning in embedded devices (e.g., satellite and vehicles) require huge storage space, which recently stimulates the widespread deployment of large-capacity flash memory in IoT devices. However, existing embedded file systems fall short in managing large-capacity storage efficiently for excessive memory consumption and poor booting performance. In this paper, we propose a novel embedded file system, LOFFS, to tackle the above issues and manage large-capacity NAND flash on resource-limited embedded devices. We redesign the space management mechanisms and construct hybrid file structures to achieve high performance with minimum resource occupation. We have implemented LOFFS in Linux, and the experimental results show that LOFFS outperforms YAFFS by 55.8% on average with orders of magnitude reductions on memory footprint.
TL;DR: In this article, the effect of low-energy proton-induced single event effect sensitivity of multiple feature size NAND flash memories has been investigated and the influence of cumulative dose on the single-event effect sensitivity was investigated.
Abstract: In this paper, the low-energy proton-induced single event effect sensitivity of multiple feature size NAND flash memories has been investigated. Under 0.41 MeV proton, the single event effect cross-section peak appeared in 25 nm and 16 nm flash devices. SRIM simulation revealed the primary reason of this phenomenon. Single event upsets caused by direct ionization of low-energy proton could be several orders of magnitude higher than those caused by high-energy proton nuclear reactions. Moreover, the influence of cumulative dose on the single event effect sensitivity of flash device was investigated. As the cumulative dose increased, the single event upset cross-section was increased considerably. This phenomenon appears due to the threshold voltage shift induced by the combination of the proton and the cumulative dose.
TL;DR: In this article, a new functional thiol with a pentafluorophenyl group was synthesized for the surface modification of CdSe quantum-dot floating layers; this was aimed at the fabrication of organic field-effect transistors (OFETs).
TL;DR: In this paper, an end-to-end methodology for enabling the deployment of low-error deep networks on microcontrollers is presented, which exploits mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization, and models the inference graph with integer-only operations.
Abstract: This paper presents a novel end-to-end methodology for enabling the deployment of low-error deep networks on microcontrollers To fit the memory and computational limitations of resource-constrained edge-devices, we exploit mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization, and we model the inference graph with integer-only operations Our approach aims at determining the minimum bit precision of every activation and weight tensor given the memory constraints of a device This is achieved through a rule-based iterative procedure, which cuts the number of bits of the most memory-demanding layers, aiming at meeting the memory constraints After a quantization-aware retraining step, the fake-quantized graph is converted into an inference integer-only model by inserting the Integer Channel-Normalization (ICN) layers, which introduce a negligible loss as demonstrated on INT4 MobilenetV1 models We report the latency-accuracy evaluation of mixed-precision MobilenetV1 family networks on a STM32H7 microcontroller Our experimental results demonstrate an end-to-end deployment of an integer-only Mobilenet network with Top1 accuracy of 68% on a device with only 2MB of FLASH memory and 512kB of RAM, improving by 8% the Top1 accuracy with respect to previously published 8 bit implementations for microcontrollers
TL;DR: In this article, a review of the recent efforts and research activities related to the fabrication and characterization of non-volatile memory device with metal floating gate/metal nanocrystals as the charge storage layer is presented.
Abstract: Traditional flash memory devices consist of Polysilicon Control Gate (CG) – Oxide-Nitride-Oxide (ONO - Interpoly Dielectric) – Polysilicon Floating Gate (FG) – Silicon Oxide (Tunnel dielectric) – Substrate. The dielectrics have to be scaled down considerably in order to meet the escalating demand for lower write/erase voltages and higher density of cells. But as the floating gate dimensions are scaled down the charge stored in the floating gate leak out more easily via thin tunneling oxide below the floating gate which causes serious reliability issues and the whole amount of stored charge carrying information can be lost. The possible route to eliminate this problem is to use high-k based interpoly dielectric and to replace the polysilicon floating gate with a metal floating gate. At larger physical thickness, these materials have similar capacitance value hence avoiding tunneling effect. Discrete nanocrystal memory has also been proposed to solve this problem. Due to its high operation speed, excellent scalability and higher reliability it has been shown as a promising candidate for future non-volatile memory applications. This review paper focuses on the recent efforts and research activities related to the fabrication and characterization of non-volatile memory device with metal floating gate/metal nanocrystals as the charge storage layer.
TL;DR: A dynamic on-line compression scheme, called SlimCache, to improve the cache hit ratio by virtually expanding the usable cache space through data compression, and leveraging the unique workload characteristics in key-value systems to efficiently identify and separate hot and cold data.
Abstract: Flash-based key-value caching is becoming popular in data centers for providing high-speed key-value services. These systems adopt slab-based space management on flash and provide a low-cost solution for key-value caching. However, optimizing cache efficiency for flash-based key-value cache systems is highly challenging, due to the huge number of key-value items and the unique technical constraints of flash devices. In this article, we present a dynamic on-line compression scheme, called SlimCache, to improve the cache hit ratio by virtually expanding the usable cache space through data compression. We have investigated the effect of compression granularity to achieve a balance between compression ratio and speed, and we leveraged the unique workload characteristics in key-value systems to efficiently identify and separate hot and cold data. To dynamically adapt to workload changes during runtime, we have designed an adaptive hot/cold area partitioning method based on a cost model. To avoid unnecessary compression, SlimCache also estimates data compressibility to determine whether the data are suitable for compression or not. We have implemented a prototype based on Twitter’s Fatcache. Our experimental results show that SlimCache can accommodate more key-value items in flash by up to 223.4%, effectively increasing throughput and reducing average latency by up to 380.1% and 80.7%, respectively.
TL;DR: In this article, a vertical split-gate Flash memory was developed to enable embedded Flash scaling. But the performance of the vertical splitgate memory was not as good as that of the conventional CIM memory devices.
Abstract: We develop a vertical split-gate Flash memory to enable embedded Flash (eFlash) scaling. The device features much smaller cell size ( 30 (at 4-bit resolution, MAC only) and high TOPS/mm2~1 can be achieved. The excellent TOPS/mm2 suggests an effective way to boost the computing performances by designing more multi cores to parallelly compute the DNN using the high-density and low-power CIM memory devices and save data (weight) movements by ~85%.
TL;DR: A sector-level classification (SLC) technique is proposed, which considers the diversity in the update frequencies of sectors and merges sectors with similar update frequencies to generate full, homogeneous pages, thereby reducing write amplification and increasing flash memory lifetime.
Abstract: A solid state drive (SSD) receives requests in multiple of sectors from the host system, which are then mapped to logical pages, the basic I/O units of the flash memory. As the SSD receives requests in sector units, the sectors in a logical page tend to exhibit diverse update frequencies. Therefore, frequent updates to some sectors of a page cause other sectors of the same page to be unnecessarily read and written to other free pages, thereby increasing write amplification and harming the flash memory lifetime. To eliminate unnecessary sector movement and to reduce write amplification, we propose a sector-level classification (SLC) technique. SLC considers the diversity in the update frequencies of sectors and merges sectors with similar update frequencies to generate full, homogeneous pages. Thus, multiple update operations can be converged to a single flash page, thereby reducing write amplification and increasing flash memory lifetime. SLC handles the merged sectors using the proposed shared-page mapping table (SMT), whereas pages whose sectors remain unmerged are handled by a conventional page mapping table. Despite the SMT overhead, SLC does not require excessive resources to accommodate SMT. The capability of SLC is evaluated by a series of experiments, which provides highly encouraging results. It is demonstrated that SLC reduces flash writes, flash reads, block erasures, and flash writes execution time by 42%, 23%, 45%, and 37%, respectively.
TL;DR: Evaluation shows that MicroVault dramatically extends the lifetime of flash memory while minimising overhead, and enforces developer-specified lifetime guarantees through a range of lifetime extension techniques, which are adaptively applied based upon the needs of the application.
Abstract: The Internet of Things (IoT) is being deployed at large scale in a wide range of long-life applications. Examples range from Industry 4.0 to smart lighting systems. These applications have diverse requirements of non-volatile storage. However, the flash memory that is used in today’s IoT devices offers limited write endurance and must therefore be carefully managed if applications are to deliver on their promises of multiyear lifetimes. Managing the health of flash memory is difficult for application developers, as it requires in-depth hardware and software knowledge, which often needs to the problem being neglected. While various techniques have been proposed to preserve the health of flash memory, prior work tends to focus on a single hardware platform and data type. Furthermore, prior work does not provide lifetime guarantees. This paper tackles this problem by proposing MicroVault, a simple and unified interface for reliable non-volatile data storage on resource-constrained IoT devices. MicroVault enforces developer-specified lifetime guarantees through a range of lifetime extension techniques, which are adaptively applied based upon the needs of the application. Evaluation shows that MicroVault dramatically extends the lifetime of flash memory while minimising overhead.
TL;DR: This work proposes partial-refresh (PR), a novel lightweight data refresh scheme for 3D NAND flash memory in cyber-physical systems that leverages LDPC detectability to identify cells that are more vulnerable to errors and reduces the refresh cost and prolongs the SSD lifetime.
TL;DR: Not only endurance and performance but also energy consumption of the flash-memory storage system could be significantly improved by the proposed multi-level retention-time queues with a management scheme to meet the retention time requirement for a reliable storage system.
Abstract: As flash memory technology has been scaled down to 1x nm and more bits can be stored in a cell, the storage density of flash memory has been significantly improved. However, these technical trends also severely hurt the programming speed and endurance of flash memory. The internal data retention time is the duration for which a flash cell can correctly hold data. By relaxing internal data retention time, both the page programming speed and the block endurance could be improved. However, the retention time of flash memory typically requires to last for several years according to the industrial standard. Thus a refreshment scheme is required to deal with the decreasing of retention time. In this article, we propose multi-level retention-time queues with a management scheme to meet the retention-time requirement for a reliable storage system. Observing that many data are overwritten in hours or days in real workloads, multiple retention-time queues could effectively separate data with different update frequencies. There are three challenge issues for a proper design: (1) Since access pattern might change from time to time, a technical issue is how to promote/demote data so that data could be maintained in the proper retention-time queue to minimize the refreshment overhead. (2) Another technical issue is how to refresh each retention-time queue in time to guarantee data integrity. (3) Since blocks resided in different retention-time queue would suffer from different level of wearing, the third technical issue is how to estimate wearing status of flash-memory blocks in an effective and efficient manner to achieve wear leveling. In our scheme, data allocator, multi-level refresh module, garbage collector, and wear leveler are introduced to deal with these technical issues. Based on our experimental results, not only endurance and performance but also energy consumption of the flash-memory storage system could be significantly improved by our scheme.