TL;DR: A write-verify-write (WvW) scheme and a programmable offset cancellation sensing technique that achieves a high-yield, high-performance and high-endurance 7Mb STT-MRAM arrays in a 22FFL FinFET technology is presented.
Abstract: STT-MRAM has been emerging as a very-promising high-density embedded non-volatile memory (eNVM) [1], [2]. Embedded Flash memory has been the leading eNVM technology, but STT-MRAM has been developed as a better solution for continuing scaling, speed and cost. This paper presents a write-verify-write (WvW) scheme and a programmable offset cancellation sensing technique that achieves a high-yield, high-performance and high-endurance 7Mb STT-MRAM arrays in a 22FFL FinFET technology [3]. The developed technology supports a wide range of operating temperatures between $-40- 105 ^{\circ}\mathrm {C}$. Compared to prior-art [4], [5], the two-stage current-sensing technique with a die-by-die tuning of thin-film precision resistor that is used as a reference can significantly improve the sensing margin during verify and read operations. Read disturb for reference cells is eliminated as there is no MTJ in the reference path.
TL;DR: This work proposes a 512Gb 3b/cell 128WL-layer NAND Flash, with a bit density of 7.80Gb/mm2: a 31% improvement over the previously reported, and three key performance improving technologies have been implemented.
Abstract: Advancements in 3D-Flash memory-layer-stacking technology has enabled density scaling that circumvents the lithography limitations which have prevented 2D-NAND Flash memory from scaling [1]. Bit densities as high as 5.95Gb/mm2 on a single die were recently reported [2], where a 512Gb NAND Flash was built on 96 layers of memory. As memory density increases, with memory layers increasing from 96 layers to 128 layers, higher bit density can be achieved by adopting larger capacity die; however, NAND performance per bit density is reducing with the 2-plane architecture. In this work, we propose a 512Gb 3b/cell 128WL-layer NAND Flash, with a bit density of 7.80Gb/mm2: a 31% improvement over the previously reported. Three key performance improving technologies have been implemented. (1) A 4-plane architecture with circuit under array (CUA) technology to improve performance per bit density. (2) A multi-die peak-power management (PPM) system to manage peak-power consumption in the system, via the ZQ pin. (3) A 4KB-page-read mode to reduce power consumption. Figure 13.5.1(a) summarized the key features and Fig. 13.5.1(b) shows the die photograph and the floorplan for this work. Figure 13.5.2 shows a table comparing this work to previous work.
TL;DR: Low-temperature solution-processed oxide-based flash memories with low leakage, tunable memory storage and good retention are reported, reports a tunable flash memory device without tunneling and blocking layer.
Abstract: Intrinsic charge trap capacitive non-volatile flash memories take a significant share of the semiconductor electronics market today. It is challenging to create intrinsic traps in the dielectric layer without high temperature processing steps. The main issue is to optimize the leakage current and intrinsic trap density simultaneously. Moreover, conventional memory devices need the support of tunneling and blocking layers since the charge trapping dielectric layer is incapable of preventing the memory leakage. Here we report a tunable flash memory device without tunneling and blocking layer by combining the discovery of high intrinsic charge traps of more than 1012 cm−2, together with low leakage current of less than 10−7 A cm−2 in solution derived, inorganic, spin-coated dielectric films which were heated at 200 °C or below. In addition, the memory storage capacity is tuned systematically upto 96% by controlling the trap density with increasing heating temperature. Realizing efficient non-volatile flash memories that do not require high temperature processing to create suitable charge trapping remains a challenge. Here, the authors report low-temperature solution-processed oxide-based flash memories with low leakage, tunable memory storage and good retention.
TL;DR: An end-to-end deployment of an integer-only Mobilenet network with Top1 accuracy of 68% on a device with only 2MB of FLASH memory and 512kB of RAM is demonstrated, improving by 8% the Top 1 accuracy with respect to previously published 8 bit implementations for microcontrollers.
Abstract: This paper presents a novel end-to-end methodology for enabling the deployment of low-error deep networks on microcontrollers. To fit the memory and computational limitations of resource-constrained edge-devices, we exploit mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization, and we model the inference graph with integer-only operations. Our approach aims at determining the minimum bit precision of every activation and weight tensor given the memory constraints of a device. This is achieved through a rule-based iterative procedure, which cuts the number of bits of the most memory-demanding layers, aiming at meeting the memory constraints. After a quantization-aware retraining step, the fake-quantized graph is converted into an inference integer-only model by inserting the Integer Channel-Normalization (ICN) layers, which introduce a negligible loss as demonstrated on INT4 MobilenetV1 models. We report the latency-accuracy evaluation of mixed-precision MobilenetV1 family networks on a STM32H7 microcontroller. Our experimental results demonstrate an end-to-end deployment of an integer-only Mobilenet network with Top1 accuracy of 68% on a device with only 2MB of FLASH memory and 512kB of RAM, improving by 8% the Top1 accuracy with respect to previously published 8 bit implementations for microcontrollers.
TL;DR: This paper shows that 3D NAND flash memory possesses very strong process similarity within a 3D flash block: the word lines on the same horizontal layer of the 3Dflash block exhibit virtually equivalent reliability characteristics, and proposes a new program sequence, called mixed order scheme (MOS), which can further reduce the program latency.
Abstract: 3D NAND flash memory exhibits two contrasting process characteristics from its manufacturing process. While process variability between different horizontal layers are well known, little has been systematically investigated about strong process similarity (PS) within the horizontal layer. In this paper, based on an extensive characterization study using real 3D flash chips, we show that 3D NAND flash memory possesses very strong process similarity within a 3D flash block: the word lines (WLs) on the same horizontal layer of the 3D flash block exhibit virtually equivalent reliability characteristics. This strong process similarity, which was not previously utilized, opens simple but effective new optimization opportunities for 3D flash memory. In this paper, we focus on exploiting the process similarity for improving the I/O latency. By carefully reusing various flash operating parameters monitored from accessing the leading WL, the remaining WLs on the same horizontal layer can be quickly accessed, avoiding unnecessary redundant steps for subsequent program and read operations. We also propose a new program sequence, called mixed order scheme (MOS), for 3D NAND flash memory which can further reduce the program latency. We have implemented a PS-aware FTL, called cubeFTL, which takes advantage of the proposed techniques. Our evaluation results show that cubeFTL can improve the IOPS by up to 48% over an existing PS-unaware FTL.
TL;DR: This work develops a novel reprogramming scheme for TLCs in 3D NAND SSD, such that a cell can be programmed and reprogrammed several times before it is erased, which can reduce the frequency of erases, improve the speed of programming, and increase the amount of bits written in a cell per program/erase cycle.
Abstract: NAND flash memory based SSDs have been widely studied and adopted. The scaling of SSD has evolved from plannar (2D) to 3D stacking. Compared with 2D SSD, 3D SSD stacks more layers into one block, constructing one block with more flash pages. For reliability and other reasons, technology node in 3D NAND SSD is larger than in 2D, but data density can be increased via increasing bit-per-cell. However, representing multiple bits per cell encounters additional challenges such as endurance and access latency. In this work, we develop a novel reprogramming scheme for TLCs in 3D NAND SSD, such that a cell can be programmed and reprogrammed several times before it is erased. Such reprogramming can reduce the frequency of erases which determines the endurance of a cell, improve the speed of programming, and increase the amount of bits written in a cell per program/erase cycle, i.e., effective capacity. Our work is the first to perform real 3D NAND SSD test to validate the feasibility of the reprogram operation. From the collected data, we derive the restrictions of performing reprogramming due to reliability challenges. Further, a reprogrammable SSD (ReSSD) is designed to structure reprogram operations, and when they should be applied. ReSSD is evaluated in a case study in 3D TLC SSD based RAID 5 system (RSS-RAID). Experimental results show that RSS-RAID can improve the endurance by 30.3%, boost write performance by 16.7%, and increase effective capacity by 7.71%, with negligible overhead compared with conventional 3D SSD based RAID 5 system.
TL;DR: To meet the continuously growing market demand for higher capacity and lower cost, a 4b/cell (QLC) 3D-Flash memory in a 96-WL-layer technology is presented and achieves an 8.5Gb/mm2 area capacity, which is $42 \sim 50$% greater than the 3D Flash memory reported in [2], [3].
Abstract: Since 3D-Flash memory took over for 2D-Flash memory, chip capacity has continuously improved [1]–[3]. In the 2D-Flash era, 2b/cell (MLC) offered higher performance and reliability, while a 3b/cell (TLC) offered the lowest cost. Thanks to a larger feature size, 3D Flash cell reliability is much better than that of 2D. As a result, TLC 3D-Flash became the mainstream non-volatile memory, since it satisfies most market requirements in both performance and reliability. To meet the continuously growing market demand for higher capacity and lower cost, a 4b/cell (QLC) 3D-Flash memory in a 96-WL-layer technology is presented. It achieves an 8.5Gb/mm2 area capacity, which is $42 \sim 50$% greater than the 3D-Flash memory reported in [2], [3]. A chip micrograph and a table summarizing key features is shown in Fig. 13.1.7. The total 1.33Tb capacity is the highest single Flash memory chip capacity reported thus far.
TL;DR: A synaptic device based on charge-trap flash memory that has good CMOS compatibility and superior reliability characteristics compared with other synaptic devices and a 3-D stacked synapse array that could be a novel solution for neuromorphic systems for implementing deep neural networks are proposed.
Abstract: This paper proposes a synaptic device based on charge-trap flash memory that has good CMOS compatibility and superior reliability characteristics compared with other synaptic devices. Using hot-electron injection and hot-hole injection, we designed operation methods to implement gradual conductance modulation and spike-timing-dependent plasticity. We demonstrate the feasibility of the device for neuromorphic applications through both a device-level technology computer-aided design simulation and a system-level MATLAB simulation. For the first time, we also propose a 3-D stacked synapse array and present the structure, operation, and process methods. The proposed array architecture features a small area and low process cost and could be a novel solution for neuromorphic systems for implementing deep neural networks.
TL;DR: An in-depth analysis of the bit-error characteristics of state-of-the-art 64-layer 3D TLC NAND flash with a focus on read-voltage calibration and how the optimal read voltages change under different device stress is characterized.
Abstract: 3D NAND flash memory has entered dynamically into the space of enterprise server and storage systems, offering significantly higher capacity and better endurance than the latest 2D technology node. Moreover, the advancements in vertical stacking, cell design and program/read algorithms, have also enabled TLC 3D NAND flash with enterprise-level reliability, thus achieving further increase in capacity and cost-per-bit reduction. This paper presents an in-depth analysis of the bit-error characteristics of state-of-the-art 64-layer 3D TLC NAND flash with a focus on read-voltage calibration. We provide experimental measurements of the RBER and threshold voltage distributions using typical and mixed-mode test patterns of program/erase cycling, retention and read-disturb. Moreover, we quantify the RBER components attributed to threshold voltage level overlapping and on-chip 2-step program errors. Finally, we characterize how the optimal read voltages change under different device stress and we evaluate calibration schemes with different performance and complexity trade-offs.
TL;DR: The focus is on flash memory and ReRAM based on various functional materials as well as the recent development of photo‐tunable memories.
Abstract: As one of the five basic components in a modern computer system, memory plays a key role in data storage while the Von Neumann architecture still occupies a principal position in modern digital era. With the rapid development of portable electronic devices, non‐volatile memories are of great importance in human's daily life. High‐performance memory devices are highly demanded, and novel materials applied to flash memory and resistive random access memory (ReRAM) have been widely investigated. The functionalities of memories can be broadened with the development of semiconductor technologies. As a facile and low‐power electromagnetic wave, light can be another modulation medium, which will not bring destructive operation and can enhance the device performance. In this review, the focus is on flash memory and ReRAM based on various functional materials as well as the recent development of photo‐tunable memories.
TL;DR: The developed compact model would equip the circuit designers and system architects with an effective tool for design-exploration of 3D NAND flash memory devices for diverse unconventional analog applications.
Abstract: We present a behavioral compact model for static characteristics of 3D NAND flash memory for integrated circuits and system-level applications utilizing BSIM-CMG 110.0.0. This model is easy to implement, computationally efficient, fast, accurate, and effectively accounts for the different parasitic capacitance coupling effects applicable to the 3D geometry of the vertical channel Macaroni body charge-trap flash memory. The model parameter extraction methodology is simple and can be extended to reproduce the electrical behavior of different 3D NAND flash memory architectures (with different page size, dimension, or a number of stacked layers). We believe that the developed compact model would equip the circuit designers and system architects with an effective tool for design-exploration of 3D NAND flash memory devices for diverse unconventional analog applications.
TL;DR: It is projected that the semicircular split-gate FG cell is a promising candidate to realize more than four bits/cell (QLC) for significantly higher memory density at a lower number of stacking layers.
Abstract: Three-dimensional (3D) semicircular split-gate flash memory cells have been successfully developed for the first time. Reduction of fringing field effects is essential to extract maximum performance from the split-gate cells, and careful design of Floating Gate (FG) cells achieves superior program slope and program/erase window at much smaller cell size relative to circular Charge Trap (CT) cells. It is projected that the semicircular split-gate FG cell is a promising candidate to realize more than four bits/cell (QLC) for significantly higher memory density at a lower number of stacking layers.
TL;DR: In this paper, the integration of silicon-doped hafnium oxide (HSO) antiferroelectric (AFE) material for enhanced floating-gate Flash memory speed by means of field-enhanced AFE polarization switching is reported.
Abstract: We report for the first time the integration of silicon-doped hafnium oxide (HSO) antiferroelectric (AFE) material for enhanced floating-gate Flash memory speed by means of field-enhanced AFE polarization switching. An analytical description of the metal–ferroelectric–metal–insulator–semiconductor (MFMIS) stack physics during a write operation is introduced to study the effect of different stack optimization parameters on the interfacial oxide field. This, in turn, suggests different optimization routes for a ferroelectric field-effect transistor (FeFET) and Flash memories with a possible improved Flash interfacial field by AFE material integration. Improved Fowler–Nordheim tunneling based Flash speed of 300 ns is illustrated for the integrated devices. The theory and experiment of the MFMIS stack physics are discussed with emphasis on the role of stack parameters for optimized memory operation.
TL;DR: A new technique is introduced that perturbs split-gate NOR Flash memory cells and extracts randomness of read noise to generate true random numbers and enables extraction of high-throughput random sequences that pass the NIST statistical tests.
Abstract: This paper introduces a new technique that perturbs split-gate NOR Flash memory cells and extracts randomness of read noise to generate true random numbers. Flash memory cells exhibit threshold voltage fluctuations during read operations caused by thermal noise and random telegraph noise effects. Recent proposals demonstrate how these inherent properties of Flash memory cells can be used to create true random numbers in modern NAND Flash memories. However, they cannot be directly applied to NOR Flash memories in microcontrollers that have different architecture, improved data retention, high endurance, and are not as susceptible to noise as high-density NAND Flash memories. The proposed technique is experimentally demonstrated and evaluated using a family of commercial microcontrollers. The evaluation shows that it enables extraction of high-throughput random sequences that pass the NIST statistical tests. Advantages of the proposed technique are as follows: (a) it does not require any special hardware and/or interface modifications, (b) it is robust, cost-effective, and high-throughput, (c) it is entirely implemented in software, and (d) it is flexible and can be tailored to work in low-end microcontrollers that are often resource- or cost-constrained.
TL;DR: In this paper, metal-assisted solid-phase single crystallization process has been demonstrated for the first time to improve the channel conductance of 3D flash memory cell, which shows superior device characteristics (Icell, Vth, sub-threshold slope, transconductance) and those uniformities with maintaining memory performance and reliability.
Abstract: In order to improve the channel conductance of 3D flash memory cell, metal-assisted solid-phase single crystallization process has been demonstrated for the first time. Metal induced lateral crystallization (MILC) process is well-known for the thin film transistors (TFTs).We tried to apply this technology to channel Si in a vertical memory holes of 3D flash memory. Monocrystalline growth was confirmed by in- situ TEM and nano-beam diffraction (NBD) analysis. Moreover, it shows superior device characteristics (Icell, Vth, sub-threshold slope, transconductance) and those uniformities with maintaining memory performance and reliability.
TL;DR: In this article, a tunable flash memory device without tunneling and blocking layer was proposed by combining the discovery of high intrinsic charge traps together with low leakage current in solution derived, inorganic, spin$-$coated dielectric films which were heated at 200$^\circ$C or below.
Abstract: Intrinsic charge trap capacitive non-volatile flash memories take a significant share of the semiconductor electronics market today. It is a challenge to create intrinsic traps in the dielectric layer without high temperature processing steps. While low temperature processed memory devices fabricated from polymers have been demonstrated as an alternative, their performance degrade rapidly after a few cycles of operation. Moreover conventional memory devices need the support of tunneling and blocking layers since the memory dielectric or polymer is incapable of preventing memory leakage. The main issue in designing a memory device is to optimize the leakage current and intrinsic trap density simultaneously. Here we report a tunable flash memory device without tunneling and blocking layer by combining the discovery of high intrinsic charge traps ($>$10$^{12}$ cm$^{-2}$) together with low leakage current($<$10$^{-7}$ this http URL$^{-2}$) in solution derived, inorganic, spin$-$coated dielectric films which were heated at 200$^\circ$C or below. In addition, the memory storage is tuned systematically upto 96% by controlling the trap density with increasing heating temperature.
TL;DR: This paper analyze a PCM implementation in depth, and identifies the primary cause of PCM’s long latency, and proposes Low-Latency PCM (LL-PCM), which can give 119% higher performance and consume 43% lower memory energy than PCM for memory-intensive applications.
Abstract: PCM is a promising non-volatile memory technology, as it can offer a unique trade-off-between density and latency compared with DRAM and flash memory. Albeit PCM is much faster than flash memory, it is still notably slower than DRAM, which can significantly degrade system performance. In this paper, we analyze a PCM implementation in depth, and identify the primary cause of PCM’s long latency, i.e., a long interconnect (high resistance/capacitance) path between a cell and a sense-amp/write-driver. This in turn requires (1) a very large charge pump consuming: ~20% of PCM chip space, ~50% of latency of write operations, and ~2× more power than a write operation itself; and (2) a large current sense-amp with long time to pre-charge the interconnect path. Then, we propose Low-Latency PCM (LL-PCM) architecture. Our analysis shows that LL-PCM can give 119% higher performance and consume 43% lower memory energy than PCM for memory-intensive applications. LL-PCM is only ~1% larger than PCM, as the cost of reducing the resistance/capacitance of the interconnect path is negated by its 4.1× smaller charge pump. CCS CONCEPTS • B.3.1 Semiconductor Memories
TL;DR: This paper proposes endurance-enhancing lower state encoding, which encodes input data to make the cell state as low as possible in consideration of interpage relation, and results indicate that the scheme shows better lifetime improvement than other schemes in most cases.
Abstract: During the past decade, the endurance of NAND flash memory has severely deteriorated. The maximum number of program and erase cycles has fallen significantly with emerging of multilevel cell (MLC) and triple-level cell (TLC) technology, and scaling down of the cell size. Wear leveling is a general solution used to alleviate this issue; it enables cells to wear down evenly but it cannot actually mitigate the wearing of the cells. Accordingly, techniques are required to minimize the actual cell degradation. This paper proposes endurance-enhancing lower state encoding . The key insight leveraged by the proposed technique is the data pattern-related characteristic of MLC and TLC NAND flash memories, in which the lower the state of the cells, the lower the occurrence of wear out. Thus, our proposed scheme encodes input data to make the cell state as low as possible in consideration of interpage relation. As a result, the wear out of the memory cells can be minimized and their lifetime is improved by 62.7% in a file type and 43.0% in MySQL. Experimental results indicate that our scheme shows better lifetime improvement than other schemes in most cases.
TL;DR: The experimental results demonstrate that the proposed FaGC+ algorithm outperforms existing garbage collection algorithms in terms of garbage collection overhead and time-sensitivity in wear leveling control.
Abstract: NAND flash memory has been widely used in consumer electronics, such as tablet personal computers and smart phones. However, unlike traditional hard disk, a garbage collection is required to reclaim memory space during data updates. A garbage collection includes a series of extra read, write and erase operations. Both write and erase operations are time consuming process, which affect the effectiveness and efficiency of the NAND flash memory system. Moreover, flash memory blocks are challenged by the limitation of the erase count. Thus, considerable efforts have been paid to reduce the garbage collection overhead and improve wear leveling. In this paper, an efficient and non-time-sensitive file-aware garbage collection algorithm, called FaGC+, is proposed. The FaGC+ algorithm involves a novel update frequency calculation method and a novel cold-hot logical page categorization scheme. The experimental results demonstrate that the proposed algorithm outperforms existing garbage collection algorithms in terms of garbage collection overhead and time-sensitivity in wear leveling control.
TL;DR: A new 3-D synaptic device with stackable AND-type Rounded Dual Channel (RDC) flash memory architecture is proposed for neuromorphic computing that operates at low power by using the FN program/erase method, and performs in a high density with multi-layer stacking.
Abstract: A new 3-D synaptic device with stackable AND-type Rounded Dual Channel (RDC) flash memory architecture is proposed for neuromorphic computing. The RDC flash devices operate at low power by using the FN program/erase method, utilizes a high speed by a parallel read operation, and performs in a high density with multi-layer stacking. Key fabrication steps are explained and the successful operation of the device in 3-D stacked structure is verified by device simulation. In addition, devices are fabricated by stacking three layers, and their operation is confirmed.
TL;DR: A highly manufacturable speech recognition system, which consists of a 5-layer fully connected neural network (360k artificial synapses, 1576 neurons, and peripheral circuits) has been successfully built on 200 mm wafers and the high accuracy of speech recognition on 8 different Chinese speech words is demonstrated.
Abstract: In this work, a new type of flash memory-based memristor, named programmable linear random-access memory (PLRAM), is presented to store analog synaptic weights in a single flash memory cell. A PLRAM cell with a self-calibrating program/erase scheme can provide very stable and repeatable analog memory states up to 7 bits in a single cell, which is suitable for an artificial synapse in the neural network. The physical implementation of a discrete Fourier transform on PLRAM arrays shows a remarkably good agreement with theoretical calculations. By taking nonlinearity effects on both forward propagation and back-propagation into consideration, a highly manufacturable speech recognition system, which consists of a 5-layer fully connected neural network (360k artificial synapses, 1576 neurons, and peripheral circuits) has been successfully built on 200 mm wafers. For the first time, the high accuracy of speech recognition (>90%) on 8 different Chinese speech words is demonstrated.
TL;DR: Single-event effects and total ionizing dose testing are described for a 32-layer NAND flash memory, in both SLC and MLC configurations, with special considerations for unique three-dimensional test results.
Abstract: Single-event effects and total ionizing dose testing are described for a 32-layer NAND flash memory, in both SLC and MLC configurations, with special considerations for unique three-dimensional test results. Extraction of three-dimensional heavyion test data presents a unique visualization of angular effects in NAND flash. Pattern dependence differences between SLC and MLC mode are noted, along with slightly non-uniform angular cross-section results due to the complex volume. Finally, TID data under different operating and bias conditions shows that the failure point remains the higher-voltage erasure circuitry.
TL;DR: In this paper, the authors demonstrate the possibility to operate a mainstream NOR Flash memory array as an artificial synaptic array learning without external supervision according to the spike-timing-dependent plasticity (STDP) rule.
Abstract: This article and its part II demonstrate the possibility to operate a mainstream NOR Flash memory array as an artificial synaptic array learning without external supervision according to the spike-timing-dependent plasticity (STDP) rule. As a first mandatory step to this aim, suitable array working conditions allowing not only selective cell programming but also selective cell erase are here identified, overcoming the block erase operation typical of any Flash memory technology. The proposed array working conditions exploit channel hot-electron injection and hot-hole injection at the drain side to achieve selective, bidirectional, and virtually analog tuning of cell threshold voltage, with no need for changes in the design of a common-ground double-polysilicon NOR array. In part II, the new working scheme will be shown to allow for a straightforward implementation of STDP and unsupervised learning in the array.
TL;DR: A time-multiplexed architecture is designed to enhance the security and expand the challenge-response pair space to 10211 and shows strong resilience against machine learning attacks and possibility for extremely energy efficient, 0.56 pJ/b operation.
Abstract: We exploit randomness in static I-V characteristics and reconfigurability of embedded flash memories to design very efficient physically unclonable function. Leakage current and subthreshold slope variations, nonlinearity, nondeterministic tuning error, and sneak path current in the redesigned commercial flash memory arrays are exploited to create a unique digital fingerprint. A time-multiplexed architecture is designed to enhance the security and expand the challenge-response pair space to 10211. Experimental results demonstrate 50.3% average uniformity, 49.99% average diffuseness, and native < 5% bit error rate. The analysis of the measured data also shows strong resilience against machine learning attacks and possibility for extremely energy efficient, 0.56 pJ/b operation.
TL;DR: In this paper, the authors demonstrate microelectromechanical system-based flash memory (MEM-FLASH) for multinary bit storage, where the MEMS switch integrated with the transistor provides the precise control of the charges on the floating gate.
Abstract: We demonstrate microelectromechanical system-based flash memory (MEM-FLASH) for multinary bit storage. The MEMS switch integrated with the transistor provides the precise control of the charges on the floating gate. This maneuvering of the charges to 8 different levels provides 3-bit operation even at an elevated temperature of ∼300 °C. The key challenge in the realization of such a memory is the know-how the amount of charge to be transferred to the floating gate to alter the bit state. The charge estimation on the floating gate cannot be performed by direct probing of the device, as this will disturb the original charge values of the floating gate and thus the threshold value. Ergo, an indirect read approach is developed. Furthermore, the cantilever switch is fabricated and tested in a vacuum environment for experimental validation of the approach. The percentage variation from the theoretical to experimental approach is in the adoptable limit of 2%.
TL;DR: A methodology for analyzing the impact of mechanical stress on the electrical performance of 3D NAND devices under application of an external load with a nanoindenter is developed.
Abstract: We have developed a methodology for analyzing the impact of mechanical stress on the electrical performance of 3D NAND devices. The methodology relies on in-situ electrical characterization of 3D NAND flash memory under application of an external load with a nanoindenter. The forces applied in the experiment are converted to stress using finite element modeling and the obtained values are correlated with electrical characteristics. With this method, I ON and I OFF degradation with compressive stress along the memory channel is demonstrated and compared for three types of channel materials: polysilicon full channel, single crystal silicon full channel, and polysilicon macaroni channel. TCAD simulations attribute the changes in I ON and I OFF to mobility decrease and Shockley-Read-Hall generation rate increase under stress, respectively.
TL;DR: Experimental results show that TempCure can effectively reduce the peak temperature and block erase counts with negligible timing overhead in comparison with representative schemes.
Abstract: Compared to the conventional planar flash memory, advanced 3-D flash memory adopts charge-trap technology that can significantly enhance cell density and storage capacity. Despite these advantages, 3-D charge-trap flash memory brings several new challenges. First, charge-trap flash is sensitive to temperature. Recent studies demonstrate that, the high temperature will incur both charge loss and retention degradation. This issue does not happen in 2-D flash memory which adopts floating gate technology. Second, current 3-D charge-trap flash integrates the extra large capacity physical block, and each block contains over 1024 physical pages. The large-capacity block infrastructure will cause extra garbage collection overhead, which makes the thermal issue more complicated. This paper presents TempCure , a temperature-aware reliability enhancement strategy for 3-D charge-trap flash memory. TempCure is a novel hardware and file system interface that can transparently allocate physical space based on the temperature status. TempCure adopts two reliability enhancement strategies, temperature mining and block allotment , to prevent the generation of hotspots and enhance the data integrity of 3-D flash memory. We conduct a set of experiments using standard benchmarks. Experimental results show that TempCure can effectively reduce the peak temperature and block erase counts with negligible timing overhead in comparison with representative schemes.
TL;DR: Experimental results show that WARCIP reduces write amplification dramatically and the number of block erasures by 4.45 times on average, implying extended lifetimes of flash SSDs.
Abstract: The storage volume of SSDs has been greatly increased recently with emerging multi-layer 3D triple-level cell and quad-level cell. However, one critical overhead of any flash memory SSD is the garbage collection (GC) process that is necessary due to the inherent physical property of flash memories. GC is a time consuming process that slows down I/O performance and decreases endurance of SSD. To minimize the negative impact of GC, we introduce Write Amplification Reduction by Clustering I/O Pages (WARCIP). The idea is to use a clustering algorithm to minimize the rewrite interval variance of pages in a flash block. As a result, pages in a flash block tend to have a similar lifetime, minimizing write amplification during a garbage collection. We have implemented WARCIP on an enterprise NVMe SSD. Both simulation and measurement experiments have been carried out. Real world I/O traces and standard I/O benchmarks are used in our experiments to assess the potential benefit of WARCIP. Experiment results show that WARCIP reduces write amplification dramatically and the number of block erasures by 4.45 times on average, implying extended lifetimes of flash SSDs.
TL;DR: A novel data organization is proposed to fortify the reliability of regular data by leaving approximate data unprotected and can improve read performance by 30% on average comparing to current techniques.
Abstract: With the increasing bit density and adoption of 3D NAND, flash memory suffers from increased errors. To address the issue, flash devices adopt error correction codes (ECC) with strong error correction capability, like low-density parity-check (LDPC) code, to correct errors. The drawback of LDPC is that, to correct data with a high raw bit error rate (RBER), read latency will be amplified. This work proposes to address this issue with the assistance of approximate data. First, studies have been conducted and show there are ample amount of approximate data available in flash storage. Second, a novel data organization is proposed to fortify the reliability of regular data by leaving approximate data unprotected. Finally, a new data allocation strategy and modified garbage collection scheme are presented to complete the design. The experimental results show that the proposed approach can improve read performance by 30% on average comparing to current techniques.