TL;DR: The seamless integration of a memristor and transistor into one multi-terminal device could enable complex neuromorphic learning and the study of the physics of defect kinetics in two-dimensional materials.
Abstract: Memristors are two-terminal passive circuit elements that have been developed for use in non-volatile resistive random-access memory and may also be useful in neuromorphic computing. Memristors have higher endurance and faster read/write times than flash memory and can provide multi-bit data storage. However, although two-terminal memristors have demonstrated capacity for basic neural functions, synapses in the human brain outnumber neurons by more than a thousandfold, which implies that multi-terminal memristors are needed to perform complex functions such as heterosynaptic plasticity. Previous attempts to move beyond two-terminal memristors, such as the three-terminal Widrow-Hoff memristor and field-effect transistors with nanoionic gates or floating gates, did not achieve memristive switching in the transistor. Here we report the experimental realization of a multi-terminal hybrid memristor and transistor (that is, a memtransistor) using polycrystalline monolayer molybdenum disulfide (MoS2) in a scalable fabrication process. The two-dimensional MoS2 memtransistors show gate tunability in individual resistance states by four orders of magnitude, as well as large switching ratios, high cycling endurance and long-term retention of states. In addition to conventional neural learning behaviour of long-term potentiation/depression, six-terminal MoS2 memtransistors have gate-tunable heterosynaptic functionality, which is not achievable using two-terminal memristors. For example, the conductance between a pair of floating electrodes (pre- and post-synaptic neurons) is varied by a factor of about ten by applying voltage pulses to modulatory terminals. In situ scanning probe microscopy, cryogenic charge transport measurements and device modelling reveal that the bias-induced motion of MoS2 defects drives resistive switching by dynamically varying Schottky barrier heights. Overall, the seamless integration of a memristor and transistor into one multi-terminal device could enable complex neuromorphic learning and the study of the physics of defect kinetics in two-dimensional materials.
TL;DR: Estimates show that a straightforward optimization of the hardware and its transfer to the already available 55-nm technology may increase this advantage to more than the best reported digital implementations of the same task.
Abstract: Potential advantages of analog- and mixed-signal nanoelectronic circuits, based on floating-gate devices with adjustable conductance, for neuromorphic computing had been realized long time ago. However, practical realizations of this approach suffered from using rudimentary floating-gate cells of relatively large area. Here, we report a prototype $28\times28$ binary-input, ten-output, three-layer neuromorphic network based on arrays of highly optimized embedded nonvolatile floating-gate cells, redesigned from a commercial 180-nm nor flash memory. All active blocks of the circuit, including 101 780 floating-gate cells, have a total area below 1 mm2. The network has shown a 94.7% classification fidelity on the common Modified National Institute of Standards and Technology benchmark, close to the 96.2% obtained in simulation. The classification of one pattern takes a sub-1- $\mu \text{s}$ time and a sub-20-nJ energy—both numbers much better than in the best reported digital implementations of the same task. Estimates show that a straightforward optimization of the hardware and its transfer to the already available 55-nm technology may increase this advantage to more than $10^{2}\times $ in speed and $10^{4}\times $ in energy efficiency.
TL;DR: This paper performs a rigorous experimental characterization of real, state-of-the-art 3D NAND flash memory chips, and identifies three new error characteristics that were not previously observed in planar Nand flash memory, but are fundamental to the new architecture of3D Nander flash memory.
Abstract: Compared to planar NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much less aggressive manufacturing process technology than planar NAND. The circuit-level and structural changes in 3D NAND flash memory significantly alter how different error sources affect the reliability of the memory. Our goal is to (1)~identify and understand these new error characteristics of 3D NAND flash memory, and (2)~develop new techniques to mitigate prevailing 3D NAND flash errors. \chIIIn this paper, we perform a rigorous experimental characterization of real, state-of-the-art 3D NAND flash memory chips, and identify three new error characteristics that were not previously observed in planar NAND flash memory, but are fundamental to the new architecture of 3D NAND flash memory. \beginenumerate [leftmargin=13pt] item 3D NAND flash memory exhibits layer-to-layer process variation, a new phenomenon specific to the 3D nature of the device, where the average error rate of each 3D-stacked layer in a chip is significantly different. We are the first to provide detailed experimental characterization results of layer-to-layer process variation in real flash devices in open literature. Our results show that the raw bit error rate in the middle layer can be 6× the error rate in the top layer. item 3D NAND flash memory experiences \emphearly retention loss, a new phenomenon where the number of errors due to charge leakage increases quickly within several hours after programming, but then increases at a much slower rate. We are the first to perform an extended-duration observation of early retention loss over the course of 24~days. Our results show that the retention error rate in a 3D NAND flash memory block quickly increases by an order of magnitude within $\sim$3 hours after programming. item 3D NAND flash memory experiences retention interference, a new phenomenon where the rate at which charge leaks from a flash cell is dependent on the amount of charge stored in neighboring flash cells. Our results show that charge leaks at a lower rate (i.e., the retention loss speed is slower) when the neighboring cell is in a state that holds more charge (i.e., a higher-voltage state). \endenumerate Our experimental observations indicate that we must revisit the error models and error mitigation mechanisms devised for planar NAND flash, as they are no longer accurate for 3D NAND flash behavior. To this end, we develop \emphnew analytical model\chIs of (1)~the layer-to-layer process variation in 3D NAND flash memory, and (2)~retention loss in 3D NAND flash memory. Our models estimate the raw bit error rate (RBER), threshold voltage distribution, and the \emphoptimal read reference voltage (i.e., the voltage at which RBER is minimized when applied during a read operation) for each flash page. Both models are useful for developing techniques to mitigate raw bit errors in 3D NAND flash memory. Motivated by our new findings and models, we develop four new techniques to mitigate process variation and early retention loss in 3D NAND flash memory. Our first technique, LaVAR, reduces process variation by fine-tuning the read reference voltage independently for each layer. Our second technique, LI-RAID, improves reliability by changing how pages are grouped under the RAID (Redundant Array of Independent Disks) error recovery technique, using information about layer-to-layer process variation to reduce the likelihood that the RAID recovery of a group could fail significantly earlier during the flash lifetime than recovery of other groups. Our third technique, ReMAR, reduces retention errors in 3D NAND flash memory by tracking the retention age of the data using our retention model and adapting the read reference voltage to data age. Our fourth technique, ReNAC, adapts the read reference voltage to the amount of retention interference to re-read the data after a read operation fails. These four techniques are complementary, and can be combined together to significantly improve flash memory reliability. Compared to a state-of-the-art baseline, our techniques, when combined, improve flash memory lifetime by 1.85×. Alternatively, if a NAND flash manufacturer wants to keep the lifetime of the 3D NAND flash memory device constant, our techniques reduce the storage overhead required to hold error correction information by 78.9%. For more information on our new experimental characterization of modern 3D NAND flash memory chips and our proposed models and techniques, please refer to the full version of our paper~\citeluo.pomacs18.
TL;DR: A new model for 3D NAND flash memory reliability is developed, which predicts how retention, wearout, self-recovery, and temperature affect raw bit error rates and cell threshold voltages and shows that the model is accurate, with an error of only 4.9%.
Abstract: NAND flash memory density continues to scale to keep up with the increasing storage demands of data-intensive applications. Unfortunately, as a result of this scaling, the lifetime of NAND flash memory has been decreasing. Each cell in NAND flash memory can endure only a limited number of writes, due to the damage caused by each program and erase operation on the cell. This damage can be partially repaired on its own during the idle time between program or erase operations (known as the dwell time), via a phenomenon known as the self-recovery effect. Prior works study the self-recovery effect for planar (i.e., 2D) NAND flash memory, and propose to exploit it to improve flash lifetime, by applying high temperature to accelerate self-recovery. However, these findings may not be directly applicable to 3D NAND flash memory, due to significant changes in the design and manufacturing process that are required to enable practical 3D stacking for NAND flash memory. In this paper, we perform the first detailed experimental characterization of the effects of self-recovery and temperature on real, state-of-the-art 3D NAND flash memory devices. We show that these effects influence two major factors of NAND flash memory reliability: (1) retention loss speed (i.e., the speed at which a flash cell leaks charge), and (2) program variation (i.e., the difference in programming speed across flash cells). We find that self-recovery and temperature affect 3D NAND flash memory quite differently than they affect planar NAND flash memory, rendering prior models of self-recovery and temperature ineffective for 3D NAND flash memory. Using our characterization results, we develop a new model for 3D NAND flash memory reliability, which predicts how retention, wearout, self-recovery, and temperature affect raw bit error rates and cell threshold voltages. We show that our model is accurate, with an error of only 4.9%. Based on our experimental findings and our model, we propose HeatWatch, a new mechanism to improve 3D NAND flash memory reliability. The key idea of HeatWatch is to optimize the read reference voltage, i.e., the voltage applied to the cell during a read operation, by adapting it to the dwell time of the workload and the current operating temperature. HeatWatch (1) efficiently tracks flash memory temperature and dwell time online, (2) sends this information to our reliability model to predict the current voltages of flash cells, and (3) predicts the optimal read reference voltage based on the current cell voltages. Our detailed experimental evaluations show that HeatWatch improves flash lifetime by 3.85× over a baseline that uses a fixed read reference voltage, averaged across 28 real storage workload traces, and comes within 0.9% of the lifetime of an ideal read reference voltage selection mechanism.
TL;DR: This paper experimentally characterizes read disturb errors on state-of-the-art 2Y-nm (i.e., 20-24 nm) MLC NAND flash memory chips and identifies that lowering pass-through voltage levels reduces the impact of read disturb and extend flash lifetime.
Abstract: This paper summarizes our work on experimentally characterizing, mitigating, and recovering read disturb errors in multi-level cell (MLC) NAND flash memory, which was published in DSN 2015, and examines the work's significance and future potential. NAND flash memory reliability continues to degrade as the memory is scaled down and more bits are programmed per cell. A key contributor to this reduced reliability is read disturb, where a read to one row of cells impacts the threshold voltages of unread flash cells in different rows of the same block.
For the first time in open literature, this work experimentally characterizes read disturb errors on state-of-the-art 2Y-nm (i.e., 20-24 nm) MLC NAND flash memory chips. Our findings (1) correlate the magnitude of threshold voltage shifts with read operation counts, (2) demonstrate how program/erase cycle count and retention age affect the read-disturb-induced error rate, and (3) identify that lowering pass-through voltage levels reduces the impact of read disturb and extend flash lifetime. Particularly, we find that the probability of read disturb errors increases with both higher wear-out and higher pass-through voltage levels.
We leverage these findings to develop two new techniques. The first technique mitigates read disturb errors by dynamically tuning the pass-through voltage on a per-block basis. Using real workload traces, our evaluations show that this technique increases flash memory endurance by an average of 21%. The second technique recovers from previously-uncorrectable flash errors by identifying and probabilistically correcting cells susceptible to read disturb errors. Our evaluations show that this recovery technique reduces the raw bit error rate by 36%.
TL;DR: This work presents an inverse resistance change PCRAM with Cr2Ge2Te6 (CrGT) that shows a high-resistance crystalline reset state and a low-Resistance amorphous set state, and demonstrates how this can break the trade-off relationship between the crystallization temperature and operating speed.
Abstract: Phase-change random access memory (PCRAM) has attracted much attention for next-generation nonvolatile memory that can replace flash memory and can be used for storage-class memory. Generally, PCRAM relies on the change in the electrical resistance of a phase-change material between high-resistance amorphous (reset) and low-resistance crystalline (set) states. Herein, we present an inverse resistance change PCRAM with Cr2Ge2Te6 (CrGT) that shows a high-resistance crystalline reset state and a low-resistance amorphous set state. The inverse resistance change was found to be due to a drastic decrease in the carrier density upon crystallization, which causes a large increase in contact resistivity between CrGT and the electrode. The CrGT memory cell was demonstrated to show fast reversible resistance switching with a much lower operating energy for amorphization than a Ge2Sb2Te5 memory cell. This low operating energy in CrGT should be due to a small programmed amorphous volume, which can be realized by a hig...
TL;DR: An NVMe SSD controller is introduced which leverages the advantages of the low-latency NAND and enables the reduction of total memory access time, thereby minimizing overall system latency.
Abstract: In a memory hierarchy, there are various classes of memory systems depending on the access latency. A typical memory hierarchy consists of a CPU cache, DRAM, and an SSD or HDD. The DRAM has an access latency of 100ns, while flash memory has a latency of about 50μs [1]. Recently, new non-volatile memories with latencies of less than 10μs, including PRAM, MRAM, and ReRAM [2], are getting attention for business-critical systems such as big-data analysis and storage caches. To meet the low latency requirements, a new type of NAND flash, Z-NAND, with a read time (t R ) of 3μs has also been introduced [3]. Figure 20.2.1 shows a feature comparison between Z-NAND and conventional 3D NAND [4,5]. The Z-NAND achieves a read time of 3μs, which is 15–20 times faster than conventional NAND. Write throughput reaches up to 160MB/s with a 100μs program time. To further minimize read latency, I/O circuit support a DDR interface for both x8 and x16 mode. To take full advantage of such low-latency memory devices, reduction of memory access overhead is necessary. In this paper, we introduce an NVMe SSD controller which leverages the advantages of the low-latency NAND and enables the reduction of total memory access time, thereby minimizing overall system latency.
TL;DR: This paper advocates to reconsider the cache system design and directly open device-level details of the underlying flash storage for key-value caching and implements a prototype, called DIDACache, based on the Open-Channel SSD platform.
Abstract: Key-value caching is crucial to today’s low-latency Internet services. Conventional key-value cache systems, such as Memcached, heavily rely on expensive DRAM memory. To lower Total Cost of Ownership, the industry recently is moving toward more cost-efficient flash-based solutions, such as Facebook’s McDipper [14] and Twitter’s Fatcache [56]. These cache systems typically take commercial SSDs and adopt a Memcached-like scheme to store and manage key-value cache data in flash. Such a practice, though simple, is inefficient due to the huge semantic gap between the key-value cache manager and the underlying flash devices. In this article, we advocate to reconsider the cache system design and directly open device-level details of the underlying flash storage for key-value caching. We propose an enhanced flash-aware key-value cache manager, which consists of a novel unified address mapping module, an integrated garbage collection policy, a dynamic over-provisioning space management, and a customized wear-leveling policy, to directly drive the flash management. A thin intermediate library layer provides a slab-based abstraction of low-level flash memory space and an API interface for directly and easily operating flash devices. A special flash memory SSD hardware that exposes flash physical details is adopted to store key-value items. This co-design approach bridges the semantic gap and well connects the two layers together, which allows us to leverage both the domain knowledge of key-value caches and the unique device properties. In this way, we can maximize the efficiency of key-value caching on flash devices while minimizing its weakness. We implemented a prototype, called DIDACache, based on the Open-Channel SSD platform. Our experiments on real hardware show that we can significantly increase the throughput by 35.5%, reduce the latency by 23.6%, and remove unnecessary erase operations by 28%.
TL;DR: In this paper, the authors developed new analytical models of layer-to-layer process variation and retention loss in 3D NAND flash memory, which can be combined together to significantly improve flash memory reliability.
Abstract: Compared to planar (i.e., two-dimensional) NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much less aggressive manufacturing process technology than planar NAND flash memory. The circuit-level and structural changes in 3D NAND flash memory significantly alter how different error sources affect the reliability of the memory.
In this paper, through experimental characterization of real, state-of-the-art 3D NAND flash memory chips, we find that 3D NAND flash memory exhibits three new error sources that were not previously observed in planar NAND flash memory: (1) layer-to-layer process variation, where the average error rate of each 3D-stacked layer in a chip is significantly different; (2) early retention loss, a new phenomenon where the number of errors due to charge leakage increases quickly within several hours after programming; and (3) retention interference, a new phenomenon where the rate at which charge leaks from a flash cell is dependent on the data value stored in the neighboring cell.
Based on our experimental results, we develop new analytical models of layer-to-layer process variation and retention loss in 3D NAND flash memory. Motivated by our new findings and models, we develop four new techniques to mitigate process variation and early retention loss in 3D NAND flash memory. These four techniques are complementary, and can be combined together to significantly improve flash memory reliability. Compared to a state-of-the-art baseline, our techniques, when combined, improve flash memory lifetime by 1.85x. Alternatively, if a NAND flash vendor wants to keep the lifetime of the 3D NAND flash memory device constant, our techniques reduce the storage overhead required to hold error correction information by 78.9%.
TL;DR: MoS2-based flash memory devices fabricated by stacking MoS2 and hexagonal boron nitride layers on an hBN/Au substrate demonstrated that these devices can emulate various biological synaptic functions, including potentiation and depression processes, spike-rate-dependent plasticity, and spike-timing dependent plasticity.
Abstract: We fabricated MoS2-based flash memory devices by stacking MoS2 and hexagonal boron nitride (hBN) layers on an hBN/Au substrate and demonstrated that these devices can emulate various biological synaptic functions, including potentiation and depression processes, spike-rate-dependent plasticity, and spike-timing dependent plasticity. In particular, compared to a flash memory device prepared on an hBN substrate, the device fabricated on the hBN/Au exhibited considerably more symmetric and linear bidirectional gradual conductance change curves, which may be attributed to the device structure incorporating double floating gate. For the device on the hBN/Au, electron transfers may occur between the floating gate MoS2 and Au, as well as between the floating gate MoS2 and the channel MoS2, allowing for more control over electron tunneling and injection. To test our hypothesis, we also fabricated a MoS2-based flash memory device on an hBN/Pd substrate and found behavior similar to the device fabricated on hBN/Au....
TL;DR: A novel technique for true random number generation using commercial off-the-shelf Flash memory that is cost-effective, easy to implement in software and to deploy through software updates, and widely applicable to all electronic devices utilizing modern NAND Flash memory chips is proposed.
Abstract: In this paper, we propose and demonstrate a novel technique for true random number generation using commercial off-the-shelf Flash memory. Flash memory cells are known to exhibit thermal noise and random telegraph noise during sensing of their threshold voltage. In order to extract this inherent noise properties of the Flash memory bits through a standard digital Flash memory interface, we utilize the program disturb and read noise characteristics, which are fundamental properties of all NAND Flash memory arrays. The proposed technique is experimentally demonstrated and evaluated using state-of-art Flash memory chips. The experimental evaluation shows that the proposed technique enables extraction of high quality, high throughput, controllable (or tunable), and temperature- and aging-tolerant random bits. The random bits generated by the proposed technique pass all tests in the National Institute of Standards and Technology statistical test suite. The advantages of the proposed technique are as follows: 1) it is cost-effective as it does not require any special circuitry or hardware modification; 2) it is tolerant to aging and temperature effects; 3) it is easy to implement in software and to deploy through software updates; and 4) it is widely applicable to all electronic devices utilizing modern NAND Flash memory chips.
TL;DR: In this article, a string-based start bias control scheme was proposed to improve the performance of BiCS FLASH memory in a 96-word-line-layer BiCS-FLASH memory.
Abstract: The first multi-layer stacked 3D Flash memory was proposed as BiCS FLASH in 2007 [1]. Since then, memory bit density has grown rapidly due to the increase in the number of stacked layers from continuous 3D technology innovations. On the other hand, the multi-level-cell technology, which was initially proposed for 2D Flash, has also been adopted to 3D Flash memories. The first 3b/cell 32-layer Flash was presented in 2015 [2], followed by a 48-layer one in 2016 [3], and a 64-layer one in 2017 [4,5]. This paper describes a 512Gb 3b/cell 3D Flash memory in a 96-word-line-layer BiCS FLASH technology. This work implements three key technologies to improve performance: (1) a string based start bias control scheme achieves a 7% shorter program time; (2) a smart V t -tracking read improves read retry performance by minimizing the tracking time and supporting a program suspend read function, and; (3) a low-pre-charge sense-amplifier bus scheme reduces both the power consumption and the data-transfer time between the sense amplifier (SA) and the data cache by half. Figure 20.1.1 shows the die micrograph and the summary of the key features of the chip.
TL;DR: In this paper, the authors present a select device technology based on volatile resistive switching with Cu and Ag top electrode and silicon oxide (SiO x ) switching materials, which displays ultrahigh resistance window and good current capability exceeding 2 MAcm−2.
Abstract: The cross-point architecture for memory arrays is widely considered as one of the most attractive solutions for storage and memory circuits thanks to simplicity, scalability, small cell size, and consequently high density and low cost. Cost-scalable vertical 3-D cross-point architectures, in particular, offer the opportunity to challenge Flash memory with comparable density and cost. To develop scalable cross-point arrays, however, select devices with sufficient ON–OFF ratio, current capability, and endurance must be available. This paper presents a select device technology based on volatile resistive switching with Cu and Ag top electrode and silicon oxide (SiO x ) switching materials. The select device displays ultrahigh resistance window and good current capability exceeding 2 MAcm−2. Retention study shows a stochastic voltage-dependent ON–OFF transition time in the ${10}~\mu \text{s}$ –1 ms range, which needs to be further optimized for fast memory operation in storage class memory arrays.
TL;DR: A new SSD simulation framework, SimpleSSD 2.0, is introduced, namely Amber, that models embedded CPU cores, DRAMs, and various flash technologies (within an SSD), and operate under the full system simulation environment by enabling a data transfer emulation.
Abstract: SSDs become a major storage component in modern memory hierarchies, and SSD research demands exploring future simulation-based studies by integrating SSD subsystems into a full-system environment. However, several challenges exist to model SSDs under a full-system simulations; SSDs are composed upon their own complete system and architecture, which employ all necessary hardware, such as CPUs, DRAM and interconnect network. Employing the hardware components, SSDs also require to have multiple device controllers, internal caches and software modules that respect a wide spectrum of storage interfaces and protocols. These SSD hardware and software are all necessary to incarnate storage subsystems under full-system environment, which can operate in parallel with the host system. In this work, we introduce a new SSD simulation framework, SimpleSSD 2.0, namely Amber, that models embedded CPU cores, DRAMs, and various flash technologies (within an SSD), and operate under the full system simulation environment by enabling a data transfer emulation. Amber also includes full firmware stack, including DRAM cache logic, flash firmware, such as FTL and HIL, and obey diverse standard protocols by revising the host DMA engines and system buses of a popular full system simulator's all functional and timing CPU models (gem5). The proposed simulator can capture the details of dynamic performance and power of embedded cores, DRAMs, firmware and flash under the executions of various OS systems and hardware platforms. Using Amber, we characterize several system-level challenges by simulating different types of full-systems, such as mobile devices and general-purpose computers, and offer comprehensive analyses by comparing passive storage and active storage architectures.
TL;DR: This paper proposes extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision, performed by gate-coupled circuit utilizing embedded floating gate (FG) memories.
Abstract: Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-medium computing precision. In this paper, we propose extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision. Similar to some previous work, the computation is performed by gate-coupled circuit utilizing embedded floating gate (FG) memories. The main novelty of our approach is an ultra-low power sensing circuitry, which is designed based on translinear Gilbert cell in topological combination with a floating resistor and a low-gain amplifier. Additionally, the digital-to-analog input conversion is merged with VMM, while current-mode algorithmic analog-to-digital circuit is employed at the circuit backend. Such implementations of conversion and sensing allow for circuit operation entirely in a current domain, resulting in high performance and energy efficiency. For example, post-layout simulation results for 400 × 400 5-bit VMM circuit designed in 55 nm process with embedded NOR flash memory, show up to 400 MHz operation, 1.68 POps/J energy efficiency, and 39.45 TOps/mm2 computing throughput. Moreover, the circuit is robust against process-voltage-temperature variations, in part due to inclusion of additional FG cells that are utilized for offset compensation.1
TL;DR: A 512Gb 3b/cell 3D Flash memory in a 96-word-line-layer BiCS FLASH technology is described, which implements three key technologies to improve performance: a string based start bias control scheme achieves a 7% shorter program time; a smart Vt-tracking read improves read retry performance by minimizing the tracking time and supporting a program suspend read function.
Abstract: The first multi-layer stacked 3D Flash memory was proposed as BiCS FLASH in 2007 [1]. Since then, memory bit density has grown rapidly due to the increase in the number of stacked layers from continuous 3D technology innovations. On the other hand, the multi-level-cell technology, which was initially proposed for 2D Flash, has also been adopted to 3D Flash memories. The first 3b/cell 32-layer Flash was presented in 2015 [2], followed by a 48-layer one in 2016 [3], and a 64-layer one in 2017 [4,5]. This paper describes a 512Gb 3b/cell 3D Flash memory in a 96-word-line-layer BiCS FLASH technology. This work implements three key technologies to improve performance: (1) a string based start bias control scheme achieves a 7% shorter program time; (2) a smart Vt-tracking read improves read retry performance by minimizing the tracking time and supporting a program suspend read function, and; (3) a low-pre-charge sense-amplifier bus scheme reduces both the power consumption and the data-transfer time between the sense amplifier (SA) and the data cache by half. Figure 20.1.1 shows the die micrograph and the summary of the key features of the chip.
TL;DR: In this paper, a robust and CMOS compatible nonvolatile memory solution is provided, which not only allows for precise trimming of the resonance frequency of the photonic device, but can also be easily manufactured and commercialized.
Abstract: Nonvolatile flash memory technology is widely used in our daily life. Following the recent progress in silicon photonics, there is now an opportunity to embed flash memories also in photonic applications. As of today, chip scale photonic devices, e.g., micro-resonators, are becoming essential building blocks in modern silicon photonics. However, their properties, such as their resonance frequencies, fluctuate due to fabrication tolerances, significantly limiting their applicability. Here, by integrating the well-established non-volatile flash memory technology into silicon photonic circuitry, this major obstacle is tackled and electrical post trimming of such resonators is demonstrated. Specifically, the Metal-Oxide-Nitride-Oxide-Silicon (MONOS) structure is used to trap charges in the thin silicon nitride layer, located in close proximity to the silicon device layer. This enables accumulating charges in the silicon, modifying the effective index of the optical mode and consequently the resonance frequency. By doing so, a robust and CMOS compatible nonvolatile memory solution is provided, which not only allows for precise trimming of the resonance frequency of the photonic device, but can also be easily manufactured and commercialized. This approach paves the way for efficient utilization of photonic structures such as resonators and interferometers in chip scale silicon photonics and electro optic systems, with a wide range of applications spanning from filters, switches and modulators, to sensors, and even lasers.
TL;DR: In this article, the authors presented a strategy for controlling the device performance of organic field effect transistor (OFET) memory devices by using a metallo-phthalocyanine (MPc)-cored star-shaped polystyrene as a charge storage material.
Abstract: We present a strategy for controlling the device performance of organic field effect transistor (OFET) memory devices by using a metallo-phthalocyanine (MPc)-cored star-shaped polystyrene as a charge storage material. MPc-cored four-armed star polymers (M = Cu or Zn) with polystyrene arms of three different number-average molecular weights were prepared by atom transfer radical polymerization and cyclization reactions with metal ions; the density of the MPc cores dispersed in the polymer matrix was dependent on the lengths of polymer arms. The charge carrier mobility of pentacene-based OFET memory devices containing the star polymer varied with the nature of the MPc-cored star polymer layer owing to the presence of the MPc core unit as well as the nanostructures of the polymer thin films. Application of an external gate bias to the OFET device caused significant reversible shifts in the threshold voltage, and the magnitude of the memory shifts was proportional to the weight percentage of MPc cores in the star polymer matrix. The memory device showed a high memory on/off current ratio (>105) and long retention characteristics (>105 s), permitting it to be characterized as a nonvolatile organic memory device; the retention time extended upon increasing the Mn of the star polymer. The MPc-cored star polymer is a promising material designed as a charge storage layer for controlling the performance of organic flash memory devices.
TL;DR: In this paper, the authors describe a 3D flash memory cell that can be produced despite a misalignment in at least two sections (top and bottom), each having multiple charge storage locations.
Abstract: Methods of forming 3-d flash memory cells are described. The methods allow the cells to be produced despite a misalignment in at least two sections (top and bottom), each having multiple charge storage locations. The methods include selectively gas-phase etching dielectric from the bottom memory hole portion by delivering the etchants through the top memory hole. Two options for completing the methods include (1) forming a ledge spacer to allow reactive ion etching of the bottom polysilicon portion without damaging polysilicon or charge-trap/ONO layer on the ledge, and (2) placing sacrificial silicon oxide gapfill in the bottom memory hole, selectively forming protective conformal silicon nitride elsewhere, then removing the sacrificial silicon oxide gapfill before performing the reactive ion etching of the bottom polysilicon portion as before.
TL;DR: This chapter describes several mitigation and recovery techniques, including cell-to-cell interference mitigation; optimal multi-level cell sensing; error correction using state-of-the-art algorithms and methods; and data recovery when error correction fails.
Abstract: NAND flash memory is ubiquitous in everyday life today because its capacity has continuously increased and cost has continuously decreased over decades. This positive growth is a result of two key trends: (1) effective process technology scaling; and (2) multi-level (e.g., MLC, TLC) cell data coding. Unfortunately, the reliability of raw data stored in flash memory has also continued to become more difficult to ensure, because these two trends lead to (1) fewer electrons in the flash memory cell floating gate to represent the data; and (2) larger cell-to-cell interference and disturbance effects. Without mitigation, worsening reliability can reduce the lifetime of NAND flash memory. As a result, flash memory controllers in solid-state drives (SSDs) have become much more sophisticated: they incorporate many effective techniques to ensure the correct interpretation of noisy data stored in flash memory cells. In this chapter, we review recent advances in SSD error characterization, mitigation, and data recovery techniques for reliability and lifetime improvement. We provide rigorous experimental data from state-of-the-art MLC and TLC NAND flash devices on various types of flash memory errors, to motivate the need for such techniques. Based on the understanding developed by the experimental characterization, we describe several mitigation and recovery techniques, including (1) cell-to-cell interference mitigation; (2) optimal multi-level cell sensing; (3) error correction using state-of-the-art algorithms and methods; and (4) data recovery when error correction fails. We quantify the reliability improvement provided by each of these techniques. Looking forward, we briefly discuss how flash memory and these techniques could evolve into the future.
TL;DR: In this paper, a 3D floating-gate (FG) synapse array for neuromorphic applications is proposed, which has a smaller cell size due to the stacked structure and smaller operation voltage by the gate-all-around geometry.
Abstract: This paper proposes a 3-D floating-gate (FG) synapse array for neuromorphic applications. The designed device has certain advantages over previous planar FG synapse devices: a smaller cell size due to the stacked structure and smaller operation voltage by the gate-all-around geometry. In addition, the operation method to implement spike time-dependent plasticity is proposed and demonstrated. The proposed array based on commercialized flash memory technology is expected be one of the most promising candidate architecture for neuromorphic applications.
TL;DR: ApproxFTL is proposed, an approximate-write aware flash translation layer design, that uses approximate- write operations to store error-resilient data of modern applications to alleviate disturbance in physical blocks that save both precise and approximate data.
Abstract: 3-D NAND flash is one of the most prospective advances in flash memory industry. While 3-D flash improves cell density and reduces lithography cost through die stacking, it suffers from severe program disturbance, which leads to significant performance and lifetime degradation for 3-D flash-based SSDs. To address the above challenge, we propose ApproxFTL, an approximate-write aware flash translation layer design, that uses approximate-write operations to store error-resilient data of modern applications. By reducing the maximal threshold voltage and tightening the guard bands between multilevel cell states, approximate write operations not only finish early but also exhibit large disturbance reduction, which can be exploited to alleviate disturbance in physical blocks that save both precise and approximate data. ApproxFTL maximizes the disturbance mitigation through approximate-write aware data placement, wear leveling, and garbage collection enhancements. Our experimental results show that ApproxFTL, while preserving high data quality, improves the read and write response time of flash accesses by 41.38% and 45.64% on average, respectively, and extends the lifetime of 3-D flash-based SSDs by 5.75% when comparing to the state-of-the-art.
TL;DR: This scheme attempts to merge subpage write requests to full page write requests in the write buffer to reduce the number of NAND writes and adds size information to the mapping table to detect unnecessary RMW operations.
Abstract: The manufacturers of NAND flash-based solid-state drives (SSDs) are increasing capacity and throughput by enlarging their page size, which is the minimum I/O unit in the NAND flash chips. Because the host and NAND flash chips have different I/O granularity units, the number of subpage requests increases. However, these subpage requests, especially writes, can cause internal fragmentation and endurance problems. Furthermore, subpage write requests inevitably involve read-modify-write (RMW) operations that increase the write response time because of the out-place-update feature in the NAND flash chips. In this paper, we propose a subpage-aware SSD to increase the lifetime and performance by reducing the number of NAND writes and eliminating unnecessary RMW operations. Our scheme attempts to merge subpage write requests to full page write requests in the write buffer to reduce the number of NAND writes and adds size information to the mapping table to detect unnecessary RMW operations. Our proposed scheme reduces the number of NAND writes by up to 30 and 19 percent on average and the write response time by up to 22 and 13 percent on average.
TL;DR: The successful demonstration of the wire connected 1S1R unit comprising this flexible selector and one bipolar resistor cell indicates the great potential of this cation-based selector to restrain the crosstalk issue in a large flexible RRAM array.
Abstract: Emerging resistive switching random access memory (RRAM), considered as the most promising candidate of flash memory, is favorable for in flexible electronic system. However, in high density flexible crossbar RRAM array, crosstalk issue that currents from the neighboring unselected cell lead to failure of write and read operations, still keeps a main bottleneck. Therefore, flexible selector compatible with the flexibility of the RRAM array should be focused on to configure one selector-one resistor (1S1R) system, which is immune to crosstalk issue. In this paper, flexible cation-based threshold switching (TS) selectors (Pt/Ag/HfO2/Pt/Ti/parylene) are fabricated and the compressive performance is studied systematically. The device shows excellent bidirectional volatile TS characteristics, including high selectivity ratio (109), low operating voltages (|VTH|<1 V), ultra-low leakage current (~10−13 A) and good flexibility. The successful demonstration of the wire connected 1S1R unit comprising this flexible selector and one bipolar resistor cell indicates the great potential of this cation-based selector to restrain the crosstalk issue in a large flexible RRAM array.
TL;DR: The authors' logic-compatible eflash-based spiking neuromorphic core achieves a 91.8% handwritten digit recognition accuracy which is close to the accuracy of the software model with the same number of weight levels.
Abstract: A neuromorphic core utilizing logic-compatible embedded flash technology for storing multi-level synaptic weights is demonstrated in a 65nm standard CMOS process. A carefully-designed program-verify sequence along with a bitline voltage regulation scheme allows the individual cell currents to be programmed precisely. This makes it possible to enable a large number of rows in parallel without impacting the current summation accuracy. Furthermore, eflash based synapses are non-volatile and hence consumes zero standby power and supports instant on/off operation. Our design stores excitatory and inhibitory weights in adjacent bitlines whose voltage levels are regulated for accurate current programming and measurement. Output spikes are generated by comparing the excitatory and inhibitory bitline currents. Our logic-compatible eflash-based spiking neuromorphic core achieves a 91.8% handwritten digit recognition accuracy which is close to the accuracy of the software model with the same number of weight levels. The maximum throughput of the core is 1.28G pixels/s and the average power consumption of a single neuron circuit is $15.9\mu \mathrm{W}$ .
TL;DR: With the weight values stored in the non-volatile memory there is no need to move data around and this greatly improves the performance and energy efficiency for neural network applications.
Abstract: We propose a novel processing-in-memory (PIM) architecture based on the voltage summation concept to accelerate the vector-matrix multiplication for neural network (NN) applications. The core device is formed by adding a buried shunt resistor to a floating gate Flash memory device. The NN string is constructed the same way as in NAND Flash by connecting the core devices in series. In perceptron operation the weighting factors are stored in the floating gate device and the sum-of-product is readily obtained by summing the voltage drop of the cells in each NN string. The energy consumption for 128 multiply-and-sum operations within a string can be as low as 0.2pJ. Finally, with the weight values stored in the non-volatile memory there is no need to move data around and this greatly improves the performance and energy efficiency for neural network applications.
TL;DR: Higher learning accuracy is obtained with GSD and NAND synaptic devices compared to that with a memristor-based synapse and measured synaptic properties of the vertical NAND cells are reported for the first time.
Abstract: Four synaptic devices are introduced for spiking neural networks (SNNs) and deep neural networks (DNNs). Unsupervised learning is successfully demonstrated by applying the STDP learning rule reflecting the LTP/LTD characteristics of the fabricated TFT-type NOR flash memory cells. Gated Schottky diode (GSD) and vertical NAND flash cell are proposed as synaptic device for DNNs. Using matched simulation, we obtained higher learning accuracy with GSD and NAND synaptic devices compared to that with a memristor-based synapse. Measured synaptic properties of the vertical NAND cells are reported for the first time.
TL;DR: This work presents the first implementation of spike-timing-dependent plasticity (STDP) and unsupervised learning in a mainstream NOR Flash memory array based on floating-gate cells, paving the way for the development of large-scale and high-density neuromorphic systems based on mainstream nonvolatile memory technologies.
Abstract: In this work, we present the first implementation of spike-timing-dependent plasticity (STDP) and unsupervised learning in a mainstream NOR Flash memory array based on floating-gate cells. A simple yet effective word-line and bit-line pulse scheme is proposed to make a common-ground double-polysilicon NOR array in 40 nm embedded technology work as an artificial synaptic array in a spiking neural network learning according to the STDP rule, with no change required either to the array or to the cell design. With this scheme, long-term potentiation and long-term depression of the synaptic weights are achieved, respectively, by hot-hole injection and channel hot-electron injection at the drain side of the cells. Unsupervised learning is experimentally demonstrated in the array, paving the way for the development of large-scale and high-density neuromorphic systems based on mainstream nonvolatile memory technologies.
TL;DR: DLV improves flash access speeds based on process variations and data retention time difference across flash blocks and integrates access speed optimization with access scheduling such that the average access response time can be effectively reduced on flash memory storage systems.
Abstract: NAND flash has been widely adopted in storage systems due to its better read and write performance and lower power consumption over traditional mechanical hard drives. To meet the increasing performance demand of modern applications, recent studies speed up flash accesses by exploiting access latency variations at the device level. Unfortunately, existing flash access schedulers are still oblivious to such variations, leading to suboptimal I/O performance improvements. In this paper, we propose DLV, a novel flash access scheduler for exploring scheduling opportunities due to device level access latency variations. DLV improves flash access speeds based on process variations and data retention time difference across flash blocks. More importantly, DLV integrates access speed optimization with access scheduling such that the average access response time can be effectively reduced on flash memory storage systems. Our experimental results show that DLV achieves an average of 41.5% performance improvement over the state-of-the-art.
TL;DR: It is shown that NAND flash memory reliability can be improved at low cost and with low performance overhead by deploying various architectural techniques that are aware of higher-level application behavior and underlying flash device characteristics.
Abstract: Raw bit errors are common in NAND flash memory and will increase in the future. These errors reduce flash reliability and limit the lifetime of a flash memory device. We aim to improve flash reliability with a multitude of low-cost architectural techniques. We show that NAND flash memory reliability can be improved at low cost and with low performance overhead by deploying various architectural techniques that are aware of higher-level application behavior and underlying flash device characteristics.
We analyze flash error characteristics and workload behavior through experimental characterization, and design new flash controller algorithms that use the insights gained from our analysis to improve flash reliability at a low cost. We investigate four directions through this approach. (1) We propose a new technique called WARM that improves flash reliability by 12.9 times by managing flash retention differently for write-hot data and write-cold data. (2) We propose a new framework that learns an online flash channel model for each chip and enables four new flash controller algorithms to improve flash reliability by up to 69.9%. (3) We identify three new error characteristics in 3D NAND through a comprehensive experimental characterization of real 3D NAND chips, and propose four new techniques that mitigate these new errors and improve 3D NAND reliability by up to 66.9%. (4) We propose a new technique called HeatWatch that improves 3D NAND reliability by 3.85 times by utilizing self-healing effect to mitigate retention errors in 3D NAND.