| Tuesday, January 20, 2009 |
| Wednesday, January 21, 2009 |
| A | B | C | D |
|---|---|---|---|
Keynote Session II 9:00 - 10:00 |
|||
System Level Architectures 10:15 - 12:20 |
Beyond Traditional Floorplanning and Placement 10:15 - 12:20 |
Signal/Power Integrity and Simulation 10:15 - 12:20 |
Special Session: Challenges in 3D Integrated Circuit Design 10:15 - 12:20 |
Energy-Aware System Level Design Methodology 13:30 - 15:35 |
Design for Manufacturing and Reliability 13:30 - 15:35 |
Analog, RF and Mixed-Signal CAD 13:30 - 15:35 |
Designers' Forum: Consumer SoC 13:30 - 15:35 |
System Level Simulation and Modeling 15:55 - 18:00 |
Chip and Package Routing Techniques 15:55 - 18:00 |
Designers' Forum: ESL Design Methods 15:55 - 18:00 |
|
| Thursday, January 22, 2009 |
| Tuesday, January 20, 2009 |
| Title | (Keynote Address) Challenges to EDA System from the View Point of Processor Design and Technology Drivers |
| Author | Mitsuo Saito (Toshiba Corporation Semiconductor Company, Japan) |
| Abstract | Historically, many microprocessors have been developed, since it was invented in early 1970s. Microprocessor design was always under the hardest competition, so they had been the technology driver for the semiconductor technology and the design methodology until recently. By discussing the relationship between the design methodology (EDA) revolution and the technology driver products transition, based upon famous Makimotos wave hypothesis, what happened to the microprocessor world is highlighted by showing typical examples. As a recent example, the positioning of the Cell Broadband Engine as a high performance computing processor and as a flexible HW, is discussed mainly, also the performance result, and the future trend of the microprocessors towards multi-core are discussed. Then it is explained, why SpursEngine derived from Cell Broadband Engine had to be developed. SoC (combination of microprocessor and HW functional unit) for custom applications should be the technology driver, for the next decade, which is the first experience after microprocessor was born. The special requirements to the EDA system to realize next wave, are predicted. Finally, when the next wave comes, maybe after 2017, software centric era, what happens to the world, is briefly mentioned. |
| Title | Adaptive Inter-router Links for Low-Power, Area-Efficient and Reliable Network-on-Chip (NoC) Architectures |
| Author | Avinash Karanth Kodi (Ohio University, United States), Ashwini Sarathy, Ahmed Louri, *Janet Wang (University of Arizona, United States) |
| Keyword | network-on-chip, low-power architecture |
| Abstract | The increasing wire delay constraints in deep sub-micron VLSI designs have led to the emergence of scalable and modular Network-on-Chip (NoC) architectures. As the power consumption, area overhead and performance of the entire NoC is influenced by the router buffers, research efforts have targeted optimized router buffer design. In this paper, we propose iDEAL - inter-router, dual-function energy and area-efficient links capable of data transmission as well as data storage when required. iDEAL enables a reduction in the router buffer size by controlling the repeaters along the links to adaptively function as link buffers during congestion, thereby achieving nearly 30% savings in overall network power and 35% reduction in area with only a marginal 1-3% drop in performance. In addition, aggressive speculative flow control further improves the performance of iDEAL. Moreover, the significant reduction in power consumption and area provides sufficient headroom for monitoring Negative Bias Temperature Instability (NBTI) effects in order to improve circuit reliability at reduced feature sizes. |
| Title | Analysis of Communication Delay Bounds for Network on Chips |
| Author | *Yue Qian (National University of Defense Technology, China), Zhonghai Lu (Royal Institute of Technology, Sweden), Wenhua Dou (National University of Defense Technology, China) |
| Keyword | Network-on-chip, network calculus, delay bound |
| Abstract | In network-on-chip, computing worst-case delay bound for packet delivery is crucial for designing predictable systems but yet an intractable problem due to complicated resource contention scenarios. In this paper, we present an analysis technique to derive the communication delay bound for individual flows. Based on a network contention model, this technique, which is topology independent, employs the network calculus theory to first compute the equivalent service curve for individual flows and then calculate their packet delay bound. To exemplify our method, we also present the derivation of a closed-form formula to calculate the delay bound for all-to-one gather communication. Our experimental results demonstrate the theoretical bounds are correct and tight. |
| Title | Frequent Value Compression in Packet-based NoC Architectures |
| Author | Ping Zhou, Bo Zhao, Yu Du, Yi Xu, Youtao Zhang, *Jun Yang (University of Pittsburgh, United States), Li Zhao (Intel, United States) |
| Keyword | compression, NoC, performance, power |
| Abstract | The proliferation of Chip Multiprocessors (CMPs) has led to the integration of large on-chip caches. For scalability reasons, a large on-chip cache is often divided into smaller banks that are interconnected through packet-based Network-on-Chip (NoC). With increasing number of cores and cache banks integrated on a single die, the on-chip network introduces significant communication latency and power consumption. In this paper, we propose a novel scheme that exploits Frequent Value compression to optimize the power and performance of NoC. Our experimental results show that the proposed scheme reduces the router power by up to 16.7%, with CPI reduction as much as 23.5% in our setting. Comparing to the recent zero pattern compression scheme, the frequent value scheme saves up to 11.0\% more router power and has up to 14.5% more CPI reduction. Hardware design of the FV table and its overhead are also presented. |
| Title | Simultaneous Data Transfer Routing and Scheduling for Interconnect Minimization in Multicycle Communication Architecture |
| Author | Yu-Ju Hong (Purdue University, United States), Ya-Shih Huang, *Juinn-Dar Huang (National Chiao Tung University, Taiwan) |
| Keyword | multicycle communication, architectural synthesis, interconnect minimization, resource allocation and sharing, scheduling |
| Abstract | In deep submicron technology, wire delay is no longer negligible and is gradually becoming a dominant factor of system performance. Several state-of-the-art architectural synthesis flows have already adopted the distributed register architecture to cope with the increasing wire delay by allowing multicycle communication. In this paper, we formulate channel and register allocation within a refined regular distributed register architecture, named RDR-GRS, as a problem of simultaneous data transfer routing and scheduling for minimizing global interconnect resources. We also present an innovative algorithm with both spatial and temporal considerations. It features both a concentration-oriented path router gathering wire-sharable data transfers and a channel-based time scheduler resolving contentions for wires in a channel, which are in spatial and temporal domain, respectively. The experimental results show that the proposed algorithm can significantly outperform existing related works. |
| Title | Dynamically Reconfigurable On-Chip Communication Architectures for Multi Use-Case Chip Multiprocessor Applications |
| Author | *Sudeep Pasricha, Nikil Dutt, Fadi Kurdahi (University of California, Irvine, United States) |
| Keyword | crossbar, on-chip communication, synthesis, low power |
| Abstract | The phenomenon of digital convergence and increasing application complexity today is motivating the design of chip multiprocessor (CMP) applications with multiple use cases. Most traditional on-chip communication architecture design techniques perform synthesis and optimization only for a single use-case, which may lead to sub-optimal design decisions for multi-use case applications. In this paper we present a framework to generate a dynamically reconfigurable crossbar-based on-chip communication architecture that can support multiple use-case bandwidth and latency constraints. Our framework generates on-chip communication architectures with a low cost, low power dissipation, and with minimal reconfiguration overhead. Results of applying our framework on several networking CMP applications show that our approach is able to generate a crossbar solution with significantly lower cost (2.4 to 3.8), and lower power dissipation (1.5 to 3.1), compared to the best previously proposed approach. |
| Title | Stochastic Thermal Simulation Considering Spatial Correlated Within-Die Process Variations |
| Author | *Pei-Yu Huang, Jia-Hong Wu, Yu-Min Lee (National Chiao Tung University, Taiwan) |
| Keyword | Statistical IC thermal simulator, Karhunen-Loeve expansion, Leakage power, stochastic Galerkin method |
| Abstract | In this work, a statistical thermal simulator including the effect of spatial correlation under within-die process variations is developed. This method utilizes the Karhunen-Loeve (KL) expansion to model the physical parameters, and applies the Polynomial Chaoses (PCs) and the stochastic Galerkin method to tackle the stochastic heat transfer equations. The experimental results not only demonstrate the accuracy and efficiency of the proposed method, but also point out that the stochastic thermal analysis is essential to provide a robust estimation of temperature distribution for the thermal-aware design flow. |
| Title | A Control Theory Approach for Thermal Balancing of MPSoC |
| Author | *Francesco Zanini, David Atienza, Giovanni De Micheli (Ecole Polytechnique Federale de Lausanne, Switzerland) |
| Keyword | thermal balancing, MPSoC, control theory, linear quadratic regulator |
| Abstract | Thermal balancing and reducing hot-spots are two important challenges facing the MPSoC designers. In this work, we model the thermal behavior of an MPSoC as a control theory problem, which enables the design of an optimum frequency controller without depending on the thermal profile of the chip. The optimization performed by the controller is targeted to achieve thermal balancing on the MPSoC thermal profile to avoid hotspots and improve its reliability. The proposed system is able to perform an on-line minimization of chip thermal gradients based on both scheduler requirements and the chip thermal profile. We compare this with state of the art thermal management approaches, our comparison shows that the proposed system offers a better both thermal profile (temperature differences higher than 4C have been reduced from 27.9% to 0.45%) and performance (up to 32% task waiting time reduction). |
| Title | Thermal Optimization in Multi-Granularity Multi-Core Floorplanning |
| Author | *Michael B. Healy, Hsien-Hsin S. Lee, Gabriel H. Loh, Sung Kyu Lim (Georgia Institute of Technology, United States) |
| Keyword | multicore, thermal, floorplanning |
| Abstract | Multi-core microarchitectures require a careful balance between many competing objectives to achieve the highest possible performance. Integrated Early Analysis is the consideration of all of these factors at an early stage. Toward this goal, this work presents the first adaptive multi-granularity multi-core microarchitecture-level floorplanner that simultaneously optimizes temperature and performance, and considers memory bus length. We include simultaneous optimization at both the module-level and the core/cache-bank level. Related experiments show that our methodology is effective for optimizing multi-core architectures. |
| Title | Temperature-Aware Dynamic Frequency and Voltage Scaling for Reliability and Yield Enhancement |
| Author | *Yu-Wei Yang, Katherine Shu-Min Li (Department of Computer Science and Engineering, National Sun Yat-Sen University, Taiwan) |
| Keyword | DVFS, DVS, oscillation ring, on-chip thermal sensors, on-chip DVFS monitor |
| Abstract | A novel oscillation-based on-chip thermal sensing architecture for dynamically adjusting supply voltage and clock frequency in System-on-Chip (SoC) is proposed. It is shown that the oscillation frequency of a ring oscillator reduces linearly as the temperature rises, and thus provides a good on-chip temperature sensing mechanism. An efficient Dynamic Frequency-to-Voltage Scaling (DF2VS) algorithm is proposed to dynamically adjust supply voltage according to the oscillation frequencies of the ring oscillators distributed in SoC so that thermal sensing can be carried at all potential hot spots. An on-chip Dynamic Voltage Scaling or Dynamic Voltage and Frequency Scaling (DVS or DVFS) monitor selects the supply voltage level and clock frequency according to the outputs of all thermal sensors. Experimental results on SoC benchmark circuits show the effectiveness of the algorithm that a 10% reduction in supply voltage alone can achieve about 20% power reduction (DVS scheme), and nearly 50% reduction in power is achievable if the clock frequency is also scaled down (DVFS scheme). The chip temperature is reduced accordingly. |
| Title | A Multiple Supply Voltage Based Power Reduction Method in 3-D ICs Considering Process Variations and Thermal Effects |
| Author | Shih-An Yu, *Pei-Yu Huang, Yu-Min Lee (National Chiao Tung University, Taiwan) |
| Keyword | Power Optimization, 3D ICs, Thermal analysis, Multiple Supply Voltage |
| Abstract | In this paper, a grid-based multiple supply voltage (MSV) assignment method is presented to statistically minimize the total power consumption of 3-D IC. This method consists of a statistical electro-thermal simulator to get the mean and variance of on-chip, a thermal-aware statistical static timing analysis (SSTA) to take into account the thermal effect on circuit timing, the statistical power delay sensitivityslack product to be the optimization criterion, and an incremental update of statistical timing to save the runtime. The experimental results demonstrate the effectiveness of the developed methodology and indicate that the consideration of the thermal effect in the circuit simulation is imperative. |
| Title | FastYield: Variation-Aware, Layout-Driven Simultaneous Binding and Module Selection for Performance Yield Optimization |
| Author | *Gregory Lucas, Scott Cromar, Deming Chen (University of Illinois, Urbana-Champaign, United States) |
| Keyword | high level synthesis, process variation, ssta |
| Abstract | We propose a new variation-aware high-level synthesis binding/module selection algorithm, named FastYield, that takes into consideration multiplexers, functional units, registers, and interconnects. Additionally, FastYield connects with the lower levels of the design hierarchy through its inclusion of a timing driven floorplanner guided by a statistical static timing analysis (SSTA) engine which is used to modify/enhance the synthesis solution. On average, FastYield achieves an 85% performance yield clock period that is 14.5% smaller, and a performance yield gain of 78.9%, when compared to a variation-unaware algorithm. |
| Title | CriAS: A Performance-Driven Criticality-Aware Synthesis Flow for On-Chip Multicycle Communication Architecture |
| Author | *Chia-I Chen, Juinn-Dar Huang (National Chiao Tung University, Taiwan) |
| Keyword | Architectural synthesis, multicycle communication architecture, distributed register architecture, criticality-aware, performance-driven |
| Abstract | In deep submicron era, wire delay is no longer negligible and is dominating the system performance. Several state-of-the-art architectural synthesis flows have been proposed for the distributed register architectures to cope with the increasing wire delay by allowing on-chip multicycle communication. In this paper, we present a new performance-driven criticality-aware synthesis flow CriAS targeting regular distributed register architectures. CriAS features a hierarchical binding strategy and a coarse-grained placer for minimizing the number of critical global data transfers. The key ideas are to take time criticality as the major concern at earlier binding stages before the detailed physical placement information is available, and to preserve the locality of closely related critical components in the later placement phase. The experimental results show that 19% overall performance improvement can be achieved on average as compared to the previous work. |
| Title | Tolerating Process Variations in High-Level Synthesis Using Transparent Latches |
| Author | *Yibo Chen, Yuan Xie (the Pennsylvania State University, United States) |
| Keyword | high-level synthesis, process variation, latch |
| Abstract | Considering process variability at the behavior synthesis level is necessary, because it makes some instances of function units slower and others faster, resulting in unbalanced control steps and reducing the attainable frequency of the circuit. To tackle this problem, this paper proposes a methodology to replace the edge-trigged flip-flops by transparent latches, to exploit latches' extra ability of passing time slacks and tolerating delay variations. In the paper we first define the timing yield in high-level synthesis, and then present how to replace flip-flops with latches to improve timing yield and mitigate the impact of process variations. We then discuss the benefits and overheads for the replacement, and propose an optimization framework for latch replacement in high-level synthesis design flow. Experimental results show that the latch-based design can achieve an average of 27% improvement of timing yield compared with traditional flip-flop based design. |
| Title | Variation-Aware Resource Sharing and Binding in Behavioral Synthesis |
| Author | Feng Wang (Qualcomm Inc., United States), Yuan Xie (Pennsylvania State University, United States), *Andres Takach (Mentor Graphics Corporation, United States) |
| Keyword | High level synthesis, resource sharing, resource binding, process variation |
| Abstract | As technology scales, the delay uncertainty caused by process variations has become increasingly pronounced in deep submicron designs. In the presence of process variations, worst-case timing analysis may lead to overly conservative synthesis, and may end up using excess resources to guarantee design constraints. In this paper, we propose an efficient variation-aware resource sharing and binding algorithm in behavioral synthesis, which takes into account the performance variations for functional units. The performance yield, which is defined as the probability that the synthesized hardware meets the target performance constraints, is used to evaluate the synthesis result. An efficient metric called statistical performance improvement, is used to guide resource sharing and binding. The proposed algorithm is integrated into a commercial synthesis framework that transfer design specifications from behavioral description to RTL netlists. The effectiveness of the proposed algorithm is demonstrated with a set of industrial benchmark designs, which consist of blocks that are commonly used in wireless and image processing applications. The experimental results show that our method achieves an average 33% area reduction over traditional methods, which are based on the worst-case delay analysis, with an average 10% run time overhead. |
| Title | Peak Temperature Control in Thermal-aware Behavioral Synthesis through Allocating the Number of Resources |
| Author | *Junbo Yu, Qiang Zhou, Jinian Bian (Tsinghua University, China) |
| Keyword | resource usage allocation, behavioral synthesis, peak temperature |
| Abstract | High temperature adversely impacts on reliability, performance, and leakage power of ICs. In behavioral synthesis, both resource usage allocation and resource binding influence the final thermal profile. Previous thermal-aware behavioral syntheses only focused on binding, ignoring allocation. This paper proposes thermal-aware behavioral synthesis with resource usage allocation. According to power density and feedbacks from thermal simulation, we allocate the number of resources under area constraint. Our flow effectively controls peak temperature and creates even power densities among resources of gdifferenth and gsameh types. Compared to classic behavioral synthesis of peak temperature control, our technique reduces peak temperature by 11.1 on average with no area overhead and only 1.2 more steps latency overhead. |
| Title | A Wireless Real-Time On-Chip Bus Trace System |
| Author | *Shusuke Kawai, Takayuki Ikari (Keio University, Japan), Yutaka Takikawa (Renesas Design Corp, Japan), Hiroki Ishikuro, Tadahiro Kuroda (Keio University, Japan) |
| Keyword | Inductive coupring, Wireless interface |
| Abstract | A 480Mb/s wireless real-time bus trace system with a pulse-based inductive coupling channel array was developed using a 0.25m CMOS digital process. The size and pitch of the inductor array are determined by numerical calculation to optimize the tradeoff between the channel coupling, crosstalk, and alignment tolerance. A low-power quasi-synchronous system is proposed to obtain an enough timing margin for RX pulse detection under the presence of the clock skew |
| Title | CKVdd: A Self-Stabilization Ramp-Vdd Technique for Dynamic Power Reduction |
| Author | Chin-Hsien Wang, *Ching-Hwa Cheng (Feng Chia University, Taiwan), Jiun-In Guo (National Chung Cheng University, Taiwan) |
| Keyword | Low power |
| Abstract | We propose a self-stabilized ramp voltage technique, CKVdd, to reduce power dissipation in conventional CMOS circuit. Normal CMOS circuits show a power increase proportional to clock frequency. CKVdd results in a lower-than-usual power increase. This technique is easily implemented in CMOS circuits. CKVdd technique possesses several characteristics that differ from of the current circuits using Vdd power source. First, CKVdd circuits have less average current and peak current consumption, such that it can be a low power design technique applied to generic digital circuits. Second, CKVdd technique combines the power source and clock signal, and can easily implement the power management mechanism. Compared to constant Vdd for multimedia decoders, the proposed technique has 45% of the usual power dissipation and 88% of the usual peak current reduction at the cost of small delay penalty. |
| Title | A 300 nW, 7 ppm/℃ CMOS Voltage Reference Circuit based on Subthreshold MOSFETs |
| Author | *Ken Ueno (Hokkaido University, Japan), Tetsuya Hirose (Kobe University, Japan), Tetsuya Asai, Yoshihito Amemiya (Hokkaido University, Japan) |
| Keyword | Voltage reference, subthreshold, Ultra-low power, process variation |
| Abstract | An ultra-low power CMOS voltage reference circuit has been fabricated in a 0.35-um standard CMOS process. The circuit generates a reference voltage based on threshold voltage of a MOSFET at absolute zero temperature. Theoretical analyses and experimental results showed that the circuit generates a quite stable reference voltage of 745 mV on average. The temperature coefficient and line sensitivity of the circuit were 7 ppm/degC and 20 ppm/V, respectively. The power supply rejection ratio (PSRR) was -45 dB at 100 Hz. The circuit consists of subthreshold MOSFETs with a low-power dissipation of 0.3 uW or less and a 1.5-V power supply. Because the circuit generates a reference voltage based on threshold voltage of a MOSFET in an LSI chip, it can be used as an on-chip process monitoring circuit and as a part of the on-chip process compensation circuit systems. |
| Title | A 100Mbps, 0.19mW Asynchronous Threshold Detector with DC Power-Free Pulse Discrimination for Impulse UWB Receiver |
| Author | *Lechang Liu, Yoshio Miyamoto, Zhiwei Zhou, Kosuke Sakaida, Jisun Ryu, Koichi Ishida, Makoto Takamiya, Takayasu Sakurai (The University of Tokyo, Japan) |
| Keyword | Ultra-wideband (UWB), UWB receiver, Threshold detector, Pulse discriminator |
| Abstract | An asynchronous threshold detector for DC-960MHz band impulse ultra-wideband (UWB) receiver is proposed in this paper. It features a DC power-free pulse discriminator. The proposed architecture in 90nm CMOS achieves the lowest power consumption of 0.19mW and energy consumption of 1.9pJ/bit at 100Mbps in the UWB receiver. |
| Title | Low-Power CMOS Transceiver Circuits for 60GHz Band Millimeter-wave Impulse Radio |
| Author | *Ahmet Oncu, Minoru Fujishima (The University of Tokyo, Japan) |
| Keyword | Low-power, CMOS, 60GHz, impulse, radio |
| Abstract | In this paper we present an 8Gbps CMOS amplitude-shift-keying (ASK) modulator in the transmitter and a 19.2mW 2Gbps CMOS pulse receiver circuits for high-speed and low-power 60GHz millimeter-wave impulse radio. High-speed ASK modulation is obtained without using DC power by turning on and off of the shunt connected short channel NMOSFET switches. The isolation is maximized using quarter-wavelength on-chip transmission lines. The isolation data-rate product of this work is 3.7 times higher than recently reported millimeter-wave ASK modulators. The proposed 60GHz pulse receiver circuit requires low-power for high-speed data since it detects the envelope of the received pulses using a nonlinear detecting amplifier and only limiting amplifier process the high-speed data. This receiver requires the lowest DC power among recently reported millimeter-wave receivers. |
| Title | An Inductor-less MPPT Design for Light Energy Harvesting Systems |
| Author | Hui Shao, *Chi-Ying Tsui, Wing-Hung Ki (The Hong Kong University of Science and Technology, Hong Kong) |
| Keyword | solar cell, power management, MPPT, energy harvesting |
| Abstract | An inductor-less maximum power point tracker was designed for light energy harvesting systems. We target at systems under different lighting environments and sometimes the solar cell voltage may be low. A charge pump is used to convert the voltage to a higher value. At the same time, the control circuit tunes the charge pump switching frequency to track the system maximum output power point. The design was fabricated and measured to verify the system operation. |
| Title | A 1 GHz CMOS Comparator with Dynamic Offset Control Technique |
| Author | *Xiaolei Zhu (Keio University, Japan), Sanroku Tsukamoto (Fujitsu Laboratories Limited, Japan), Tadahiro Kuroda (Keio University, Japan) |
| Keyword | Offset cancel, Comparator, A/D converter |
| Abstract | Abstract− A dynamic offset control technique that employs charge compensation by timing control is proposed for comparator design in scaled CMOS technology. The analysis has been verified by fabricating a 65 nm CMOS 1.2 V 1 GHz comparator that occupies 25 x 65 m2 and consumes 380 W. Circuits for offset control occupies 21% of the areas and 12% of the power consumption of the whole comparator chip. |
| Title | Circuit Design Using Stripe-Shaped PMELA TFTs on Glass |
| Author | *Keita Ikai, Jinmyoung Kim, Makoto Ikeda, Kunihiro Asada (University of Tokyo, Japan) |
| Keyword | TFT, PMELA, Design environment, Glass |
| Abstract | A design environment for stripe-shaped PMELA TFTs on glass has been developed and successfully tested. Cell library including standard cells, logic synthesis database, Place and Route rule, layout parasitic extraction rule and transistor models are developed. Measurement results show that the digital circuits designed in this environment work correctly. They also show that the simulation environment is accurate enough for simulating digital circuits. |
| Title | Low Energy Level Converter Design for Sub-Vth Logics |
| Author | Hui Shao, *Chi-Ying Tsui (The Hong Kong University of Science and Technology, Hong Kong) |
| Keyword | low energy, sub-Vth logic, level converter |
| Abstract | A low energy consumption level converter (LC) is presented for logic voltage conversion from sub-Vth voltage to nominal high voltage. By employing the multi-stage architecture and implementing a unique circuit inside each stage, the proposed LC can reduce its energy consumption by almost 3 orders and at the same time ensure the robustness of its function. The LC was fabricated and measured to verify its operation and performance improvement. |
| Title | A Time-to-Digital Converter with Small Circuitry |
| Author | Kazuya Shimizu, *Masato Kaneta, HaiJun Lin, Haruo Kobayashi, Nobukazu Takai (Gunma University, Japan), Masao Hotta (Musashi Institute of Technology, Japan) |
| Keyword | Time-to-Digital Converter, Time Domain Analog Circuit, nano CMOS, Digital Assist Analog Technology, Time Measurement |
| Abstract | This paper describes a Time-to-Digital-Converter (TDC) architecture with small CMOS circuitry as well as fine time resolution better linearity compared to a conventional vernier delay line TDC. The TDC measures the interval time between two signals and it is used in an all digital PLL and a time-domain ADC. In the proposed TDC, the number of the delay buffers is half of the conventional TDC, which leads to small chip area and low power. Also the nonlinearity due to delay mismatch among buffers is reduced, which we have demonstrated by MATLAB simulation. We have also designed and laid out its circuitry using TSMC 0.18um CMOS process, and the chip measurements shows its principle functions as expected. |
| Title | A VDD Independent Temperature Sensor Circuit with Scaled CMOS Process |
| Author | *Hiroki Oshiyama, Toshihiro Matsuda, Kei-ichi Suzuki, Hideyuki Iwata (Toyama Prefectural University, Japan), Takashi Ohzone (Dawn Enterprise Co. Ltd., Japan) |
| Keyword | CMOS, temperature sensor, voltage reference |
| Abstract | A supply voltage (VDD) independent temperature sensor circuit by a standard 90 nm CMOS process achieves the predicted errors about -1.0 to +2.0 C (-0.6 to +0 C) for the temperature range of -20 to +100 C (+20 to +80 C) for two-point calibration lines. This temperature sensor has a good tolerance to the change of VDD from 2.5 to 1.5 V, which corresponds to the measurement error of 0.9 C. |
| Title | A Current-mode DC-DC Converter using a Quadratic Slope Compensation Scheme |
| Author | *Chihiro Kawabata, Yasuhiro Sugimoto (Chuo University, Japan) |
| Keyword | DC-DC, converter, quadratic, slope, compensation |
| Abstract | A quadratic slope compensation scheme for a current-mode DC-DC converter to obtain stable frequency characteristics without depending on the input and output voltages is proposed. A 5 MHz and 500 mA operational buck DC-DC converter with input voltages ranging from 3.3 V to 2.5 V and with output voltages ranging from 2.5 V to 0.5 V was designed and fabricated by using a 0.35 um CMOS process to verify the effectiveness of the scheme. Little variation of frequency characteristics at frequencies above 200 KHz for the various input and output voltages was observed. |
| Title | Ultra Low-Power ANSI S1.11 Filter Bank for Digital Hearing Aids |
| Author | *Yu-Ting Kuo, Tay-Jyi Lin, Yueh-Tai Li (National Chiao Tung University, Taiwan), Chou-Kun Lin (ITRI, STC, Taiwan), Chih-Wei Liu (National Chiao Tung University, Taiwan) |
| Keyword | hearing aid, filter bank, low power |
| Abstract | This paper presents an ANSI S1.11-compliant filter bank for digital hearing aids, of which the power consumption is minimized through algorithmic, numerical and architectural optimizations. This filter bank has been implemented and fabricated using the TSMC 0.13m CMOS technology. The transistor-level simulations show that the power dissipation is only 79W for 24KHz & 18-band audio processing. |
| Title | An 11,424 Gate-Count Dynamic Optically Reconfigurable Gate Array with a Photodiode Memory Architecture |
| Author | Daisaku Seto, *Minoru Watanabe (Shizuoka University, Japan) |
| Keyword | ORGAs, FPGAs, optical configuration, multi-context devices |
| Abstract | The worldfs largest 11,424 gate-count dynamic optically reconfigurable gate array VLSI chip, which is based on the use of junction capacitance of photodiodes as configuration memory, has been fabricated. The size and process of the VLSI chip are, respectively, a 96.04 mm2 and a 0.35 m-3 metal CMOS process technology. To clarify the availability of the VLSI, this paper shows an experimental result of |
| Title | A Low-Power FPGA Based on Autonomous Fine-Grain Power-Gating |
| Author | *Shota Ishihara, Masanori Hariyama, Michitaka Kameyama (Tohoku University, Japan) |
| Keyword | FPGA, asynchronous architecture, power-gating, LEDR encoding, bit-serial architecture |
| Abstract | This is the first implementation of an FPGA based on autonomous fine-grain power-gating. To cut the power consumption of clock network and detect the activity of the cell efficiently, asynchronous architecture is full exploited. The proposed FPGA is fabricated in a 90nm CMOS process with dual threshold voltages. It is more efficient in power than the synchronous FPGA at less than 30% utilization. |
| Title | A 52-mW 8.29mm2 19-mode LDPC Decoder Chip for Mobile WiMAX Applications |
| Author | *Xin-Yu Shih, Cheng-Zhou Zhan, Cheng-Hung Lin, An-Yeu (Andy) Wu (National Taiwan University, Taiwan) |
| Keyword | LDPC, Mobile WiMAX, Multi-mode |
| Abstract | This paper presents a LDPC decoder chip supporting all 19 modes in Mobile WiMAX applications. An efficient IC design strategy is proposed to reduce 31.25% decoding latency, and enhance hardware utilization ratio from 50% to 75%. In addition, we propose a new early termination scheme that can dynamically adjust the iteration number. The multi-mode chip implemented in 8.29mm2die area can be maximally measured at 83.3MHz with only 52mW power consumption. |
| Title | A Full-Synthesizable High-Precision Built-In Delay Time Measurement Circuit |
| Author | Ming-Chien Tsai, *Ching-Hwa Cheng (Feng Chia University, Taiwan) |
| Keyword | Built-in Delay Test, delay fault diagnosis, Vernier Delay Line |
| Abstract | Delay testing has become a major issue for manufacturing advanced Systems on a Chip. Automatic Test Equipment and scan techniques are usually applied in delay testing. However, the circuits under test have many circuit paths and dependent input patterns; it is hard to measure delay times accurately, especially when debugging small delay defects. We propose a Built-In Delay Measurement (BIDM) circuit that is modified from Vernier Delay Lines. All digitally designed BIDMs with small area overhead can be easily embedded within testing circuits. BIDMs can be used to record the data propagation delay times within circuit path segments, for delay testing, diagnosis, and calibration requirements internal to the chip. Our BIDM was implemented in a 32bit error correction circuit by a chip using TSMC 0.18u technology. The instruments measured results showing that the BIDM chip correctly reported the CUT segment path delay times. The chip measurement results were a 95.83% match to the postlayout SPICE simulation values. This BIDM makes it possible to debug small delay defects in chips. |
| Title | A Dynamic Quality-Scalable H.264 Video Encoder Chip |
| Author | *Hsiu-Cheng Chang, Yao-Chang Yang, Jia-Wei Cheng (National Chung Cheng University, Taiwan), Ching-Lung Su (National Yunlin University of Science Technology, Taiwan), Cheng-An Chien, Jiun-In Guo, Jinn-Shyan Wang (National Chung Cheng University, Taiwan) |
| Keyword | Quality-Scalable, H.264, Encoder, real-time |
| Abstract | This paper proposes a dynamic quality-scalable H.264 video encoder that comprises 470Kgates and 13.3Kbytes SRAM using 1P8M 0.13um CMOS technology. Exploiting parameterized algorithms for motion estimation and intra prediction, the proposed design can dynamically configure the encoding modes with the design trade-off between power consumption and video quality for various video encoding applications. It achieves real-time H.264 video encoding on CIF, D1, and HD720@30fps with 7mW-25mW, 27mW-162mW, and 122mW-183mW power dissipation in different quality modes. |
| Title | A High Performance LDPC Decoder for IEEE802.11n Standard |
| Author | *Wen Ji, Yuta Abe, Takeshi Ikenaga, Satoshi Goto (Waseda University, Japan) |
| Keyword | LDPC, message passing algorithm, partially-parallel LDPC decoder |
| Abstract | In this paper, we propose a partially-parallel irregular LDPC decoder for IEEE 802.11n standard. The design is based on a novel sum-delta message passing schedule to achieve high throughput and low area cost design. We further improve the design with pipeline structure and parallel computation. The synthesis result in TSMC 0.18 CMOS technology demonstrates that for (648,324) irregular LDPC code, our decoder achieves 7.5X improvement in throughput, which reaches 402 Mbps at the frequency of 200MHz, with 11% area reduction. |
| Title | Design and Chip Implementation of the Ubiquitous Processor HCgorilla |
| Author | *Masa-aki Fukase, Kazunori Noda, Atsuko Yokoyama, Tomoaki Sato (Hirosaki University, Japan) |
| Keyword | Processor, Wave-pipeline, Ubiquitous |
| Abstract | HCgorilla is a hardware cryptography-embedded multimedia mobile processor that follows the parallelism of multicore and multiple pipelines dedicated for ubiquitous computing. Multiple pipelines are composed of media and cipher pipes. Each pipe is partly wave-pipelined to achieve power conscious high performance. Media pipes have user friendly functions due to Java compatibility. Random number addressing by cipher pipes is suited to cryptographic streaming. This paper describes the design and implementation of HCgorilla chips by using CMOS standard cell libraries |
| Title | An 8.69 Mvertices/s 278 Mpixels/s Tile-based 3D Graphics SoC HW/SW Development for Consumer Electronics |
| Author | *Liang-Bi Chen, Ruei-Ting Gu, Wei-Sheng Huang, Chien-Chou Wang, Wen-Chi Shiue, Tsung-Yu Ho, Yun-Nan Chang, Shen-Fu Hsiao, Chung-Nan Lee, Ing-Jer Huang (Department of Computer Science and Engineering, National Sun Yat-Sen University, Taiwan) |
| Keyword | 3D Graphics, SoC, Performance Tuning, Consumer Electronics, Tile-based |
| Abstract | This paper presents an 8.69 Mvertices/s, 278 Mpixels/s, 15.7 mm2 tiled-based 3D graphics SoC HW/SW supporting OpenGL ES 1.0 running at 139 MHz. The SoC also includes embedded circuitry to monitor run time characteristics, detect bus protocol error/inefficiency, and capture bus traces at various abstraction levels with compression ratio up to 98%. |
| Title | A Multi-Task-Oriented Security Processing Architecture with Powerful Extensibility |
| Author | *Dan Cao, Jun Han, Xiaoyang Zeng, Shiting Lu (Fudan University, China) |
| Keyword | security processing, multi-core, SoC |
| Abstract | A multi-task-oriented security processing architecture is presented in this paper. This architecture contains a host microprocessor and multiple security processors (SP). The SP could integrate dedicated Crypto-Engines, which provides functional extensibility. And the performance scalability and multi-task parallelism could be enhanced by increasing the number of SPs on system bus. Its demonstrated that this architecture greatly improves the system efficiency. A test chip is implemented based on SMIC 0.18 um standard CMOS technology, and its functionality is well verified. |
| Title | A Delay-Optimized Universal FPGA Routing Architecture |
| Author | *Fang Wu, Huowen Zhang, Lei Duan, Jinmei Lai, Yuan Wang, Jiarong Tong (Fudan University, China) |
| Keyword | Routing, Delay, GRB |
| Abstract | A universal FPGA routing Architecture is presented, which ensures that every module in the FPGA including CLBs and IOBs have a uniform interconnect architecture, and the load of lines is equally distributed. So, this architecture is highly repeatable and the signal delay is predictable and regular. Furthermore, the realization of the Programmable Interconnect Point (PIP) and the BUFFER driver is also optimized to benefit the signal delay up to 5%.The test results of the example chip show the reasonableness of these ideas. |
| Title | Timing Variation-Aware Task Scheduling and Binding for MPSoC |
| Author | *Haneul Chon, Taewhan Kim (Seoul National University, Republic of Korea) |
| Keyword | Timing variation, task scheduling, binding |
| Abstract | This work addresses the new problem of timing variation-aware task scheduling and binding (TSB) for multiprocessor system-on-chip (MPSoC) architecture in the system-level design, where tasks have full flexibilities of resource (i.e., processor) sharing to meet the design constraints. With the timing variation of processors clock speed, it has been observed that considering the effects of resource sharing on the resulting performance yield computation is critically important for accurate design space exploration and evaluation in the system-level design. Unfortunately previous statistical static timing analysis (SSTA) in the system-level has never considered resource sharing in computing the performance yield, or has overly simplified by employing the gate-level SSTAs. In this work, we overcome those limitations by proposing an effective SSTA technique called TSBSSTA, which schedules and binds tasks to resources in the presence of resource sharing. We also propose a timing variation-aware (TV) framework, called TSB-TV, tightly integrating TSB-SSTA. We have tested the effectiveness of our approach through experimentation with benchmarks, which showed an average of 56.1% improvement in performance yield over conventional methods. |
| Title | Flexible and Abstract Communication and Interconnect Modeling for MPSoC |
| Author | *Katalin Popovici (TIMA Laboratory, France), Ahmed Jerraya (CEA-LETI, Minatec, France) |
| Keyword | communication, exploration, modeling, NoC, H.264 |
| Abstract | Current multiprocessor systems on chip (MPSoC) architectures integrate a massive number of IPs that need to exchange data in complex and diverse synchronization ways. The key challenge when designing MPSoC is that the communication architecture needs to be decided at the beginning of the design, before all the details about mapping the application on the architecture are known. These early decisions cause two difficulties: how to select the best communication architecture and how to estimate the effect of mapping the application onto the communication resources. In this paper, we propose high level communication models that allow early accurate performance estimation of both communication architecture and communication mapping. We applied the proposed modeling methods to analyze the impact on performance in case of two network topologies and several communication mapping schemes for the H.264 Encoder application. |
| Title | Partial Order Method for Timed Simulation of System-Level MPSoC Designs |
| Author | *Eric Cheung, Harry Hsieh (University of California, Riverside, United States), Felice Balarin (Cadence Design Systems, United States) |
| Keyword | Partial Order Simulation, SystemC, MPSoC |
| Abstract | Current discrete event simulator requires heavy simulation overhead to switch between different components to simulate them in strictly chronological order. Therefore, timed simulation is significantly slower than un-timed simulation. By simply adding delays in the components and communication channels, our timed MPEG-2 decoder simulates more than 14 times slower than an un-timed simulation. In this paper, we propose a partial order method to speed up timed simulation by relaxing the order that the components are simulated. With partial order method, a component is not required to schedule a channel access if both behavioral and timing results of the access are known. The simulation switches less frequently hence the simulation overhead reduces. We show that partial order method can be used in complex system-level simulation such asMPSoC implementations of the MPEG-2 decoder. In our experiments, partial order method provides more than 10 times speedups over regular discrete event simulation for timed simulation. |
| Title | A UML-Based Approach for Heterogeneous IP Integration |
| Author | *Zhenxin Sun, Weng-Fai Wong (National University of Singapore, Singapore) |
| Keyword | System level design, UML |
| Abstract | With increasing availability of predefined IP (Intellectual Properties) blocks and inexpensive microprocessors, embedded system designers are faced with more design choices than ever. On the other hand, there is a constant pressure on reducing the time to market. However, as the IP blocks are provided by different vendors, they differ in their interfaces. In order to improve design reuse, methods for combining heterogeneous IP blocks with incompatible protocols and I/Os are needed. In this paper, we propose an interface synthesis method that uses the UML notation to model the interfaces of predefined components and glue logic within the standard OCP-compliant environment. We built a code generator to produce the interface adapters from the UML models. We experimented with our approach using simple-bus and a MPEG-2 decoder as case studies. |
| Title | Statistical Modeling and Analysis of Chip-Level Leakage Power by Spectral Stochastic Method |
| Author | Ruijing Shen, Ning Mi, *Sheldon Tan (University of California at Riverside, United States), Yici Cai, Xianlong Hong (Tsinghua University, China) |
| Keyword | Leakage analysis, orthogonal polynomials, variational analysis |
| Abstract | In this paper, we present a novel statistical full-chip leakage power analysis method. The new method can provide a general framework to derive the full-chip leakage current or power in a closed form in terms of the variational parameters, such as the channel length, the gate oxide thickness, etc. It can accommodate various spatial correlations. The new method employs the orthogonal polynomials to represent the variational gate leakages in a closed form first, which is generated by a fast multi-dimensional Gaussian quadrature method. The total leakage currents then are computed by simply summing up the resulting orthogonal polynomials (their coefficients). Unlike many existing approaches, no grid-based partitioning and approximation are required. Instead, the spatial correlations are naturally handled by orthogonal decompositions. The proposed method is very efficient and it becomes linear when there exist strong spatial correlations. Experimental results show that the proposed method is about 10X faster than the recently proposed method~\cite{Chang:DAC'05} with constant better accuracy. |
| Title | On the Futility of Statistical Power Optimization |
| Author | Jason Cong, Puneet Gupta, *John Lee (University of California, Los Angeles, United States) |
| Keyword | gate sizing, optimization, statistical power |
| Abstract | In response to the increasing variations in integrated-circuit manufacturing, the current trend is to create designs that take these variations into account statistically. In this paper we try to quantify the difference between the statistical and deterministic optima of leakage power while making no assumptions about the delay model. We develop a framework for deriving a theoretical upper-bound on the suboptimality that is incurred by using the deterministic optimum as an approximation for the statistical optimum. On average, the bound is 2.4% for a suite of benchmark circuits in a 45nm technology. We further give an intuitive explanation and show, by using solution rank orders, that the practical suboptimality gap is much lower. There- fore, the need for statistical power modeling for the purpose of optimization is questionable. |
| Title | Timing Driven Power Gating in High-Level Synthesis |
| Author | Shih-Hsu Huang, *Chun-Hua Cheng (Chung Yuan Christian University, Taiwan) |
| Keyword | Clock Skew Scheduling, High-Level Synthesis, Low Power Design, Resource Binding, Standby Leakage Minimization |
| Abstract | The power gating technique is useful in reducing standby leakage current, but it increases the gate delay. For a functional unit, its maximum allowable delay (for a target clock period) limits the smallest standby leakage current its power gating can achieve. In this paper, we point out: in the high-level synthesis of a non-zero clock skew circuit, the resource binding (including functional units and registers) has a large impact on the maximum allowable delays of functional units; as a result, different resource binding solutions have different standby leakage currents. Based on that observation, we present the first work to draw up the timing driven power gating in high-level synthesis. Given a target clock period and design constraints, our goal is to derive the minimum-standby-leakage-current resource binding solution. Benchmark data show: compared with the existing design flow, our approach can greatly reduce the standby leakage current without any overhead. |
| Title | Congestion-Aware Power Grid Optimization for 3D Circuits Using MIM and CMOS Decoupling Capacitors |
| Author | Pingqiang Zhou, Karthikk Sridharan, *Sachin S. Sapatnekar (ECE Dept, University of Minnesota, United States) |
| Keyword | 3D circuit, power grid, MIM decap, leakage power, congestion |
| Abstract | In three-dimensional (3D) chips, the amount of supply current per package pin is significantly more than in two-dimensional (2D) designs. Therefore, the power supply noise problem, already a major issue in 2D, is even more severe in 3D. CMOS decoupling capacitors (decaps) have been used effectively for controlling power grid noise in the past, but with technology scaling, they have grown increasingly leaky. As an alternative, metal-insulator-metal (MIM) decaps, with high capacitance densities and low leakage current densities, have been proposed. In this paper, we explore the tradeoffs between using MIM decaps and traditional CMOS decaps, and propose a congestion-aware 3D power supply network optimization algorithm to optimize this tradeoff. The algorithm applies a sequence-of-linear-programs based method to find the optimum tradeoff between MIM and CMOS decaps. Experimental results show that power grid noise can be more effectively optimized after the introduction of MIM decaps, with lower leakage power and little increase in the routing congestion, as compared to a solution using CMOS decaps only. |
| Title | Incremental and On-demand Random Walk for Iterative Power Distribution Network Analysis |
| Author | *Yiyu Shi, Wei Yao (Electrical Engineering Dept., University of California, Los Angeles, United States), Jinjun Xiong (IBM Thomas J. Watson Research Center, United States), Lei He (Electrical Engineering Dept., University of California, Los Angeles, United States) |
| Keyword | random walk, power grid, simulation, incremental analysis |
| Abstract | Power distribution networks (PDNs) are designed and analyzed iteratively. Randomwalk is among themost efficient methods for PDN analysis. We develop in this paper an incremental and on-demand random walk to reduce iterative analysis time. During each iteration, we map the design changes as positive or negative random walks for observed nodes. To update PDN analysis result, we only need to apply these extra positive or negative walks, instead of doing all walks from scratch. We show that different execution orders for these walks do not affect accuracy but do affect the runtime because of the cancellation between positive and negative walks. Considering this cancellation effect, we optimize the walk order by solving a min-energy electromagnetic particles placement problem and, as a result, further reduce the runtime to about 8 compared to the worst order. Experiments show that, compared to random walk from scratch, our algorithm has similar accuracy but reduces the iterative analysis time by up to 18 for on-chip PDN sizing, and by up to 13 for package ball assignment with substrate routing. In addition, our incremental random walk has a linear time complexity with respect to the number of observed nodes and is more suitable for on-demand analysis, compared to random walk from scratch and its big warm-up cost. |
| Title | SAT-Controlled Redundancy Addition and Removal --- A Novel Circuit Restructuring Technique |
| Author | Chi-An Wu, Ting-Hao Lin, Shao-Lun Huang, Chung-Yang (Ric) Huang (National Taiwan University, Taiwan) |
| Keyword | Redundancy Addition and Removal, SAT, Logic Restructuring |
| Abstract | We proposed a novel Boolean Satisfiability (SAT)-controlled redundancy addition and removal (RAR) algorithm to resolve the performance and quality problems of the previous RAR approaches. With the introduction of modern SAT techniques, such as efficient Boolean constraint propagation (BCP), conflict-driven learning, and flexible decision procedure, our RAR engine can identify 10x more alternative wires/gates while achieving 70% reduction in runtime. |
| Title | On Improved Scheme for Digital Circuit Rewiring and Application on Further Improving FPGA Technology Mapping |
| Author | Fu Shing Chim, *Tak Kei Lam, Yu Liang Wu (The Chinese University of Hong Kong, Hong Kong) |
| Keyword | Rewiring, Graph-based, FPGA, Technology Mapping, VLSI CAD |
| Abstract | The digital circuit rewiring technique has been shown to be one of the most powerful logic transformation methods being able to further improve some already excellent results on many EDA problems. In this work a new hybrid rewiring approach that can enjoy advantages from both ATPG-based and graph-based rewiring is proposed. Our hybrid approach utilizes structural characteristics and ATPG technique to perform quick alternative wires identification inside circuits. Experimental results suggest that our hybrid engine is able to achieve about 50% of alternative wires coverage when compared with ATPG-based rewiring engine with 4% of runtime only. For some problems only requiring a good-enough and very quick solution, this new rewiring technique may serve as a useful alternative. |
| Title | Hybrid LZA: A Near Optimal Implementation of the Leading Zero Anticipator |
| Author | Amit Verma (National Institute of Technology, Rourkela, India), Ajay K. Verma, Philip Brisk, Paolo Ienne (Ecole Polytechnique Federale de Lausanne, Switzerland) |
| Keyword | leading zero anticipator, Error detection, Adder |
| Abstract | The Leading Zero Anticipator (LZA) is one of the main components used in floating point addition. It tends to be on the critical path, so it has attracted the attention of many researchers in the past. Most LZAs used today can be classified in two categories: exact and inexact. Inexact LZAs are normally preferred due to their shorter critical paths and reduced complexity; however, the inexact LZA requires an additional correct stage. In this paper we present a new LZA architecture that combines ideas taken from prior exact and inexact LZAs. Our new LZA improves the delay of floating point addition by 7-10% compared to state of art techniques as well as reduces hardware area in most cases. We also establish theoretical lower bounds on the delay of an LZA and we show that our LZA is very close to these bounds. |
| Title | An Optimized Design of Serial-Parallel Finite Field Multiplier for GF(2m) Based on All-One Polynomials |
| Author | Pramod Kumar Meher (Nanyang Technological University, Singapore), *Yajun Ha (National University of Singapore, Singapore), Chiou-Yng Lee (Lunghwa University of Science and Technology, Taiwan) |
| Keyword | finite field multiplication, VLSI, architecture optimization |
| Abstract | In this paper, we derive a recursive algorithm for finite field multiplication over GF(2^m) based on irreducible all-one-polynomials (AOP), where the modular reduction of degree is achieved by cyclic left-shift without any logic operations. A regular and localized bit level dependence graph (DG) is derived from the proposed algorithm and mapped into an array architecture, where the modular reduction is achieved by a serial-in parallel-out shift-register. The multiplier is optimized further to perform the accumulation of partial products by the T flip flops of the output register without XOR gates. It is interesting to note that the optimized structure consists of an array of (m+1) AND gates between an array of (m+1) D flip flops and an array of (m+1) T flip flops. The proposed structure therefore involves significantly less area and less computation time compared with the corresponding existing structures. |
| Title | (Invited Paper) Programming Multicore SIMD Architectures |
| Author | Reiji Suda (The University of Tokyo, Japan) |
| Abstract | A General introduction to SIMD and multi-threading programming. |
| Title | (Invited Paper) Designing and Optimizing Compute Kernels on Nvidia GPU's |
| Author | *Damir Jamsek (IBM Research Division, United States) |
| Keyword | GPU, NVIDIA |
| Abstract | The availability of high performance compute capability in NVIDIA GPUs has expanded their use in CAD environments. We will describe the basic compute models including host/device programming models, device multi-thread programming models, as well optimization and performance tuning techniques |
| Title | (Invited Paper) Parallelizing Fundamental Algorithms such as Sorting on Multi-core Processors for EDA Acceleration |
| Author | Masato Edahiro (System IP Core Research Laboratories, NEC Corporation/Department of Computer Science, University of Tokyo, Japan) |
| Title | System-Level Cost Analysis and Design Exploration for Three-Dimensional Integrated Circuits (3D ICs) |
| Author | *Xiangyu Dong, Yuan Xie (Pennsylvania State University, United States) |
| Keyword | 3D Integration, Cost Analysis |
| Abstract | Three-dimensional integrated circuit (3D IC) is emerging as an attractive option for overcoming the barriers in interconnect scaling. The majority of the existing 3D IC research is focused on how to take advantage of the performance, power, smaller form-factor, and heterogeneous integration benefits that offered by 3D integration. However, all such advantages ultimately have to translate into cost savings when a design strategy has to be decided: Is 3D integration a cost effective way for a particular IC design? Consequently, system-level cost analysis at the early design stage is imperative to help the decision making on whether 3D integration should be adopted. In this paper, we study the design estimation method for 3D ICs at the early design stage, and propose a cost analysis model to study the cost implication for 3D ICs, and address the following cost-related problems related to 3D IC design: (1) Do all the benefits of 3D IC design come with a much higher cost? (2) How to do 3D integration in a cost-effective way? (3)Are there any design options to compensate the extra 3D bonding cost? A cost-driven 3D IC design flow is also proposed to guide the design space exploration for 3D ICs toward a costeffective direction. |
| Title | Synthesis of Networks on Chips for 3D Systems on Chips |
| Author | *Srinivasan Murali, Ciprian Seiculescu (Ecole Polytechnique Federale de Lausanne, Switzerland), Luca Benini (University of Bologna, Italy), Giovanni De Micheli (Ecole Polytechnique Federale de Lausanne, Switzerland) |
| Keyword | Networks on Chips, 3D, topology, synthesis |
| Abstract | Three-dimensional stacking of silicon layers is emerging as a promising solution to handle the design complexity and heterogeneity of Systems on Chips (SoCs). Networks on Chips (NoCs) are necessary to efficiently handle the 3D interconnect complexity. Designing power efficient NoCs for 3D SoCs that satisfy the application performance requirements, while satisfying the 3D technology constraints is a big challenge. In this work, we address this problem and present a synthesis approach for designing power-performance efficient 3D NoCs. We present methods to determine the best topology, compute paths and perform placement of the NoC components in each 3D layer. We perform experiments on varied, realistic SoC benchmarks to validate the methods and also perform a comparative study of the resulting 3D NoC designs with 3D optimized mesh topologies. The NoCs designed by our synthesis method results in large interconnect power reduction (average of 38%) and latency reduction (average of 25%) when compared to traditional NoC designs. |
| Title | An Application-centered Design Flow for Self Reconfigurable Systems Implementation |
| Author | *Fabio Cancare, Marco Domenico Santambrogio, Donatella Sciuto (Politecnico di Milano, Italy) |
| Keyword | Dynamic Reconfiguration, Reconfigurability, FPGA |
| Abstract | Up to now every proposed methodology for implementing dynamic self reconfigurable systems is architecture-centered. In most cases the system development process is time consuming and requires a very specific technical background. Aim of this work is to provide a fast brain to bit design ow whose goal is to simplify the dynamic reconfigurable system development process by shifting the designer focus from the architecture point of view to the application point of view: designers will not need to possess Dynamic Reconfigurability expertise but just to be skilled with the application domain. |
| Title | System-Level Process Variability Compensation on Memory Organizations. On the Scalability of Multi-Mode Memories |
| Author | *Concepcion Sanz, Manuel Prieto, Jose Ignacio Gomez (Universidad Complutense de Madrid, Spain), Antonis Papanikolaou, Francky Catthoor (Inter-University Microelectronics Center, Belgium) |
| Keyword | Process variation, parametric yield, variability compensation |
| Abstract | Process variation and the dynamism of modern applications can degrade the expected performance of a system. Execution time can be severely affected by both factors, resulting in deadline violations and energy consumption overheads. Memory organizations, which account for a large part of the system-energy and the time budgets, are especially vulnerable to process variation. Configurable multi-mode memories are a promising technology to deal with these problems, but they also introduce new issues that need to be solved. Essentially, adding configuration capabilities to the memories comes with a cost, both in memory area and control complexity; hence, we need to evaluate what is the minimum amount of re-configurability to satisfy systems constraints. In this paper, we analyze the scalability of configurable memories and highlight the relationship among mode allocation, memory mapping and data allocation. |
| Title | Accelerating Statistical Static Timing Analysis Using Graphics Processing Units |
| Author | Kanupriya Gulati, *Sunil P. Khatri (Texas A&M University, United States) |
| Keyword | Graphics Processing Units, Monte Carlo, Statistical Static Timing Analysis |
| Abstract | In this paper, we explore the implementation of Monte Carlo based statistical static timing analysis (SSTA) on a Graphics Processing Unit (GPU). SSTA via Monte Carlo simulations is a computationally expensive, but important step required to achieve design timing closure. It provides an accurate estimate of delay variations and their impact on design yield. The large number of threads that can be computed in parallel on a GPU suggests a natural fit for the problem of Monte Carlo based SSTA to the GPU platform. Our implementation performs multiple delay simulations at a single gate in parallel. A parallel implementation of the Mersenne Twister pseudo-random number generator on the GPU, followed by Box-Muller transformations (also implemented on the GPU) is used for generating gate delay numbers from a normal distribution. The mean and standard deviation of the pin-to-output delay distributions for all inputs and for every gate, are obtained using a memory lookup, which benets from the large memory bandwidth of the GPU. Threads which execute in parallel have no data/control dependencies on each other. All threads compute identical instructions, but on different data, as required by the Single Instruction Multiple Data (SIMD) programming semantics of the GPU. Our approach is implemented on a NVIDIA GeForce GTX 8800 GPU card. Our results indicate that our approach can obtain an average speedup of about 260X as compared to a serial CPU implementation. With the recently announced quad 8800 GPU cards, we estimate that our approach would attain a speedup of over 785X. The correctness of the Monte Carlo based SSTA implemented on a GPU has been verified by comparing its results with a CPU based implementation. |
| Title | Trade-off Analysis between Timing Error Rate and Power Dissipation for Adaptive Speed Control with Timing Error Prediction |
| Author | *Hiroshi Fuketa, Masanori Hashimoto, Yukio Mitsuyama, Takao Onoye (Osaka University, Japan) |
| Keyword | adaptive speed control, timing error prediction, canary FF, low power design, subthreshold circuit |
| Abstract | Timing margin of a chip varies chip by chip due to manufacturing variability, and depends on operating environment and aging. Adaptive speed control with timing error prediction is a promising approach to mitigate the timing margin variation, whereas it inherently has a critical risk of timing error occurrence when a circuit is slowed down. This paper presents how to evaluate the relation between timing error rate and power dissipation in self-adaptive circuits with timing error prediction. The discussion is experimentally validated using a 32-bit ripple carry adder in subthreshold operation in a 90nm CMOS process. We show a trade-off between timing error rate and power dissipation, and reveal the dependency of the trade-off on design parameters. |
| Title | Statistical Analysis of On-Chip Power Grid Networks by Variational Extended Truncated Balanced Realization Method |
| Author | *Duo Li, Sheldon Tan (University of California at Riverside, United States), Gengsheng Chen, Xuan Zeng (Fudan University, China) |
| Keyword | Power grid, TBR, Reduction, Interconnect, Variation |
| Abstract | In this paper, we present a novel statistical analysis approach for large power grid network analysis under process variations. The new algorithm is very efficient and scalable for huge networks with a large number of variational variables. This approach, called varETBR for variational extended truncated balanced realization, is based on model order reduction techniques to reduce the circuit matrices before the variational simulation. It performs the parameterized reduction on the original system using variation-bearing subspaces. varETBR calculates variational response Gramians by Monte-Carlo based numerical integration considering both system and input source variations for generating the projection subspace. varETBR is very scalable for the number of variables and is flexible for different variational distributions and ranges as demonstrated in experimental results. After the reduction, Monte-Carlo based statistical simulation is performed on the reduced system and the statistical responses of the original system are obtained thereafter. Experimental results, on a number of IBM benchmark circuits [15] up to 1.6 million nodes, show that the varETBR can be 4500X faster than the Monte-Carlo method and is much more scalable than one of the recently proposed approaches. |
| Title | Bound-Based Identification of Timing-Violating Paths Under Variability |
| Author | *Lin Xie, Azadeh Davoodi (University of Wisconsin at Madison, United States) |
| Keyword | variability, statistical timing analysis, timing-violating path, violation probability |
| Abstract | We introduce a bound-based technique to identify the top M timing-violating paths in a circuit under variability. These are the paths with the highest violation probability (i.e., C_p) which is the probability that a path (i.e., p) violates the timing constraint. To compute C_p, we require the violation probabilities of the nodes (i.e., C_n) and edges (i.e., C_e) on the path. First, we show computing C_n and C_e of all the nodes and edges requires only two rounds of Statistical Static Timing Analysis and then for each node/edge we need one table lookup for probability calculation using a technique known as Pearson Curve. Given C_n and C_e, our major contribution is in computing upper and lower bounds for C_p of an arbitrary path segment. We show constant-time for incremental update of the bounds when extending a path segment to a longer one. These bounds can be used to exactly construct the top violating paths. If the goal is to find the single most-violating path, we show a bound-based formulation that can prune a large portion of circuit without losing optimality. In our simulations, we verify the correctness and accuracy of our bounds for individual paths. We also verify identification of selected paths using Monte Carlo simulation. We obtain near-optimal accuracy with extremely fast runtimes. |
| Title | Adaptive Techniques for Overcoming Performance Degradation due to Aging in Digital Circuits |
| Author | Sanjay Kumar, Chris Kim, *Sachin Sapatnekar (University of Minnesota, United States) |
| Keyword | Reliability, Adaptive Body Bias, NBTI, Leakage, Delay |
| Abstract | Negative Bias Temperature Instability (NBTI) in PMOS transistors has become a major reliability concern in present-day digital circuit design. Further, with the recent usage of Hf-based high-k dielectrics for gate leakage reduction, Positive Bias Temperature Instability (PBTI), the dual effect in NMOS transistors has also reached significant levels. Consequently, designers are required to build in substantial guardbands into their designs, leading to large area and power overheads, in order to guarantee reliable operation over the lifetime of a chip. We propose a guard-banding technique based on adaptive body bias (ABB) and adaptive supply voltage (ASV), to recover the performance of an aged circuit, and compare its merits over previous approaches. |
| Title | (Invited Paper) Introduction to Hardware-dependent Software Design |
| Author | Rainer Dmer (University of California at Irvine, United States), Andreas Gerstlauer (University of Texas at Austin, United States), Wolfgang Mller (University of Paderborn, Germany) |
| Abstract | Due to the rapidly increasing software content in embedded systems, Hardware-dependent Software (HdS) has become a critical topic in system design. In this talk, we will motivate the need for special attention to HdS in research and development and provide a brief introduction to the issues involved in the design of HdS. |
| Title | (Invited Paper) Using a Dataflow abstracted Virtual Prototype for Hardware-dependent Software Design |
| Author | Wolfgang Ecker, Stefan Heinen, *Michael Velten (Infineon Technologies AG, Germany) |
| Keyword | Abstraction, VP, TLM, HdS |
| Abstract | The complexity of Hardware-dependent Software (HdS) continuously grows stronger than chip complexity since more and more tasks are moved to software. Clearly, the pressure on the development of new methodologies for early validation of HdS increases as well. Existing methods must be continuously improved and new methods must be developed. This is exemplified with a state-of-the-art Transaction Level (TL) model used for firmware development of a productive wireless communication chip. By discussing the strengths and shortcomings of TL modeling we derive a set of requirements for a future modeling paradigm, which led to the new data flow abstraction approach presented in this paper. Experiments showed that we gain up to 10x performance improvement. |
| Title | (Invited Paper) Needs and Trends in Embedded Software Development for Consumer Electronics |
| Author | *Yasutaka Tsunakawa (Sony Corporation, Japan) |
| Keyword | Embedded software, Consumer electronics, Multi-Core, Many-Core |
| Abstract | Like other domains, the flow to Many-Core cannot be avoided in the domain of the consumer electronics either. The Multi-Core has already become the mainstream of the system LSI, and the number of cores in the chip will continue to increase. Because of the advancement of required functions and the pressure to the consumption electricity reduction, the flow to Many-Core will continue without cessation. However, seeing it from a point of view of the embedded software development, there are many unsolved problems lie like a huge cliff between current Multi-Core and Many-Core. The research organizations seem to make their main efforts in technical establishment of Many-Core, and the tool vendors concentrate on a solution offer to the current Multi-Core. Therefore measures of the transition period will come several years later are still insufficient. In this article, I want to discuss about the major problems which block the shift to Many-Core from the current Multi-Core, from the viewpoint of consumer electronics. |
| Title | (Invited Paper) Hardware-dependent Software Synthesis for Many-Core Embedded Systems |
| Author | *Samar Abdi, Gunar Schirner, Ines Viskic, Hansu Cho, Yonghyun Hwang, Lochi Yu, Daniel Gajski (Center for Embedded Computer Systems, University of California, Irvine, United States) |
| Keyword | Embedded Software, Multicore Design, Software Synthesis |
| Abstract | This paper presents synthesis of Hardware Dependent Software (HdS) for multicore and many-core designs using Embedded System Environment (ESE). ESE is a tool set, developed at UC Irvine, for transaction level design of multicore embedded systems. HdS synthesis is a key component of ESE backend design ow. We follow a design process that starts with an application model consisting of C processes communicating via abstract message passing channels. The application model is mapped to a platform net-list of SW and HW cores, buses and buffers. A high speed transaction level model (TLM) is generated to validate abstract communication between processes mapped to different cores. The TLM is further rened into a Pin-Cycle Accurate Model (PCAM) for board implementation. The PCAM includes C code for all the HdS layers including routing, packeting, synchronization and bus transfer. The generated HdS methods provide a library of application level services to the C processes on individual SW cores. Therefore, the application developer does not need to write low level HdS for board implementation. Synthesis results for an multi-core MP3 decoder design, using ESE, show that the HdS is generated in order of seconds, compared to hours of manual coding. The quality of synthesized code is comparable to manually written code in terms of performance and code size. |
| Wednesday, January 21, 2009 |
| Title | (Keynote Address) Automated Synthesis and Verification of Embedded Systems: Wishful Thinking or Reality? |
| Author | Wolfgang Rosenstiel (Wilhelm-Schickard-Institute for Informatics, University of Tuebingen, Germany) |
| Abstract | More complex embedded hardware/software systems have to be developed with shorter design time and reduced cost. One solution for this problem is increasing design automation starting from higher levels of abstraction. Automatic synthesis and verification has been around in research for a quite a while. This talk will show examples for state-of-the art tools for system-level synthesis and verification of embedded systems and demonstrate their possibilities and limitations by some automotive applications. |
| Title | Computation and Data Transfer Co-Scheduling for Interconnection Bus Minimization |
| Author | Cathy qun Xu (University of Texas at Dallas, United States), *Chun Jason Xue, Bessie C Hu (City University of Hong Kong, Hong Kong), Edwin H.M. Sha (University of Texas at Dallas, United States) |
| Keyword | Scheduling, Interconnection network, clustered processors, data path synthesis |
| Abstract | High Instruction-Level-Parallelism in DSP and media applications demands highly clustered architecture. It is challenge to design an efficient, flexible yet cost saving inter-connection network to satisfy the rapid increasing inter-cluster data transfer needs. This paper presents a computation and data transfer co-scheduling technique to minimize the number of partially connected interconnection buses required for a given embedded application while minimizing its schedule length. Previous researches in this area focused on scheduling computations to minimize the number of inter-cluster data transfers. The proposed co-scheduling technique not only schedule computations to reduce the number of inter-cluster data transfers, but also schedule inter-cluster data transfers to minimize the number of required partially connected buses for inter-cluster connection network. Experimental results indicate that 52.3% fewer buses required compared to current best known technique while achieving the same schedule length minimization. |
| Title | Prototyping Pipelined Applications on a Heterogeneous FPGA Multiprocessor Virtual Platform |
| Author | *Antonino Tumeo, Marco Branca, Lorenzo Camerini, Marco Ceriani (Politecnico di Milano, Italy), Matteo Monchiero (HP Labs, United States), Gianluca Palermo, Fabrizio Ferrandi, Donatella Sciuto (Politecnico di Milano, Italy) |
| Keyword | FPGA, Prototyping, Pipelining, Multiprocessor, Multimedia |
| Abstract | Multiprocessors on a chip are the reality of these days. Semiconductor industry has recognized this approach as the most efficient in order to exploit chip resources, but the success of this paradigm heavily relies on the efficiency and widespread diffusion of parallel software. Among the many techniques to express the parallelism of applications, this paper focuses on pipelining, a technique well suited to data-intensive multimedia applications. We introduce a prototyping platform (FPGA-based) and a methodology for these applications. Our platform consists of a mix of standard and custom heterogeneous cores. We discuss several case studies, analyzing the interaction of the architecture and applications and we show that multimedia and telecommunication applications with unbalanced pipeline stages can be easily deployed. Our framework eases the development cycle and enables the developers to focus directly on the problems posed by the programming model in the direction of the implementation of a production system. |
| Title | Variability-Aware Robust Design Space Exploration of Chip Multiprocessor Architectures |
| Author | *Gianluca Palermo, Cristina Silvano, Vittorio Zaccaria (Politecnico di Milano, DEI, Italy) |
| Keyword | Design Space Exploration |
| Abstract | In the context of a design space exploration framework for supporting the platform-based design approach, we address the problem of robustness with respect to manufacturing process variations. First, we introduce response surface modeling techniques to enable an efficient evaluation of the statistical measures of execution time and energy consumption for each system configuration. We then introduce a robust design space exploration frameworkto afford the problem of the impact of manufacturing process variations onto the system-level metrics and consequently onto the application-level constraints. We finally provide a comparison of our design space exploration technique with conventional approaches. |
| Title | Partial Conflict-Relieving Programmable Address Shuffler for Parallel Memories in Multi-Core Processor |
| Author | *Young-Su Kwon, Bon-Tae Koo, Nak-Woong Eum (Electronics and Telecommunications Research Institute, Republic of Korea) |
| Keyword | parallel memory, access conflict, multi-core, memory |
| Abstract | The advancement of process technology enables the integration of multiple cores featuring parallel processing. The requirement of extensive memory bandwidth puts a major performance bottleneck in multi-core architectures for media applications. While the parallel memory system is a viable solution to account for a large amount of memory transactions required by multiple cores, memory access conflicts caused by simultaneous accesses to an identical memory page by two or more cores limit the performance of multi-core architectures. We propose and evaluate the programmable memory address shuffler associated with the novel memory shuffling algorithm integrated in multi-core architectures with parallel memory system. The address shuffler efficiently translates the requested memory addresses into the shuffled addresses such that access conflicts diminish by analyzing the access pattern of the application. We demonstrate that the shuffling of sub-pages is represented by cyclic linked list which enables partial address shuffling with the minimal number of shuffling table entries. The programmable address shuffler reduces the amount of access conflicts by 83% for pitch-shifting audio decompression. |
| Title | HitME: Low Power Hit MEmory Buffer for Embedded Systems |
| Author | Andhi Janapsatya, *Sri Parameswaran, Aleksandar Ignjatovic (University of New South Wales, Australia) |
| Keyword | memory, low power, cache, loop cache |
| Abstract | In this paper, we present a novel HitME (Hit-MEmory) buffer to reduce the energy consumption of memory hierarchy in embedded processors. The HitME buffer is a small direct-mapped cache memory that is added as additional memory into existing cache memory hierarchies. The HitME buffer is loaded only when there is a hit on L1 cache. Otherwise, L1 cache is updated from the memory and the processor's memory request is served directly from the L1 cache. The strategy works due to the fact that 90% of memory accesses are only accessed once, and these often pollute the cache. Energy reduction is achieved by reducing the number of accesses to the L1 cache memory. Experimental results show that the use of HitME buffer will reduce the L1 cache accesses resulting in a reduction in the energy consumption of the memory hierarchy. This decrease in L1 cache accesses reduces the cache system energy consumption by an average of 60.9% when compared to traditional L1 cache memory architecture and an energy reduction of 6.4% when compared to filter cache architecture for 70nm cache technology. |
| Title | Signal Skew Aware Floorplanning and Bumper Signal Assignment Technique for Flip-Chip |
| Author | *Cheng-Yu Wang, Wai-Kei Mak (Department of Computer Science, National Tsing Hua University, Taiwan) |
| Keyword | Flip-chip, floorplanning, Bumper, pad, Assignment |
| Abstract | Flip-chip is a solution for designs requiring more I/O pins and higher speed. However, the higher speed demand also brings the issue of signal skew. In this paper, we propose a new 3-stage design layout methodology for flip-chip considering signal skew. Firstly, we produce an initial bumper signal assignment, and then solve the flip-chip floorplanning problem using a partitioningbased technique to spread the modules across the flip-chip as the distribution of its bumpers. With an anchoring and relocation strategy, we can effectively place I/O buffers at desirable locations. Finally, we further reduce signal skew and monotonic routing density by refining the bumper signal assignment. Experimental results show that signal skew of traditional floorplanners range from 4% to 280% higher than ours. And the total wirelength of other floorplanners is as much as 100% higher than ours. Moreover, our signal refinement method can further decrease monotonic routing density by up to 8% and signal skew by up to 11% |
| Title | A Novel Thermal Optimization Flow Using Incremental Floorplanning for 3D ICs |
| Author | Xin Li, *Yuchun Ma, Xianlong Hong (Tsinghua University, China) |
| Keyword | 3D ICs, incremental floorplanning, thermal |
| Abstract | Thermal issue is a critical challenge in 3D IC design. To eliminate hotspots, physical layouts are always adjusted by shifting or duplicating hot blocks. However, these modifications may degrade the packing area as well as interconnect distribution greatly. In this paper, we propose some novel thermal-aware incremental changes to optimize these multiple objectives including thermal issue in 3D ICs. Furthermore, to avoid random incremental modification, which may be inefficient and need long runtime to converge, here potential gain is modeled for each candidate incremental change. Based on the potential gain, a novel thermal optimization flow to intelligently choose the best incremental operation is presented. We distinguish the thermal-aware incremental changes in three different categories: migrating computation, growing unit and moving hotspot. Mixed integer linear programming (MILP) models are devised according to these different incremental changes. Experimental results show that migrating computation, growing unit and moving hotspot can reduce max on-chip temperature by 7%, 13% and 15% respectively on MCNC/GSRC benchmarks. Still, experimental results also show that the thermal optimization flow can reduce max on-chip temperature by 14% compared to an existing 3D floorplan tool CBA, and achieve better area and total wirelength improvement than individual operations do. |
| Title | Analog Placement with Common Centroid and 1-D Symmetry Constraints |
| Author | *Linfu Xiao, Evangeline Young (The Chinese University of Hong Kong, Hong Kong) |
| Keyword | analog placement, common centroid, symmetry |
| Abstract | In this paper, we will present a placement method for analog ircuits. We consider both common centroid and 1-D symmetry constraints, which are the two most common types of placement requirements in analog designs. The approach is based on a symmetric feasible condition on the sequence pair representation that can cover completely the set of all placements satisfying the common centroid and 1-D symmetry constraints. This condition is essential for a good searching process to solve the problem effectively. Symmetric placement is an important step to achieve matchings of other electrical properties like delay and temperature variation. We have compared our results with those presented in the most updated previous works. Significant improvements can be obtained by our approach in both common centroid and 1-D symmetry placements, and we are the first who can handle both constraints simultaneously. |
| Title | A Multilevel Analytical Placement for 3D ICs |
| Author | Jason Cong, *Guojie Luo (University of California, Los Angeles, United States) |
| Keyword | 3D IC, analytical placement, through-silicon via |
| Abstract | Abstract - In this paper we propose a multilevel non-linear programming based 3D placement approach that minimizes a weighted sum of total wirelength and TS via number subject to area density constraints. This approach relaxes the discrete layer assignments so that they are continuous in the z-direction and the problem can be solved by an analytical global placer. A key idea is to do the overlap removal and device layer assignment simultaneously by adding a density penalty function for both area & TS via density constraints. Experimental results show that this analytical placer in a multilevel framework is effective to achieve trade-offs between wirelength and TS via number. Compared to the recently published transformation-based 3D placement method [1], we are able to achieve on average 12% shorter wirelength and 29% fewer TS via compared to their cases with best wirelength; we are also able to achieve on average 20% shorter wirelength and 50% fewer TS via number compared to their cases with best TS via numbers. |
| Title | Exploring Adjacency in Floorplanning |
| Author | Jia Wang, *Hai Zhou (Northwestern University, United States) |
| Keyword | floorplanning, adjacency graph |
| Abstract | This paper describes a new floorplanning approach called Constrained Adjacency Graph (CAG) that helps exploring adjacency in floorplans. CAG extends the previous adjacency graph approaches by adding explicit adjacency constraints to the graph edges. After sufficient and necessary conditions of CAG are developed based on dissected floorplans, CAG is extended to handle general floorplans in order to improve area without changing the adjacency relations dramatically. These characteristics are currently utilized in a randomized greedy improvement heuristic for wire length optimization. The results show that better floorplans are found with much less running time for problems with 100 to 300 modules in comparison to a simulated annealing floorplanner based on sequence pairs. |