Download A Decade of Hardware/Software Codesign - Lecture Slides | CS 269 and more Study notes Computer Science in PDF only on Docsity!
A Decade of
Hardware/Software Codesign
IEEE Computer April 2003
By: Wayne Wolf
HW/SW Codesign Began
Early 1990s
- Described IC design problems
- Microprocessors: board-level systems
Non IC designers
- Integrated microprocessors with hw
components on a board
16-bit, 32-bit microprocessors already in
board-level designs
Eventually, chips large enough to have CPU
and other subsystems
Two new problems
- System design methodologies need to handle large predesigned CPUs
- Software needs to be treated as first-class component in chip design
Root of HW/SW Codesign
Researchers developed basic approaches to
design of embedded sw on CPUs
Hw/sw codesign increase predictability of
embedded system design
- Analysis methods that rapidly evaluate design methodologies - Performance - Power - Size goals - Synthesis methods
Hw/sw codesign became major research
theme
First Steps
Foundational work
- SOS system, USC
- More research presented in CODES and CASHE
Hw/sw partitioning emerged as important first
step in creating models and algms
- Vulcan, Stanford
- Cosyma, Technical Univ. of Braunschweig
HW/SW Partitioning
Maps design onto architecture
System
- Single CPU
- One or more application-specific ICs
Early Designs
ASIC was accelerator rather than
coprocessor
- CPU’s execution unit didn’t dispatch ASIC
Assumed CPU and ASIC were to be on
separate chips
CPU and ASIC communicated by memory or
registers
Allocation
- Intensive tasks to ASIC
- Work less suited for hw implementation to CPU
Vulcan and Cosyma
C-like program
Vulcan
- All functionality to hw, operations to CPU
- Reduce cost
Cosyma
- Started with all operations on CPU
- Moved to ASIC for performance
Performance Analysis
Three dimensions of analysis
- Hardware, software, system
Hardware easiest to analyze
Software more difficult
System most complex
HW Performance Analysis
Goal: determine max clock frequency
Had to be quick
Solution: use high-level synthesis techniques
- Estimate longest path through the logic
SW Performance Analysis
Problem: worst-case execution time
Codesign community not aware of work
related to this aspect of performance
- Path-enumeration algorithm
Cosyma
- Ran test cases on target processor
Vulcan
- Analyzed program’s control dataflow
System Performance Analysis
CPU-ASIC system both multiprogramming
and multiprocessing system
- Multiple processes can interleave on CPU
- Multiple processes can run simultaneously
Cosyma and Vulcan
- Used simplified computational models
- Assumed implementation single-threaded
- Could determine total system execution time
Moving into the Mainstream
Hw/sw partitioning now practical design task
FPGAs emerged
- Combine programmable logic fabric with one or more CPUs
- CPUs implemented on chips separately from programmable logic
- Perfect for cosynthesis
- Internal architecture exactly what hw/sw partitioning targets
New Problems
Identifying application that maps well onto
FPGA
Communication between CPU and ASIC in
CPU/ASIC architectures
- Delay can nullify performance gains in ASIC
- Physical, synchronization, etc
Need application that can move operations to
ASIC with small communication cost and
easily overlap useful work on CPU
Interface and Language
Must create interfaces for FPGA fabric and
CPU
- CPU: driver interfaces
- FPGA: system bus interface
Debate on best language for input to hw/sw
partitioning algorithms
- C versus Verilog
- Describe system in two languages
- When operations moved across partition, only small part of specification translated
System on Chip
No fixed architecture
- Variety of algorithms for analyzing and synthesizing general architectures important to SoC cosynthesis
Design is IP-oriented
- CPUs, predesigned special-purpose logic, FPGA fabric components
Cosynthesis: determining how best to use
large IP blocks without writing descriptions
directly in terms of these blocks
Design space large and irregular
SoC Design and Platform-based
Design
Platform = predesigned architecture used to
build systems for given range of applications
Platform also any architecture built from
CPUs, custom logic, interconnection hw
“Antithesis of codesign”
Codesign an ideal way to explore design
space and create suitable platform
architecture
Open Problems
Current embedded systems far more
sophisticated than 10 years ago, but
problems remain
- Computational models for jointly describing hw and sw systems
- System-level performance analysis
- Evaluation of algorithms for design-space exploration - Applying genetic algorithms and advanced methods of codesign
- Memory systems
Memory Systems
Profoundly influences system’s performance
and energy consumption
Cache models important in understanding
memory systems
- Great model predicts how changes to hw or sw will influence system performance and power
Cache synthesis helps choose cache
configuration for particular application
Alternatives to traditional caches
- Scratch-pad memory
- Sw-managed small memory, fast access to data, no fixed cache-management policy
New Modeling Languages
System-level design languages
Must consider computational models, system design methodologies, simulation, language acceptance, etc
New classes of Architectures
VLIW processors
- Signal processing and networking
applications
New methods for performance analysis
and code generation with VLIW-based architectures
FPGAs need more study as a medium
for implementing embedded systems
Networks on Chips
Emerging problem: how to evaluate effect of NoCs on codesign
Provides more structured system easier to analyze
Complex systems not trivial to analyze for performance or power
Adding NoC to architecture makes it harder to analyze
Internet
Increasing connection of embedded
systems and the internet
- New workloads and mixtures of deadlines
Synthesis techniques for internet-
enabled machines necessary
Demand for self-organizing systems
adaptable to environment changes, device failures, etc increases
VLSI
VLSI systems more complex
- Codesign will expand to include SoC
systems
Require physically distributed computation to deal with fast response rates
- Automobiles
- All chips together must ensure joint satisfaction of application’s performance requirements