Energy Efficient Packet Classification Hardware Accelerator

Alan Kennedy, Xiaojun Wang

HDL Lab, School of Electronic Engineering,

Dublin City University, Dublin 9, Ireland

alan.kennedy@eeng.dcu.ie

Bin Liu

Department of Computer Science and Technology

Tsinghua University, Beijing P.R.China

liub@tsinghua.edu.cn

Abstract

Packet classification is an important function in a

router’s line-card. Although many excellent solutions have

been proposed in the past, implementing high speed packet

classification reaching up to OC-192 and even OC-768 with

reduced cost and low power consumption remains a

challenge. In this paper, the HiCut and HyperCut

algorithms are modified making them more energy efficient

and better suited for hardware acceleration. The hardware

accelerator has been tested on large rulesets containing up

to 25,000 rules, classifying up to 77 Million packets per

second (Mpps) on a Virtex5SX95T FPGA and 226 Mpps

using 65nm ASIC technology. Simulation results show that

our hardware accelerator consumes up to 7,773 times less

energy compared with the unmodified algorithms running

on a StrongARM SA-1100 processor when classifying

packets. Simulation results also indicate ASIC

implementation of our hardware accelerator can reach OC-

768 throughput with less power consumption than TCAM

solutions.

1. Introduction

Packet Classification is increasingly being used by

networking devices such as routers, switches and firewalls

to implement policies like the blocking of unwanted internet

traffic. It is also used for services such as giving priority to

Voice over IP or IP-TV packets and the billing of traffic

based on network usage. As line rate goes up to OC-192 and

moves towards OC-768, which corresponds to 31.25 Million

packets per second (Mpps) and 125 Mpps in the worst case

when minimum sized packets (40 bytes each) arrive back to

back, it poses great pressure to the classifier in a router’s

line-card. Previous studies emphasize how to increase the

throughput while reducing the implementation cost, but

seldom address the power consumption. In fact, the power

control for the classifier is equally important while

designing the line-card due to its tight space budget and the

power supply.

Due to their large integration scale and high speed,

network processors deployed in typical network equipment

can consume more power than any other components in the

equipment (e.g, the Intel IXP2800 has a peak power

consumption of 30W). As a key attached component to the

network processor, the classifier is definitely required to be

designed power efficient. Analysis in [1] demonstrated that

up to 50% of ISP maintenance costs are power related,

including the electricity consumed by the routers and the

corresponding cooling systems and so on. Research by

Gupta and Singh [2] showed that in 2000 the amount of

energy used by various networking devices in the U.S.

accumulated to nearly the yearly output of a nuclear reactor

unit. So when we design a classifier, a multi-dimensional

metric should be considered including the power

consumption, besides throughput and cost.

Software approaches, for example the Packet

Classification algorithms in [3-11], have the advantage of

reduced cost but fail to operate at a very high speed due to

their low throughput and nondeterministic amount of clock

cycles when executing a packet lookup. Our recent research

results from [12] show that when Packet Classification

algorithms are implemented on devices such as the

StrongARM SA-1100 running at 200 Mhz, the maximum

achievable throughput from even the best performing

algorithms is only around 0.5 Mpps. For this reason

hardware methods for implementing Packet Classification

are essential to prevent it from becoming a bottleneck.

The popular hardware implementation at present is to

employ Ternary Content Addressable Memory (TCAM) due

to the fact that it can match the rules in an O(1) clock cycle.

This is achieved by carrying out parallel comparisons on all

the stored rules in one clock cycle plus the use of pipelining.

State-of-the-art technology such as the Cypress Ayama

10000 Network Search Engine [13] can perform 133 million

144-bit search key per second. This high lookup rate

however comes at a large cost of consuming between 4.86-

19.14 watts depending on the TCAM size. Besides the high

power consumption, another drawback for TCAM is its poor

storage efficiency of rulesets when using rules containing

ranges. Research on real world databases in [14] showed

that TCAM storage efficiency ranged between 16-53%, with

an average of 34%. TCAMs also take up large amounts of

die area with one bit requiring 10-12 transistors compared to

SRAM which only requires 4-6 transistors per bit. The

complexity of TCAMs also determines they can’t run at the

high clocking speed obtainable by SRAM. A search engine

implemented using this approach will require multiple chips

including a host ASIC, TCAMs and the corresponding

SRAMs.

Energy Efficient Packet Classification Hardware Accelerator | CS 533, Papers of Computer Science

Related documents

Partial preview of the text

Download Energy Efficient Packet Classification Hardware Accelerator | CS 533 and more Papers Computer Science in PDF only on Docsity!

978-1-4244-1694-3/08/$25.00 ©2008 IEEE