Designing an FIR (Finite Impulse Response) extraction filter (a decimating filter) for a Xilinx XC2V1000 FPGA is a classic digital signal processing task. The XC2V1000 is part of the Virtex-II family, which, while older, has ample resources for such designs.
Here is a comprehensive guide to the design process, from specification to implementation on the XC2V1000.
1. System Overview & Key Concepts
FIR Filter: A digital filter with a finite impulse response. It works by multiplying a window of recent input samples by a set of coefficients and summing the products. It is inherently stable and can have linear phase.
Extraction (Decimation): The process of reducing the sampling rate. It consists of low-pass filtering (to prevent aliasing) followed by down-sampling (throwing away samples).
XC2V1000 Resources:
- Logic Cells: ~10,000 (5,120 Slices)
- Block RAM (BRAM): 40 (each 18 Kbit)
- Dedicated Multipliers: 40 (18x18 bits)
- Your design must efficiently utilize these resources, especially the multipliers and BRAM.
2. Design Flow
The design process follows these steps:
Specification → Filter Design → Architecture Choice → Implementation → Simulation & Testing
3. Step 1: Filter Specification
Define the requirements for your extraction filter:
- Input Sampling Rate (Fs_in): e.g., 50 MHz
- Output Sampling Rate (Fs_out): e.g., 10 MHz
- Decimation Factor (M): M = Fs_in / Fs_out = 5
- Passband Frequency (Fpass): e.g., 0 - 4 MHz (must be < Fs_out/2 = 5 MHz)
- Stopband Frequency (Fstop): e.g., 4.5 MHz
- Passband Ripple: e.g., < 0.1 dB
- Stopband Attenuation: e.g., > 60 dB
- Data Bit Width: e.g., 16-bit input samples
- Coefficient Bit Width: e.g., 18 bits (to use the full multiplier width effectively)
4. Step 2: Filter Coefficient Generation
Use a tool like MATLAB, Python (SciPy), or a dedicated filter design tool to generate the optimal FIR coefficients.
MATLAB Example:
matlab
M = 5; % Decimation Factor
Fs_in = 50e6;
Fpass = 4e6;
Fstop = 4.5e6;
Ap = 0.1; % Passband Ripple (dB)
Ast = 60; % Stopband Attenuation (dB)
% Design the filter
h = firceqrip(round(50/M), Fstop/(Fs_in/2), [Ap Ast], 'passedge');
h = h / max(h); % Normalize coefficients
% Quantize coefficients to 18 bits
h_quant = round(h * (2^17 - 1));
% Save coefficients to a .coe file for Xilinx CoreGen
fid = fopen('fir_filter.coe', 'w');
fprintf(fid, 'radix=10;\n');
fprintf(fid, 'coefdata=\n');
fprintf(fid, '%d,\n', h_quant(1:end-1));
fprintf(fid, '%d;', h_quant(end));
fclose(fid);
This designs a filter that meets the specs and prepares the coefficients for the next step.
5. Step 3: FPGA Architecture Choice
For a decimating FIR filter, the polyphase decomposition architecture is the most efficient. It is a fundamental technique for implementing decimation filters.
- Why Polyphase? It allows you to run the filter logic at the lower output rate (Fs_out = 10 MHz) instead of the high input rate (Fs_in = 50 MHz). This significantly reduces power consumption and timing constraints.
- How it works: The single large FIR filter is broken down into M (5) smaller sub-filters (polyphase branches). A commutator switch feeds each new input sample to a different branch in a round-robin fashion. The outputs of all branches are summed only once every M cycles to produce a single output.
For the XC2V1000, the ideal implementation uses:
- Dedicated Multipliers: For the multiply-accumulate (MAC) operations.
- Block RAM (BRAM): To store the delay lines (previous input samples) for each polyphase branch. This saves hundreds of flip-flops.
6. Step 4: Implementation using Xilinx Tools (ISE)
Since the Virtex-II family is supported by Xilinx ISE (not Vivado), the workflow is as follows:
Option A: Using CoreGen (Highly Recommended)
Create a New Project in ISE: Target the xc2v1000 device.
Launch CoreGen:
Select FIR Compiler from the DSP->Filters menu.
- Configure the Core:
- Component Name: fir_decimator
- Filter Type: Decimation
- Decimation Rate: 5
- Coefficient File: Load the fir_filter.coe file you generated.
- Data & Coefficient Widths: Set to 16 and 18 bits, respectively.
- Hardware Oversampling: Set to 1 (since polyphase handles the rate change).
- Memory Type: Choose Block RAM. This instructs the core to use the XC2V1000's BRAMs for the delay line storage.
- Multiplier Type: Choose Dedicated Multipliers. This is crucial for performance and logic efficiency.
- Generate the Core: CoreGen will create a netlist (fir_decimator.ngc) and a simulation model.
Option B: Custom VHDL/Verilog Polyphase Implementation
If you need more control, you can write the code yourself. The structure would include:
- A Delay Line (in BRAM): A circular buffer to store incoming samples.
- A Coefficient ROM (in BRAM): To store the precomputed polyphase coefficients.
- A Control FSM: To manage the polyphase branch selection, reading from RAMs, and the MAC operation.
- A Multiplier-Accumulator (MAC): Instantiate the dedicated multiplier primitives (MULT18X18) for efficiency.
Simplified VHDL Snippet (Conceptual):
vhdl
-- Instantiating a dedicated multiplier
multiplier : process(clk)
begin
if rising_edge(clk) then
product <= sample_from_ram * coefficient_from_rom;
end if;
end process;
accumulator : process(clk)
begin
if rising_edge(clk) and phase_index = 0 then
if reset = '1' then
output_acc <= (others => '0');
else
output_acc <= output_acc + product; -- MAC operation
end if;
end if;
end process;
-- Output the result and reset accumulator at the end of each polyphase cycle
data_out <= output_acc when phase_index = M-1 else (others => '0');
7. Step 5: Integration, Simulation, and Testing
- Instantiate the Core: In your top-level VHDL/Verilog design, instantiate the CoreGen module.
vhdl
your_fir_filter : entity work.fir_decimator
port map (
clk => clk_50MHz, -- Input clock
nd => data_valid_in, -- New data input strobe
rfd => ready_for_data,-- Core is ready
rdy => data_valid_out,-- Output is valid
din => filter_input, -- 16-bit input
dout => filter_output -- Scaled output
);
- Constrain the Design: Create a .ucf file to define the clock period and I/O pins.
ucf
NET "clk_50MHz" TNM_NET = "clk_50MHz";
TIMESPEC "TS_clk_50MHz" = PERIOD "clk_50MHz" 20 ns HIGH 50%; # 50MHz clock
NET "filter_input<0>" LOC = "A10"; # example pin assignment
Simulate: Use ISim or ModelSim to simulate the filter with test vectors (e.g., a sine wave sweep) to verify its frequency response and decimation functionality.
Synthesize, Map, & Place+Route: Run the full implementation flow in ISE.
Analyze the Report:
- Check the Timing Report to ensure the design meets the 50 MHz clock constraint.
- Check the Map Report to verify the resource usage: it should show ~5-10 multipliers and 1-2 BRAMs being used, which is well within the XC2V1000's limits.
8. Resource Estimation for XC2V1000
For a 16-bit input, 18-bit coefficients, 50-tap decimate-by-5 filter implemented with CoreGen:
- Multipliers: ~5-10 (of 40 available). The polyphase structure reduces the number of multipliers working at any one time.
- Block RAMs: 1-2 (of 40 available). Used for the delay line and coefficients.
- Slices: ~200-500 (of 5120 available). Used for control logic and routing.
This design would fit very comfortably in an XC2V1000, leaving ample resources for other parts of your system. The use of dedicated resources (Multipliers, BRAM) is the key to an efficient implementation.
Top comments (0)