<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ai pics</title>
    <description>The latest articles on DEV Community by ai pics (@ai_pics_6442ad429fc2ff12f).</description>
    <link>https://dev.to/ai_pics_6442ad429fc2ff12f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3521344%2F3ec6a7fd-66f1-450a-8154-5aa0add29189.png</url>
      <title>DEV Community: ai pics</title>
      <link>https://dev.to/ai_pics_6442ad429fc2ff12f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ai_pics_6442ad429fc2ff12f"/>
    <language>en</language>
    <item>
      <title>Drive 16 (or More) LEDs with Two 74HC595 Shift Registers Using Only 3 Arduino Pins</title>
      <dc:creator>ai pics</dc:creator>
      <pubDate>Fri, 10 Oct 2025 02:26:27 +0000</pubDate>
      <link>https://dev.to/ai_pics_6442ad429fc2ff12f/drive-16-or-more-leds-with-two-74hc595-shift-registers-using-only-3-arduino-pins-5af8</link>
      <guid>https://dev.to/ai_pics_6442ad429fc2ff12f/drive-16-or-more-leds-with-two-74hc595-shift-registers-using-only-3-arduino-pins-5af8</guid>
      <description>&lt;p&gt;Why this project?&lt;/p&gt;

&lt;p&gt;Arduino boards run out of GPIO pins fast when you start doing LED patterns or building a control panel. The 74HC595 serial-in/parallel-out (SIPO) shift register lets you trade a few pins (data, clock, latch) for many outputs. Each chip adds 8 outputs, and by chaining &lt;a href="https://mozelectronics.com/semiconductor-ics/" rel="noopener noreferrer"&gt;IC chips&lt;/a&gt; you can scale well beyond 16 channels—limited mainly by signal integrity, update speed, and power.&lt;/p&gt;

&lt;p&gt;This guide shows you how to:&lt;br&gt;
Wire two 74HC595s to control 16 LEDs using only three Arduino pins&lt;br&gt;
Add more chips with no code rewrite (change one number)&lt;br&gt;
Use PWM on OE for global brightness&lt;br&gt;
Initialize cleanly to avoid “mystery LEDs” turning on at power-up&lt;br&gt;
Organize code so each LED is addressed by a single channel index (0..N-1)&lt;br&gt;
Bill of Materials&lt;/p&gt;

&lt;p&gt;Required&lt;/p&gt;

&lt;p&gt;1 × Arduino Uno (or any 5 V-logic compatible board)&lt;/p&gt;

&lt;p&gt;2 × &lt;a href="https://mozelectronics.com/tutorials/74hc595-shift-register-pinout-datasheet-arduino/" rel="noopener noreferrer"&gt;74HC595 shift registers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;16 × LEDs&lt;/p&gt;

&lt;p&gt;16 × 220 Ω resistors (one per LED; 180–330 Ω is typical—see Current section)&lt;/p&gt;

&lt;p&gt;Breadboard(s) and jumper wires&lt;/p&gt;

&lt;p&gt;Optional but recommended&lt;/p&gt;

&lt;p&gt;2 × 0.1 µF &lt;a href="https://mozelectronics.com/passive-components/capacitors-and-capacitor-kits/ceramic-capacitors/" rel="noopener noreferrer"&gt;ceramic capacitors&lt;/a&gt; (one per 74HC595 between VCC and GND for decoupling)&lt;/p&gt;

&lt;p&gt;1 × External 5 V supply if you plan to light many LEDs at once&lt;/p&gt;

&lt;p&gt;1 × Potentiometer (if you insist on analog dimming in series—not recommended)&lt;/p&gt;

&lt;p&gt;1 × PWM-capable Arduino pin wired to OE for global brightness (recommended)&lt;/p&gt;

&lt;p&gt;How the 74HC595 Works (Quickly)&lt;/p&gt;

&lt;p&gt;DS (SER, pin 14): Data in (one bit per clock)&lt;/p&gt;

&lt;p&gt;SH_CP (SRCLK, pin 11): Shift clock; on rising edge the bit on DS enters the shift register&lt;/p&gt;

&lt;p&gt;ST_CP (RCLK, pin 12): Latch clock; rising edge copies the internal shift register to outputs Q0..Q7&lt;/p&gt;

&lt;p&gt;Q7′ (pin 9): Serial data out; chain this to the next chip’s DS&lt;/p&gt;

&lt;p&gt;OE (pin 13): Output enable, active low (LOW = outputs active). Tie to GND, or to a PWM pin for dimming&lt;/p&gt;

&lt;p&gt;MR (pin 10): Master reset, active low. Keep HIGH for normal use. Pulse LOW to clear all bits&lt;/p&gt;

&lt;p&gt;Data is clocked into the first chip; bits ripple toward the last chip. On latch, all outputs update together—zero flicker if you write quickly.&lt;/p&gt;

&lt;p&gt;Wiring (Two Chips → 16 LEDs)&lt;br&gt;
Common Power/Control&lt;/p&gt;

&lt;p&gt;VCC (pin 16) → 5 V&lt;/p&gt;

&lt;p&gt;GND (pin 8) → GND&lt;/p&gt;

&lt;p&gt;0.1 µF between VCC and GND on each chip (close to the IC)&lt;/p&gt;

&lt;p&gt;Control Lines (Arduino → both chips)&lt;/p&gt;

&lt;p&gt;SH_CP (pin 11) → Arduino D12 (clock)&lt;/p&gt;

&lt;p&gt;ST_CP (pin 12) → Arduino D8 (latch)&lt;/p&gt;

&lt;p&gt;OE (pin 13) → GND (or a PWM pin like D3 for global brightness)&lt;/p&gt;

&lt;p&gt;MR (pin 10) → 5 V (or a digital pin if you want software reset)&lt;/p&gt;

&lt;p&gt;Serial Data &amp;amp; Daisy-Chain&lt;/p&gt;

&lt;p&gt;Chip 1 DS (pin 14) → Arduino D11 (data)&lt;/p&gt;

&lt;p&gt;Chip 1 Q7′ (pin 9) → Chip 2 DS (pin 14)&lt;/p&gt;

&lt;p&gt;If you add a third chip: Chip 2 Q7′ → Chip 3 DS, and so on&lt;/p&gt;

&lt;p&gt;LEDs&lt;/p&gt;

&lt;p&gt;For each chip:&lt;/p&gt;

&lt;p&gt;Q0..Q7 (pins 15, 1, 2, 3, 4, 5, 6, 7) → 220 Ω resistor → LED → GND&lt;br&gt;
(You can reverse LED polarity if you prefer sourcing vs sinking current; the code is the same—just invert logic if needed.)&lt;/p&gt;

&lt;p&gt;Channel Numbering (Simple Mental Model)&lt;/p&gt;

&lt;p&gt;The chip closest to Arduino is register 0: it controls channels 0..7&lt;/p&gt;

&lt;p&gt;The next chip is register 1: channels 8..15&lt;/p&gt;

&lt;p&gt;With N chips, channels go 0..(8N−1)&lt;br&gt;
The code below lets you call regWrite(channel, state) without caring which register or bit that is.&lt;/p&gt;

&lt;p&gt;Power &amp;amp; Current: What You Can Safely Drive&lt;/p&gt;

&lt;p&gt;Typical LED current with 220 Ω at 5 V is ~5–10 mA depending on LED color (VF).&lt;br&gt;
Example (red LED): (5 V − 2.0 V) / 220 Ω ≈ 13.6 mA (often too bright; many LEDs are fine at 5 mA)&lt;/p&gt;

&lt;p&gt;The 74HC595 can’t source/sink large current on all pins simultaneously. Keep per-pin current ≤ 8 mA and total per chip ≤ ~50 mA (check your datasheet).&lt;/p&gt;

&lt;p&gt;If you need to drive lots of LEDs at once at higher current, use transistor arrays (e.g., ULN2803) or MOSFETs, or multiplex with 74HC595 + 74HC138 etc.&lt;/p&gt;

&lt;p&gt;For many simultaneously-on LEDs, use a separate 5 V supply for the LED side and common GND with Arduino.&lt;/p&gt;

&lt;p&gt;Avoid “Random LEDs on at Power-Up”&lt;/p&gt;

&lt;p&gt;At startup, the internal register has undefined contents.&lt;/p&gt;

&lt;p&gt;Software fix: Immediately write zeros and latch in setup()&lt;/p&gt;

&lt;p&gt;Hardware fix: Tie MR to an Arduino pin; briefly drive LOW→HIGH after boot&lt;/p&gt;

&lt;p&gt;No-glare boot: Hold OE HIGH (outputs off), preload zeros, then set OE LOW&lt;/p&gt;

&lt;p&gt;Why PWM on OE Beats a Series Pot&lt;/p&gt;

&lt;p&gt;Putting a pot in series with all LEDs changes current and can cause color mismatch and uneven brightness. Driving OE from a PWM pin keeps per-LED resistors fixed and dims everything uniformly via duty cycle. (Remember: OE is active LOW. More PWM duty = darker unless you invert it in code.)&lt;/p&gt;

&lt;p&gt;Step-by-Step Assembly&lt;/p&gt;

&lt;p&gt;Place the two &lt;a href="https://mozelectronics.com/parts/texas-instruments-sn74hc595n-3104/" rel="noopener noreferrer"&gt;SN74HC595N&lt;/a&gt; on the breadboard with power rails connected; add 0.1 µF caps per chip.&lt;/p&gt;

&lt;p&gt;Wire Arduino D11→DS(14), D12→SH_CP(11), D8→ST_CP(12). Tie OE(13)→GND and MR(10)→5 V (or to Arduino pins as described).&lt;/p&gt;

&lt;p&gt;Connect Chip 1 Q7′(9)→Chip 2 DS(14).&lt;/p&gt;

&lt;p&gt;Add resistors from each Q pin to its LED, and LEDs back to GND.&lt;/p&gt;

&lt;p&gt;Double-check power and grounds; verify no shorts.&lt;/p&gt;

&lt;p&gt;Upload the sketch and test.&lt;/p&gt;

&lt;p&gt;The Code (drop-in, scalable)&lt;/p&gt;

&lt;p&gt;Change NUM_REGS to match how many 74HC595s you chained.&lt;/p&gt;

&lt;p&gt;Use regWrite(channel, state) to set any LED.&lt;/p&gt;

&lt;p&gt;flush() pushes the entire state array out.&lt;/p&gt;

&lt;p&gt;Optional OE_PIN for global brightness via analogWrite.&lt;/p&gt;

&lt;p&gt;// ===== User Configuration =====&lt;br&gt;
const int DATA_PIN  = 11;  // DS  -&amp;gt; 74HC595 pin 14&lt;br&gt;
const int CLOCK_PIN = 12;  // SH_CP-&amp;gt; 74HC595 pin 11&lt;br&gt;
const int LATCH_PIN = 8;   // ST_CP-&amp;gt; 74HC595 pin 12&lt;/p&gt;

&lt;p&gt;const int MR_PIN = -1;     // Tie to 5V or assign a pin (active LOW). -1 = tied HIGH.&lt;br&gt;
const int OE_PIN = -1;     // Tie to GND or assign a PWM pin (active LOW). -1 = tied LOW.&lt;/p&gt;

&lt;p&gt;const int NUM_REGS = 2;    // 2 chips = 16 channels; set to your chain length&lt;br&gt;
// ==============================&lt;/p&gt;

&lt;p&gt;byte regs[NUM_REGS]; // Each byte is Q0..Q7; bit 0 -&amp;gt; Q0, bit 7 -&amp;gt; Q7&lt;/p&gt;

&lt;p&gt;inline void latch() {&lt;br&gt;
  digitalWrite(LATCH_PIN, LOW);&lt;br&gt;
  digitalWrite(LATCH_PIN, HIGH);&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Push regs[] to the chain (send farthest chip first)&lt;br&gt;
void flush() {&lt;br&gt;
  digitalWrite(LATCH_PIN, LOW);&lt;br&gt;
  for (int i = NUM_REGS - 1; i &amp;gt;= 0; --i) {&lt;br&gt;
    shiftOut(DATA_PIN, CLOCK_PIN, MSBFIRST, regs[i]);&lt;br&gt;
  }&lt;br&gt;
  digitalWrite(LATCH_PIN, HIGH);&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;void clearAll() {&lt;br&gt;
  for (int i = 0; i &amp;lt; NUM_REGS; ++i) regs[i] = 0;&lt;br&gt;
  flush();&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Set one channel (0..NUM_REGS*8-1), then flush&lt;br&gt;
void regWrite(int channel, bool state) {&lt;br&gt;
  if (channel &amp;lt; 0 || channel &amp;gt;= NUM_REGS * 8) return;&lt;br&gt;
  int r = channel / 8;         // which register&lt;br&gt;
  int b = channel % 8;         // which bit (0=Q0 .. 7=Q7)&lt;br&gt;
  bitWrite(regs[r], b, state);&lt;br&gt;
  flush();&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Optionally set all 8-bit registers at once then flush (length must be NUM_REGS)&lt;br&gt;
void writeAll(const byte* values) {&lt;br&gt;
  for (int i = 0; i &amp;lt; NUM_REGS; ++i) regs[i] = values[i];&lt;br&gt;
  flush();&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;void setup() {&lt;br&gt;
  pinMode(DATA_PIN,  OUTPUT);&lt;br&gt;
  pinMode(CLOCK_PIN, OUTPUT);&lt;br&gt;
  pinMode(LATCH_PIN, OUTPUT);&lt;/p&gt;

&lt;p&gt;if (MR_PIN &amp;gt;= 0) {&lt;br&gt;
    pinMode(MR_PIN, OUTPUT);&lt;br&gt;
    digitalWrite(MR_PIN, HIGH); // keep not-reset (LOW would clear)&lt;br&gt;
  }&lt;br&gt;
  if (OE_PIN &amp;gt;= 0) {&lt;br&gt;
    pinMode(OE_PIN, OUTPUT);&lt;br&gt;
    digitalWrite(OE_PIN, LOW);  // enable outputs (LOW = on)&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;// Clean startup&lt;br&gt;
  clearAll();&lt;/p&gt;

&lt;p&gt;// If you wired MR to a pin and want to hard-reset at boot:&lt;br&gt;
  // if (MR_PIN &amp;gt;= 0) { digitalWrite(MR_PIN, LOW); delay(1); digitalWrite(MR_PIN, HIGH); }&lt;/p&gt;

&lt;p&gt;// If you wired OE to a pin and want outputs disabled during init:&lt;br&gt;
  // if (OE_PIN &amp;gt;= 0) { digitalWrite(OE_PIN, HIGH); /* preload zeros */ clearAll(); digitalWrite(OE_PIN, LOW); }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;void loop() {&lt;br&gt;
  const int N = NUM_REGS * 8;&lt;/p&gt;

&lt;p&gt;// 1) Light up one-by-one&lt;br&gt;
  for (int i = 0; i &amp;lt; N; ++i) { regWrite(i, true); delay(60); }&lt;br&gt;
  delay(200);&lt;br&gt;
  for (int i = N - 1; i &amp;gt;= 0; --i) { regWrite(i, false); delay(40); }&lt;br&gt;
  delay(200);&lt;/p&gt;

&lt;p&gt;// 2) Single "runner" back and forth&lt;br&gt;
  clearAll();&lt;br&gt;
  for (int pass = 0; pass &amp;lt; 2; ++pass) {&lt;br&gt;
    for (int i = 0; i &amp;lt; N; ++i) { clearAll(); regWrite(i, true); delay(40); }&lt;br&gt;
    for (int i = N - 1; i &amp;gt;= 0; --i) { clearAll(); regWrite(i, true); delay(40); }&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;// 3) Even/odd blink pattern&lt;br&gt;
  byte evenMask = 0b01010101; // Q0,2,4,6&lt;br&gt;
  byte oddMask  = 0b10101010; // Q1,3,5,7&lt;br&gt;
  for (int k = 0; k &amp;lt; 6; ++k) {&lt;br&gt;
    for (int r = 0; r &amp;lt; NUM_REGS; ++r) regs[r] = (k % 2 == 0) ? evenMask : oddMask;&lt;br&gt;
    flush();&lt;br&gt;
    delay(180);&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;// 4) Global brightness sweep via OE (if connected to PWM pin)&lt;br&gt;
  // NOTE: OE is active LOW. We invert the duty with (255 - d).&lt;br&gt;
  /*&lt;br&gt;
  if (OE_PIN &amp;gt;= 0) {&lt;br&gt;
    for (int r = 0; r &amp;lt; NUM_REGS; ++r) regs[r] = 0xFF; // all on&lt;br&gt;
    flush();&lt;br&gt;
    for (int d = 0; d &amp;lt;= 255; d += 5) { analogWrite(OE_PIN, 255 - d); delay(8); }&lt;br&gt;
    for (int d = 255; d &amp;gt;= 0; d -= 5) { analogWrite(OE_PIN, 255 - d); delay(8); }&lt;br&gt;
  }&lt;br&gt;
  */&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Scaling to More LEDs&lt;/p&gt;

&lt;p&gt;Hardware: Chain Q7′ of the last chip to DS of the new chip; share clock, latch, OE, MR, VCC, GND.&lt;/p&gt;

&lt;p&gt;Software: Set NUM_REGS to your new count. Your channel numbers keep increasing linearly (e.g., with 3 chips, channels 0–23).&lt;/p&gt;

&lt;p&gt;Troubleshooting&lt;/p&gt;

&lt;p&gt;Some LEDs randomly on at power-up&lt;br&gt;
Initialize with clearAll() in setup(); optionally wire MR and/or OE to Arduino pins as explained.&lt;/p&gt;

&lt;p&gt;Nothing lights&lt;br&gt;
Check VCC/GND, verify latch wiring (ST_CP). Make sure you call flush() or use regWrite() which calls it for you.&lt;/p&gt;

&lt;p&gt;Only first 8 work&lt;br&gt;
Q7′(pin 9) of chip 1 must go to DS(pin 14) of chip 2. Also confirm you’re shifting MSBFIRST and sending the last register first in the loop.&lt;/p&gt;

&lt;p&gt;Flicker or unreliable updates with many chips&lt;br&gt;
Lower clock rate (use shiftOut as is, or add small delays), keep wires short, add decoupling capacitors, ensure solid ground. Consider buffering if chain gets long.&lt;/p&gt;

&lt;p&gt;Uneven brightness&lt;br&gt;
Use individual current-limiting resistors per LED and dim via OE PWM, not a shared series potentiometer.&lt;/p&gt;

&lt;p&gt;Frequently Asked (Useful) Variations&lt;/p&gt;

&lt;p&gt;Can I multiplex instead to reduce current and chips?&lt;br&gt;
Yes. For matrixes (e.g., 8×8), pair 74HC595 with a row/column driver (like ULN2803, TPIC6B595, or a 74HC138) and scan rows. Code is different but very scalable.&lt;/p&gt;

&lt;p&gt;What about SPI for speed?&lt;br&gt;
You can wire DS→MOSI, SH_CP→SCK, and manually toggle ST_CP as latch. Then use SPI.transfer() for much faster updates than shiftOut().&lt;/p&gt;

&lt;p&gt;Can I use 3.3 V boards?&lt;br&gt;
74HC595 typically works at 3.3–5 V. Check your particular HC family and ensure LED current and logic thresholds are respected.&lt;/p&gt;

&lt;p&gt;Summary&lt;/p&gt;

&lt;p&gt;Two 74HC595s = 16 LED channels using just 3 Arduino pins&lt;/p&gt;

&lt;p&gt;Global brightness via OE on a PWM pin is cleaner than a series potentiometer&lt;/p&gt;

&lt;p&gt;Add more chips: wire Q7′→DS, change NUM_REGS, done&lt;/p&gt;

&lt;p&gt;The provided code presents a universal channel interface and example effects you can extend&lt;/p&gt;

&lt;p&gt;If you want, I can also provide a version wrapped as a small C++ class (with non-blocking timers for smooth patterns) or an SPI-accelerated variant for longer chains.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>arduino</category>
    </item>
    <item>
      <title>STM32 Internal Temperature Sensor Reading (With DMA + Timer Trigger) — Complete Guide &amp; Example Code</title>
      <dc:creator>ai pics</dc:creator>
      <pubDate>Thu, 25 Sep 2025 02:48:23 +0000</pubDate>
      <link>https://dev.to/ai_pics_6442ad429fc2ff12f/stm32-internal-temperature-sensor-reading-with-dma-timer-trigger-complete-guide-example-code-cjk</link>
      <guid>https://dev.to/ai_pics_6442ad429fc2ff12f/stm32-internal-temperature-sensor-reading-with-dma-timer-trigger-complete-guide-example-code-cjk</guid>
      <description>&lt;p&gt;STM32 &lt;a href="https://mozelectronics.com/semiconductor-ics/embedded-processors-and-controllers/microcontrollers-mcus/" rel="noopener noreferrer"&gt;MCUs&lt;/a&gt; include a built-in temperature sensor wired to a dedicated ADC channel. It’s meant primarily for on-die temperature monitoring (trend/change detection), not precision ambient measurement. With the right sampling time, reference-voltage compensation, and a clean trigger, you can still get stable, repeatable readings that are good enough for system health checks, thermal throttling, and failsafe logic.&lt;/p&gt;

&lt;p&gt;This tutorial shows you how to:&lt;/p&gt;

&lt;p&gt;Enable the internal temperature sensor and VREFINT channels&lt;/p&gt;

&lt;p&gt;Trigger conversions at 50 Hz via TIM3 TRGO&lt;/p&gt;

&lt;p&gt;Stream two ADC channels via DMA (circular)&lt;/p&gt;

&lt;p&gt;Compensate for VDD changes using the VREFINT reading&lt;/p&gt;

&lt;p&gt;Convert VSENSE → °C using the datasheet equation&lt;/p&gt;

&lt;p&gt;Calibrate and stabilize results for real projects&lt;/p&gt;

&lt;p&gt;⚠️ Always check your exact MCU’s datasheet: the conversion equation and parameters (V25, Avg_Slope) vary by family/line and sometimes by revision.&lt;/p&gt;

&lt;p&gt;Table of Contents&lt;/p&gt;

&lt;p&gt;What the Internal Temp Sensor Measures&lt;/p&gt;

&lt;p&gt;Reading Flow &amp;amp; Conversion Equation&lt;/p&gt;

&lt;p&gt;Project Architecture (50 Hz pipeline)&lt;/p&gt;

&lt;p&gt;Step-by-Step CubeMX Configuration&lt;/p&gt;

&lt;p&gt;HAL Example Code (STM32F103-style)&lt;/p&gt;

&lt;p&gt;Calibration &amp;amp; Accuracy Tips&lt;/p&gt;

&lt;p&gt;Troubleshooting FAQ&lt;/p&gt;

&lt;p&gt;Wrap-Up&lt;/p&gt;

&lt;p&gt;1) What the Internal Temp Sensor Measures&lt;/p&gt;

&lt;p&gt;The sensor reports a voltage (VSENSE) proportional to the die temperature.&lt;/p&gt;

&lt;p&gt;It’s internally connected to a dedicated ADC channel.&lt;/p&gt;

&lt;p&gt;A second internal channel (VREFINT) exposes a stable bandgap reference used to estimate actual VDD and correct readings.&lt;/p&gt;

&lt;p&gt;It’s excellent for trend detection and thermal protection, but not meant as a lab-grade ambient probe.&lt;/p&gt;

&lt;p&gt;2) Reading Flow &amp;amp; Conversion Equation&lt;/p&gt;

&lt;p&gt;High-level steps:&lt;/p&gt;

&lt;p&gt;Enable TempSensor ADC channel&lt;/p&gt;

&lt;p&gt;Set sampling time ≥ 17 µs (per datasheet)&lt;/p&gt;

&lt;p&gt;Start ADC (ideally with a timer trigger)&lt;/p&gt;

&lt;p&gt;Read VSENSE and VREFINT&lt;/p&gt;

&lt;p&gt;Convert VSENSE → temperature using datasheet constants&lt;/p&gt;

&lt;p&gt;Typical equation style (family-specific):&lt;/p&gt;

&lt;p&gt;Temperature (°C) = ((V25 - VSENSE) / Avg_Slope) + 25&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;p&gt;V25 = sensor output at 25 °C (e.g., ~1.43 V on many F1 parts)&lt;/p&gt;

&lt;p&gt;Avg_Slope = mV/°C (e.g., ~4.3 mV/°C on many F1 parts)&lt;/p&gt;

&lt;p&gt;VSENSE = computed from raw ADC code with VREFINT-based VDD correction&lt;/p&gt;

&lt;p&gt;For devices that provide factory temperature calibration points (e.g., TS_CAL1/TS_CAL2 at known temperatures), prefer those over the generic V25/Avg_Slope constants.&lt;/p&gt;

&lt;p&gt;3) Project Architecture (50 Hz pipeline)&lt;/p&gt;

&lt;p&gt;We’ll build a stable pipeline with deterministic sampling and voltage compensation:&lt;/p&gt;

&lt;p&gt;TIM3 generates TRGO = Update at 50 Hz (period = 20 ms)&lt;/p&gt;

&lt;p&gt;ADC1 (regular group) is externally triggered by TRGO&lt;/p&gt;

&lt;p&gt;Regular conversions scan two internal channels: VREFINT then TempSensor&lt;/p&gt;

&lt;p&gt;DMA (circular) moves both results into memory every trigger&lt;/p&gt;

&lt;p&gt;In the ADC conversion complete callback, we flip a GPIO (rate probe) and set a flag&lt;/p&gt;

&lt;p&gt;In the main loop, we compute VDD, then VSENSE, then °C, and print via UART (115200)&lt;/p&gt;

&lt;p&gt;4) Step-by-Step CubeMX Configuration&lt;/p&gt;

&lt;p&gt;MCU/Board: e.g., STM32F103C8 (Blue Pill). The flow applies broadly; names may differ by family.&lt;/p&gt;

&lt;p&gt;RCC / Clock&lt;/p&gt;

&lt;p&gt;Use HSE (external crystal) → PLL → SYSCLK 72 MHz (typical for F103)&lt;/p&gt;

&lt;p&gt;Ensure ADC clock ≤ datasheet limit (e.g., 12 MHz)&lt;/p&gt;

&lt;p&gt;ADC1&lt;/p&gt;

&lt;p&gt;Regular conversions: 2 channels (VREFINT, TempSensor)&lt;/p&gt;

&lt;p&gt;Sampling time: choose the nearest ≥ 17 µs.&lt;/p&gt;

&lt;p&gt;Example: at 12 MHz ADC clock, 239.5 cycles ≈ 19.96 µs&lt;/p&gt;

&lt;p&gt;External trigger: TIM3 TRGO (Update event)&lt;/p&gt;

&lt;p&gt;DMA: Add 1 channel, circular, halfword, memory increment enabled&lt;/p&gt;

&lt;p&gt;TIM3&lt;/p&gt;

&lt;p&gt;Timer clock source = internal&lt;/p&gt;

&lt;p&gt;Set PSC and ARR so Update = 20 ms (50 Hz)&lt;/p&gt;

&lt;p&gt;Example at 72 MHz: PSC = 23, ARR = 59999&lt;/p&gt;

&lt;p&gt;TRGO = Update Event&lt;/p&gt;

&lt;p&gt;USART1&lt;/p&gt;

&lt;p&gt;115200 8N1 for logging&lt;/p&gt;

&lt;p&gt;GPIO&lt;/p&gt;

&lt;p&gt;One output (e.g., PB0) for sampling-rate verification (toggle in ADC ISR)&lt;/p&gt;

&lt;p&gt;NVIC&lt;/p&gt;

&lt;p&gt;Enable ADC1 global interrupt&lt;/p&gt;

&lt;p&gt;5) HAL Example Code (STM32F103-style)&lt;/p&gt;

&lt;p&gt;This example uses V25 = 1.43 V and Avg_Slope = 4.3 mV/°C, which are common for many F1 parts. Adjust to your datasheet. If your family provides VREFINT calibration or TS_CAL1/TS_CAL2, prefer those for accuracy.&lt;/p&gt;

&lt;p&gt;/*&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Demo: STM32 Internal Temperature Sensor (ADC + DMA + TIM3 TRGO @ 50 Hz)&lt;/li&gt;
&lt;li&gt;Target style: STM32F103 (adjust constants &amp;amp; addresses to your MCU)
*/
#include "main.h"
#include 
#include &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;/* === Datasheet Parameters (adjust!) === */&lt;/p&gt;

&lt;h1&gt;
  
  
  define AVG_SLOPE_mV_per_C   (4.3f)     // mV/°C
&lt;/h1&gt;

&lt;h1&gt;
  
  
  define V_AT_25C_V           (1.43f)    // V @ 25°C
&lt;/h1&gt;

&lt;h1&gt;
  
  
  define VREFINT_TYP_V        (1.20f)    // Typical internal reference (only if no cal value)
&lt;/h1&gt;

&lt;p&gt;/* HAL handles (CubeMX will generate these) */&lt;br&gt;
ADC_HandleTypeDef   hadc1;&lt;br&gt;
DMA_HandleTypeDef   hdma_adc1;&lt;br&gt;
TIM_HandleTypeDef   htim3;&lt;br&gt;
UART_HandleTypeDef  huart1;&lt;/p&gt;

&lt;p&gt;/* Double buffer: [0] = VREFINT ADC code, [1] = VSENSE ADC code */&lt;br&gt;
static volatile uint16_t adc_buf[2];&lt;br&gt;
static volatile uint8_t  new_sample = 0;&lt;/p&gt;

&lt;p&gt;/* App state */&lt;br&gt;
static float vref_V = 0.0f;&lt;br&gt;
static float vsense_V = 0.0f;&lt;br&gt;
static float temperature_C = 0.0f;&lt;/p&gt;

&lt;p&gt;static char line[48];&lt;/p&gt;

&lt;p&gt;/* Prototypes generated by CubeMX */&lt;br&gt;
void SystemClock_Config(void);&lt;br&gt;
static void MX_GPIO_Init(void);&lt;br&gt;
static void MX_DMA_Init(void);&lt;br&gt;
static void MX_ADC1_Init(void);&lt;br&gt;
static void MX_TIM3_Init(void);&lt;br&gt;
static void MX_USART1_UART_Init(void);&lt;/p&gt;

&lt;p&gt;/* === Optional: If your device has VREFINT calibration, read it here ===&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Many non-F1 families define VREFINT_CAL_ADDR &amp;amp; VREFINT_CAL_VREF in headers.&lt;/li&gt;
&lt;li&gt;For plain F1, you may not have this and must use VREFINT_TYP_V.
&lt;em&gt;/
// #define VREFINT_CAL_ADDR ((uint16_t&lt;/em&gt;)0x1FFFxxxx) // family-specific
// #define VREFINT_CAL_VREF (3.0f)                  // e.g., 3.0 V or per docs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;int main(void)&lt;br&gt;
{&lt;br&gt;
  HAL_Init();&lt;br&gt;
  SystemClock_Config();&lt;br&gt;
  MX_GPIO_Init();&lt;br&gt;
  MX_DMA_Init();&lt;br&gt;
  MX_ADC1_Init();&lt;br&gt;
  MX_TIM3_Init();&lt;br&gt;
  MX_USART1_UART_Init();&lt;/p&gt;

&lt;p&gt;/* Start 50 Hz trigger */&lt;br&gt;
  HAL_TIM_Base_Start(&amp;amp;htim3);&lt;/p&gt;

&lt;p&gt;/* Calibrate &amp;amp; start ADC in DMA circular mode &lt;em&gt;/&lt;br&gt;
  HAL_ADCEx_Calibration_Start(&amp;amp;hadc1);&lt;br&gt;
  HAL_ADC_Start_DMA(&amp;amp;hadc1, (uint32_t&lt;/em&gt;)adc_buf, 2);&lt;/p&gt;

&lt;p&gt;for (;;)&lt;br&gt;
  {&lt;br&gt;
    if (new_sample)&lt;br&gt;
    {&lt;br&gt;
      /* Compute VDD from VREFINT reading &lt;em&gt;/&lt;br&gt;
      /&lt;/em&gt; Without a factory cal value, approximate with typical Vrefint */&lt;br&gt;
      const float adc_fullscale = 4095.0f;&lt;br&gt;
      const float vrefint_code  = (float)adc_buf[0];&lt;br&gt;
      const float vsense_code   = (float)adc_buf[1];&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  /* Estimate effective VDD using VREFINT */
  /* VREFINT_TYP_V = Vrefint (typical) at nominal VDD, so:
     VDD ≈ (VREFINT_TYP_V * adc_fullscale) / ADC[VREFINT]  */
  float vdd_V = (VREFINT_TYP_V * adc_fullscale) / (vrefint_code &amp;gt; 0.5f ? vrefint_code : 0.5f);

  /* Now compute VSENSE in volts using that VDD */
  vref_V   = vdd_V;                                     // alias for clarity
  vsense_V = (vsense_code * vref_V) / adc_fullscale;

  /* Convert to temperature (°C). Avg_Slope is in mV/°C */
  temperature_C = (((V_AT_25C_V - vsense_V) * 1000.0f) / AVG_SLOPE_mV_per_C) + 25.0f;

  /* Print one line per sample (for Serial Plotter/Monitor) */
  int n = snprintf(line, sizeof(line), "%.2f\r\n", temperature_C);
  HAL_UART_Transmit(&amp;amp;huart1, (uint8_t*)line, (uint16_t)n, 50);

  new_sample = 0;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;/* ADC end-of-conversion callback: one pair (VREFINT, VSENSE) ready */&lt;br&gt;
void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef *hadc)&lt;br&gt;
{&lt;br&gt;
  if (hadc-&amp;gt;Instance == ADC1)&lt;br&gt;
  {&lt;br&gt;
    HAL_GPIO_TogglePin(GPIOB, GPIO_PIN_0); // Scope this to verify 50 Hz rate&lt;br&gt;
    new_sample = 1;&lt;br&gt;
  }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Notes on accuracy upgrades (when your MCU supports it):&lt;/p&gt;

&lt;p&gt;If your device provides VREFINT_CAL (a factory ADC code measured at a known VDD, e.g., 3.0 V), then compute:&lt;br&gt;
VDD = (VREF_KNOWN * VREFINT_CAL_CODE) / ADC[VREFINT]&lt;br&gt;
This removes the approximation of VREFINT_TYP_V.&lt;/p&gt;

&lt;p&gt;If your device exposes TS_CAL1 (at ~30 °C) and TS_CAL2 (at ~110 °C), compute the slope from those two points and linearly interpolate the temperature. This is typically more accurate than using V25 + Avg_Slope.&lt;/p&gt;

&lt;p&gt;6) Calibration &amp;amp; Accuracy Tips&lt;/p&gt;

&lt;p&gt;Use the right sampling time&lt;br&gt;
The temp channel needs ≥ 17 µs. If you sample faster, readings will jitter or skew low.&lt;/p&gt;

&lt;p&gt;Compensate VDD with VREFINT&lt;br&gt;
Always read VREFINT alongside VSENSE and correct for VDD changes.&lt;/p&gt;

&lt;p&gt;Factory calibration beats typical constants&lt;br&gt;
Prefer TS_CAL1/TS_CAL2 and VREFINT_CAL when your part provides them. They capture per-die variation.&lt;/p&gt;

&lt;p&gt;Thermal reality check&lt;br&gt;
The internal sensor reports die temperature. CPU load, flash waits, and DC/DC activity heat the silicon. It will not match a distant ambient probe.&lt;/p&gt;

&lt;p&gt;Averaging &amp;amp; rate&lt;br&gt;
A simple moving average (e.g., 8–16 samples) helps. Don’t oversample; 10–50 Hz is plenty for thermal trends.&lt;/p&gt;

&lt;p&gt;One-time alignment&lt;br&gt;
If absolute accuracy matters, co-calibrate with a known good external sensor placed near the MCU package and fit offset/slope.&lt;/p&gt;

&lt;p&gt;7) Troubleshooting FAQ&lt;/p&gt;

&lt;p&gt;Q: My reading is noisy or jumps a lot.&lt;/p&gt;

&lt;p&gt;Increase sampling time (e.g., 239.5 cycles).&lt;/p&gt;

&lt;p&gt;Average multiple samples.&lt;/p&gt;

&lt;p&gt;Is TRGO configured? Software-triggered, irregular sampling can add jitter.&lt;/p&gt;

&lt;p&gt;Q: Numbers drift when VDD changes.&lt;/p&gt;

&lt;p&gt;You’re likely not using VREFINT compensation. Read VREFINT every cycle.&lt;/p&gt;

&lt;p&gt;Q: I get obviously wrong temperatures (e.g., –20 °C at room).&lt;/p&gt;

&lt;p&gt;Check channel order (VREFINT vs TempSensor).&lt;/p&gt;

&lt;p&gt;Verify reference equation constants (V25, Avg_Slope) match your MCU.&lt;/p&gt;

&lt;p&gt;Confirm ADC clock/dividers and resolution.&lt;/p&gt;

&lt;p&gt;Q: How do I validate 50 Hz sampling?&lt;/p&gt;

&lt;p&gt;Toggle a GPIO in HAL_ADC_ConvCpltCallback() and measure on a scope. You should see 20 ms between edges.&lt;/p&gt;

&lt;p&gt;Q: Can I do this without DMA?&lt;/p&gt;

&lt;p&gt;Yes, poll or interrupt per conversion, but DMA keeps the CPU free and guarantees deterministic double-channel reads.&lt;/p&gt;

&lt;p&gt;8) Wrap-Up&lt;/p&gt;

&lt;p&gt;With timer-triggered ADC, DMA, and VREFINT compensation, STM32’s internal temperature &lt;a href="https://mozelectronics.com/sensors/" rel="noopener noreferrer"&gt;sensor&lt;/a&gt; becomes a reliable trend monitor for thermal management and safety logic. For tighter absolute numbers, lean on factory calibration points (when present) and perform a quick in-system alignment against a trusted external probe.&lt;/p&gt;

&lt;p&gt;If you want, I can also provide an LL-driver version, a FreeRTOS task pattern, or a CSV logger to the serial port so you can chart measurements over time.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>stm32</category>
      <category>code</category>
    </item>
    <item>
      <title>How to Optimize HLS Designs for FPGAs (A Practical, Vendor-Agnostic Playbook)</title>
      <dc:creator>ai pics</dc:creator>
      <pubDate>Mon, 22 Sep 2025 08:39:19 +0000</pubDate>
      <link>https://dev.to/ai_pics_6442ad429fc2ff12f/how-to-optimize-hls-designs-for-fpgas-a-practical-vendor-agnostic-playbook-49k0</link>
      <guid>https://dev.to/ai_pics_6442ad429fc2ff12f/how-to-optimize-hls-designs-for-fpgas-a-practical-vendor-agnostic-playbook-49k0</guid>
      <description>&lt;p&gt;Optimizing High-Level Synthesis (HLS) for &lt;a href="https://mozelectronics.com/semiconductor-ics/embedded-processors-and-controllers/fpgas/" rel="noopener noreferrer"&gt;FPGAs&lt;/a&gt; is about turning C/C++ into RTL that meets your throughput, latency, area, and power targets—without breaking correctness. Below is a concise, field-tested checklist you can apply in Vitis HLS (Xilinx), Intel HLS, Catapult, etc. Examples use Vitis HLS-style pragmas, with notes for portability.&lt;/p&gt;

&lt;p&gt;1) Know the Optimization Stack&lt;/p&gt;

&lt;p&gt;Algorithm level – choose math/data representations that minimize work.&lt;/p&gt;

&lt;p&gt;Loop &amp;amp; task level – expose parallelism (pipeline, unroll, dataflow).&lt;/p&gt;

&lt;p&gt;Memory &amp;amp; I/O – feed the beast (partition, reshape, burst, stream).&lt;/p&gt;

&lt;p&gt;Micro-architecture – bind operators/memories, balance latencies, share resources.&lt;/p&gt;

&lt;p&gt;Closure – verify (C/COSIM), analyze (util/timing/II/latency), iterate.&lt;/p&gt;

&lt;p&gt;2) Numerics &amp;amp; Code Structure&lt;br&gt;
Use bit-accurate fixed types&lt;/p&gt;

&lt;p&gt;Prefer ap_(u)int / ap_fixed (or vendor equivalents) over float/double when error budget allows.&lt;/p&gt;

&lt;p&gt;Right-size widths aggressively to cut LUTs, FFs, and DSP usage.&lt;/p&gt;

&lt;h1&gt;
  
  
  include "ap_int.h"
&lt;/h1&gt;

&lt;h1&gt;
  
  
  include "ap_fixed.h"
&lt;/h1&gt;

&lt;p&gt;using pix_t = ap_uint&amp;lt;10&amp;gt;;          // example: 10-bit pixel&lt;br&gt;
using coeff_t = ap_fixed&amp;lt;16,2&amp;gt;;     // 2 integer bits, 14 fractional&lt;/p&gt;

&lt;p&gt;Make dependencies obvious (or remove them)&lt;/p&gt;

&lt;p&gt;Keep hot loops simple; hoist conditionals outside loops when possible.&lt;/p&gt;

&lt;p&gt;Replace complex if/else trees on the critical path with tables or precomputed constants where sensible.&lt;/p&gt;

&lt;p&gt;Use const, restrict (where safe), and pass-by-reference to help the compiler infer no-aliasing and enable bursting.&lt;/p&gt;

&lt;p&gt;3) Loop-Level Optimization&lt;br&gt;
Pipeline first&lt;/p&gt;

&lt;p&gt;Goal: II=1 on the critical loop whenever feasible.&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS PIPELINE II=1
&lt;/h1&gt;

&lt;p&gt;for (int i = 0; i &amp;lt; N; i++) {&lt;br&gt;
  // body with no loop-carried true deps&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Tip: If HLS won’t reach II=1, check the synthesis log’s “stall” reason:&lt;/p&gt;

&lt;p&gt;Memory port conflicts → partition/reshape arrays or widen the data path.&lt;/p&gt;

&lt;p&gt;Loop-carried dependency (RAW/WAR/WAW) → restructure buffers or prove independence:&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS DEPENDENCE variable=buf inter false
&lt;/h1&gt;

&lt;p&gt;Unroll to trade area for throughput&lt;/p&gt;

&lt;p&gt;Partial unroll to match available memory banks/ports; full unroll only if you can feed it.&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS UNROLL factor=4
&lt;/h1&gt;

&lt;p&gt;for (int k=0; k&amp;lt;K; k++) { ... }&lt;/p&gt;

&lt;p&gt;Tile / block for locality&lt;/p&gt;

&lt;p&gt;Break large loops into tiles that fit BRAM/URAM; combine with on-chip buffers to reduce DDR traffic.&lt;/p&gt;

&lt;p&gt;for (int ii=0; ii&amp;lt;N; ii+=Ti)&lt;br&gt;
  for (int jj=0; jj&amp;lt;M; jj+=Tj)&lt;br&gt;
    compute_tile(ii, jj);&lt;/p&gt;

&lt;p&gt;Help the estimator&lt;/p&gt;

&lt;p&gt;Tripcounts improve latency reports and scheduling:&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS LOOP_TRIPCOUNT min=64 max=128
&lt;/h1&gt;

&lt;p&gt;4) Task-Level Concurrency (DATAFLOW)&lt;/p&gt;

&lt;p&gt;Use dataflow to run producer/consumer stages concurrently. Connect stages with hls::stream (or Intel channels).&lt;/p&gt;

&lt;h1&gt;
  
  
  include "hls_stream.h"
&lt;/h1&gt;

&lt;p&gt;void stageA(hls::stream&amp;amp; out);&lt;br&gt;
void stageB(hls::stream&amp;amp; in, hls::stream&amp;amp; out);&lt;br&gt;
void stageC(hls::stream&amp;amp; in);&lt;/p&gt;

&lt;p&gt;void top(hls::stream&amp;amp; in, hls::stream&amp;amp; out) {&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS DATAFLOW
&lt;/h1&gt;

&lt;p&gt;static hls::stream s1("s1"), s2("s2");&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS STREAM variable=s1 depth=64
&lt;/h1&gt;

&lt;h1&gt;
  
  
  pragma HLS STREAM variable=s2 depth=64
&lt;/h1&gt;

&lt;p&gt;stageA(s1);&lt;br&gt;
  stageB(s1, s2);&lt;br&gt;
  stageC(s2);&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Tips&lt;/p&gt;

&lt;p&gt;Choose FIFO depths to absorb burstiness and meet initiation intervals across stages.&lt;/p&gt;

&lt;p&gt;Avoid reading/writing the same array from multiple tasks unless you bank/partition correctly.&lt;/p&gt;

&lt;p&gt;5) Memory &amp;amp; Interface Tuning&lt;br&gt;
Partition / reshape arrays to add ports&lt;/p&gt;

&lt;p&gt;PARTITION creates true parallel banks (good for random access).&lt;/p&gt;

&lt;p&gt;RESHAPE packs multiple elements per word (great for sequential access and burst width).&lt;/p&gt;

&lt;p&gt;// Random parallel reads&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS ARRAY_PARTITION variable=buf cyclic factor=4 dim=1
&lt;/h1&gt;

&lt;p&gt;// Wide sequential loads/stores (e.g., 512-bit DDR beats)&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS ARRAY_RESHAPE variable=line factor=16 dim=1
&lt;/h1&gt;

&lt;p&gt;Burst DDR and align widths&lt;/p&gt;

&lt;p&gt;Use m_axi (Vitis) and wide types (ap_uint&amp;lt;256/512&amp;gt;) to match DDR or NoC widths; ensure contiguous access patterns.&lt;/p&gt;

&lt;p&gt;Add offset=slave &amp;amp; proper bundle= names for multiple ports.&lt;/p&gt;

&lt;p&gt;void kernel(ap_uint&amp;lt;512&amp;gt;* in, ap_uint&amp;lt;512&amp;gt;* out, int N) {&lt;br&gt;
  #pragma HLS INTERFACE m_axi     port=in  offset=slave bundle=gmem0 depth=1024&lt;br&gt;
  #pragma HLS INTERFACE m_axi     port=out offset=slave bundle=gmem1 depth=1024&lt;br&gt;
  #pragma HLS INTERFACE s_axilite port=N   bundle=control&lt;br&gt;
  #pragma HLS INTERFACE s_axilite port=return bundle=control&lt;br&gt;
  // ...&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Stream for high throughput and low latency&lt;/p&gt;

&lt;p&gt;Use AXI4-Stream at the top and hls::stream internally for line-rate pipelines (video, radio, ML).&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS INTERFACE axis port=in_axis
&lt;/h1&gt;

&lt;h1&gt;
  
  
  pragma HLS INTERFACE axis port=out_axis
&lt;/h1&gt;

&lt;p&gt;6) Resource Binding &amp;amp; Micro-Architecture&lt;br&gt;
Bind operations and memories&lt;/p&gt;

&lt;p&gt;Map multiplies to DSPs (throughput) or LUTs (save DSPs).&lt;/p&gt;

&lt;p&gt;Choose BRAM vs URAM for large buffers; single-/dual-port appropriately.&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS RESOURCE variable=mul_op core=DSP48
&lt;/h1&gt;

&lt;h1&gt;
  
  
  pragma HLS BIND_STORAGE variable=tile type=ram_2p impl=bram
&lt;/h1&gt;

&lt;p&gt;Control sharing vs. replication&lt;/p&gt;

&lt;p&gt;Use UNROLL to replicate compute, or ALLOCATION/RESOURCE pragmas to limit operator instances for area.&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS ALLOCATION operation instances=mul limit=2
&lt;/h1&gt;

&lt;p&gt;Latency balancing&lt;/p&gt;

&lt;p&gt;For long adder trees or MAC chains, HLS will usually insert registers; you can constrain:&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS LATENCY min=1 max=6
&lt;/h1&gt;

&lt;p&gt;7) Throughput vs. Latency vs. Fmax&lt;/p&gt;

&lt;p&gt;II (Initiation Interval) controls throughput (samples/cycle).&lt;/p&gt;

&lt;p&gt;Latency is total cycles from input to output.&lt;/p&gt;

&lt;p&gt;Fmax comes from post-synthesis timing; shorten critical paths (reduce fan-out, balance trees, use DSPs).&lt;/p&gt;

&lt;p&gt;Clocking note: Set the target period in tool constraints (e.g., Vitis HLS create_clock -period 5) rather than in code; adjust until timing is clean with margin.&lt;/p&gt;

&lt;p&gt;8) Verification &amp;amp; Reporting&lt;/p&gt;

&lt;p&gt;C-sim: Prove algorithm correctness fast.&lt;/p&gt;

&lt;p&gt;C/RTL Co-sim: Validate that RTL matches C under realistic I/O.&lt;/p&gt;

&lt;p&gt;Reports: Inspect&lt;/p&gt;

&lt;p&gt;Achieved II and latency,&lt;/p&gt;

&lt;p&gt;Stall reasons (dependencies/ports),&lt;/p&gt;

&lt;p&gt;Resource map (LUT/FF/DSP/BRAM/URAM),&lt;/p&gt;

&lt;p&gt;Interface burst efficiency.&lt;/p&gt;

&lt;p&gt;Bit-exact testing for fixed-point: measure SNR/PSNR or error budgets vs. floating-point golden.&lt;/p&gt;

&lt;p&gt;9) Example: Streaming FIR with One-Sample-per-Cycle&lt;/p&gt;

&lt;p&gt;This version sustains II=1 by unrolling the tap MAC and fully partitioning coefficients and the shift register. It uses fixed-point, AXI-Stream I/O, and works nicely inside a DATAFLOW pipeline.&lt;/p&gt;

&lt;h1&gt;
  
  
  include "ap_fixed.h"
&lt;/h1&gt;

&lt;h1&gt;
  
  
  include "hls_stream.h"
&lt;/h1&gt;

&lt;p&gt;using data_t  = ap_fixed&amp;lt;16,8&amp;gt;;&lt;br&gt;
using acc_t   = ap_fixed&amp;lt;32,12&amp;gt;;   // wider accumulator&lt;br&gt;
const int N = 64;&lt;/p&gt;

&lt;p&gt;struct axis_t {&lt;br&gt;
  data_t data;&lt;br&gt;
  bool   last;&lt;br&gt;
};&lt;/p&gt;

&lt;p&gt;void fir64(hls::stream&amp;amp; in, hls::stream&amp;amp; out, const data_t coeff[N]) {&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS INTERFACE axis port=in
&lt;/h1&gt;

&lt;h1&gt;
  
  
  pragma HLS INTERFACE axis port=out
&lt;/h1&gt;

&lt;h1&gt;
  
  
  pragma HLS INTERFACE ap_ctrl_none port=return
&lt;/h1&gt;

&lt;h1&gt;
  
  
  pragma HLS ARRAY_PARTITION variable=coeff complete dim=1
&lt;/h1&gt;

&lt;p&gt;static data_t shift_reg[N];&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS ARRAY_PARTITION variable=shift_reg complete dim=1
&lt;/h1&gt;

&lt;p&gt;while (true) {&lt;/p&gt;

&lt;h1&gt;
  
  
  pragma HLS PIPELINE II=1
&lt;/h1&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;axis_t x = in.read();

// shift
for (int i = N-1; i &amp;gt; 0; --i) {
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
  
  
  pragma HLS UNROLL
&lt;/h1&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  shift_reg[i] = shift_reg[i-1];
}
shift_reg[0] = x.data;

// MAC
acc_t acc = 0;
for (int i = 0; i &amp;lt; N; ++i) {
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
  
  
  pragma HLS UNROLL
&lt;/h1&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  acc += (acc_t)shift_reg[i] * (acc_t)coeff[i];
}

axis_t y;
y.data = (data_t)acc;
y.last = x.last;
out.write(y);

if (x.last) break;  // simple frame terminator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;br&gt;
}&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
