Revolutionizing Chip Design with Intelligent, Energy-Efficient Communication Frameworks

Introduction

In the fast-paced world of semiconductor technology, designing chips that are both high-performance and energy-efficient is more critical than ever. With increasing demand for powerful computing across industries, the challenge lies in balancing performance with energy consumption and fault tolerance. Our breakthrough framework for on-chip communication offers an innovative, learning-based solution that addresses these demands head-on.

The Challenge

Traditional on-chip communication frameworks often struggle to keep up with the ever-increasing data flow in modern chips. This bottleneck limits performance and increases energy consumption, particularly in high-demand environments such as AI, data centers, and edge computing. Moreover, ensuring fault tolerance without sacrificing performance or efficiency presents a major hurdle in semiconductor design.

The Solution

Our learning-based communication framework solves these challenges by optimizing on-chip communication in real time, ensuring the highest levels of performance while minimizing energy usage. This intelligent system continuously adapts to data flow, allowing chips to operate more efficiently under varying workloads. Additionally, its fault-tolerant design ensures that even in the presence of failures, chips continue to function with minimal disruption.

Why This Matters

  1. Energy Efficiency and Performance: With energy efficiency becoming a priority across industries, this framework reduces power consumption while boosting chip performance. Whether applied in data centers, AI applications, or consumer electronics, this technology helps reduce costs and environmental impact.
  2. Fault Tolerance: This framework doesn’t just offer high performance—it’s built to withstand failures. In industries where downtime is costly or dangerous, such as automotive and telecommunications, this technology ensures that systems remain operational even when faults occur.
  3. Learning-Based Optimization: By using AI to optimize on-chip communication, this framework continuously improves chip performance, making it a future-proof solution that adapts to evolving demands.

Why License This Technology?

Licensing this innovative communication framework gives your company the tools to design chips that are smarter, faster, and more efficient. In an industry where the demand for energy-efficient, high-performance semiconductors is growing exponentially, this solution offers a competitive edge. It enables your products to stand out by delivering not just raw power, but intelligent performance management that maximizes efficiency and reliability.

The Opportunity

As industries increasingly rely on advanced semiconductor technology, now is the time to adopt frameworks that will define the next generation of chips. License this learning-based, fault-tolerant communication design and take a bold step into the future of semiconductor innovation.

A proactive fault-tolerant scheme which improves performance and energy efficiency for NoCs. The fault-tolerant scheme allows routers to switch among several different fault-tolerant operations. Each operation mode has different trade-offs among fault-tolerant capability, retransmission traffic, latency, and energy efficiency. Another example provides a proactive, dynamic control policy to balance and optimize the dynamic interactions and trade-offs. The example control policy uses example machine learning algorithm called reinforcement learning (RL). The example RL-based controller independently observes a set of NoC system parameters at runtime, and over time they evolve optimal per-router control policies. By automatically and optimally switching among the four fault-tolerant modes, the trained control policy results in minimizing system level network latency and maximizing energy efficiency while detecting and correcting errors.

The invention claimed is:

1. A network on chip (NoC) reconfigurable router assembly comprising:

a plurality of input ports for receiving a plurality of input data packets from an adjacent upstream reconfigurable router assembly and connected to the adjacent reconfigurable upstream router assembly;
a plurality of discrete input decoders each connected to one of said plurality of input ports, receiving one of the plurality of input data packets from one of said plurality of input ports, and providing a decoded input data packet;
a plurality of discrete input storage buffers, each connected to one of said plurality of input decoders and receiving a decoded input data packet from one of said plurality of input decoders;
an interconnection device receiving the plurality of decoded input data packets from said plurality of input storage buffers, and providing a plurality of interconnection output data packets;
a plurality of discrete output encoders each receiving one of said plurality of interconnection output data packets and providing an encoded output data packet;
a plurality of output ports each receiving the encoded output data packet from one of said plurality of output encoders;

a discrete fault-tolerant controller coupled to each of said plurality of discrete input decoders and each of said plurality of discrete output encoders, said discrete fault-tolerant controller:

transmitting a downstream synchronization control signal to a downstream reconfigurable router to synchronize the plurality of downstream input decoders in the downstream reconfigurable router with said plurality of said output encoders of the current reconfigurable router,
transmitting an upstream synchronization control signal to the upstream reconfigurable router to synchronize said plurality of output encoders in the upstream reconfigurable router with said plurality of input decoders in the current reconfigurable router,
receiving a downstream synchronization control signal from the upstream reconfigurable router to synchronize said plurality of input decoders of the current reconfigurable router with said plurality of output encoders in the upstream router,
receiving an upstream synchronization control signal from the downstream reconfigurable router to synchronize said plurality of output encoders of the current reconfigurable router with said plurality of input decoders in the down stream router,
receiving a downstream control signal from an upstream reconfigurable router and disable said plurality of input decoders in response to the received downstream control signal,
transmitting a downstream control signal to a downstream reconfigurable router to disable said plurality of input decoders in said downstream reconfigurable router, transmitting an upstream control signal to the upstream reconfigurable router to disable said output encoders of the upstream reconfigurable router, and
receiving an upstream control signal from a downstream reconfigurable router and disabling said output encoders in response to the received upstream control signal,
a discrete processing device configured to select one of a plurality of fault-tolerant operation modes based on reinforcement machine-learning optimization, and controlling operation of said plurality of input decoders and said plurality of output encoders based on the selected fault-tolerant operation mode, wherein the plurality of fault-tolerant operation modes comprise Cyclic Redundancy Check (CRC), Automatic Retransmission Query (ARQ), Single-bit Error Correction and Double-bit Error Detection (SECDED), and Double-bit Error Correction Triple-bit Error Correction (DECTED);
said processing device further configured to dynamically determine a state of said reconfigurable router assembly based on attributes of said plurality of input ports, said plurality of output ports, said error correction code (ECC) encoder, said error correction code (ECC) decoder, output buffers, or acknowledgement response;
said processing device further configured to dynamically disable at least one of said plurality of input decoders or at least one of said plurality of output encoders in response to the determined action to operate in the selected fault-tolerant operation mode, and dynamically enable at least one of said plurality of input decoders or at least one of said plurality of output encoders in response to the determined action to operate in the selected fault-tolerant operation mode.

2. The NoC reconfigurable router assembly of claim 1, further comprising:

an error checking encoder encoding source and providing a first encoded error checking output data packet;
an error checking input storage buffer receiving the first encoded error checking output data packet from said error checking encoder data packet;
said interconnection device receiving the first error checking encoded output data packet from said error checking input storage buffer and receiving a second error checking encoded output data packet from the adjacent upstream reconfigurable router assembly; and
an error checking decoder connected to said interconnection device, receiving the second error checking encoded output data packet, decoding the second error checking encoded output data packet, determining if the second error checking encoded output data packet has error, and sending a retransmit request to the adjacent upstream reconfigurable router assembly to retransmit the plurality of input data packets if the second error checking encoded output data packet has error;
said plurality of output encoders receiving the first error checking encoded output data packet from said interconnection device.
3. The NoC reconfigurable router assembly of claim 2, wherein said error checking encoder comprises a Cyclic Redundancy Check (CRC) encoder, and said error correction decoder comprises a CRC decoder.
4. The NoC reconfigurable router assembly of claim 1, wherein said interconnection device comprises a crossbar.
5. The NoC reconfigurable router assembly of claim 1, wherein said plurality of input decoders comprise Automatic Retransmission Query (ARQ) protocol with error correction code (ECC) (ARQ+ECC) decoders, and said plurality of output encoders comprise ARQ+ECC encoders.
6. The NoC reconfigurable router assembly of claim 1, further comprising a processing device sending an acknowledgment request to the adjacent upstream reconfigurable router assembly and receiving said acknowledgement response from an adjacent downstream reconfigurable router assembly.
7. The NoC reconfigurable router assembly of claim 6, further comprising a plurality of output buffers each receiving one of the plurality of second interconnection output data packets, said processing device further retransmitting the plurality of second interconnection output data packets from said plurality of output buffers in response to receiving a negative acknowledgement response from the adjacent downstream reconfigurable router assembly.
8. The NoC reconfigurable router assembly of claim 1, further comprising a plurality of output buffers each receiving one of the plurality of second interconnection output data packets, and further comprising a processing device further receiving a retransmit request from an adjacent downstream reconfigurable router assembly and retransmitting the plurality of second interconnection output data packets from said plurality of output buffers in response to the retransmit request.

9. The NoC reconfigurable router assembly of claim 1, wherein:

the reconfigurable router assembly further comprising:
a switch that enables and disables said plurality of output encoders and plurality of input decoders in response to an activation from the fault-tolerant controller.
10. The NoC reconfigurable router assembly of claim 9, said processing device delaying at least one cycle before receiving input data packets.
11. The NoC reconfigurable router assembly of claim 1, wherein said plurality of input decoders and said plurality of output encoders perform single-bit error correction and double-bit error detection (SECDED).
12. The NoC reconfigurable router assembly of claim 1, said action comprising fault-tolerant methodology using ARQ+CRC error detection.
13. The NoC reconfigurable router assembly of claim 1, said action comprising fault-tolerant methodology using SECDED error correction.
14. The NoC reconfigurable router assembly of claim 1, said action comprising fault-tolerant methodology using SECDED error correction with proactive retransmission method.
15. The NoC reconfigurable router assembly of claim 1, said action comprising fault-tolerant methodology using relaxed transmission method.
16. The NoC reconfigurable router assembly of claim 1, wherein said attributes comprise usage of said plurality of input storage buffers, usage of said plurality of input ports, usage of said plurality of output ports, rate of received negative acknowledgement response, rate of transmitted negative acknowledgement response, and/or temperature.
17. The NoC reconfigurable router assembly of claim 1, further comprising a state-action table associating each state with an associated action.
18. The NoC reconfigurable router assembly of claim 1, said processing device selecting the action with maximum benefit according to a reward that is a function of objectives includes energy, performance, and reliability.
19. The NoC reconfigurable router assembly of claim 1, wherein said plurality of input storage buffers each comprise a register.
20. The NoC reconfigurable router assembly of claim 1, wherein said processing device selects an operation mode according to machine learning algorithms including supervised learning or a reinforcement learning algorithm.
21. The NoC reconfigurable router assembly of claim 17, wherein the state-action table is stored in the fault-tolerant controller.

22. A Network on Chip (NoC) reconfigurable router assembly comprising:

a plurality of input ports for receiving a plurality of input data packets from an upstream reconfigurable router assembly;
a plurality of discrete input decoders each receiving one of the plurality of input data packets from one of said plurality of input ports and providing a decoded input;
a plurality of input storage buffers, each receiving a decoded input from one of said plurality of input decoders;
an interconnection device receiving the plurality of decoded input data packets from said plurality of input storage buffers, and providing a plurality of interconnection output data packets;
a plurality of discrete output encoders each receiving one of said plurality of interconnection output data packets and providing an encoded output data packet;
a plurality of output ports each receiving the encoded output data packet from one of said plurality of output encoders;
an error checking encoder encoding source and providing a first encoded error checking output data packet;
an error checking input storage buffer receiving the first encoded error checking output data packet from said error checking encoder;
said interconnection device receiving the first error checking encoded output data packet from said error checking input storage buffer and receiving a second error checking encoded output data packet from the upstream reconfigurable router assembly;
an error checking decoder connected to said interconnection device, receiving the second error checking encoded output, decoding the second error checking encoded output data packet, determining if the second error checking encoded output data packet has error, and sending a retransmit request to the upstream reconfigurable router assembly to retransmit the plurality of input data packets if the second error checking encoded output data packet has error;
said plurality of output encoders receiving the first error checking encoded output data packet from said interconnection device;

a discrete fault-tolerant controller coupled to each of said plurality of discrete input decoders and each of said plurality of discrete output encoders, said discrete fault-tolerant controller:

transmitting a downstream synchronization control signal to a downstream reconfigurable router to synchronize the plurality of downstream input decoders in the downstream reconfigurable router with said plurality of said output encoders of the current reconfigurable router,
transmitting an upstream synchronization control signal to the upstream reconfigurable router to synchronize said plurality of output encoders in the upstream reconfigurable router with said plurality of input decoders in the current reconfigurable router,
receiving a downstream synchronization control signal from the upstream reconfigurable router to synchronize said plurality of input decoders of the current reconfigurable router with said plurality of output encoders in the upstream router,
receiving an upstream synchronization control signal from the downstream reconfigurable router to synchronize said plurality of output encoders of the current reconfigurable router with said plurality of input decoders in the down stream router,
receiving a downstream control signal from an upstream reconfigurable router and disable said plurality of input decoders in response to the received downstream control signal,
transmitting a downstream control signal to a downstream reconfigurable router to disable said plurality of input decoders in said downstream reconfigurable router, transmitting an upstream control signal to the upstream reconfigurable router to disable said output encoders of the upstream reconfigurable router, and
receiving an upstream control signal from a downstream reconfigurable router and disabling said output encoders in response to the received upstream control signal, and
a discrete processing device configured to select one of a plurality of fault-tolerant operation modes based on reinforcement machine-learning optimization, and controlling operation of said plurality of input decoders and said plurality of output encoders based on the selected fault-tolerant operation mode, wherein the plurality of fault-tolerant operation modes comprise Cyclic Redundancy Check (CRC), Automatic Retransmission Query (ARQ), Single-bit Error Correction and Double-bit Error Detection (SECDED), and Double-bit Error Correction Triple-bit Error Correction (DECTED);
said processing device further configured to dynamically determine a state of said reconfigurable router assembly based on attributes of said plurality of input ports, said plurality of output ports, said error correction code (ECC) encoder, said error correction code (ECC) decoder, said buffers, or acknowledgement response;
said processing device further configured to dynamically disable at least one of said plurality of input decoders or at least one of said plurality of output encoders in response to the determined action to operate in the selected fault-tolerant operation mode, and dynamically enable at least one of said plurality of input decoders or at least one of said plurality of output encoders in response to the determined action to operate in the selected fault-tolerant operation mode.
23. The NoC reconfigurable router assembly of claim 22, wherein said error checking encoder comprises a Cyclic Redundancy Check (CRC) encoder, and said error correction decoder comprises a CRC decoder.
24. The NoC reconfigurable router assembly of claim 1, wherein each of the plurality of fault-tolerant operation modes has a fault-tolerant capability, retransmission traffic, latency and energy efficiency.
25. The NoC reconfigurable router assembly of claim 1, wherein the machine-learning optimization balances design trade-offs and simultaneously optimizes parameters including power, latency and error rate.
26. The NoC reconfigurable router assembly of claim 1, wherein the downstream and upstream synchronization control signals synchronize the error correction codes used in said plurality of output encoders and said plurality of input decoders of the current router, downstream router, and upstream router.

Share

Title

Learning-based high-performance, energy-efficient, fault-tolerant on-chip communication design framework

Inventor(s)

Ke Wang, Ahmed Louri

Assignee(s)

George Washington University

Patent #

12040897

Patent Date

July 16, 2024

Inquire about this intellectual property

Learn more about "Revolutionizing Chip Design with Intelligent, Energy-Efficient Communication Frameworks"