Last visit was: Wed Dec 10, 2025 11:21 pm
It is currently Wed Dec 10, 2025 11:21 pm



 [ 245 posts ]  Go to page Previous  1 ... 13, 14, 15, 16, 17
 Qupls (Q+) 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2432
Location: Canada
Quote:
With a project your size
how do check for the stupid little errors?
I run something called lint on the code. It checks for all kinds of mistakes. Then I also run synthesis to get the size of things.
I spot a lot of the little errors while editing the code. There are bound to be errors yet undetected.
I work on things incrementally, and have ported code from other projects that was working. I try not to do too much at once without running some sort of test.
Doing diffs helps too.
I have lots of experience by now and that helps. I tend to avoid common mistakes.

*****************************

Re-did micro-op loading into the decode stream. The number of micro-ops allowed per instruction was increased to 48 from eight. That is enough micro-ops to cover more complex sequences like float divide and reciprocal. The synthesize size is just slightly less than it was before, while at the same time allowing many more micro-ops per instruction.

I have been busy reducing the size of the core. A whopping 60k LUTs has been removed, without changing functionality. I got the 23,000 LUT ROB down to about 18,000.

Did a lot of work on the micro-op (instruction) decoder, converting about a dozen modules for Qupls4.

Got register r0 working as a general-purpose register now, except when it is used in an address calculation.
Rbase = r0 bypasses to 0
Rindex = r0 bypasses to 0
Bypassing r0 for both base and index allows absolute addressing mode.
Otherwise r0 is a general-purpose reg.

Register fields can be used as small or large constants for most instructions. There is not much need to bypass r0 to 0.
Added a LOADI instruction so that loading a constant into a register is possible while r0 is non-zero. An ADD instruction was being used before to load constants.

Added two opcodes to allow instruction pointer relative addressing for loads and stores.

Updated some of the documentation again.

_________________
Robert Finch http://www.finitron.ca


Sun Dec 07, 2025 2:55 am WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1863
> I have been busy reducing the size of the core. A whopping 60k LUTs has been removed, without changing functionality

Very good!


Sun Dec 07, 2025 1:13 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2432
Location: Canada
Reduced the size of the core again, due to work on the instruction dispatcher.

Figured out what was causing the tools to take a long time to synthesize. It was in the instruction dispatcher. There were a way, way too many muxes. It worked previously because data structure sizes were smaller, and there were fewer functional units. I managed to re-write things in a better fashion and now the instruction dispatch is much smaller. There are some limitations however.

Previously there was no limitation on instruction dispatch other than a max of four per clock cycle. Now, there are restrictions on which functional units can be issued instructions in the same clock cycle. For instance, both a multiply and a divide cannot be issued in the same clock cycle as they are sharing a dispatch slot now. But I managed to increase the number of dispatch slots to five.

_________________
Robert Finch http://www.finitron.ca


Tue Dec 09, 2025 6:16 am WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 881
Does scaling where you multiply and then divide work ok? tax 7.25 %
pennys = (pennys * 725) / 100


Tue Dec 09, 2025 9:04 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2432
Location: Canada
Quote:
Does scaling where you multiply and then divide work ok? tax 7.25 %
pennys = (pennys * 725) / 100
In theory it should work. It will still do all the instructions eventually, but it may not be in the first cycle that it is encountered. If a divide and multiply happen at the same time, it will choose to do the divide to queue first as that can take many clock cycles. The CPU does take dependencies into consideration when going to execute instructions. I think in your example it will do the multiply first then the divide afterwards because the division depends on the result of the multiply. It might queue the instructions in a different order but they do not execute until all the operands are valid. Since the divide depends on the multiply result it wont execute until later.

The core is too large to fit in the FPGA even in its smallest configuration. It is close to fitting but I am not going to try and shoehorn it in. I may end up writing a software emulator for the core. There is an even larger core with 512-bit registers being worked on.

Started working on getting the thing running in simulation.

_________________
Robert Finch http://www.finitron.ca


Wed Dec 10, 2025 6:13 am WWW
 [ 245 posts ]  Go to page Previous  1 ... 13, 14, 15, 16, 17

Who is online

Users browsing this forum: alibaba-cloud, claudebot, CN-mobile-9808-b, sougou and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software