Last visit was: Thu May 01, 2025 3:34 pm
|
It is currently Thu May 01, 2025 3:34 pm
|
Author |
Message |
MichaelM
Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL
|
robfinch wrote: Sometimes I think Thor's too complex for me. There's currently a bug I can't seem to find. I put debug status LED displays in the software, and the location of the problem changes depending on the presence of the status display. Thor executes a break instruction which fills the screen with zeros (a dump of the address). I think it's maybe a bad DRAM interface causing a return to address zero. I guess I need to hammer test the interface. Know what you mean about the complexity. I use my experience with my "simple" core to temper expectations for my teams at work. It's those instances where my mind loses its grasp on the whole project that I use to reduce the complexity of the whatever we're trying to implement at work. It's natural to add complexity to resolve problems, but invariably there comes a time when we lose our grasp on the problem. When that happens, it becomes difficult to isolate and solve issues. You're already doing the things I would suggest to isolate and solve the issue, so I'll only offer encouragement. Hope that you find your issue quickly and that its resolution is not too involved and that it doesn't affect too many of your core's modules.
_________________ Michael A.
|
Mon Jan 18, 2016 1:42 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Still working on Thor. I got a little further when I discovered an error in software that drives the debug LEDs. I did something like the following: Code: ldi r1,32[bp] sc r1,LEDS
The intent was to set the LEDS based on a value passed on the stack, but I used the load immediate instruction. The assembler quietly assembled it as an load immediate #32. So the LEDS always reflected the number 32. LED status 32 is set by an interrupt subroutine, so I couldn't figure out why the interrupt routine was being called when all interrupts were disabled. Some work has been done on a vector extension to Thor. Thor's now too big for the FPGA but I keep working on it anyway.
_________________Robert Finch http://www.finitron.ca
|
Wed Apr 06, 2016 1:59 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Narrowed the processor bug down to this chunk of code, it gets to 272B but not to FMTKc_149. (LED status 33 but not the next C line LED status 34) It's all mundane code and nothing obvious stands out. Looks like I have to move to assembler code to find the exact line it fails on. Code: ; for (jj = 0; jj < 8; jj++) { FFFC2703 01 93 1A 00 DF sw r0,-16[bp] FMTKc_148: FFFC2708 01 86 DA 00 DF lw r3,-16[bp] FFFC270D 01 21 03 02 cmpi p1,r3,#8 FFFC2711 16 30 59 p1.ge br FMTKc_149 ; LEDS(33); FFFC2714 01 47 1B FE addui sp,sp,#-8 FFFC2718 01 6F 43 08 ldi r3,#33 FFFC271C 01 93 DB 00 C0 sw r3,0[sp] FFFC2721 01 A2 01 28 08 FC jsr _LEDS FFFC2727 01 47 1B 02 addui sp,sp,#8 ; jcbs[nn].tasks[jj] = -1; FFFC272B 01 86 1A 01 DF lw r4,-16[bp] FFFC2730 01 4E C4 20 00 mulu r3,r4,#2 FFFC2735 01 86 DA 81 DF lw r7,-8[bp] FFFC273A 20 08 01 4E 87 01 80 mulu r6,r7,#2048 FFFC2741 40 18 C2 00 01 4C 46 01 80 addu r5,r6,#_jcbs FFFC274A 01 4C 05 41 6B addu r4,r5,#1716 FFFC274F 01 6F 46 00 ldi r6,#1 FFFC2753 01 A7 46 11 neg r5,r6 FFFC2757 01 C1 C4 50 20 sc r5,[r4+r3] FMTKc_150: FFFC275C 01 86 DA 00 DF lw r3,-16[bp] FFFC2761 01 47 43 00 addui r3,r3,#1 FFFC2765 01 93 DA 00 DF sw r3,-16[bp] FFFC276A 01 3F 9B br FMTKc_148 FMTKc_149: ; LEDS(34);
_________________Robert Finch http://www.finitron.ca
|
Sat Apr 09, 2016 11:27 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
This bug has me stumped. I tried shifting the position of the code and it made no difference. I tried moving the code to the start of the boot rom then running it in simulation. The code ran through just fine, no errors in simulation. All the instructions are instructions the core has encountered and run successfully before. It's not likely to be something stupid like a wrong instruction length or bad opcode. The only thing I can think of is it's a error in the sequence of instructions and how they interact in the pipeline. So I guess the next thing to do is to break up the instruction sequence and see if that makes a difference.
_________________Robert Finch http://www.finitron.ca
|
Tue Apr 12, 2016 2:21 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
Moving a nop around between each pair of instructions might illuminate - perhaps that's what you already have in mind.
|
Tue Apr 12, 2016 8:23 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Quote: Moving a nop around between each pair of instructions might illuminate - perhaps that's what you already have in mind. It's a good idea. I think I found the error though. I tried splitting up the code then the error disappeared, but re-appeared later in code. However this time I was able to simulate the error. It turns out I made ALU #1 smaller by omitting the multiply operation but I forgot to account for that in the issue logic. So finally after thousands of instruction ran, the core issued a multiply to ALU #1 and crashed. The core was hung waiting for a multiply response and there was none. So it's fixed now and runs in simulation but I have yet to try it in the FPGA. I found another issue with divide by zero at the same time. If there was a divide by zero exception the divide by zero state would never clear.
_________________Robert Finch http://www.finitron.ca
|
Thu Apr 14, 2016 2:21 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
Well done! It's not a Pentium is it?!?
|
Thu Apr 14, 2016 8:14 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Well, I tried it in the FPGA and it got further. It crashes a couple of dozen lines further down in code just after LED status 46. It looks similar. Mundane code, all previously executed instructions. It's supposedly after it does a screen clear and a printf but nothing shows on the screen so something isn't quite right. At least it got past the first crash point. It's a pita because it isn't really practical to run it in simulation. There are loops like clearing the message list (16,000 messages). Chances are that if I modify the code at all to shorten the loops for instance, the bug will go away. I already tried one minor mod in simulation and *poof* no bug. Sim ran fine. Thor's running for hundreds of thousands of cycles ! I suppose I could try a sync instruction between each instruction (taking the nop idea). Running sync's will likely "fix" a pipeline bug so I could call it a feature and leave it that way Quote: Well done! It's not a Pentium is it?!? Was there a Pentium bug that's now a feature ?
_________________Robert Finch http://www.finitron.ca
|
Thu Apr 14, 2016 12:51 pm |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
I was thinking of the pentium's two slightly different execution pipelines (if I'm using the right terminology)
|
Thu Apr 14, 2016 6:15 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
I was reading up on the Pentium, and yes Thor is a NAP processor (Not a Pentium). I think it's a sensible idea to relegate infrequently used instructions to a single functional unit, so it sort of is like a Pentium which has the two different U, V pipelines. The Pentium is a slightly different arch. Thor organizes instructions into a reorder buffer or queue. Instructions at the head of the queue are committed to the machine's state. New instructions are added at the tail of the queue as room permits. It's a bit like a pipeline that can vary in length. It adjusts dynamically. A pipeline acts like a conveyor belt that has a fixed distance to travel. Things are placed at one end and fall off the other. A couple of newer archs. use the conveyor belt concept. I believe some of the new processor's also use reorder queues rather than strict pipelines. (The reorder queue idea been around for a while).
I fixed a software bug and Thor gets a few lines further yet. I had coded an expression like "ptr = &stack[nn] + 511" when I really wanted "ptr = &stack[nn][511]" In the first case the compiler took the 511 and multiplied it by the size of the stack (8192) because it doesn't know how to add pointers. So it set the top of stack to some strange place. I think pointer addition by the compiler produces unpredictable results. It's possible to subtract pointers of the same type in order to get an index value but beyond that operations are undefined.
_________________Robert Finch http://www.finitron.ca
|
Fri Apr 15, 2016 2:50 pm |
|
 |
MichaelM
Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL
|
Rob:
Just wanted to let you know I am reading this thread. I am interested in your problems and their solutions. I have an interest in the processor core for a project of my own. Naturally, since it would use IP not of my design, I'm interested in understanding the problems that I might encounter.
_________________ Michael A.
|
Fri Apr 15, 2016 4:13 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
I'm working on another problem now. Got to LED status 56 which is right at the end of a subroutine. It looks like something is screwy with the subroutine return. I'm guessing it's a software problem with vars on the stack (eg return address) getting messed up. I also note that printf() is called by the routine and it doesn't display anything. So it's likely something to do with the video processing exception. Just testing the theory now by removing the printf() (call to the video bios).
I'm also working on another core (acronym DSD2) which is a like a streamlined version of Thor (but not software compatible). DSD2 has only four different instruction sizes rather than Thor's eight. DSDS2 doesn't do some of the things that Thor does. The decoding's thus a bit simpler. DSD2 works on the notion of a short queue and hence doesn't support predication. It should be a good 20 to 30% smaller (or more) than Thor, and hopefully faster as well. Sure Thor can execute two instruction at a time but he's not particularly fast. The core is running at only 25 MHz. I expect that the RISC-V (another core I've worked on) compatible core running at 75MHz is probably faster even though it probably takes an average of 2-3 clocks per instruction.
_________________Robert Finch http://www.finitron.ca
|
Sat Apr 16, 2016 6:08 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Confounded ddr2. Can't get the DRAM to work reliably. SRAM is so much simpler to work with. A number of OS vars are stored in the DRAM, so things can't boot without a working RAM. On readback from a checkerboard test it responds with: 55555555 55555555 55555555 55555555 AAAAAAAA 55555555 AAAAAAAA 55555555
It' supposed to be alternating A's and 5's: AAAAAAAA 55555555 AAAAAAAA 55555555 AAAAAAAA 55555555 AAAAAAAA 55555555
Looks like there may be a bad address connection or a timing problem. Trying to figure out which address line is messed up. Maybe I can only use 1/2 the ram. Looks like if A4 is low it doesn't work, but it might if A4 is high.
_________________Robert Finch http://www.finitron.ca
|
Sun Apr 17, 2016 6:24 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Progress little bits at a time. On the chance that the ddr ram was actually working, I decided to try bypassing the data cache for my RAM test routine. Sure enough with the cache bypassed the RAM test routine ran successfully ! The problem has something to do with the data cache when ddr memory is accessed. I used the volatile load instructions rather than the regular load instructions and presto things worked.
_________________Robert Finch http://www.finitron.ca
|
Wed Apr 20, 2016 8:39 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
A eureka moment has arrived. I found a problem with the RAM interface. When I put the system together I built a nice pipelined scratch ram (BRAM) interface that can return data every clock cycle after the pipeline startup delay. This was interfaced to Thor using an eight word burst access mode controller. The same controller is used for all memory access. Unfortunately the dram memory interface I built only supports four word burst accesses. Hence the controller would return the first four words properly and scramble the rest. I'm amazed the system worked as well as it did. The system can now get through a RAM test routine successfully. But it still hangs. So there is another problem out there.
_________________Robert Finch http://www.finitron.ca
|
Wed Apr 20, 2016 8:08 pm |
|
Who is online |
Users browsing this forum: claudebot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|