CPU (from: https://www.flickr.com/photos/2top/10402551773/)

qSoC – The OR1200 CPU

The OR1200 is a RISC-type, Harvard architecture (separate instruction and data buses) synthesizable CPU core, written by the OpenCores community.

It can be configured with a number of optional components, such as cache, MMU, FPU, timer, programmable interrupt controller, debug unit, etc. For sake of simplicity, I decided to disable most of the optional components, except the hardware multiplier and divider, all other features will be added if/when needed.

The OR1200 has standard GNU tools available, like gcc, binutils, and a few standard libraries, like uClibc and newlib. There is also an official port of linux available.

One of the requirements was that the CPU would be able to run with a 50MHz frequency on a CycloneIII – class FPGA. The hardware divider implementation used in OR1200 is a 8-cycle divider, which was on a critical timing path, and needed to be reduced to a 16-cycle divider, which, while slow, is still heaps faster than a software division implementation. The changes are in this commit: 01a18ffbbb86b074. There’s a testbench for the updated division code in this commit: 8232f830c60a1166.

There are other settings (defines) available in the or1200_defines.v file, you can tune some of them to get a better utilization or speed from the code. One of the things that I changed to get some more speed was the type of the compare used in the ALU (the change is in this commit: 337217f9511888c2).

The OR1200 version used here is not exactly in sync with the official OR1200 repository, as this core was split from the original a long time ago and changed a lot and the changes didn’t propagate back to the original repo. One of the important changes, besides some bug fixes, is the different bus – QMEM, replacing the original Wishbone bus.

With most of the optional components removed, and with the QMEM bus, the OR1200, as used in this project, is a nice, small but fast little softcore CPU.

SoC

qSoC, or how to build an FPGA SoC from scratch

Introduction

I’d like to talk about how to build a fast, lean, clean SoC machine. What is an SoC? An SoC, or a System On a Chip, is, simply put, a microprocessor with some common peripherals attached, like ROM, RAM, SDRAM controller, UART, SPI, GPIO, and other I/O ports or protocols, all tied up into a system. An SoC is not unlike a microcontroller, which also has a microprocessor bundled with some sort of memory and I/O, the distinction is more or less cosmetical.

An SoC can be used standalone, running barebone or a form of an operating system and communicating with the world outside of the FPGA, or alternatively, it can be used as a part of a larger system, incorporating bigger logic blocks and acting as a sort of support for it. The latter will be the focus of this and following posts.

What I want from the SoC is small footprint (small consumption of FPGA resources), good processing speed, and a well-defined interface to other parts of the FPGA or the outside world. What I have in mind will need at least these components:

  • CPU
  • a well-defined bus interface
  • ROM for the bootloader or the whole firmware
  • RAM, either internal or external SDRAM
  • ‘registers’, which is a catch-all phrase for access to memory-mapped I/O and other functions

CPU

So, let’s start with the CPU. There are a bunch of open-source CPU architectures available, and I’ll definitely want to try and incorporate more than just one into this SoC, so there’s a choice of power vs. resource consumption. But for now, I’ll start with the OpenCores OR1200 (or a slightly modified version of it), since that is the CPU core I’m most familiar with. Later on, I’ll definitely look into adding at least an AVR, an ARM and a RISC-V variants to this SoC.

The OR1200 is a nice, small and relatively simple Harvard architecture RISC CPU, loosely based on the MIPS architecture and instruction set. It is pretty configurable, so you can for example disable the hardware multiplication and division support to save logic gates, and run a software implementation instead. It also has support for cache and MMU, and an optional FPU unit, but I’m going to strip it of all those (unnecessary) addons and just keep the CPU core.

Bus

Next on the list is the bus interface. In my opinion, a well-defined bus interface is a cornerstone of a good SoC design, and must not be overlooked or brushed over quickly. All of the components in an SoC will communicate through this bus, so it best that it is well-designed and thought-through at the beginning, so there are no strange errors popping up all of a sudden, if you add a component somewhere down the line. There are quite some bus architectures intended for an SoC interconnect to choose from, like Wishbone, APB, AHB, etc., but I chose the QMEM bus, especially for its simplicity and speed. As will be explained later, the QMEM bus is not much more than a standard synchronous memory bus with added flow control and optional tags attached to it.

Memory

There’s not much to say about memory, besides that it is needed. At the very least, a small ROM is needed for the bootloader or a sort of a monitor program, that can write and read to memory and registers, and load firmware from the serial port or SD card, but I’ll talk about that later. Besides the ROM, the CPU will need some form of RAM, either from FPGA’s internal memory blocks, or an external memory like SRAM or SDRAM. The SoC should preferably support all of these variants.

I/O

The SoC will need support for some standard external communication protocols. The most important one is an UART, or a serial port, which can be used for debugging, controlling the SoC system with the help of the monitor firmware, uploading of new firmwares, etc. Another very useful protocol is an SPI master, so the SoC can talk to an SD card and load files from it. A GPIO (general purpose I/O controller) can be added to the SoC, so a range of pins can be controlled for digital I/O. I plan to add support for many other I/O channels later on, like a VGA controller, audio output, including sigma-delta DAC and DAC IC support, and other such interfaces.

Other important components

The SoC needs a couple of other standard components, like clock management, reset synchronization module, an interrupt controller, and a timer or two.

Frequency considerations

Another thing to consider is at what frequencies the SoC should run. Usually, in an ASIC product, this would need to be a balance between power consumption and processing speed, limited by the particular process node limitations. Luckily, for an FPGA project, power consumption is not so important, especially taking into account the large static power consumption of FPGAs, so the frequency can be more easily selected based on actual needs.

Personally, I like the trio of 25/50/100MHz frequencies, and I’ll explain what I mean and why.

An SoC will very probably contain an SDRAM controller, and the 100MHz is the max frequency of many SDRAM ICs. Of course, many can run at higher frequencies, up to 166MHz, but the 100MHz is a safe bet to work with any SDRAM IC.

Next up, the 50MHz is in my opinion a nice operating speed for most of the logic in the FPGA, as it nicely balances the required flip-flops for the asynchronous logic to work at this frequency, without wasting a lot of LEs. Any CPU for the qSoC should be capable to work (at least) at this frequency.

The 25MHz is the maximum speed an SD card operates over the SPI bus, and the 25MHz clock can be generated from toggling flip-flops running on the 50MHz clock obviously.

The 50MHz and 25MHz are also frequencies that can be used as pixel clocks for two VGA video resolutions: a 640×480 VGA with 25MHz pixel clock and a 800×600 SVGA with 50MHz clock.

There’s another benefit to using the frequencies that are nice multiples – you can program the PLL in the FPGA to generate these frequencies that are synchronous to each other, which means that you don’t have to use clock-domain crossing logic, since all three clocks can be edge-aligned. This way, you can save gates, plus you don’t have to deal with the headaches that CDC will definitely bring.

Code repository & structure

The repository for this SoC experiment (which I named qSoC for Qmem SoC) is here:

https://github.com/rkrajnc/qsoc

Currently, there’s only OR1200 CPU RTL code there, together with some QMEM modules and a testbench for the OR1200 divider.

I like to keep all files in the repository nicely organized into these directories:

  • rtl – for all common synthesizable Verilog / VHDL code
  • fw – for all common CPU firmware code
  • tools – for all tools / scripts needed for building / converting etc. files needed for the qSoC
  • bench – for Verilog top benchmark files
  • ver – for any scripts needed for verification / benchmarking
  • fpga – for any FPGA board specific files, like Quartus or ISE project files, sdc files etc.

The directory structure might change in the future, as I’d like to keep the projects I build with this SoC in the same repo, so I’ll probably have to add a project directory, with any project-specific files, but I’ll cross that bridge when I get to it.

Boards

I plan to support at least these boards:

More could be added in the future.

Planned projects

The first project I plan to make is a simple WAV audio player, so I’ll be able to test the sigma-delta implementation used in the minimig. After that, probably something involving a VGA controller, a character generator, perhaps a whole system capable of running simple 2D SDL games. Another interest is definitely some sound processing / generation projects, like a MIDI synthesizer, or an FPGA guitar effect. We’ll see.

 

Coming next up, a few more words on the selected CPU and its modifications.

Pen

minimig AGA v1.2 for the MiST board released

There is a new release of minimig-AGA for the MiST board available, grab it on the minimig-mist page!

While the CPU is still not as stable as I’d like, the latest bitfield fixes by Till Harbaum fixed a lot of games and demos that didn’t work properly before. The OSD Turbo option is not needed or recommended for games or demos anymore, but could still be used for a little speed up in Workbench or programs. F11 key was fixed – when HRTmon is enabled, the F11 key enters the monitor program, when it is disabled, the F11 key acts as a normal Amiga HELP key. There is a CD32Pad option in the OSD, but I didn’t have time to fix the CD32 joypad emulation, so the option doesn’t have any effect for now. Initial support for programmable display modes (productivity modes, like 640x480x60Hz) is available in this core, but there are still some problems with them – some pixels are missing on the right edge of screen, and lowres sprites (mouse pointer) seem to have some problems – You can switch to hires pointer to get around this problem.

The planned ethernet support will unfortunately have to wait till the next release.

Please report any bugs or problems here:
https://github.com/rkrajnc/minimig-mist/issues (if you have a Github account)
– or –
https://gitreports.com/issue/rkrajnc/minimig-mist (no account needed)

Here are all of the updates made in this release:

  • initial implementation of programmable display modes
  • OSD HRTmon enable / disable switch is working (ON – F11 acts as NMI and enters HRTmon, OFF – F11 is HELP key)
  • proper handling of CPU access to chipRAM & custom regs – CPU waits for free slot when turbo option is disabled
  • fixed playfield 2 color lut offset when playfield 2 has priority for OCS/ECS modes
  • some IRQ changes
  • added support for undocumented Agnus / Denise behaviour when BPL=7 in ECS/OCS mode (fixes demos like SushiBoyz by Ghostown, Sliced&Diced by Dekadence, etc)
  • blitter line mode fixed (fixes demos like SushiBoyz & Sunglasses by Ghostown, Vectorize by RSi, etc)
  • bitfield instructions fixed, barrel shifter implemented by Till Harbaum
  • fixed colortable for sprites (bplxor) – fixes Alien Breed 3D (and probably many others)
  • fixed keyboard rate
  • kickstart ROM is also uploaded to mirror position (E0)
  • fixed reset problems & fastRAM disappearing
  • added CPU CACR cache control, fixed reading CACR reg
  • added scanlines for non-doublescaned config
  • fixed video dithering – both spatial & random dithering weren’t working properly

Enjoy!