Beautiful graphics from simple math – Mandelbrot set on FPGA, Part 1

Introduction to the Mandelbrot set

Like I mentioned in the previous post, I wanted to familiarize myself with the HDMI output on the DE10-nano FPGA board, and there is no harm in getting some pretty graphics along the way, so I decided to implement a Mandelbrot set viewer on FPGA.

If you are not familiar with the Mandelbrot set, it is a type of a fractal, which is simply put an infinitely – repeating recursive pattern. That means that you can zoom into the set and discover ever repeating, self-mirroring patterns. They are also a little weird ;). You can read more about them on the linked Wikipedia article.

The familiar look of the Mandelbrot set is depicted in this picture:

Mandelbrot set

Mandelbrot set

For the formal definition of the Mandelbrot set, I’ll just quote the Wikipedia article:

The Mandelbrot set is the set of values of c in the complex plane for which the orbit of critical point z= 0 under iteration of the quadratic map  remains bounded. Thus, a complex number c is a member of the Mandelbrot set if, when starting with z0 = 0 and applying the iteration repeatedly, the absolute value of zn remains bounded for all n > 0.

For example, for c = 1, the sequence is 0, 1, 2, 5, 26, …, which tends to infinity, so 1 is not an element of the Mandelbrot set. On the other hand, for c = −1, the sequence is 0, −1, 0, −1, 0, …, which is bounded, so −1 does belong to the set.

Which put in simple terms means that the Mandelbrot set can be calculated simply by iteratively calculating the equation for all “interesting” complex numbers , and starting the iteration with . The complex number is expressed as x + iy, where x and y can be mapped to the horizontal and vertical position on the screen. If the iteration doesn’t escape to infinity, then is part of the set.

That handles the “black&white” part. You can then decide how to color the image. A popular (and simple) algorithm you can use is the Escape time algorithm, which selects the color of each individual pixel based on whether the pixel is part of the set or not, and if not, how many iterations it took to “escape”.

The simplified Escape time algorithm can be represented in pseudocode like this:

x2 := 0
y2 := 0

while (x2 + y2 ≤ 4 and iteration < max_iteration) do
  y := 2 × x × y + y0
  x := x2 - y2 + x0
  x2 := x × x
  y2 := y × y
  iteration := iteration + 1

The Mandelbrot set colored with the Escape time algorithm has some properties that are very helpful if you want to calculate it on an FPGA:

  • the set is closed and contained in a closed disk of radius 2 – if the distance from the origin (starting point) is bigger than 2 for the current iteration, then that origin is not part of the set, and you can stop iterating
  • since the set has a limited radius, that means that the range of numbers we need to work with is also limited – important if you want to use integer only math
  • the pixel values are independent of one another – you can calculate all pixels in parallel
  • not a lot of iterations are needed for each pixel
  • the equation to calculate it is simple, probably the simplest of all fractals that I know of


So, to implement the algorithm on an FPGA, you can already see the requirements:

  • memory to hold the calculated pixel values – how much depends on the FPGA resources, and the amount of available memory will determine the maximum resolution
  • the maximum number of iterations will be also limited by the memory – using 8, 9 or 10 bit memory width will provide the best resource utilization
  • as can be seen from the pseudocode above, a lot of multipliers will be needed, ideally, you would use all the FPGA DSP elements available
  • the multipliers will need to be quite wide, at least 27 bits, since you’d want to zoom deep into the Mandelbrot set


This covers the introduction to the Mandelbrot set. I’ll go deeper into the implementation in the next post. Cheers!


An update on minimig, and a (late) Happy New Year!

I know there wasn’t much updates on the minimig status, but with good (;)) reason: I’ve been pretty busy, so I had little time to work on minimig updates. I did spend quite some time on rewriting the CPU <-> minimig interface, which will hopefully reduce timing problems and enable me to finally add the ethernet Zorro interface. The CPU clock will also be reduced to 28MHz, hopefully that won’t affect the CPU speed much. This is quite a complex task, so don’t expect results immediately.

On the other hand, I’m trying to fix some smaller problems and bug reports, and the intermediate beta releases will be available as I work toward the v1.3 release. I already have the supporting scripts ready that will help me build the firmware & core, increase build numbers, git tag & commit, zip it up and upload it to the server. First of these releases is already available here:

Or, alternatively, there will be a .zip file that will always contain the last released version here:

Don’t expect too many changes though – each new release will probably only contain two or three small updates. In this release, just two changes are important: fixing the SDRAM refresh counter, which should help with RAM stability, and the OSD display fix for productivity video modes. More to follow soon, though! 😉



The latest beta builds will be available here:



qSoC – The QMEM bus

QMEM bus specification

This post describes the QMEM bus, the different cycles allowed, the bus elements and different bus configurations supported.

1. Introduction
2. Features
3. Signals Description
4. Cycles Description
5. Bus Elements
6. Bus Configurations


QMEM (abbreviated from quick memory), is a flexible, portable, simple and fast system interconnect bus, specifically targeted at SoC systems for their inter-chip communication needs.

QMEM is based on synchronous memory bus with added flow control signals, which makes it very simple and fast. The origin of QMEM is the OR1200 open-source CPU implementation, where it was used as a tightly-coupled memory (TCM) bus inside the CPU.


  • flexible: endian-independent, supports different data and address widths, flexible access speeds, flexible interconnect methods like point-to-point, shared bus, multi-layered interconnect
  • portable: fully vendor-, tool-, language- and technology independent
  • simple: based on synchronous memory bus with added flow control, it is the simplest bus with minimal bus interconnect logic
  • fast: fully pipelined, single cycle reads and writes, with no setup or end cycles
  • extensible: allows any number of transfer tags added to support different features, like master identification, slave error reporting, etc

Signals description

  • cs – master output signal denoting valid master cycle when cs=’1′, and idle cycle when cs=’0′
  • we – master read/write select signal, denotes write cycle when we=’1′, and read cycle when we=’0′
  • sel – master byte select signal, one bit for each byte in data words, generally only taken into account during write cycles and ignored during read cycles, sel=’1111′ denotes four bytes, sel=’0011′ denotes lower two bytes, sel=’0100′ denotes single byte
  • adr – master address signal
  • dat_w – master write data
  • dat_r – slave read data
  • ack – slave cycle acknowledge, asserted and valid when master cs=’1′

All signals are active-high. Optionally, clock (clk) and reset (rst) signals can be considered part of the QMEM bus, especially if the bus uses different clock domains that the rest of master or slave logic. Other common optional signals are slave error response (err), and the master id signal (mid).

The master can start the cycle at any time (synchronously to the clock), by asserting cs=’1′, and set any other signals as appropriate. The cycle ends with the slave acknowledge (ack=’1′). After the slave acknowledges the cycle, the master is free to start a new cycle immediately, by keeping cs=’1′, or going to an idle state by asserting cs=’0′.

Cycles description

Reset condition

In reset state (rst=’1′), all QMEM bus signals should be ignored, and their state can be undefined. The first cycle out of reset should be initialized as an IDLE cycle.

IDLE cycle

IDLE cycle is denoted when master cs=’0′ and slave ack=’0′. There is no activity on the bus, other than the possible slave read data (dat_r), if the previous cycle was a read cycle.

QMEM reset state & idle cycle

QMEM reset state & idle cycle

WRITE cycles

A WRITE cycle is denoted when master asserts cs=’1′, we=’1′, and puts the desired address on adr, the byte-select on sel and data to be written on dat_w. The master must not change any of its signals, or stop the cycle by asserting cs=’0′, without receiving the slave acknowledge (ack=’1′) first. The slave can insert any number of delay cycles by holding ack=’0′ while the master asserts cs=’1′, until it is ready to service masters’ request. The master can start a new cycle immediately after synchronously detecting slave acknowledge response (ack=’1′).

QMEM single write cycle with no delay

QMEM single write cycle with no delay

QMEM single write cycle with 1 cycle delay

QMEM single write cycle with 1 cycle delay

QMEM multiple write cycles with no delay

QMEM multiple write cycles with no delay

QMEM multiple write cycles with 1 cycle delay

QMEM multiple write cycles with 1 cycle delay

READ cycles

A READ cycle is denoted when master asserts cs=’1′, we=’0′, and puts the desired address on adr, and the byte-select on sel . The master must not change any of its signals, or stop the cycle by asserting cs=’0′, without receiving the slave acknowledge (ack=’1′) first. The slave can insert any number of delay cycles by holding ack=’0′ while the master asserts cs=’1′, until it is ready to service masters’ request. The master can start a new cycle immediately after synchronously detecting slave acknowledge response (ack=’1′).

QMEM single read cycle with no delay

QMEM single read cycle with no delay

QMEM single read cycle with 1 cycle delay

QMEM single read cycle with 1 cycle delay

QMEM multiple read cycles with no delay

QMEM multiple read cycles with no delay

QMEM multiple read cycles with 1 cycle delay

QMEM multiple read cycles with 1 cycle delay

MIXED cycles

A QMEM master is free to mix READ, WRITE and IDLE cycles any way it chooses. The slave must be ready to respond to a master WRITE cycle, even if it is in the same clock period as the previous cycle’s master read request, since reads are pipelined.

QMEM mixed cycles with no delay

QMEM mixed cycles with no delay

QMEM mixed cycles with 1 cycle delay

QMEM mixed cycles with 1 cycle delay

ERROR cycle

The QMEM bus has an optional err signal. The slave must keep this signal tied to ground (err=’0′), unless it wishes to communicate an error condition to the master. The slave can do that by asserting err=’1′ at the same time it is acknowledging the cycle with ack=’1′. Usually, the error signal is high if the slave is in reset. Another case where a slave might raise the error condition, is if the master is trying to address a memory or register that is bigger than the size of the slave memory.

QMEM error cycle

QMEM error cycle

QMEM bus elements

QMEM bus has four major bus components: masters, slaves, arbiters and decoders.

QMEM master

A QMEM master is a master device on the bus. It can start cycles, set bus direction and number of bytes affected, sets data to be written or reads data from slaves.

An example of a QMEM master (non-synthesizeable) can be seen here: qmem_master.v

QMEM slave

A QMEM slave responds to master cycles, either writing the master data to its memory or registers, or reading its memory or registers and sending them to the master.

An example of a QMEM slave (non-synthesizeable) can be seen here: qmem_slave.v

QMEM arbiter

An arbiter is a bus element that decides which master has access to a slave device in a given cycle. Each QMEM slave that is accessed by multiple masters must have an arbiter. An arbiter can grant masters access to the slave on priority-basis, it can use a round-robin scheme, or a combination of the two.

An example of a priority-based QMEM arbiter (synthesizeable) can be seen here: qmem_arbiter.v

QMEM decoder

An decoder is a bus element that directs master requests to an appropriate slave, based on a slave decoding scheme, which is usually address-based. Each QMEM master that accesses multiple slaves must have a decoder. This only applies to immediate connections, so in a shared bus configuration where the masters connect to a single arbiter (= single slave), there is no need for a decoder on the master’s side (the slave side of the arbiter in this bus configuration could still have a slave decoder attached, if there are multiple slaves).

An example of a QMEM decoder (synthesizeable) can be seen here: qmem_decoder.v

There are other elements that can be attached to a QMEM bus, like a bus register stage, which can register master, slave, or both side of the bus, a bus monitor that validates bus signals according to the rules, and bus converters, which can convert form and to QMEM bus from other bus architectures, like APB, AHB and Wishbone. These still need to be written, so I’ll write a

QMEM bus configurations

A QMEM bus can be built in multiple different configurations, depending on the speed or logic utilization needs. Some of the common configurations are: shared bus, point-to-point and multilayer.

In the following graphs, the mX represents masters, aX aribters, dX decoders and sX slaves.

QMEM shared bus

QMEM shared bus

QMEM point-to-point

QMEM point-to-point bus

QMEM multilayer bus

QMEM multilayer bus

CPU (from:

qSoC – The OR1200 CPU

The OR1200 is a RISC-type, Harvard architecture (separate instruction and data buses) synthesizable CPU core, written by the OpenCores community.

It can be configured with a number of optional components, such as cache, MMU, FPU, timer, programmable interrupt controller, debug unit, etc. For sake of simplicity, I decided to disable most of the optional components, except the hardware multiplier and divider, all other features will be added if/when needed.

The OR1200 has standard GNU tools available, like gcc, binutils, and a few standard libraries, like uClibc and newlib. There is also an official port of linux available.

One of the requirements was that the CPU would be able to run with a 50MHz frequency on a CycloneIII – class FPGA. The hardware divider implementation used in OR1200 is a 8-cycle divider, which was on a critical timing path, and needed to be reduced to a 16-cycle divider, which, while slow, is still heaps faster than a software division implementation. The changes are in this commit: 01a18ffbbb86b074. There’s a testbench for the updated division code in this commit: 8232f830c60a1166.

There are other settings (defines) available in the or1200_defines.v file, you can tune some of them to get a better utilization or speed from the code. One of the things that I changed to get some more speed was the type of the compare used in the ALU (the change is in this commit: 337217f9511888c2).

The OR1200 version used here is not exactly in sync with the official OR1200 repository, as this core was split from the original a long time ago and changed a lot and the changes didn’t propagate back to the original repo. One of the important changes, besides some bug fixes, is the different bus – QMEM, replacing the original Wishbone bus.

With most of the optional components removed, and with the QMEM bus, the OR1200, as used in this project, is a nice, small but fast little softcore CPU.