Cellular Automata

Optimisation

Simulating and displaying massively-parallel hardware systems - such as cellular automata - on conventional, serial VN-architecture computers can easily mop up CPU cycles. This can result in performance difficulties rather rapidly.

This article looks at some methods that have been developed historically in an attempt to cope with such problems.

It attempts to concentrates on issues that transcend programming languages - and tries to avoid giving too much optimisation advice that is not specific to cellular automata.

Some of the methods described are simple and obvious. Others require significant programming skill to implement well.

Before we start:

Cautionary notes

  • Knuth

    We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

    - Donald Knuth

    I'm sure you've heard this before, but I have to say it again.

    Optimisation is probably the last thing you should be doing. Get your program to work properly first, and then profile it, and see if it needs further work.

  • The Pareto Principal

    This is often known as the 80/20 rule. Some think that should be called the 90/10 rule.

    There are various statements of it that relate to computer programming:

    • 80% of the bugs lie in 20% of the code;

    • 80% of the budget is spent on 20% of the project;

    • 80% of the code is written by 20% of the programmers;

    ...I'm sure you get the idea.

    In the context of optimisation we have:

    • 80% of the time is spent in 20% of the code;

      • Moral: profile your code - and concentrate your efforts where they can do some good;

    • 80% of the improvement comes in with the first 20% of the effort;

      • Moral: give up early - optimisation can rapidly run into diminishing returns;

  • Maintenance

    Optimisation is the sworn enemy of maintainable, comprehensiblle code. In some cases these aims coincide - but this occurs more by chance than anything else. If you think you (or anyone else) is going to want to maintain, modify or understand your code in the future, don't optimise it too far.

  • Meddling

    Optimisation can introduce bugs that were not present before. To a first approximation, if your program works, leave it well alone.

  • Optimising as-you-go

    The general rule of thumb is don't optimise as you go along.

    You might think this will avoid a costly rewrite when you find the program does not perform - but in practice optimisation can be applied more effectively and efficiently to an existing working program.

    It's harder to predict where the bottlenecks are going to be than it is to use tools to measure what the code is doing once it has been written.

    There is sometimes a place for optimising as you go - but it is usually not a good idea.

  • Cost

    Programmer time is expensive. Hardware is often relatively cheap - and Moore's law suggests it will get cheaper as time passes. Make sure you don't make the wrong tradeoff here.

  • Proverbs

    • The best is the enemy of the good.

    • Working toward perfection may prevent completion.

    • Complete it first, then perfect it.

    • The part that needs to be perfect is usually small.

OK - if you have digested that lot and still want to proceed, then we'll begin.

Optimisations

  1. Naive implementation
  2. Buffer swap
  3. Bit packing
  4. Lookup table (LUT)
  5. Single buffer
  6. Greedy reading
  7. Gigantic lookup table (GLUT)
  8. Sparse matrix techniques
  9. Only redraw changed regions
  10. Update as you go along
  11. Don't redraw too fast
  12. Adaptive runtime optimisation
  13. Avoid unnecessary boundary checks
  14. Run directly in the display buffer
  15. Avoid simple boundary implementations
  16. Use low-level programming
  17. Take advantage of the hardware
  1. Naive implementation

  2. Buffer swap

  3. Bit packing

  4. Lookup table (LUT)

  5. Single buffer

  6. Greedy reading

  7. Gigantic lookup table (GLUT)

    • This is a generalisation of the look-up table approach.

      Rather than processing one cell at a time, you combine multiple cells into a block, combine their neighbourhoods, and use that to index into an array.

      The same comments that were given about cache usage apply (in spades) here.

      As in the greedy reading approach you should ideally attempt to maximise the number of cells you deal with at once while minimizing the size of the neighbourhood you need to calculate the results - all while remaining within your memory constraints. This results in dealing with near-round regions in one chunk - rather than dealing with long thin ones.

  8. Sparse matrix techniques

  9. Only redraw changed regions

  10. Update as you go along

  11. Don't redraw too fast

  12. Adaptive runtime optimisation

  13. Avoid unnecessary boundary checks

  14. Run directly in the display buffer

  15. Avoid simple boundary implementations

  16. Use low-level programming

  17. Take advantage of the hardware

If you have got as far as this, you can probably noticed that the list got a bit anal towards the end.

I list some of these techniques for the sake of completeness - not because anyone should necessarily be performing them.

I wish you happy optimisations.

Don't forget to count to ten before embarking.


Tim Tyler | tim@tt1.org | http://cell-auto.com/