Monthly Archives: October 2014

LMS syntax highlighting in Sublime Text 2

I’ve been working a bit on getting LMS syntax highlighting in to Sublime Text 2.

This is what it looks like so far:

LMS syntax highlighting in Sublime

I’m struggling to get labels to highlight properly but I’ve got a lot of the basics in there. It’s easy but boring to add all op codes so I’ll add them as I go along. It will be fantastic!



For a while I’ve been wondering about the performance of the EV3 and I’ve been thinking that since it is a 300 MHz ARM9 processor I figured the performance shouldn’t be an issue.

But after building a simple framework that draws one image using UI_DRAW BITMAP, and drawing 2 sprites that does UI_DRAW PIXEL for each pixel in the sprite I am already using up 90 of the 100 ms I have for each frame. This basically means that to achieve 10 fps I can’t do more than draw one background and two sprites.

So that is a bit disappointing!

There is a trick to copy the current frame buffer to a temporary buffer and then copy that buffer back to the frame buffer, and it is a cheap operation to do, so when I removed the UI_DRAW BITMAP that draws the background and only do that once, copy the drawn image to a buffer and then for each frame copy that buffer back to the frame buffer I get down to 59 ms per frame. So that’s a bit nicer.

Removing my own sprite drawing function but keeping the buffer copying thing the frame time is reduced from 59 ms to 0.3 ms, soooo .. that’s embarrassing. :)

If I keep all my code but remove the call to UI_DRAW PIXEL the frame time is 44 ms, so the actual drawing is somewhat expensive, but since we use quite a lot of frame time even without drawing then perhaps there is a chance to optimize this.

If I remove the reading of the pixel in the sprite I get down to a frame time of 21 ms. In these 21 ms it is all “my” stuff, meaning it is my inner loop in the sprite drawing, my algorithm to sort the game objects (bubble sort ftw), my overhead for managing game objects etc..

And if I remove the call to Sprite_Draw entirely I’m down to 1.5 ms. So the time is spent in that function.

So to summarize.

Full, old drawing: 90 ms
Drawing with the buffer trick: 59 ms
Drawing with the buffer trick but removing reading pixels from the sprite and writing them to the screen: 22 ms
Drawing with the buffer trick but removing the call to Sprite_Draw: 1.5 ms

So 57.5 ms is spent in this function (20.5 ms if we remove the calls to ARRAY_READ and UI_DRAW PIXEL):

subcall Sprite_Draw
  IN_16   SpriteDataHandle
  IN_16   ScreenX
  IN_16   ScreenY

  DATA8   SpriteWidth
  DATA8   SpriteHeight
  DATA32  ReadOfs
  DATA16  WriteX
  DATA16  WriteY
  DATA16  MaxX
  DATA16  MaxY
  DATA8   ReadPixel
  DATA8   WritePixel

  // Read sprite dimensions. Keeping READ_CONTENT since this code works.
  ARRAY( READ_CONTENT, -1, SpriteDataHandle, 0, 1, SpriteWidth )
  ARRAY( READ_CONTENT, -1, SpriteDataHandle, 1, 1, SpriteHeight )

  // Setup
  MOVE8_16( SpriteWidth, MaxX )
  ADD16( MaxX, ScreenX, MaxX )

  MOVE8_16( SpriteHeight, MaxY )
  ADD16( MaxY, ScreenY, MaxY )

  // The whole thing
  MOVE32_32( 2, ReadOfs )
  MOVE16_16( ScreenY, WriteY )

  MOVE16_16( ScreenX, WriteX )

  ARRAY( READ_CONTENT, -1, SpriteDataHandle, ReadOfs, 1, ReadPixel )
  JR_EQ8( ReadPixel, 2, DoneDrawing )   // Don't do any drawing at all
  JR_EQ8( ReadPixel, 1, DrawBlack )
  MOVE8_8( BG_COLOR, WritePixel )       // Set WritePixel to White
  JR( Draw )

  MOVE8_8( FG_COLOR, WritePixel )       // Set WritePixel to Black

  UI_DRAW( PIXEL, WritePixel, WriteX, WriteY )

  ADD32( ReadOfs, 1, ReadOfs )
  ADD16( WriteX, 1, WriteX )
  JR_LT16( WriteX, MaxX, Loop_X )

  ADD16( WriteY, 1, WriteY )
  JR_LT16( WriteY, MaxY, Loop_Y )

Reducing compares in the inner loop so it looks like this:

  MOVE16_16( ScreenX, WriteX )

  ARRAY( READ_CONTENT, -1, SpriteDataHandle, ReadOfs, 1, ReadPixel )
  JR_EQ8( ReadPixel, 2, DoneDrawing )   // Don't do any drawing at all
  UI_DRAW( PIXEL, ReadPixel, WriteX, WriteY )

  ADD32( ReadOfs, 1, ReadOfs )
  ADD16( WriteX, 1, WriteX )
  JR_LT16( WriteX, MaxX, Loop_X )

  ADD16( WriteY, 1, WriteY )
  JR_LT16( WriteY, MaxY, Loop_Y )

will reduce the frame time to 52 ms, so a slight improvement, but still not good enough.

So I’ve gotten the frame time down from 90 ms to 52 ms and I’ll accept that performance for now, leave the optimization aside for a while and get on with the actualproject.

Sorted sprites

In the game engine thingie I’m writing I have game objects. Btw, game engine sounds way more fancy and pretentious than what this is. Anyways, in the game .. framework, that I’m writing, I have game objects. I want to be able to load a scenario from disk, and in this scenario there should be items, like enemies, pickups, etc..

So I’ve created the concept of game objects, which I explained previously. A game object is basically just an array of handles. In C language, it means an array of pointers. Each handle, or pointer, refer to another part of the memory where content is stored. So a game object can have a piece of memory that explain the world position of a game object, another piece of memory that explain the AI state, and a third piece that explain the animation state.

So it is a retained mode model where the game programmer doesn’t immediately tell the framework to draw a sprite. Instead the game programmer load a sprite and assign it to a game object and then the framework have its own draw function where all the enabled game objects are drawn. This allow the framework to sort the draw order for all game objects. And that is what I did today, I wrote the sorting function.

It is a quite simple bubble sort algorithm. So far performance isn’t a concern of mine. I just want to get something up and running so I can start experimenting with a gameplay mechanic.

Here is the implementation for the sorting:

// Reorder the draw list so they are drawn in the correct order.
// Simple bubble sort
subcall GameObjectManager_SortForDrawing
  DATA32 SlotIndex0
  DATA32 SlotIndex1

  DATA16 Sort0
  DATA16 Sort1

  HANDLE hGameObject0
  HANDLE hGameObject1

  DATA8 ContinueSorting

  MOVE32_32( 0, SlotIndex0 )
  MOVE32_32( 1, SlotIndex1 )

  // Assume this is the last iteration of the sort
  MOVE8_8( 0, ContinueSorting )

  // Read sort values
  ARRAY_READ( GOM_hDrawOrder, SlotIndex0, hGameObject0 )
  ARRAY_READ( GOM_hDrawOrder, SlotIndex1, hGameObject1 )
  JR_EQ16( -1, hGameObject1, EndOfList )

  CALL( Transform_GetSort, hGameObject0, Sort0 )
  CALL( Transform_GetSort, hGameObject1, Sort1 )

  // If object 0 have lower or equal sort value to object 1 we jump to SkipSort
  JR_LTEQ16( Sort0, Sort1, SkipSort )

  // These two needs to swap. That also means we need to do sort again.
  ARRAY_WRITE( GOM_hDrawOrder, SlotIndex0, hGameObject1 )
  ARRAY_WRITE( GOM_hDrawOrder, SlotIndex1, hGameObject0 )
  MOVE8_8( 1, ContinueSorting )

  ADD32( 1, SlotIndex0, SlotIndex0 )
  ADD32( 1, SlotIndex1, SlotIndex1 )

  // Check if we've reached the last index. If we haven't
  // reached the last index yet, jump to Loop.
  JR_LT32( SlotIndex1, GOM_MAXOBJECTS, Loop )

  // So this was the last entry. If the list was modified this iteration we need to do another iteration.
  JR_EQ8( 1, ContinueSorting, MainLoop )

Function pointers?

When all the array headache has been sorted out I’ve been able to go back to work on the actual game. (The game engine, rather.. :) )

Today I’ve expanded a bit on the component way of thinking. I used to be able to create a game object and add transform and sprite components to them, and today I also added an AI component. I also figured I want each component to have an update function that is called each frame.

Each component have an index into an array, so the transform is index 0, sprite is index 1, AI is index 3 etc.. So I figured it would be sweet if I could have a look up table with function pointers for each component. So I tried simply reading a label into a variable, and that seems to work fine. I made this little test:

vmthread MAIN
  CALL( ClearLog )
  DATA16 TestVar
  MOVE16_16( TestFunc, TestVar )
  CALL( WriteLog16, 'TestVar: ', TestVar )
  CALL( WriteLog16, 'TestFunc: ', TestFunc )
  CALL( TestFunc )
  CALL( TestVar )

subcall TestFunc
  CALL( WriteLog, 'From testfunc' )

And the log looked like this:

TestVar: 0x00000002
TestFunc: 0x00000002
From testfunc

and after that the APP halted with a VM verification error. I had a quick look in the code and it seemed like only constants could evaluate as labels, but now that I think about it my code wouldn’t have to evaluate as a label. If the label have the value 2 and my variable hold the value 2, and the code looks like this:

ObjectIdToCall  =  *(OBJID*)PrimParPointer();
if ((*VMInstance.pObjList[ObjectIdToCall]).ObjStatus == STOPPED)

So close but no cigar!

Array bliss!


So I found the ARRAY_READ and ARRAY_WRITE OP codes and I have just tried them out. I won’t make a long blog post about it now, I’ll just say that it seems to work fine.

First I saw the call to cMemoryResize and got concerned. But then I saw that it was encapsulated in an if statement so it would only be called if the index of the write was greater than the number of elements allocated. Ok, I’ll make this post a bit longer then, since you’re asking so kindly for it.

So this is the OP code I found, and it’s implementation:

/*! \page cMemory
 *  <hr size="1"/>
 *  <b>     opARRAY_WRITE (HANDLE, INDEX, VALUE)  </b>
 *- Array element write\n
 *- Dispatch status can change to FAILBREAK
 *  \param  (HANDLER) HANDLE    - Array handle
 *  \param  (DATA32)  INDEX     - Index to element to write
 *  \param  (type)    VALUE     - Value to write - type depends on type of array\n
/*! \brief  opARRAY_WRITE byte code
void      cMemoryArrayWrite(void)
  PRGID   TmpPrgId;
  HANDLER TmpHandle;
  void    *pTmp;
  void    *pValue;
  DESCR   *pDescr;
  DATA32  Elements;
  DATA32  Index;
  void    *pArray;
  DATA8   *pData8;
  DATA16  *pData16;
  DATA32  *pData32;
  DATAF   *pDataF;

  TmpPrgId        =  CurrentProgramId();
  TmpHandle       =  *(HANDLER*)PrimParPointer();
  Index           =  *(DATA32*)PrimParPointer();
  pValue          =  PrimParPointer();

  if (cMemoryGetPointer(TmpPrgId,TmpHandle,&pTmp) == OK)
    pDescr        =  (DESCR*)pTmp;
    if (Index >= 0)
      Elements  =  Index + 1;

      DspStat   =  NOBREAK;
      if (Elements > (*pDescr).Elements)
        if (cMemoryResize(TmpPrgId,TmpHandle,Elements) == NULL)
          DspStat   =  FAILBREAK;
      if (DspStat == NOBREAK)
        if (cMemoryGetPointer(TmpPrgId,TmpHandle,&pTmp) == OK)
          pDescr      =  (DESCR*)pTmp;
          pArray      =  (*pDescr).pArray;
#ifdef DEBUG
          printf("  Write  P=%1u H=%1u     I=%8lu A=%8p\r\n",(unsigned int)TmpPrgId,(unsigned int)TmpHandle,(unsigned long)Index,pArray);
          switch ((*pDescr).Type)
            case DATA_8 :
              pData8          =  (DATA8*)pArray;
              pData8[Index]   =  *(DATA8*)pValue;
              DspStat         =  NOBREAK;

            case DATA_16 :
              pData16         =  (DATA16*)pArray;
              pData16[Index]  =  *(DATA16*)pValue;
              DspStat         =  NOBREAK;

            case DATA_32 :
              pData32         =  (DATA32*)pArray;
              pData32[Index]  =  *(DATA32*)pValue;
              DspStat         =  NOBREAK;

            case DATA_F :
              pDataF          =  (DATAF*)pArray;
              pDataF[Index]   =  *(DATAF*)pValue;
              DspStat         =  NOBREAK;

  if (DspStat != NOBREAK)
#ifdef DEBUG
    printf("  WR ERR P=%1u H=%1u     I=%8lu\r\n",(unsigned int)TmpPrgId,(unsigned int)TmpHandle,(unsigned long)Index);

As you can see there is a scary realloc going on on line 3966, but it is only called if the array is too small to fit the index requested, whereas ARRAY WRITE_CONTENT always resize the array, even if the index is within bounds.

So I wrote up this little test code

vmthread  MAIN
	CALL( ClearLog )
	CALL( WriteLog, 'Hello rocktest 3!' )

	ARRAY( CREATE8, 20, hTemp )

	CALL( WriteLog16, 'hTemp: ', hTemp )

	ARRAY_WRITE( hTemp, 0, 20 )
	ARRAY_WRITE( hTemp, 2, 22 )
	ARRAY_WRITE( hTemp, 1, 21 )

	DATA8 readApa

	ARRAY_READ( hTemp, 0, readApa )
	CALL( WriteLog8, 'Index 0: ', readApa )

	ARRAY_READ( hTemp, 1, readApa )
	CALL( WriteLog8, 'Index 1: ', readApa )

	ARRAY_READ( hTemp, 2, readApa )
	CALL( WriteLog8, 'Index 2: ', readApa )

I decided to only try the case I knew was broken with ARRAY_WRITE, and that is to write to an arbitrary index within bounds, and then write to a lower index and confirm that the contents of the higher index is still intact. And according to the log it works!

Compiling /Users/magnusrunesson/Projects/Rockhammer/rocktest.lms
539 bytes
Copied /Users/magnusrunesson/Projects/Rockhammer/rocktest.rbf to /media/card/rockhammer/rocktest.rbf
Starting '/media/card/rockhammer/rocktest.rbf'
Reading log from '/media/card/rockhammer/rocktest_log.txt'
Hello rocktest 3!
hTemp: 2
Index 0: 20
Index 1: 21
Index 2: 22

So great success! I know, I know, it is dangerous to be optimistic. So far I’ve mostly found out that stuff never works as well as they seem at first glance. But having looked through the code and written a test case I am optimistic about this.

Another positive thing about this is that the index you use is actually the index of the element, not the byte offset in memory.

So I’ll revisit my game code and make it work with this new array stuff, and let you know how it goes.

Funny how things just work out!

My last rant on handles and addresses

Dear friends and loved ones,

It is with a heavy heart I must inform you that I just can’t get those stupid addresses and handles to work!

I have tried allocating memory using ARRAY CREATE and reading and writing to it using READ_CONTENT and WRITE_CONTENT. It sort of worked but turned out that WRITE_CONTENT reallocates the buffer and it only allocates up to the entry I am currently writing to. So if I have an array of 4 entries and I write to entry 3, the content of entry 4 will become garbage.

I have tried reading the address of an array allocated with ARRAY CREATE but I haven’t been successful. I did manage to read the content of the array by doing @myHandle, but reading was never the issue. Writing was.

I also tried to get the physical address of some memory by using the & operator, but what that does is take in the parameter, read the content of the parameter and use as address from which other content is read and used as pointer.

Result  =  (void*)*(DATA32*)Result;

I then went on to try and statically allocate some memory by doing this:

DATAS		MyMemory			100000
vmthread  MAIN
	DATA32 MyMemoryAddress
	MOVE32_32( &MyMemory, MyMemoryAddress )

But that didn’t work either. My best guess is that it reads the content of MyMemory, which is zero initialized, and then use 0 as address to read another address from.

So after all that, what is next?

I have NO clue how to work with memory in LMS2012. I can’t for my life figure out if it is possible to .. ARGH! It has happened AGAIN.

When writing up my blog post I often find myself going back to earlier things to investigate further. Something I did again today.

I looked through all existing code and found that only a small test LMS script actually used ARRAY READ_CONTENT and WRITE_CONTENT, and I wasn’t surprised as they don’t work very well.

What I did find though was these OP codes. ARRAY_WRITE and ARRAY_READ.

So I will read up on those and then come back, either with yet another blog post about how handles and arrays don’t work, or another “this is the last rant on arrays” blog post.

Either way I will return shortly!