Confession time

Hi all!

I have a confession to make. I wasn’t planning on writing about anything other than LMS stuff on this blog, but I have something I need to get of my chest.

I have been cheating on you.

It’s true. I have had other projects besides hacking away on the EV3. I’m not proud of it, but that is how it is. And that is why I haven’t made any updates on this blog in a while. My other project sort of have a deadline and with christmas coming up finding time for one project is difficult, and finding time for two projects is near impossible for me.

So I will keep working on my other project for a bit, and when that is done I can come back to wrestle with the LMS VM on the EV3.

Hugs and kisses!

LMS syntax highlighting in Sublime Text 2

I’ve been working a bit on getting LMS syntax highlighting in to Sublime Text 2.

This is what it looks like so far:

LMS syntax highlighting in Sublime

I’m struggling to get labels to highlight properly but I’ve got a lot of the basics in there. It’s easy but boring to add all op codes so I’ll add them as I go along. It will be fantastic!

 

Performance

For a while I’ve been wondering about the performance of the EV3 and I’ve been thinking that since it is a 300 MHz ARM9 processor I figured the performance shouldn’t be an issue.

But after building a simple framework that draws one image using UI_DRAW BITMAP, and drawing 2 sprites that does UI_DRAW PIXEL for each pixel in the sprite I am already using up 90 of the 100 ms I have for each frame. This basically means that to achieve 10 fps I can’t do more than draw one background and two sprites.

So that is a bit disappointing!

There is a trick to copy the current frame buffer to a temporary buffer and then copy that buffer back to the frame buffer, and it is a cheap operation to do, so when I removed the UI_DRAW BITMAP that draws the background and only do that once, copy the drawn image to a buffer and then for each frame copy that buffer back to the frame buffer I get down to 59 ms per frame. So that’s a bit nicer.

Removing my own sprite drawing function but keeping the buffer copying thing the frame time is reduced from 59 ms to 0.3 ms, soooo .. that’s embarrassing. :)

If I keep all my code but remove the call to UI_DRAW PIXEL the frame time is 44 ms, so the actual drawing is somewhat expensive, but since we use quite a lot of frame time even without drawing then perhaps there is a chance to optimize this.

If I remove the reading of the pixel in the sprite I get down to a frame time of 21 ms. In these 21 ms it is all “my” stuff, meaning it is my inner loop in the sprite drawing, my algorithm to sort the game objects (bubble sort ftw), my overhead for managing game objects etc..

And if I remove the call to Sprite_Draw entirely I’m down to 1.5 ms. So the time is spent in that function.

So to summarize.

Full, old drawing: 90 ms
Drawing with the buffer trick: 59 ms
Drawing with the buffer trick but removing reading pixels from the sprite and writing them to the screen: 22 ms
Drawing with the buffer trick but removing the call to Sprite_Draw: 1.5 ms

So 57.5 ms is spent in this function (20.5 ms if we remove the calls to ARRAY_READ and UI_DRAW PIXEL):

subcall Sprite_Draw
{
  IN_16   SpriteDataHandle
  IN_16   ScreenX
  IN_16   ScreenY

  DATA8   SpriteWidth
  DATA8   SpriteHeight
  DATA32  ReadOfs
  DATA16  WriteX
  DATA16  WriteY
  DATA16  MaxX
  DATA16  MaxY
  DATA8   ReadPixel
  DATA8   WritePixel

  // Read sprite dimensions. Keeping READ_CONTENT since this code works.
  ARRAY( READ_CONTENT, -1, SpriteDataHandle, 0, 1, SpriteWidth )
  ARRAY( READ_CONTENT, -1, SpriteDataHandle, 1, 1, SpriteHeight )

  // Setup
  MOVE8_16( SpriteWidth, MaxX )
  ADD16( MaxX, ScreenX, MaxX )

  MOVE8_16( SpriteHeight, MaxY )
  ADD16( MaxY, ScreenY, MaxY )

  // The whole thing
  MOVE32_32( 2, ReadOfs )
  MOVE16_16( ScreenY, WriteY )

Loop_Y:
  MOVE16_16( ScreenX, WriteX )

Loop_X:
  ARRAY( READ_CONTENT, -1, SpriteDataHandle, ReadOfs, 1, ReadPixel )
  JR_EQ8( ReadPixel, 2, DoneDrawing )   // Don't do any drawing at all
  JR_EQ8( ReadPixel, 1, DrawBlack )
  MOVE8_8( BG_COLOR, WritePixel )       // Set WritePixel to White
  JR( Draw )

DrawBlack:
  MOVE8_8( FG_COLOR, WritePixel )       // Set WritePixel to Black

Draw:
  UI_DRAW( PIXEL, WritePixel, WriteX, WriteY )

DoneDrawing:
  ADD32( ReadOfs, 1, ReadOfs )
  ADD16( WriteX, 1, WriteX )
  JR_LT16( WriteX, MaxX, Loop_X )

  ADD16( WriteY, 1, WriteY )
  JR_LT16( WriteY, MaxY, Loop_Y )
}

Reducing compares in the inner loop so it looks like this:

Loop_Y:
  MOVE16_16( ScreenX, WriteX )

Loop_X:
  ARRAY( READ_CONTENT, -1, SpriteDataHandle, ReadOfs, 1, ReadPixel )
  JR_EQ8( ReadPixel, 2, DoneDrawing )   // Don't do any drawing at all
  UI_DRAW( PIXEL, ReadPixel, WriteX, WriteY )

DoneDrawing:
  ADD32( ReadOfs, 1, ReadOfs )
  ADD16( WriteX, 1, WriteX )
  JR_LT16( WriteX, MaxX, Loop_X )

  ADD16( WriteY, 1, WriteY )
  JR_LT16( WriteY, MaxY, Loop_Y )

will reduce the frame time to 52 ms, so a slight improvement, but still not good enough.

So I’ve gotten the frame time down from 90 ms to 52 ms and I’ll accept that performance for now, leave the optimization aside for a while and get on with the actualproject.

Sorted sprites

In the game engine thingie I’m writing I have game objects. Btw, game engine sounds way more fancy and pretentious than what this is. Anyways, in the game .. framework, that I’m writing, I have game objects. I want to be able to load a scenario from disk, and in this scenario there should be items, like enemies, pickups, etc..

So I’ve created the concept of game objects, which I explained previously. A game object is basically just an array of handles. In C language, it means an array of pointers. Each handle, or pointer, refer to another part of the memory where content is stored. So a game object can have a piece of memory that explain the world position of a game object, another piece of memory that explain the AI state, and a third piece that explain the animation state.

So it is a retained mode model where the game programmer doesn’t immediately tell the framework to draw a sprite. Instead the game programmer load a sprite and assign it to a game object and then the framework have its own draw function where all the enabled game objects are drawn. This allow the framework to sort the draw order for all game objects. And that is what I did today, I wrote the sorting function.

It is a quite simple bubble sort algorithm. So far performance isn’t a concern of mine. I just want to get something up and running so I can start experimenting with a gameplay mechanic.

Here is the implementation for the sorting:

//
// Reorder the draw list so they are drawn in the correct order.
// Simple bubble sort
//
subcall GameObjectManager_SortForDrawing
{
  DATA32 SlotIndex0
  DATA32 SlotIndex1

  DATA16 Sort0
  DATA16 Sort1

  HANDLE hGameObject0
  HANDLE hGameObject1

  DATA8 ContinueSorting

MainLoop:
  //
  MOVE32_32( 0, SlotIndex0 )
  MOVE32_32( 1, SlotIndex1 )

  //
Loop:
  // Assume this is the last iteration of the sort
  MOVE8_8( 0, ContinueSorting )

  // Read sort values
  ARRAY_READ( GOM_hDrawOrder, SlotIndex0, hGameObject0 )
  ARRAY_READ( GOM_hDrawOrder, SlotIndex1, hGameObject1 )
  JR_EQ16( -1, hGameObject1, EndOfList )

  CALL( Transform_GetSort, hGameObject0, Sort0 )
  CALL( Transform_GetSort, hGameObject1, Sort1 )

  // If object 0 have lower or equal sort value to object 1 we jump to SkipSort
  JR_LTEQ16( Sort0, Sort1, SkipSort )

  // These two needs to swap. That also means we need to do sort again.
  ARRAY_WRITE( GOM_hDrawOrder, SlotIndex0, hGameObject1 )
  ARRAY_WRITE( GOM_hDrawOrder, SlotIndex1, hGameObject0 )
  MOVE8_8( 1, ContinueSorting )

  //
SkipSort:
  ADD32( 1, SlotIndex0, SlotIndex0 )
  ADD32( 1, SlotIndex1, SlotIndex1 )

  // Check if we've reached the last index. If we haven't
  // reached the last index yet, jump to Loop.
  JR_LT32( SlotIndex1, GOM_MAXOBJECTS, Loop )

EndOfList:
  // So this was the last entry. If the list was modified this iteration we need to do another iteration.
  JR_EQ8( 1, ContinueSorting, MainLoop )
}

Function pointers?

When all the array headache has been sorted out I’ve been able to go back to work on the actual game. (The game engine, rather.. :) )

Today I’ve expanded a bit on the component way of thinking. I used to be able to create a game object and add transform and sprite components to them, and today I also added an AI component. I also figured I want each component to have an update function that is called each frame.

Each component have an index into an array, so the transform is index 0, sprite is index 1, AI is index 3 etc.. So I figured it would be sweet if I could have a look up table with function pointers for each component. So I tried simply reading a label into a variable, and that seems to work fine. I made this little test:

vmthread MAIN
{
  CALL( ClearLog )
  DATA16 TestVar
  MOVE16_16( TestFunc, TestVar )
  CALL( WriteLog16, 'TestVar: ', TestVar )
  CALL( WriteLog16, 'TestFunc: ', TestFunc )
  CALL( TestFunc )
  CALL( TestVar )
}

subcall TestFunc
{
  CALL( WriteLog, 'From testfunc' )
}

And the log looked like this:

TestVar: 0x00000002
TestFunc: 0x00000002
From testfunc

and after that the APP halted with a VM verification error. I had a quick look in the code and it seemed like only constants could evaluate as labels, but now that I think about it my code wouldn’t have to evaluate as a label. If the label have the value 2 and my variable hold the value 2, and the code looks like this:

ObjectIdToCall  =  *(OBJID*)PrimParPointer();
if ((*VMInstance.pObjList[ObjectIdToCall]).ObjStatus == STOPPED)

So close but no cigar!

Array bliss!

Wohoo!

So I found the ARRAY_READ and ARRAY_WRITE OP codes and I have just tried them out. I won’t make a long blog post about it now, I’ll just say that it seems to work fine.

First I saw the call to cMemoryResize and got concerned. But then I saw that it was encapsulated in an if statement so it would only be called if the index of the write was greater than the number of elements allocated. Ok, I’ll make this post a bit longer then, since you’re asking so kindly for it.

So this is the OP code I found, and it’s implementation:

/*! \page cMemory
 *
 *  <hr size="1"/>
 *  <b>     opARRAY_WRITE (HANDLE, INDEX, VALUE)  </b>
 *
 *- Array element write\n
 *- Dispatch status can change to FAILBREAK
 *
 *  \param  (HANDLER) HANDLE    - Array handle
 *  \param  (DATA32)  INDEX     - Index to element to write
 *  \param  (type)    VALUE     - Value to write - type depends on type of array\n
 *
 *\n
 *
 */
/*! \brief  opARRAY_WRITE byte code
 *
 */
void      cMemoryArrayWrite(void)
{
  DSPSTAT DspStat = FAILBREAK;
  PRGID   TmpPrgId;
  HANDLER TmpHandle;
  void    *pTmp;
  void    *pValue;
  DESCR   *pDescr;
  DATA32  Elements;
  DATA32  Index;
  void    *pArray;
  DATA8   *pData8;
  DATA16  *pData16;
  DATA32  *pData32;
  DATAF   *pDataF;

  TmpPrgId        =  CurrentProgramId();
  TmpHandle       =  *(HANDLER*)PrimParPointer();
  Index           =  *(DATA32*)PrimParPointer();
  pValue          =  PrimParPointer();

  if (cMemoryGetPointer(TmpPrgId,TmpHandle,&pTmp) == OK)
  {
    pDescr        =  (DESCR*)pTmp;
    if (Index >= 0)
    {
      Elements  =  Index + 1;

      DspStat   =  NOBREAK;
      if (Elements > (*pDescr).Elements)
      {
        if (cMemoryResize(TmpPrgId,TmpHandle,Elements) == NULL)
        {
          DspStat   =  FAILBREAK;
        }
      }
      if (DspStat == NOBREAK)
      {
        if (cMemoryGetPointer(TmpPrgId,TmpHandle,&pTmp) == OK)
        {
          pDescr      =  (DESCR*)pTmp;
          pArray      =  (*pDescr).pArray;
#ifdef DEBUG
          printf("  Write  P=%1u H=%1u     I=%8lu A=%8p\r\n",(unsigned int)TmpPrgId,(unsigned int)TmpHandle,(unsigned long)Index,pArray);
#endif
          switch ((*pDescr).Type)
          {
            case DATA_8 :
            {
              pData8          =  (DATA8*)pArray;
              pData8[Index]   =  *(DATA8*)pValue;
              DspStat         =  NOBREAK;
            }
            break;

            case DATA_16 :
            {
              pData16         =  (DATA16*)pArray;
              pData16[Index]  =  *(DATA16*)pValue;
              DspStat         =  NOBREAK;
            }
            break;

            case DATA_32 :
            {
              pData32         =  (DATA32*)pArray;
              pData32[Index]  =  *(DATA32*)pValue;
              DspStat         =  NOBREAK;
            }
            break;

            case DATA_F :
            {
              pDataF          =  (DATAF*)pArray;
              pDataF[Index]   =  *(DATAF*)pValue;
              DspStat         =  NOBREAK;
            }
            break;

          }
        }
      }
    }
  }
  if (DspStat != NOBREAK)
  {
#ifdef DEBUG
    printf("  WR ERR P=%1u H=%1u     I=%8lu\r\n",(unsigned int)TmpPrgId,(unsigned int)TmpHandle,(unsigned long)Index);
#endif
    SetDispatchStatus(DspStat);
  }
}

As you can see there is a scary realloc going on on line 3966, but it is only called if the array is too small to fit the index requested, whereas ARRAY WRITE_CONTENT always resize the array, even if the index is within bounds.

So I wrote up this little test code

vmthread  MAIN
{
	CALL( ClearLog )
	CALL( WriteLog, 'Hello rocktest 3!' )

	HANDLE hTemp
	ARRAY( CREATE8, 20, hTemp )

	CALL( WriteLog16, 'hTemp: ', hTemp )

	ARRAY_WRITE( hTemp, 0, 20 )
	ARRAY_WRITE( hTemp, 2, 22 )
	ARRAY_WRITE( hTemp, 1, 21 )

	DATA8 readApa

	ARRAY_READ( hTemp, 0, readApa )
	CALL( WriteLog8, 'Index 0: ', readApa )

	ARRAY_READ( hTemp, 1, readApa )
	CALL( WriteLog8, 'Index 1: ', readApa )

	ARRAY_READ( hTemp, 2, readApa )
	CALL( WriteLog8, 'Index 2: ', readApa )
}

I decided to only try the case I knew was broken with ARRAY_WRITE, and that is to write to an arbitrary index within bounds, and then write to a lower index and confirm that the contents of the higher index is still intact. And according to the log it works!

Compiling /Users/magnusrunesson/Projects/Rockhammer/rocktest.lms
539 bytes
Copied /Users/magnusrunesson/Projects/Rockhammer/rocktest.rbf to /media/card/rockhammer/rocktest.rbf
Starting '/media/card/rockhammer/rocktest.rbf'
Reading log from '/media/card/rockhammer/rocktest_log.txt'
Hello rocktest 3!
hTemp: 2
Index 0: 20
Index 1: 21
Index 2: 22

So great success! I know, I know, it is dangerous to be optimistic. So far I’ve mostly found out that stuff never works as well as they seem at first glance. But having looked through the code and written a test case I am optimistic about this.

Another positive thing about this is that the index you use is actually the index of the element, not the byte offset in memory.

So I’ll revisit my game code and make it work with this new array stuff, and let you know how it goes.

Funny how things just work out!

My last rant on handles and addresses

Dear friends and loved ones,

It is with a heavy heart I must inform you that I just can’t get those stupid addresses and handles to work!

I have tried allocating memory using ARRAY CREATE and reading and writing to it using READ_CONTENT and WRITE_CONTENT. It sort of worked but turned out that WRITE_CONTENT reallocates the buffer and it only allocates up to the entry I am currently writing to. So if I have an array of 4 entries and I write to entry 3, the content of entry 4 will become garbage.

I have tried reading the address of an array allocated with ARRAY CREATE but I haven’t been successful. I did manage to read the content of the array by doing @myHandle, but reading was never the issue. Writing was.

I also tried to get the physical address of some memory by using the & operator, but what that does is take in the parameter, read the content of the parameter and use as address from which other content is read and used as pointer.

Result  =  (void*)*(DATA32*)Result;

I then went on to try and statically allocate some memory by doing this:

DATAS		MyMemory			100000
vmthread  MAIN
{
	DATA32 MyMemoryAddress
	MOVE32_32( &MyMemory, MyMemoryAddress )
}

But that didn’t work either. My best guess is that it reads the content of MyMemory, which is zero initialized, and then use 0 as address to read another address from.

So after all that, what is next?

I have NO clue how to work with memory in LMS2012. I can’t for my life figure out if it is possible to .. ARGH! It has happened AGAIN.

When writing up my blog post I often find myself going back to earlier things to investigate further. Something I did again today.

I looked through all existing code and found that only a small test LMS script actually used ARRAY READ_CONTENT and WRITE_CONTENT, and I wasn’t surprised as they don’t work very well.

What I did find though was these OP codes. ARRAY_WRITE and ARRAY_READ.

So I will read up on those and then come back, either with yet another blog post about how handles and arrays don’t work, or another “this is the last rant on arrays” blog post.

Either way I will return shortly!

& and @!

Quick update since I’m on the move and don’t have access to the source code.

My final experiment last night was sort of successful. I allocated an array using ARRAY CREATE and on the handle that was returned I used the @ operator, so I did MOVE32_32( @handle, variable ) and the content of the variable was something like 0x00032ac30.

Then I did MOVE32_32( &variable, var2 ) and var2 was set to 0x0201.

So that seems great but the reason why I’m not yet happy is because I then tried to set the contents of the array with ARRAY WRITE_CONTENT and after that my experiment fell apart and I could no longer use the & operator, not even when I reverted back to the exact same code I had used when it worked the first time.

So in conclusion, this seems like it might work but I’m also having trouble taming it.

No love for the address?

I’ve been looking more into the & operator, which seems pretty great. But it doesn’t seem to be fully supported.

When reading through the code that decode parameters, and looking at the comments, the code seems to fully decode the address, but the comment have completely emitted the information.

So basically, the compiler builds a stream of bytes form the LMS code. The byte stream is super simple. It is a byte that represent the opcode, and then a bunch of bytes representing potential parameters. If we look at the byte code for:

MOVE32_32( 1145324612, MyVar1 )

It can compile into:

3A834444444440

If we split it up into opcode and parameters it becomes this:

0x3a,         // Opcode for MOVE32_32
0x83,         // Information for first parameter.  Binary: 1000 0011
0x44444444,   // Constant for 1145324612
0x40,         // Information for second parameter. Binary: 0100 0000

So the opcode is 0x3a, and if we look at the native code for MOVE32_32 we see this:

void      cMove32to32(void)
{
  DATA32  Tmp;

  Tmp  =  *(DATA32*)PrimParPointer();
  *(DATA32*)PrimParPointer()  =  Tmp;
}

That code contain two calls to PrimParPointer, and what that does is first look at the information for the parameter, and then either fetch a constant, fetch a local or global variable. The bitfield is explained as a comment for PrimParPointer and looks like this:

/*! \page parameterencoding Parameter Encoding
 *
 *  Parameter types and values for primitives, system calls and subroutine calls are encoded in the callers byte code stream as follows:
 *
 *  opADD8 (ParCode1, ParCode2, ParCodeN)
 * \verbatim
Bits  76543210
      --------
      0Ttxxxxx    short format
      ||||||||
      |0||||||    constant
      ||||||||
      ||0vvvvv    positive value
      ||1vvvvv    negative value
      |||
      |1|         variable
      | |
      | 0iiiii    local index
      | 1iiiii    global index
      |
      1ttt-bbb    long format
       ||| |||
       0|| |||    constant
       ||| |||
       |0| |||    value
       |1| |||    label
       ||| |||
       1|| |||    variable
        || |||
        0| |||    local
        1| |||    global
         | |||
         0 |||    value
         1 |||    handle
           |||
           000    Zero terminated string  (subject to change)
           001    1 bytes to follow       (subject to change)
           010    2 bytes to follow       (subject to change)
           011    4 bytes to follow       (subject to change)
           100    Zero terminated string  \endverbatim
 *
 */

So if we again look at our parameters. First the first parameter:

Binary: 1000 0011

According to the comment it is in long format, it is a constant, it is a value and it contains four bytes. The PrimParPointer will the go on to read those 4 bytes from the byte stream, which are as we saw earlier

0x44444444

The second parameter information is then:

Binary: 0100 0000

So according to the comment that means it is in short format, it is a variable, and the variable index is 0.

So that’s it, that is how the parameter encoding works. Those of you with a keen eye may have noticed that in the long format there is one bit that isn’t explained. Bit 3. (If you follow their naming of the bits) Bit 4 explain if it is a handle or not, but bit 3 is left out.

So I went back to look at the code that is generated for this:

MOVE32_32( &MyVar1, MyVar2 )

And the generated byte code is:

0x3A,  // Opcode for MOVE32_32
0xC9,  // First parameter information.  Binary: 1100 1001
0x00,  // ??
0x44,  // Second parameter information. Binary: 0100 0100

The encoding of the parameter information when doing & seems to be perfectly fine, and the code in PrimParPointer seems to be perfectly valid too. It looks like this:

    if (Data & PRIMPAR_HANDLE)
    {
      VMInstance.Handle  =  *(HANDLER*)Result;
      cMemoryArraryPointer(VMInstance.ProgramId,VMInstance.Handle,&Result);
    }
    else
    {
      if (Data & PRIMPAR_ADDR)
      {
        Result  =  (void*)*(DATA32*)Result;
        VMInstance.Value  =  (DATA32)Result;
      }
    }

So it seems like the @ operator (handle) will return the pointer to an address based on a handle used when calling ARRAY( CREATEx, … ) and the & operator (address) will type case the Result (which is a pointer) and dereference that address to get a new address, which it returns.

I’m confused..

In any case, that 0x00 that snuck in straight after the 0xc9 seems misplaced, but other than that it seems like addresses may work. I will remove the 0x00 and see what the VM think about that.

&?

Yesterday I played around with using & to get the address of a variable. In assembler.logo there is a function called get-adr that is called when you do & on a variable.

However, the VM got stuck in an infinite loop when doing this super simple test:

vmthread  MAIN
{
	DATA32 MyVar
	DATA32 MyAddress
	MOVE32_32( &MyVar, MyAdress )
}

So I looked at all the existing LMS files that shipped with the LMS2012 and none of them used the & operator. I’ve done some simple tests to just look at the compiled RBF file when doing & and see if I get any wiser. After that I’ll look into @ and see if that is anything I could use.