Justin Frankel - main feed

<< >>

justin = { main feed , music , code , askjf , pubkey };recent comments
search

Searching for 'ide' in (articles)... skipping 50 results... [back to index]

[ newer results... ]

February 2, 2017
macOS screen updating, 2017 edition

TL;DR: Retina iMac (4k/5k) owners can greatly improve the graphics performance of many applications (including REAPER) by setting the color profile (in System Preferences, Displays, Color tab) to "Generic RGB" or "Adobe RGB." (and restarting REAPER and/or other applications being tested)

I previously wrote in mid-2014 about the state of blitting bitmaps to screen on modern OS X (now macOS) versions. Since then, Apple has released new hardware (including Retina iMacs) and a couple of new macOS versions.

Much of that article is still useful today, but I made a mistake in the second update:

OK, if you provide a bitmap that is twice the size of the drawing rect, you can avoid argb32_image_mark_RGBXX, and get the Retina display to update in about 5-7ms, which is a good improvement (but by no means impressive, given how powerful this machine is). I made a very simple software scaler (that turns each pixel into 4), and it uses very little CPU.

While this was helpful (and did decrease the amount of time spent blitting), it was wrong in that the reason for the faster blit was that the system was parallelizing the blit with multiple cores. So, it was faster, but it also used more CPU (and was generally wasteful).

I discovered this because I've been researching how to improve REAPER's graphic performance on the iMac 5k in particular, so I started benchmarking. This time around, I figured I should measure how many screen pixels are updated and divide that by how long it takes. Some results, based on my memory (I'm not going to rerun them for this article, laziness).

Initial version (REAPER 5.32 state, using the retina hack described above, public WDL as of today):

old C2D iMac, 10.6: 350MPix/sec
mid-2012 RMBP 15", 10.12, Thunderbolt display (non-retina): 1500MPix/sec
mid-2012 RMBP 15", 10.12, built-in display (retina): 800MPix/sec
late-2015 Retina iMac 5k, 10.12: 192MPix/sec

The one that really jumped out at me was the Retina iMac 5k -- it's a quarter of the speed of the RMBP! WTF. We'll get to that later.

After I realized the hack above was actually doing more work (thank you, Xcode instrumentation), I did some more experiments, avoiding the hack, and found that in the newer SDKs there are kCGImageByteOrderXYZ flags (I don't believe it was in previous SDKs), and found that these alised to KCGBitmapByteOrderXYZ, and that when using kCGBitmapByteOrder32Host with the pixel format for CGImageCreate()/etc, it would speed things up. With retina hack removed:

mid-2012 RMBP 15", 10.12, built-in display (retina): 300MPix/sec
late-2015 Retina iMac 5k, 10.12: 152MPix/sec

With retina hack removed and byte order set to host:

old C2D iMac, 10.6: 350MPix/sec
mid-2012 RMBP 15", 10.12, Thunderbolt display (non-retina): 1500MPix/sec
mid-2012 RMBP 15", 10.12, built-in display (retina): 720MPix/sec
late-2015 Retina iMac 5k, 10.12: 200MPix/sec

The non-retina displays might have changed slightly, but it was insignificant. So, by setting the byte order to native, we get the Retina MBP close to the level of performance of the hack, which isn't great but is serviceable, and at least the CPU use is decreased. This also has the benefit (drawback?) of making the byte-order of pixels the same on macOS/Intel and win32, which will take some more attention (and a lot of testing).

From profiling and looking at the code, this blit performance could easily be improved by Apple -- the inner loop where most time is being spent does a lot more than it needs to. Come on Apple, make us happy. Details offered on request.

Of course, this really doesn't do anything for the iMac 5k -- 200MPix/sec is *TERRIBLE*. The full screen is 15 megapixels, so at most that gets you around 13fps, and that's at 100% CPU use. After some more profiling, I found that the function chewing the most CPU ended in "64". Then it hit me -- was this display running in 16 bits per channel? A quick google search later, it was clear: the Retina iMacs have 10-bit displays, and you can run them in 10 bits per channel, which means 64 bits per pixel. macOS is converting all of our pixels to 64 bits per pixel (I should also mention that it seems to be doing a very slow job of it). Luckily, changing the color profile (in system preferences, displays) to "Generic RGB" or similar disables this, and it gets the ~800MPix/sec level of performance similar to the RMBP, which is at least tolerable.

Sorry for the long wordy mess above, I'm posting it here so that google finds it and anybody looking into why their software is slow on macOS 10.11 or 10.12 on retina imacs have some explanation.

Also please please please Apple optimize CGContextDrawImage()! I'm drawing an image with no alpha channel and no interpolation and no blend mode and the inner loop is checking each pixel to see if the alpha is 255? I mean wtf. You can do better. Hell, you've done way better. All that "new" Retina code needs optimizing!

Update a few hours later:
Fixing various issues with the updated byte-ordering, CoreText produces quite different output for CGBitmapContexts created with different byte orderings:

Hmph! Not sure which one is "correct" there... hmm... If you use kCGImageAlphaPremultipliedFirst for the CGBitmapContext rather than kCGImageAlphaNoneFirst, then it looks closer to the original, maybe. ?

Also other caveat: NSBitmapImageRep can't seem to deal with the ARGB format either, so if you use that you need to manually bswap the pixels...

Update (2019): ~~Solved~~Worked around most of this issue by using Metal, read here.

4 Comments

January 20, 2017
oh dear

where has the time gone? HNY and stuff.

I was honored to be asked to open for the Silver Sound Showdown festival at the Brooklyn Bowl this year. It was an amazing experience. I used the following tools:

Thinkpad X60 laptop with a 1.83GHz Core 2 Duo CPU and 2GB RAM.
I updated this to Windows 10, but for the month before the show while practicing I wouldn't connect it to the internet. I was afraid I'd get to the venue and it'd start installing updates... Or at bets when it tried to download some it'd interrupt the audio.
Zoom R24 audio interface/control surface
Zoom gave me this one a while back, kindly, and it's really great. My only wish would be that I could use the drum trigger buttons to send MIDI to REAPER... but it works well anyway.
Line6 FBV Express Mk II pedal (you can use this via USB without a Line6 amp)
2x dynamic mics
Home made wood furniture to hold the laptop and audio interface
Electric guitar (direct-in to the Zoom's Hi-Z port)
Flute (amazon $100 special)
House drum kit
I brought my cymbals, but my ride wouldn't fit in my case so I brought the crash to use as a ride, wasn't ideal)
REAPER, Super8 (JSFX), plug-ins
ReaDelay, ReaEQ, various other JSFX, and the classic SimulAnalog JCM900 VST. I also wrote a spectral hold that samples the master output, and another instance which synthesizes from a track with anticipative FX threads, for better performance.
In hindsight I should've used the headphones I had brought for monitoring, rather than the stage monitors, but oh well.

The performance was completely improvised, and while it has quite a few rough spots there are at least a few nice bits in there.

Here's a video (shot with a Contour Roam 2 pointed at my blurred-out crotch, and mixed with the audio recorded by REAPER itself):
(youtube link)

Muchas gracias to Silver Sound for having me and to everybody who came out to see the show! Woohoo! Let's not talk about what happened today.

Recordings:

not enough recovery

1 Comment

October 15, 2016
Live at Sidewalk NYC on September 15th

I should have posted this a month ago, but forgot:

(youtube link)

It's Cory's song, with André and Anette, with a not-fully-planned appearance by Jason on the sax. The full recording is available in the music page thing.

In other news, if you've listened to the other music on this page, you may have noticed that this is my new best friend:

Allison's clarinet from grade school, an amazing gift. Mmmm.

Comment...

April 17, 2016
collaboration

Made in March by Kara Daving (for another project--slightly repurposed with permission here).

Recordings:

freeform jam with 3choys
the four corners

Comment...

December 5, 2015
time lapse

bleh colors fail:

2 Comments

December 3, 2015
no idea what I'm doing, but learning some things

Recordings:

and a dump truck too

Comment...

November 20, 2015
ah, 10 years

I have spent about 27% of my life programming REAPER! That's not entirely true -- I've done other things in the last 10 years, like eating and occasionally sleeping, but you get the idea. It is, by a huge margin, the longest I've worked on anything, ever^*. Happy days.

Here is a commit from today's date, around this time, in 2005. Most of the changes did not survive the decade, but the files still exist at least.


commit 64bd59b56fb4edac13d264a516d194aa9715a09d
Author: Justin <justin@localhost>
Date:   Sun Nov 20 18:51:40 2005 +0000

diff --git a/jmde/mediaitem.h b/jmde/mediaitem.h
index 8fbc575..c272d1d 100644
--- a/jmde/mediaitem.h
+++ b/jmde/mediaitem.h
@@ -4,23 +4,47 @@
 #include "pcmsrc.h"
 #include "../WDL/string.h"
 
+#define SOURCE_TYPE_MEDIAITEM 0x1000
 
 #define WM_USER_RESIZECHILD (WM_USER+1020)
 
-class MediaItem 
+class MediaItem : public PCM_source
 {
 public:
   MediaItem();
-  ~MediaItem() { delete m_src; }
+  virtual ~MediaItem() { delete m_src; }
 
-  int GetNumChannels() { return m_src?m_src->GetNumChannels():0; }
-  int PropertiesWindow(HWND hwndParent) { return -1; } // todo: properties window
+  virtual PCM_source *Duplicate()
+  {
+    MediaItem *ni=new MediaItem;
+
+    ni->m_position=m_position;
+    ni->m_length=m_length;
+    ni->m_startoffs=m_startoffs;
+    ni->m_loop=m_loop;
+    ni->m_fade_in_len = m_fade_in_len;
+    ni->m_fade_out_len = m_fade_out_len;
+    ni->m_fade_in_shape = m_fade_in_shape;
+    ni->m_fade_out_shape = m_fade_out_shape;
+    ni->m_name.Set(m_name.Get());
+    ni->m_src = m_src ? m_src->Duplicate() : 0;
+    ni->m_volume=m_volume;
+    ni->m_pan=m_pan;
+    ni->m_ui_sel = m_ui_sel;
+
+    return ni;
+  }
+
+  virtual int GetNumChannels() { return m_src?m_src->GetNumChannels():0; }
+  virtual int PropertiesWindow(HWND hwndParent) { return -1; } // todo: properties window
 
   // times passed to these should be global time, i.e. area not relative to m_position
   // should handle out-of-bounds, too, and just return silence for those regions
-  void GetSamples(PCM_source_transfer_t *block);
-  void GetPeakInfo(PCM_source_peaktransfer_t *block);
+  virtual void GetSamples(PCM_source_transfer_t *block);
+  virtual void GetPeakInfo(PCM_source_peaktransfer_t *block);
 
+  virtual double GetLength() { return m_length; }
+  virtual int GetType() { return SOURCE_TYPE_MEDIAITEM; }
 
   double m_position;
   double m_length;
@@ -30,8 +54,6 @@ public:
   double m_fade_in_len, m_fade_out_len;
   int m_fade_in_shape, m_fade_out_shape; // shape 0 = linear, ...
 
-  double m_volume, m_pan;
-
   WDL_String m_name;
 
   PCM_source *m_src;
diff --git a/jmde/pcmsrc.h b/jmde/pcmsrc.h
index 5a904ac..abe3a3e 100644
--- a/jmde/pcmsrc.h
+++ b/jmde/pcmsrc.h
@@ -81,6 +81,8 @@ class PCM_section_source : public PCM_source
     virtual PCM_source *Duplicate() 
     {      
       PCM_section_source *ns=new PCM_section_source;
+      ns->m_volume=m_volume;
+      ns->m_pan=m_pan;
       ns->SetSource(m_src?m_src->Duplicate():0,m_startpos,m_length,m_edgeoverlap_time);
       return ns;
     }
@@ -131,6 +133,8 @@ class PCM_mixing_source : public PCM_source
     virtual PCM_source *Duplicate() 
     {
       PCM_mixing_source *ns=new PCM_mixing_source;
+      ns->m_volume=m_volume;
+      ns->m_pan=m_pan;
 
       int x;
       for (x = 0; x < m_channels.GetSize(); x ++)
@@ -174,6 +178,8 @@ class PCM_source_wavefile : public PCM_source
     virtual PCM_source *Duplicate()
     {
       PCM_source_wavefile *ns=new PCM_source_wavefile;
+      ns->m_volume=m_volume;
+      ns->m_pan=m_pan;
       ns->Open(m_filename.Get());
       return ns;
     }

* Most of my projects have a lifespan measured in minutes, sometimes exceeding an hour.

5 Comments

October 28, 2014
my own private can of worms

First, from a recent 'git log' command:

commit f94d5a07541a672b4446248409568c20bca9487d
Author: Justin <justin@localhost>
Date:   Sun Sep 11 21:52:27 2005 +0000

    Vss2Git

diff --git a/jmde/mediaitem.h b/jmde/mediaitem.h^*
new file mode 100644
index 0000000..52b8a8f
--- /dev/null
++ b/jmde/mediaitem.h
@@ -0,0 +1,37 @@
#ifndef _MEDIAITEM_H_
#define _MEDIAITEM_H_

#include "pcmsrc.h"
#include "../WDL/string.h"

class MediaItem 
{
public:
  double m_position;
  double m_length;

  double m_startoffs;
  double m_fade_in_len, m_fade_out_len;
  int m_fade_in_shape, m_fade_out_shape;

  double m_volume, m_pan;

  WDL_String m_name;

  PCM_source *m_src;
};

class AudioChannel
{
  WDL_PtrList<MediaItem> m_items;
  double m_volume, m_pan;
  bool m_mute, m_solo;
  WDL_String m_name;
  // recording source stuff, too
  // effect processor list

  // getsamples type interface
};


#endif

* Trivia: guess what jmde (JMDE) stands for?

..and to think, back when we used VSS we didn't even have commit messages! Soon after, "AudioChannel" became instantiable and went on to be known as "MediaTrack", and as one would hope many other things ended up changing.

Wow, 9 years have gone by.

I've been having a blast this week working on something that let me make this:

The interesting bit of this is not the contents of the video itself -- 3 hasty first-takes with drums, bass, and guitar, each with 2 cameras (a Canon 6D and a Contour Roam 2) -- but how it was put together.

I've spent much of the last week experimenting with improving the video features of REAPER, specifically adding support for fades and video processing. This is a ridiculously large can of worms to open, so I'm keeping it mostly contained in my office and studio.

Working on video features is reminding me of when I was first starting work on what would become REAPER: I was focused on doing things that I could use then and there for things I wanted to make. It is incredibly satisfying to work this way. So now, I'm doing it in a branch (thank you git), as it is useful for me, but so incredibly far from the usability standard that REAPER represents now (even if you argue that REAPER is poorly designed, it's still 100x better than what I've done this week). You can't go put half-baked, poor performing, completely-programmer-oriented video features into a 9 year old program.

The syntax has since been simplified a bit, but basically you have meta-video items which can combine other video items on the fly. So you can write new transitions or customize existing transitions while you work (which is something I love about JSFX).

I'm going to keep working on this, it might get there someday. Former Vegas fans, fear not, REAPER isn't going to become a video editor. I'm just going for a taste...

6 Comments

July 24, 2014
licecap 1.25beta 3

I just posted (to our prerelease site) LICEcap 1.25 beta 3, which includes support for using transparency for smaller images. I had some fun debugging this (including some very stupid mistakes on my part that took about an entire day to debug, oops).

I also had some fun writing logic to decide what to do when a pixel could be encoded as transparent, but also could be well-represented by an indexed color. Iniitially I had it only use the indexed value if the previous pixel was indexed, but it ended up being quite a bit better to do track the occurence of transparent pixels and pixels of that index, and use the one that is more common. There is probably a better algorithm to use here, but that saw some good gains. For comparison, I ran 1.24 and 1.25 beta 3 at the same time for a stupid demo video. The 1.24 version was 2.5MB, the 1.25 beta 3 version was 1.5MB. WIN. It ultimately is highly dependent on the content, though, so I might look at trying some other things out...

1 Comment

July 15, 2014
checking assumptions

Very often in computer code integers are divided by powers of two, such as 2, 4, 8, 16, 32, 64, and so on). These divisions are much faster than dividing by other numbers (since computers represent the underlying number in binary). In C/C++, there are two ways this type of division is typically expressed: normal division (x/256), or a shift (x<<8). These two methods produce the same results for non-negative values, and depending on the meaning of the code in question, (x/256) is often more readable, and thus I use it regularly.

If x is signed and is negative, division rounds towards 0, whereas the shift rounds towards negative infinity, but in situations where rounding of negative values is not important, I had generally assumed that modern compilers would generate similarly efficient code (reducing x/256 into a single x86 sar instruction).

It turns out, testing with a two modern compilers (and one slightly out of date compiler), this is not the case.

Here is the C code:

  void b(int r);
  void f(int v)
  {
    int i;
    for (i=0;i<v/256;i++) b(v > 0 ? v/65536 : 0);
  }

  void f2(int v)
  {
    int i;
    for (i=0;i<(v>>8);i++) b(v > 0 ? v >> 16 : 0);
  }

Ideally, a compiler should generate identical code for each of these functions. In the case of the loop counter, if v is less than 0, how it is rounded makes no difference. In the case of the parameter to b(), the code v >> 16 is only evaluated if v is known to be above 0.

Let's look at the output of some compilers (removing decoration and unrelated code). I've marked some code as bold to signify instructions that could be eliminated (with slight changes to the surrounding instructions):

Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Xcode 5.1.1 on OSX 10.9.4
Target: x86_64-apple-darwin13.3.0
Command line flags: -O2 -fomit-frame-pointer

f():                              # using divides
	cmpl	$256, %edi
	jl	LBB0_3            # if v is less than 256, skip loop

        # v is known to be 256 or greater. 

	movl	%edi, %ebx
	sarl	$31, %ebx         # ebx=0xffffffff if negative, 0 if non-negative
	movl	%ebx, %r14d    
	shrl	$24, %r14d        # r14d=0xff if negative, 0 if non-negative
	addl	%edi, %r14d       # r14d = (v+255) if v negative, v if v non-negative
	sarl	$8, %r14d         # r14d = v/256

	shrl	$16, %ebx         # this will make ebx 65535 if v negative, 0 if v non-negative
	addl	%edi, %ebx        # ebx = (v+65535) if v negative, v if v non-negative
	sarl	$16, %ebx         # ebx = v/65536

        # interestingly, the optimizer used its knowledge of v being greater than 0 to remove the ternary conditional expression completely.

	xorl	%ebp, %ebp
LBB0_2:
	movl	%ebx, %edi
	callq	_b
	incl	%ebp
	cmpl	%r14d, %ebp
	jl	LBB0_2
LBB0_3:



f2():                             # using shifts
	movl	%edi, %ebp
	sarl	$8, %ebp          # ebp = v>>8
	testl	%ebp, %ebp
	jle	LBB1_3            # if less than or equal to 0, skip

	movl	%edi, %eax
	sarl	$16, %eax         # eax = v>>16
	xorl	%ebx, %ebx
	testl	%edi, %edi
	cmovgl	%eax, %ebx        # if v is greater than 0, set ebx to eax

        # the optimizer could also have removed the xorl/testl/cmovgl sequence as well

LBB1_2:
	movl	%ebx, %edi
	callq	_b
	decl	%ebp
	jne	LBB1_2
LBB1_3:

In the first function (division), the LLVM optimizer appears to have removed the ternary expression (checking to see if v was greater than 0), likely because it knew that if the loop was running, v was greater than 0. Unfortunately, it didn't apply this knowledge to the integer divisions of v, which would have allowed it to not generate (substantial) rounding code.

In the second function (shifts), LLVM wasn't required to generate rounding code (as C's >> maps to x86 sar directly), but it also didn't use the knowledge that v would be greater than 0.

Microsoft (R) C/C++ Optimizing Compiler Version 18.00.30501 for x64
Visual Studio Express 2013 for Windows Desktop on Windows 7
Command line flags: /O2 (/Os produced different results, but nothing related to rounding)


f():                              # using divides

        mov     eax, ecx
        mov     ebx, ecx
        cdq                       # set edx to 0xffffffff if v negative, 0 otherwise
        movzx   edx, dl           # set edx to 0xff if v negative, 0 otherwise
        add     eax, edx          # eax = v+255 if v negative, v otherwise
        sar     eax, 8            # eax = v/256
        test    eax, eax
        jle     SHORT $LN1@f      # skip loop if v/256 is less than or equal to 0
        mov     QWORD PTR [rsp+48], rdi

        mov     edi, eax          # edi is loop counter
$LL3@f:
        test    ebx, ebx
        jle     SHORT $LN6@f      # if v is less than or equal to 0, jump to set eax to 0
        mov     eax, ebx
        cdq                       # set edx to 0xffffffff if v negative, 0 otherwise
        movzx   edx, dx           # set edx to 0xffff if v negative, 0 otherwise
        add     eax, edx          # eax = v+65535 if v negative, v otherwise
        sar     eax, 16           # eax = v/65536
        jmp     SHORT $LN7@f
$LN6@f:
        xor     eax, eax
$LN7@f:
        mov     ecx, eax
        call    b
        dec     rdi
        jne     SHORT $LL3@f

        mov     rdi, QWORD PTR [rsp+48]
$LN1@f:

f2():                             # using shifts
        mov     eax, ecx
        mov     ebx, ecx
        sar     eax, 8            # eax = v>>8
        test    eax, eax
        jle     SHORT $LN1@f2     # skip loop if v>>8 is less than or equal to 0

        mov     QWORD PTR [rsp+48], rdi
        mov     edi, eax
$LL3@f2:
        test    ebx, ebx
        jle     SHORT $LN6@f2     # if v is less than or equal to 0, jump to set ecx to 0
        mov     ecx, ebx
        sar     ecx, 16           # ecx = v>>16
        jmp     SHORT $LN7@f2
$LN6@f2:
        xor     ecx, ecx
$LN7@f2:
        call    b
        dec     rdi
        jne     SHORT $LL3@f2
        mov     rdi, QWORD PTR [rsp+48]
$LN1@f2:

VS 2013 generates different rounding code for the division, using cdq/movzx (or cdq/and if shifting by something other than 8 or 16 bits).

Also worth noting is that VS 2013 doesn't even bother moving the invariant ternary operator and (v/65536) or (v>>16) out of the loop. Ideally, it could move that calculation out of the loop, or remove the ternary operator completely. Ouch. I have to say, VS 2013 does seem to produce pretty good code overall, but I guess most of ours is heavily in floating point these days.

gcc 4.4.5
Linux x86_64
Command line flags: -O2 -fomit-frame-pointer

f():                              # using divides
        testl   %edi, %edi
        movl    %edi, %ebp        # ebp = edi = v
        leal    255(%rbp), %r12d  # r12d = v+255

        cmovns  %edi, %r12d       # set r12d to v, if v is non-negative (otherwise r12d was v+255)
        sarl    $8, %r12d         # r12d = v/256
        testl   %r12d, %r12d
        jle     .L14              # if r12d is less than or equal to 0, skip
        movl    %edi, %r14d
        xorl    %ebx, %ebx        # ebx is loop counter
        xorl    %r13d, %r13d
        sarl    $16, %r14d        # r14d = v>>16
.L13:
        testl   %ebp, %ebp
        movl    %r13d, %edi
        cmovg   %r14d, %edi       # if v is greater than 0, use v>>16 instead of 0
        addl    $1, %ebx
        call    b
        cmpl    %r12d, %ebx
        jl      .L13
.L14:

f2():                             # using shifts
        movl    %edi, %r12d
        sarl    $8, %r12d
        testl   %r12d, %r12d
        movl    %edi, %ebp
        jle     .L6               # skip loop if (v>>8) is less than or equal to 0
        movl    %edi, %r14d
        xorl    %ebx, %ebx        # ebx is loop counter
        xorl    %r13d, %r13d
        sarl    $16, %r14d        # r14d = (v>>16)
.L5:
        testl   %ebp, %ebp
        movl    %r13d, %edi
        cmovg   %r14d, %edi       # if v is greater than 0, use v>>16 instead of 0
        addl    $1, %ebx
        call    b
        cmpl    %r12d, %ebx
        jl      .L5
.L6:

gcc 4.4 does an interesting job, using lea to generate v+255, and then cmovns to replace it with v if v is non-negative. It doesn't bother generating rounding code for v/65536, but it does still generate rounding code for v/256, even though any non-positive result for v/256 is treated the same way throughout. Also, gcc doesn't eliminate the non-varying ternary expression, nor put the constant v/65536 or v>>16 outside of the loop.

Conclusions?

I'm not sure what to say here -- modern compilers can generate a lot of really good code, especially looking at floating point and SSE, but this makes me feel as though some of the basics have been neglected. If I were a better programmer I'd go dig into LLVM and GCC and submit patches.

I should have also tested ICC, but I've spent enough time on this, and the only ICC version we use is old enough that I would just regret not using the latest.

For comparison, here is what I would like to see LLVM generate for f():


f():                              # using divides
	cmpl	$256, %edi
	jl	LBB0_3            # if v is less than 256, skip loop

	movl	%edi, %ebx
	movl	%edi, %ebp
	shrl	$8, %ebx          # ebx = v/256, since v is non-negative
	shrl	$16, %ebp         # ebp = v/65536, since v is non-negative

LBB0_2:
	movl	%ebp, %edi
	callq	_b
	decl    %ebx
	jnz	LBB0_2
LBB0_3:

Performance-wise, I'm sure they wouldn't differ in any meaningful way, but the decrease in size would be nice.

Finally: write for the compiler you have, not the compiler you wish you had. When performance is important, use shifts instead of divides, or use unsigned types (which really should generate the same code for (x/256) vs (x>>8)). Move as much logic out of the loop as you can -- yes, the compiler might be able to do it for you, but why depend on that? But most important of all: test your assumptions.

5 Comments

July 2, 2014
This is ~~2014~~ backwards: OSX blit performance

I've been investigating, once again, the performance of drawing code-rendered RGBA bitmaps to NSViews in OSX. I found that on my Retina Macbook Pro (when the application was not in low-resolution legacy mode), calling CGContextSetInterpolationQuality with kCGInterpolationNone would cause CGContextDrawImage() to be more than twice as fast (with less filtering of the image, which was a fair tradeoff and often desired).

The above performance gain aside, I am still not satisfied with the bitmap drawing performance on recent OSX versions, which has led me to benchmark SWELL's blitting code. My test uses the LICE test application, with a screen full of lines, an opaque NSView, and 720x500 resolution.

OSX 10.6 vs 10.8 on a C2D iMac

My (C2D 2.93GHz) iMac running 10.6 easily runs the benchmark at close to 60 FPS, using about 45% of one core, with the BitBlt() call typically taking 1ms for each frame.

Here is a profile -- note that CGContextDrawImage() accounts for a modest 3.9% of the total CPU use:

It might be possible to reduce the work required by changing our bitmap representation from ABGR to RGBA (avoiding sseCGSConvertXXXX8888TransposeMask and performing a memcpy() instead), but in my opinion 1ms for a good sized blit (and less than 4% of total CPU time for this demo) is totally acceptable.

I then rebooted the C2D iMac into OSX 10.8 (Mountain Lion) for a similar test.

Running the same benchmark on the same hardware in Mountain Lion, we see that each call to BitBlt() takes over 6ms, the application struggles to exceed 57 FPS, and the CPU usage is much higher, at about 73% of a core.

Here is the time sampling of the CGContextDrawImage() -- in this case it accounts for 36% of the total CPU use!

Looking at the difference between these functions, it is obvious where most of the additional processing takes place -- within img_colormatch_read and CGColorTransformConvertData, where it apparently applies color matching transformations.

I'm happy that Apple cares about color matching, but to force it on (without allowing developers control over it) is wasteful. I'd much rather have the ability transform the colors before rendering, and be able to quickly blit to screen, than to have to have every single pixel pushed to the screen color transformed. There may be some magical way to pass the right colorspace value to CGCreateImage() to bypass this, but I have not found it yet (and I have spent a great deal of time looking, and trying things like querying the monitor's colorspace).

That's what OpenGL is for!
But wait, you say -- the preferred way to quickly draw to screen is OpenGL.

Updating a complex project to use OpenGL would be a lot of work, but for this test project I did implement a very naive OpenGL blit, which enabled an OpenGL context for the view and created a texture for drawing each frame, more or less like:

    glDisable(GL_TEXTURE_2D);
    glEnable(GL_TEXTURE_RECTANGLE_EXT);

    GLuint texid=0;
    glGenTextures(1, &texid);
    glBindTexture(GL_TEXTURE_RECTANGLE_EXT, texid);
    glPixelStorei(GL_UNPACK_ROW_LENGTH, sw);
    glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_MIN_FILTER,  GL_LINEAR);
    glTexImage2D(GL_TEXTURE_RECTANGLE_EXT,0,GL_RGBA8,w,h,0,GL_BGRA,GL_UNSIGNED_INT_8_8_8_8, p);

    glBegin(GL_QUADS);

    glTexCoord2f(0.0f, 0.0f);
    glVertex2f(-1,1);
    glTexCoord2f(0.0f, h);
    glVertex2f(-1,-1);
    glTexCoord2f(w,h);
    glVertex2f(1,-1);
    glTexCoord2f(w, 0.0f);
    glVertex2f(1,1);

    glEnd();

    glDeleteTextures(1,&texid);
    glFlush();

This resulted in better performance on OSX 10.8, each BitBlt() taking about 3ms, framerate increasing to 58, and the CPU use going down to about 50% of a core. It's an improvement over CoreGraphics, but still not as fast as CoreGraphics on 10.6.

The memory use when using OpenGL blitting increased by about 10MB, which may not sound like much, but if you are drawing to many views, the RAM use would potentially increase with each view.

I also tested the OpenGL implementation on 10.6, but it was significantly slower than CoreGraphics: 3ms per frame, nearly 60 FPS but CPU use was 60% of a core, so if you do ever implement OpenGL blitting, you will probably want to disable it for 10.6 and earlier.

Core 2 Duo?! That's ancient, get a new computer!
After testing on the C2D, I moved back to my modern quad-core i7 Retina Macbook Pro running 10.9 (Mavericks) and did some similar tests.

Normal: 12-14ms per frame, 46 FPS, 70% of a core CPU use
Normal, in "Low Resolution" mode: 6-7ms per frame, 58FPS, 60% of a core CPU use
Normal, without the kCGInterpolationNone: 29ms per frame, 29 FPS, 70% of a core CPU use
Normal, in "Low Resolution" mode, without kCGInterpolationNone: same as with kCGInterpolationNone.
GL: 1-2ms per frame, 57 FPS, 37% of a core CPU
GL, in "Low Resolution" mode: 1-2ms per frame, 57 FPS, 40% of a core CPU

Interestingly, "Low Resolution" mode is faster in all modes except for GL, where apparently it is slower (I'm guessing because the hardware accelerates the GL scaling, whereas "Low Resolution" mode puts it through a software-scaler at the end.

Let's see where the time is spent in the "Normal, Low Resolution" mode:

This looks very similar to the 10.8, non-retina rendering, though some function names have changed. There is the familiar img_colormatch_read/CGColorTransformConvertData call which is eating a good chunk of CPU. The ripc_RenderImage/ripd_Mark/argb32_image stack is similar to 10.8, and reasonable in CPU cycles consumed.

Looking at the Low Resolution mode, it really does behave similar to that of 10.8 (though it's depressing to see that it still takes as long to run on an i7 as 10.8 did on a C2D, hmm). Let's look at the full-resolution Retina mode:

img_colormatch_read is present once again, but what's new is that ripc_RenderImage/ripd_Mark/argb32_image have a new implementation, calling argb32_image_mark_RGB24 -- and argb32_image_mark_RGB24 is a beast! It uses more CPU than just about anything else. What is going on there?

Conclusions
If you ever feel as if modern OSX versions have gotten slower when it comes to updating the screen, you would be right. The basic method of drawing ixels rendered in a platform-independent fashion to screen has gotten significantly slower since Snow Leopard, most likely in the name of color-accuracy. In my opinion this is an oversight on Apple's part, and they should extend the CoreGraphics APIs to allow manual application of color correction.

Additionally, I'm suspicious that something odd is going on within the function argb32_image_mark_RGB24, which appears to only be used on Retina displays, and that the performance of that function should be evaluated. Improving the efficiency of that function would have a positive impact on the performance of many third party applications (including REAPER).

If anybody has an interest in duplicating these results or doing further testing, I have pushed the updates to the LICE test application to our WDL git repository (see WDL/lice/test/).

Update: July 3, 2014
After some more work, I've managed to get the CPU use down to a respectable level in non-Retina mode (10.8 on the iMac, 10.9/Low Resolution on the Retina MBP), by using the system monitor's colorspace:

    CMProfileRef systemMonitorProfile = NULL;
    CMError getProfileErr = CMGetSystemProfile(&systemMonitorProfile);
    if(noErr == getProfileErr)
    {
      cs = CGColorSpaceCreateWithPlatformColorSpace(systemMonitorProfile);
      CMCloseProfile(systemMonitorProfile);
    }

Using this colorspace with CGContextCreateImage prevents CGContextDrawImage from calling img_colormatch_read/CGColorTransformConvertData/etc. On the C2D 10.8, it gets it down to 1-2ms per frame, which is reasonable.

However, this mode is appears to be slower on the Retina MBP in high resolution mode, as it calls argb32_image_mark_RGB32 instead of argb32_image_mark_RGB24 (presumably operating on my buffer directly rather than the intermediate colorspace-converted buffer), which is even slower.

Update: July 3, 2014, later
OK, if you provide a bitmap that is twice the size of the drawing rect, you can avoid argb32_image_mark_RGBXX, and get the Retina display to update in about 5-7ms, which is a good improvement (but by no means impressive, given how powerful this machine is). I made a very simple software scaler (that turns each pixel into 4), and it uses very little CPU. So this is acceptable as a workaround (though Apple should really optimize their implementation). We're at least around 6ms, which is way better than 12-14ms (or 29ms which is where we were last week!), but there's no reason this can't be faster. Update (2017): the mentioned method was only "faster" because it triggered multiprocessing, see this new post for more information.

As a nice side effect, I'm adding SWELL_IsRetinaDC(), so we can start making some things Retina aware -- JSFX GUIs would be a good place to start...

5 Comments

December 14, 2013
a month later, midi2osc becomes OSCII-bot

midi2osc, as mentioned in the last post, got some updates which made its name obsolete, particularly the ability to send MIDI and receive OSC, so it has now been renamed OSCII-bot. Other mildly interesting updates:

OSX version
Can load multiple scripts, which run independently but can share hardware
Better string syntax (normal quotes rather than silly {} etc), user strings identified by values 0..1023
Better string manipulation APIs (sprintf(), strcpy(), match(), etc).
match() and oscmatch(), which can be used for simple regular expressions with the ability to extract values
Ability to detect stale devices and reopen them
Scripts can output text to the newly resizeable console, including basic terminal emualtion (\r, and \33[2J for clear)
Vastly improved icon

I'll probably get around to putting it on cockos.com soon enough, but for now a new build is here. Read the readme.txt before using for instructions.

The thing I'm most excited about, in this, is the creation of eel_strings.h, which is a framework for extending EEL2 (the scripting engine that powers JSFX, for one) to add string support. Adding support for strings to JSFX will be pretty straightforward, so we'll likely be doing that in the next few weeks. Fun stuff. Very few things are as satisfying as making fun programming languages to use...

8 Comments

November 14, 2013
another quick project - Cockos midi2osc

Making installers and web pages is too much of a pain for a small project, so:

Here's another project, this one took less than 24 hours to make. It's a little win32 program called "midi2osc" (license: GPL, binary included, code requires WDL to compile), and it simply listens to any number of MIDI devices, and broadcasts to any number of destinations via OSC-over-UDP.

MIDI and OSC use completely different types of encoding -- MIDI consists of 1-3 byte sequences (excluding sysex), and OSC is encoded as a string and any number of values (strings, floats, integers, whatever). It would be very easy to make a simplistic conversion of every MIDI event, such as 90 53 7f being converted to "/note/on/53" with an integer value of 7f, and so on. This would be useful, but also might be somewhat limited.

In order to make this as useful as possible, I made it use EEL2 to enable user scripting of events. EEL2 is a fast scripting engine designed for floating point computation, that was originally developed as part of Nullsoft AVS, and evolved as part of our Jesusonic/JSFX code. EEL2 compiles extremely quickly to native code, and can have context that is used by code running in multiple threads simultaneously.

For this project the EEL2 syntax was extended slightly, via the use of a preprocessor, so that you can specify format strings for OSC. For example, you can tell REAPER to set a track's volume via:

    oscfmt0 = trackindex;
    oscsend(destination, { "/track/%.0f/volume" }, 0.5);

Internally, { xyz } is stored to a string table and inserted as a magic number which refers to that string table entry. It is cheap, but it works.

Other than that pretty much everything else was a matter of copying and pasting some tedious bits (win32 MIDI device input, OSC message construction and bundling) and writing small bits of glue.

Since writing this, I've found myself fixing a lot of small OSC issues in REAPER. I always tell people how using the thing you make is very important -- I should update that to include the necessity of having good test environments (plural).

Why did I make this? My Zoom R24. This is a great device, but the Windows 7/64 driver has some issues. Particularly:

If the MIDI input device of the R24 is not opened, audio playback is not reliable. This includes when listening to Winamp or watching YouTube. So basically, for this thing to be useful, I need something to keep hitting the MIDI device constantly. So for you Zoom R24 win64 users who have this problem, midi2osc might be able to fix your problems.
If REAPER crashes or is otherwise debugged with the MIDI device open, the process hangs and it's a pain to continue. Moving the MIDI control to a separate process that can run in the system tray = win.
(not midi2osc related): I wish the drum pads would send MIDI too... *ahem*

As a result of this, the midi2osc.cfg that comes in the .zip represents basic support for the R24:

// @input lines:
// usage: @input devicenameforcode "substring match" [skip]
// can use any number of inputs. devicenameforcode must be unique, if you specify multiple @input lines
// with common devicenameforcode, it will use the first successful line and ignore subsequent lines with that name
// you can use any number of devices, too

@input r24 "ZOOM R"


// @output lines
// usage: @output devicenameforcode "127.0.0.1:8000" [maxpacketsize] [sleepamt]
// maxpacketsize is 1024 by default, can lower or raise depending on network considerations
// sleepamt is 10 by default, sleeps for this many milliseconds after each packet. can be 0 for no sleep.

@output localhost "127.0.0.1:8000"

@init

// called at init-time
destdevice = localhost; // can also be -1 for broadcast

// 0= simplistic /track/x/volume, /master/volume
// 1= /r24/rawfaderXX (00-09)
// 2= /action/XY/cc/soft (tracks 1-8), master goes to /r24/rawfader09
fader_mode=2;

@timer

// called around 100Hz, after each block of @msg

@msg

// special variables:
// time (seconds)
// msg1, msg2, msg3 (midi message bytes)
// msgdev == r24  // can check which device, if we care

(msg1&0xf0) == 0xe0 ? (

  // using this to learn for monitoring fx, rather than master track
  fader_mode > 0 ? (
     fmtstr = { f/r24/rawfader%02.0f }; // raw fader
     oscfmt0 = (msg1&0xf)+1;

     fader_mode > 1 && oscfmt0 != 9 ? (
       fmtstr = { f/action/%.0f/cc/soft }; // this is soft-takeover, track 01-08 volume
       oscfmt0 = ((oscfmt0-1) * 8) + 20;
     );

     val=(msg2 + (msg3*128))/16383;
     val=val^0.75;
     oscsend(destdevice,fmtstr,val);
  ) : (
     fmtstr = (msg1&0xf) == 8 ? { f/master/volume } : { "f/track/%.0f/volume"};
     oscfmt0 = (msg1&0xf)+1;
     oscsend(destdevice,fmtstr,(msg2 + (msg3*128))/16383);
  );
);

msg1 == 0x90 ? (
  msg2 == 0x5b ? oscsend(destdevice, { b/rewind }, msg3>64);
  msg2 == 0x5c ? oscsend(destdevice, { b/forward }, msg3>64);

  msg3>64 ? (
    oscfmt0 = (msg2&7) + 1;

    msg2 < 8 ?  oscsend(destdevice, { t/track/%.0f/recarm/toggle }, 0) :
      msg2 < 16 ?  oscsend(destdevice, { t/track/%.0f/solo/toggle }, 0) :
        msg2 < 24 ?  oscsend(destdevice, { t/track/%.0f/mute/toggle }, 0) : 
    (
      msg2 == 0x5e ? oscsend(destdevice, { b/play }, 1);
      msg2 == 0x5d ? oscsend(destdevice, { b/stop }, 1);
      msg2 == 0x5f ? oscsend(destdevice, { b/record }, 1);
    )
  );
);

msg1 == 0xb0 ? (
  msg2 == 0x3c ? (
    oscsend(destdevice, { f/action/992/cc/relative }, ((msg3&0x40) ? -1 : 1));
  );

);

The 9th fader sends "/r24/rawfader09" because I have that OSC string mapped (with soft-takeover) to a volume plug-in in my monitoring FX chain.

6 Comments

October 7, 2013
papier tigre

Digging the album "Recreation" by Papier Tigre. Here's a pretty good live video (bad mix from one side of the stage, but, it's still quite listenable IMO):

Comment...

March 27, 2013
REAPER development fun

Using stuff coming in the next release (though it is not obvious what that stuff is, aside from webm encoding, due to the nature of it, but I didn't play it nearly this well live, and of course it only took a handful of minutes of recording and couple hours to do start to finish):

2 Comments

March 7, 2013
my bass rides in the jump seat of an airbus

2 Comments

July 27, 2012
and yet another crappy music video

Comment...

July 18, 2012
another crappy music video

1 Comment

February 28, 2012
twitter

As a bit of an experiment, and after having set up a Twitter feed for Cockos, I'm probably going to start posting more random things (that I wouldn't bother updating this blog for) to twitter.com/lejustinfrankel... All that I have so far is a link to a mp3 I made today. Maybe I'll complain about [insert random API here] from time to time, too. The big meaty stuff will go here, still, though. For example, a discussion I've had with a friend:

Someone really needs to make an open system (think web, or email) for social (or anti-social) content publishing. Imagine a world where all of the content of each user on Facebook can be hosted and delivered by the provider of your choice (or yourself), and where the privacy controls and failures are not controlled by one monolithic company.. *cough* hurry up people *cough*.. The hard part, of course, would be getting such a thing to the size where it is useful, but there should be enough people out there interested. Speaking of which, this Onion video is absolutely brilliant:

Recordings:

staring makes things disappear

9 Comments

February 11, 2012
jersey

from my last ride

2 Comments

January 11, 2012
music and claymation

I saw this excellent claymation music video:

...which hooked me. "Portugal. The Man" is great, I've listened to this album ("In The Mountain In The Cloud") more times than I can count:

...and the animator of the first video, Lee Hardcastle, has some fantastic videos. Check out Pingu's The Thing (which is quite popular on YouTube, so you might have already seen it):

Sorry for the flood of YouTube embeds, but I enjoyed them greatly, and having already spammed many people with links, felt compelled to continue.

1 Comment

December 23, 2011
Big Ideas

I have occasionally found myself in conversation, often in the presence of alcohol, about the ownership and value ideas. This post will attempt to document my current state of mind in these matters.

We routinely say "I have an idea", but I assert that nobody can own an idea. The closest one can come to owning an idea is to have private (possibly exclusive) possession of it. Sitting on an idea (or indeed implementing it and using it privately) could prevent others from implementing the idea, but it also would not prevent them. What is interesting about this, too, is that one would have no way of knowing if they had exclusive possession of the idea, since others could also possess it privately.

I've often heard things such as: "If I have an idea for something great, I should be able to benefit from it." I don't think it is that simple, nor do I think it should be. I like to imagine it from the perspective of conservation of energy. In my opinion, "having an idea" doesn't cost anything -- there's no work done, no trial and error, no refinement, no experimentation, it's purely the creation of an abstract concept. All of the work, all of the energy required to develop the idea into something real, that happens after having an idea. All of the work of implementing (or at least designing an implementation or possible implementation) is the where the value is created, and that is from what a person should be able to derive benefit.

If you suppose for a moment, that someone could have some sort of exclusive right over an idea, what would that actually mean? Could they prevent other people from doing anything that could be conceivably based on that idea? Could they demand a share of any derived revenue, or control? Could they demand credit? For how long? Ugh, chaos follows. The world would grind to a halt due to this complexity. The advantage you get for having an original (or at least somewhat original) idea is a slight head start.

An idea is something that one can benefit from, but that the rest of society also has the same opportunity for benefit. This actually makes me quite happy.

What does one do if they have an idea and want to make it into something real, but have no applicable skills? Hire people. No money? Make non-disclosure agreements or other contracts which will help protect what other people do with the information you given to them, and prevent them from doing other things that could possibly relate to the idea. This is a joke, though, few talented developers will agree to these sort of terms. Both sides need to have something of value to offer, and ideas are not value held by either side, because they are not able to be owned.

A friend brought the subject of Facebook up after seeing The Social Network.

Facebook became huge because of how it was made and marketed.
The idea for Facebook wasn't a new, nor original, idea.
Even if it was some brand new idea, it doesn't matter, since nobody can own the idea.

TL;DR: Ideas are worth a lot to society, but not much to individuals. Execution is the opposite.

Finally, some advice for anybody who wants to make things and profit from them: figure out something you can contribute; ideas aren't enough. If you're content to just contribute to society: publish your ideas, let people use them, and hope for the best.

(also, a related post that I previously linked to and agreed with, but the notion that the worth of ideas differs for society as a whole vs the "owner" is new for me)

7 Comments

March 8, 2011
another one

funny days (the internet), includes samples from this video.

Recordings:

funny days

Comment...

February 13, 2011
Unauthorized Addition

I recorded some drums to the audio from this video, and sent it to the author, but the author hasn't responded, and to the best I can tell, hasn't listened to the mp3. So I'm posting it here:

Chris Jeffries - I think I'm Alive (drums added).

My preference for what happens is:

the original author notices and does not mind.
the original author does not notice
the original author notices and minds
the original author does not notice and minds

That is all.

2 Comments

November 19, 2010
A video from last night:

I had a wonderful time.

Apologies for the lousy video quality, and the rough audio mix.

6 Comments

November 18, 2010
So excited...

(and here is the video for a song I mentioned in a previous post):

4 Comments

August 17, 2010
I am drinking kool-aid

This is a good video for anybody who does software development:

Some people I know don't like Linus when they've watch this, but I think he's awesome, even though he called me stupid and ugly. He was right, I guess.

Using SVN was a great thing for me, as I'd constantly diff my work to make sure it was what I wanted. It also (obviously) enables collaboration.

Git, however, is utterly awesome, an order of magnitude more useful. Branches in SVN were a huge pain, we rarely used them. In Git, you can actually use them, effectively and without having to deal with nonsense, it is fantastic.

It is fast, efficient at storing data, easy to synchronize and automate backups, I love it.

The only downside I see is that TortoiseSVN doesn't exist for it, TortoiseGit is getting there, from what I hear, but I've just been using the command line thus far.

Anyway, I'm just giddy with it. I would say life changing, but that would be overdramatic. It is work-changing, I guess.

4 Comments

August 11, 2010
Home made iPhone tripod

I find my iPhone 3GS does a decent job as a video camera, so I made this:

4 Comments

July 30, 2010
Yes, I love technology

Here's a youtube collaboration I accidentally participated in:

I must say, these are exciting times... We have all kinds of crazy tools, it is so awesome.

1 Comment

July 13, 2010
Buggy fmod() with Visual C++ 2005/2008 targeting x64

I am posting this in case anybody debugging something needs to find it -- I did find mention of it on some Java related site, but nothing conclusive. This may affect VC2010, too, but I haven't tested it.

While VC 2005/2008 targeting x64 generates SSE code for floating point code, fmod() still uses the x87 FPU, and more importantly it assumes that the divide by 0 exception flag is clear going in (meaning if it is set prior to the call, the call will throw an exception or return #.IND regardless of the input). Apparently they assume that since the compiler won't possibly generate code that would cause the divide by 0 floating point exception flag to be set, then it would safe to assume that flag will always be clear. Or it could be a typo. If you use assembly code, or load a module compiled with another compiler that generates x87 code, this can be a huge problem.

Take this example (hi.cpp):

#include <stdio.h>
#include <math.h>

extern "C" void asmfunc();

int main() {
  asmfunc();
  printf("%f\n",fmod(1.0,2.0));
  return 0;
}

and hihi.asm (compile with nasm -f win64):

SECTION .text

global asmfunc
asmfunc:
  fld1
  fldz
  fdivp
  fstp st0
  ret

Compiling this (cl.exe hi.cpp hihi.obj) and running it does not print 1.0, as it should.

The solution we use is to call 'fclex' after any code that might use the FPU. Or not use fmod(). Or call fclex before fmod() every time. I should note that if you use ICC with VC200x, it doesn't have this problem (it presumably has a faster, correct fmod() implementation).

6 Comments

June 7, 2010
LICEcap!

We've just released a new piece of open source software for Windows, called LICEcap! It allows one to create animated screen captures. I know, there's a lot of software out there that does this already, but none of them are both free and meet my needs, so we made LICEcap.

LICEcap has a nice UI (in that you position/size the window where you want to capture, and can move it around while recording). We support writing to .GIF directly (big thanks/credit/blame to Schwa for getting the palette generation working as well as it does), as well as to a new format called .LCF.

LCF compresses by taking a series of frames, say, 20 frames, and then dividing each frame into slices, approx 128x16px each. Each slice is then compared to the same slice on the previous frame, and (if different) encoded directly after the previous frame. zlib is used to remove redundancy (often slices don't completely change from frame to frame, i.e. scrolls or small updates will compress very well). This is all done in 16bpp, and the end result is quite good compression, and lossless (well, 16bpp lossless) quality. REAPER supports playing back the .LCF files, too. The biggest down side is high memory use during compression/decompression (20 frames of 640x480x16bpp is about 12MB, and for smooth CPU distribution you end up using twice that).

I should mention that the primary reason for us making this tool was the desire to post animated gifs of new features in REAPER with the changelog. Hopefully we'll follow through on that.

On a related note, tomorrow (or soonish), I plan to post my latest additions on how to make OS X applications not perform terribly (new one: avoid avoid AVOID CGBitmapContextCreateImage() like the plague. HOLY CRAP it is bad to use). Apple: please, for the love of God, either make your documentation a Wiki, or hire someone who actually writes (multi-platform) applications with your APIs to write documentation.

9 Comments

May 19, 2010
this is the best movie i've ever seen

Recordings:

freeform jam with brennewt

3 Comments

March 4, 2010
eeePC 901

I got a while back this ASUS eeePC 901, with a 1.6ghz Atom and 20GB of SSD. When I first got it I upgraded the RAM to 2GB, and installed Windows XP. It wasn't that great, so then I tried Win7 pro on it, which almost worked, but would just freeze for lengths of time for no apparent reason. The keyboard is tiny and has a very different arrangement from what I'm used to. It sat idle for a while, and since I have been developing on Linux (hello, SWELL/generic), I decided to install Ubuntu on it. The new verdict:

I love it. It's reasonably fast for compiling, installing new stuff is easy (apt-get ftw), Firefox w/ flash is fine, Xchat is decent enough, the stock mail client is very usable. The best part is that the battery life is fantastic, the screen is bright, there are very little moving parts, and it feels really solid. Yes, a bit like a toy, but I'm not worried about breaking it. Anyway, I'm fully on board with the netbook thing. Maybe next time I'd get one with a slightly larger keyboard, though...

Oh yeah and the other part of what I'm saying is: I'm definitely appreciating where Linux distributions for the desktop have gotten. Almost there... ;)

March 4, 2010
which reminds me: strict aliasing

I get that the "strict aliasing" optimization of recent C/C++ standards allow for great optimization. And I get that gcc has some anciently-designed optimizing, but at any rate, it annoys me that gcc will detect strict-aliasing violating code, and still go ahead and generate code that is obviously wrong -- i.e. when it knows that two pointers ARE in fact pointing to the same memory, it assumes that they can't possibly, and optimizes as if they don't. LLVM probably doesn't have the same problem, heh. Oh well I'll use -fno-strict-aliasing and meanwhile go through and use unions (and occasionally C++ templates) to make our stuff compatible with strict aliasing optimizations.

Of course, on performance sensitive code this is a huge time sink -- I ended up (on our anti-denormal code) looking at the output of many iterations of the same code on gcc i686, x86_64, vc6+icc10, vc2005+icc10, icc11 on osx, gcc ppc, etc, to try to find source code that worked properly and produced decent assembly code. The variety of code produced by each combination is staggering. Also, I found that often the code I thought would be fastest was not, when benchmarked. Oh well.

9 Comments

February 22, 2010
Ideas vs Execution

A better delivered post of something I've been telling people for years -- that execution is way, WAY more important than ideas. Yes, the ideas still need to be reasonable, but beyond that, it's how you do what you do that counts.

10 Comments

February 4, 2010
The mighty iPhone vs the Nokia n900

After seeing a slashdot post about people running OS X on the Nokia n900, I read some more info about the n900. It seemed like great hardware, and was debian-linux based, so it seemed like a good platform to play with. Enticed, I found it on Nokia's site, complete with a 14 day return policy.

I should mention, I have/use an iPhone 3GS. Apple ends up pissing me off to no end, but I really end up liking the 3GS. It's a great phone/browser/apprunner/notetaker/calendar/ipod/etc. If it wasn't locked down so tight, I would like it even more. So really I end up disliking the idea of the iPhone, but liking it in reality.

The n900 is pretty much the opposite -- the idea is great. Having a phone I can ssh into and install g++ and make on and build stuff and run on, is great. On paper, everything's there. This is what I found:

Screen: the screen looks good. It's high resolution, but the touch sensitivity of it isn't great, it ends up feeling clunky.
Storage: on paper it has 32GB of flash. This is great. What's stupid is that the root fs is only 256mb of NAND memory, and while you can install extra packages via apt-get etc, if those packages aren't carefully designed, it ends up filling your root filesystem. Even the obvious things like making /var/cache/apt point to the big disk, they could do, but haven't. So basically you have to do one of many hacks if you wish to install much. The biggest thing I found was moving various /usr/[share|lib|bin]/xxx directories -- all stuff nonessential to booting -- to the bigger disk and symlinking them. Anyway, it's dumb that you should have to do this. Complete pain in the ass. I eventually got everything I wanted installed, but if the point is to have an open extensible phone, you gotta make it do that out of the box.
WiFi: support seems solid. When the phone is sleeping, you can still ssh to the phone (if you installed OpenSSH, which is easy). RAD.
SSH: awesome. Fast, the thing really feels quick. It is 600mhz, and for command line linux that is super fast. I remember my P133 being quick, too.
Web: the browser is pretty solid, and flash support kinda works (wasn't really fast enough for YouTube, but there seems to be an app for this).
Keyboard is usable. Better than the iphone's, for me, but not fantastic either.
No AT&T 3G support. I don't care whose fault it is, but come on?!.
Camera: quality was decent. The video recording was pretty good, sound seemed better than the iPhone 3GS's. Here's a youtube video we did as a test (apologies for the content).
General UI: Some things are super fancy (nice blurs, transitions), but other things are not even half baked. There's a nice standard for "close window" or "go back" button locations, but 90% of the time the buttons for those are not visible, yet if you click them they are there. Silly stuff.
Multitasking. Mmmm. so good. I like. Hear that, Apple? It's not even a pad...

So I sent the thing back. It didn't last 24 hours. The dealing with root fs, I could get past that. The lack of AT&T 3G support, that made the decision easier. I really tried. I wanted to like it.

6 Comments

February 4, 2010
REAPER on TV

Thanks to a user on the REAPER forums: 1000s of ways to die: Greateful Bed.

Doesn't seem like a show I'd watch, but amusing to see REAPER on it, at any rate. Now if only it ended up on It's Always Sunny in Philadelphia. Then we would have made it.

1 Comment

December 19, 2009
xmas

I wish I could provide as good a present to the world this Christmas as this:

Jason Lytle's "Merry Xmas 2009" :)

I'm listening now, very happily.

Edit: updated the URL. mmm I cant wait for his next album too.

2 Comments

November 22, 2009
an old photo

I took this in 2000, on slide film.. but here it is, processed and uploaded with snapease heh (testing it)

Comment...

November 13, 2009
A show I saw in August

Dungen!!!

Video was taken using an iPhone 3GS, and the audio using a Zoom H2, then I stuck em together, mmm. There's another video on the YT from the same show, too...

Speaking of other people I love and would likely stalk if I wasn't lazy/busy/shy:

Jason Lytle

I saw you at The Independent last month, and while I was disappointed that you were opening (rather than the main), I was completely enchanted with your set. Even without the full band you had previously at Cafe du Nord. As my friend Dave and I were leaving to walk home while the main act was playing, we saw you outside, and I resisted the urge to give you a hug and all of the money I had in my pocket. ***** redacted for my own good***** ok I've made enough of a fool out of myself.

4 Comments

April 9, 2009
Nazi Zombies

I really really like Call of Duty: World At War's Nazi Zombie minigame. It's so good, the most fun I've had playing video games in years.

Recordings:

brenchr - 1 -- [6:07]
brenchr - 2 -- [101:53]

Comment...

February 23, 2009
Excited, waiting for UPS (aka Brown Santa) to come...

I ordered a Digilent Nexys2, and have been reading up on HDLs (primarily Verilog since it seems closer to C than VHDL).. Mmmm. Should be fun.

Have a few ideas for things to try, but I won't post them here just yet.

P.S. a song.

1 Comment

February 20, 2009
Consume had a bit of rehearsal

Here we go, from the other day -- the mix isn't ideal but it's a start.

Got a bit of projector action with AVS + IRC on it, the music room cleaned up, and lots of fancy abuse of technology happening.

Time to write some more songs though perhaps...

Recordings:

chr - 1 -- [29:20]
chr - 2 -- [21:28]

1 Comment

December 2, 2008
Zoom H2 modding

I opened my Zoom H2 up, removed the 4 mics, connected the wires to a 9 pin serial connector, and put it back together...

Here it is in a picture with the original mics connected to a DB9 plug via speaker wire (it's proof of concept, hey, I know this is crappy).

I'm going to make a DB9 plug to 4x XLR breakout cable today. There is no phantom power (just a piddly 1.9v or so), so it'll have to be used with dynamic mics, but the preamps seem to work fine with the SM57 I have here. Also need to figure out where to connect the XLR's pin 1 (well, it'll go to the DB9 casing, but then on the inside the H2 will need to find a good place to ground that).

If only the (otherwise awesome) ASIO driver supported 4ch input!

1 Comment

April 4, 2008
a remix someone sent me!

Apparently someone remixed the vocals to the song "Ode To ZB" (which was an improv with me and Dave Biderman), and it appears it was played on some faraway radio station:

Kivonat Radio

The original track is here:

Ode to ZB

Woohoo!! So awesome!

2 Comments

April 3, 2008
wtf

Radiohead's Remix Thing looks neat a glance. But I think they missed the boat here and should've done it differently.

1) Don't make us use iTunes, for the love of God (sorry, Steve, but you know I'm right). 7digital or whoever else would've been forgivable.

2) Making people PAY for stems is dodgy. It would be one thing if we could buy stems for our own enjoyment of the whole album.. THAT would be worth buying... I understand they probably want to not be paying for the bandwidth--but seriously, use BitTorrent then or something. Nevermind that, if they didn't want to pay for bandwidth, why would they have all the remixes on their site? None of them seem to play for me, so maybe they didn't get enough...

3) The terms of the remix site are pretty terrible. I mean, giving them total ownership of everything that you upload just sucks. It's completely one-sided. Not only do I have to PAY them for stems, but anything I give back to them they get ALL rights to, and I get absolutely NONE? This is a tough sell.

Anyway, it's just disappointing. With just some slight changes this could feel like so much less of an marketing stunt and more like something legit and good.

2 Comments

January 24, 2008
no time prototype video

My old Canon XL1 finally gets some use.. I love putting it in full-manual mode, mmm.

Shot this in 45m, then spent an hour or so editing:

It's actually a lot better in high quality form (17MB XVID AVI).

Perhaps if I'd actually written a song ahead of time it would have been better...

2 Comments

January 1, 2008
HAPPY NEW YEAR(S)

(this serves as a reply to those who texted me HNY and didn't get a response from me).

The Radiohead NYE special was (predictably?) good. Mmm. My tivo won't erase that for a while.

We migrated to SVN for version control a while back. Definitely liking it, and Tortoise SVN rules. No good free mac SVN clients I've found, though (anybody?). So for now the command line version isn't bad. I can't believed I'd used VSS and/or SourceOffSite for over 8 years. Eep.

I've set up an SVN server for use with audio and REAPER projects, to aide in some collaboration. It's sort of working, although I think most people aren't used to version control.

Someone REALLY needs to make a web site where you can upload projects with media, then other people can make derived versions, and upload them, and you can go through the whole tree of projects etc. Seriously. Either using something like SVN or whatever. Would be awesome to open up that sort of collaboration. I know there are sites out there doing half of this, but I haven't seen it done really well.

Oh and I never posted this link here, our show we played in November:

But now sadly Christophe has fled the country off to a beach somewhere, and we're lonely and drummerless.

What else? Well it'll have to wait til next time.

Recordings:

freeform jam with biderman

5 Comments

November 20, 2007
wooo

We played a fun show last night, should have some video up soon.

Man this is funny shit though (which I first read on At Ease), Lily Allen complaining about Radiohead, "It's arrogant for them to give their music away for free - they've got millions of pounds. It sends a weird message to younger bands who haven't done as well.".

YEAH! THATS MESSED UP! And all those people making open source software shouldn't do that either, because it's unfair to all of the soon-to-be Microsoft's of the world as well!

(As a side note, sorry Craig, I know you like Lily Allen, but think how this could be directed at Prince, too)..

The comments on that At Ease article page are hilarious, though. Some great reading, especially if you are putting of write a tab control wrapper layer...

2 Comments

November 8, 2007
blogness

OK I'm going to start using this more so REAPER users can see what's happening with development. The last bit of time has been spent on Mac porting. To do the mac port we are developing some software called SWELL, which is part of WDL.

SWELL allows you to easily adapt Windows code to target Quartz natively. It's not trying to be completely compatibility like WINE or WINELib, but rather is focusing on providing the minimal subset, with maximum efficiency and minimal overhead. I'm actually getting into it, too.

If you are a Windows developer considering porting stuff to OS X, you should check it out. Well, actually wait a few weeks cause we're in the process of making it a LOT better. :) Oh and don't forget to remap XCode's keys to make it behave more like MSVC!

1 Comment

November 3, 2007
holy shit the daily show rules

Been watching some old Daily Shows (which I might add is fucking great that they put them ALL online! HOLY CRAP! AWESOME!).

Just watched election night coverage from 2000, and man, Stephen Colbert CALLED IT. Watch this video:

Now I know you could say he was just joking, but holy shit.

2 Comments

[ older results... ]