-O2
. This takes longer to compile (but not much) and the
speed difference is pretty big over -O0
or -O1
.
-O3
is also available, but it goes nuts with the inlining of
functions, and that can blow out your cache pretty well. Give it a try and
time it both ways.
-m386
or -m486
. Pick which machine you are targeting. It'll still
work on either one if it's run on the other. Use -m486
for Pentium and up, too.
-fomit-frame-pointer
if and only if you will not be using:
-funroll-loops
. I used to think -O3
would turn this on, but it doesn't.
Do not just turn this on for the hell of it, though. Time the code before and
after. It speeds up loops on 486's but won't have as much effect on Pentiums
and up. And the extra code size may have cache side effects. But in my code,
I usually turn this on for the tight graphics loops.
-S
. This option causes gcc to emit the assembler code it would feed into
its assembler into a .s
file. Look at this. Find out exactly
what is being generated.
__djgpp_nearptr_enable()
. WARNING! This command turns
off all memory protection! You could blow things up bad! Of course,
if you're used to complete lack of memory protection, you'll live._dosmemput()
.
int
s and 8-bits chars
(chars
don't slow it down, just shorts
.
This is because DJGPP runs your code in a 32-bit segment and it must issue
a register size override prefix (which stalls the pipeline) to specify that
the register width differs from the segment width.)
outport
s, you can try using
CWSDPR0. It runs your app at ring 0, which speeds up port accesses. The
drawback: No virtual memory. But if you're going for performance, disk swaps
would kill you anyway. It also locks all memory, which is nice for when
you want interrupt handlers and don't want to deal with locking every byte
they touch. You can use stubedit to force your binary to load it instead of
CWSDPMI.EXE. However, this won't help you in Windows or OS/2 DOS boxes.
for (i = len; i; i--)instead of
for (i = 0; i < len; i++)Otherwise len must either be kept in a register or loaded from memory every time.