Downsizing some FreeDOS utils

Eric Auer writes:

Hi, apart from upx-ing files, I think a few of our files also just contain too many libraries. In addition, SWSUBST / JOIN / SUBST could be done in a way that there is, for example, join.bat which calls swsubst with some /you-are-now-join argument. SWSUBST currently uses its own filename to determine which syntax it should accept. As DOS does not allow symlinks or hardlinks, you have to have 3 copies of this 54k file if you want all 3 commands.

So here we go...:

WCD is 60k, TAIL 55k, FDISK 36k, LABEL 26k, FC 23k, CHOICE 17k, MORE 15k, [mklink crashes DOSEMU, what is that?], PAUSE 14k, POPD 13k, PUSHD 10k, my COUNT which does not even fully work 10k, VOL 7k, ... other files seem to be as big as they have to be, although there are several files that are surprxisingly -small- for that they are able to do.

Any ideas why those files are that big? I would be glad if somebody succeeded in compiling them with other options or less libraries to create smaller versions of them, thanks.

VOL, PUSHD, POPD and PAUSE are all available as FreeCOM built-ins as well. If other command.com do not have them there, it is nice to have standalone programs in general, but one should point out that users of FreeCOM do not need the separate exe / com files in those cases (same for DOSKEY, which I do not even have installed as it has been a FreeCOM built- in for long time). CHOICE and LABEL are good candidates for future inclusion in FreeCOM (as probably is COUNT...!). FC sounds simple but the FreeDOS one is too full of features to make it part of FreeCOM ;-). It even has case- insensitive compare, although I do not know if this is NLS-aware. Improving TYPE would give a MORE builtin, I guess. The big "good thing" would be that a FreeCOM internal MORE could be designed in a way that does not require DOS pipes (which require temp files, as DOS is not multitasking) - as there are several things in FreeCOM that could use some /P page-wise output options, the code could be shared among them. By the way, would it be an idea to keep strings compressed in RAM? On disk, you can UPX FreeCOM from 92k to 66k, but I do not know if non-XMS swapping or other features would get confused by this. If you have no XMS, you must either use KSSF and CALL /S to "replace command.com by a program and re- start comman.com afterwards, loosing history data" or spend lots of RAM, maybe string compression especially of help strings could help. As you know, LZ, Huffman and "copy N bytes from M bytes back" type compression can be decompressed with very little code (although LZ requires a bit of RAM and Huffman makes you work on bit strings).


This starts an interesting discussion on the fd-dev mailing list:

Arkady responds:

EA: Any ideas why those files are that big?

Because (1) they written in HLL and (2) there are no compiler with enoughly smart optimizer. Especially, C is bad languge to optimize (even poor TP beats by programs size almost all compilers).

EA: would be that a FreeCOM internal MORE could be designed in a way that does
EA: not require DOS pipes (which require temp files, as DOS is not multitasking)

This is practially impossible (though, in some tricky cases additional resident code may do this).

EA: By the way, would it be an idea to keep strings compressed in RAM? On disk,
EA: you can UPX FreeCOM from 92k to 66k, but I do not know if non-XMS swapping
EA: or other features would get confused by this.

This is not hard. For example, use BriefLZ (ligthweight compression library from Jibz, http://www.ibsensoftware.com). Its decompressor is very small and fast and not requires extra memory, so there are no reasons to not keep resources internally packed.

Aitor writes:

PAUSE 14k, ... POPD 13k, PUSHD 10k, ...

These are internal, so I guess the best way of reducing size is simply to leave them out. ;-)

but one should point out that users of FreeCOM do not need the separate exe / com files in those cases (same for DOSKEY, which I do not even have installed as it has been a FreeCOM built- in for long time).

I do not support this idea, we have a nice choice to reduce FreeCOM to a limit, with the constraint of the MS compatibility.

CHOICE and LABEL are good candidates for future inclusion

I guess CHOICE is not frequently used, regardless how big it is, it's better to have it out... Just my opinion.

Steve Nickolas writes:

Personally, I would say this: VOL is in every command.com clone I know of (MS/PC, DR, DOSPLUS, RMF, FreeCOM, 4DOS/NDOS), so an external one would be a waste in any case. As for PUSHD, POPD, CDD, etc., I personally believe that extra commands in COMMAND.COM should be compile-time options and omitted from the main distribution. That is just MHO, however.

I guess CHOICE is not frequently used, regardless how big it is, it's better to have it out... Just my opinion.

I agree; IMHO, I don't think there should be any commands in FreeCOM *by default* that aren't in M$ command.com. (There could be a separate copy on SF or ibiblio with all features enabled).


On a slightly different (but related) thread, Bart Oldeman writes this:

Poorly written Turbo Pascal code can be more compact than other compilers' code. (But compare Micro-C) ...

it's just that the C RTL in a DOS setting takes up a fair amount of space -- two things especially (sometimes more, sometimes less):

printf() -- needs buffered output; pulls in a lot of things. argc/argv() processing -- this has to be parsed from the string at psp+80 and may pull in malloc() and free().

This is as opposed to say Linux where argc/argv (and the environment settings) are given to the program ready to use and printf can be in a shared library.

But of course you can limit your RTL use with C if you want to -- you can, say, use your own stripped down printf() (find one in the kernel source for instance) and parse the command line directly from psp+80 at the cost of some portability (doesn't matter if the program is DOS specific though).

Here some experiments with the tiny model (.com files):

int main(void) {return 0;}

open watcom 1.0 979 bytes
turbo c 2.01 1742 bytes
turbo c++ 1.01 3258 bytes

int main(int argc, char **argv) {return 0;}

open watcom 1.0 2936
turbo c 1742
turbo c++ 3534

#include <stdio.h>
int main(void) {printf("hello world!\n"); return 0;}

open watcom 1.0 10022
turbo c 5108
turbo c++ 5506

but using our own printf() for open watcom 1.0: 2446 bytes

this compares to BP 7.0:

program hello;
begin
        writeln('hello world');
end.

2240 bytes (same amount of disk space as 2446 bytes).

Now Micro-C can generate small executables too, and that's largely because its printf can be very small since it doesn't have to worry about (long) longs, far pointers, floating point, "%*d" and so on.

Arkady adds:

There are two things, which affect optimization:

1. Quality of code optimization. This is important especially for speed and usually may increase code size.

2. Global optimization (_automatic_ cross-functions and cross-modules code analyzing with exclusion unused code/data, functions inlining, etc.; impossible with loosy .OBJ files). This is most factor, which decreases programs size.

C standard not defines modules and (suggests) independent files instead. This hardly complicates global optimization. With some tricks (generation for each file .LIB instead .OBJ, where .LIB should contain separate module for each global variable and function) C compilers may introduce sort of global optimization with almost none efforts (see how Turbo Vision tries to emulate this feature: there 42 small files nm*.cpp, each of them contains one declaration). Unfortunately, even with such tricks there will be missing other features (for example, automatic functions inlining) and this not helps in case of OOP (when _all_ virtual methods, even unused, recorded in VMT, that prohibits their elimination). This is why I hate C and, especially, C++.