Running a GDI printer under Linux 
part 6 - Writing the printer software


There are some other articles on this subject as well as the motivations for this work in my homepage.

In this final article of the series I will look at some details when writing the actual printer software. The main idea, as I said before, is to simulate the nearest possible the original working driver (under the other OS), so you will not have to think too much with command details nor discover the meaning of all registers inside the printer interface. If it works for the other, it will work for Linux as well. Of course, there will be found problems with timings, required commands (most are garbage, as you will discover, which explains why my Linux driver is twice as faster), and some minimum parameters to setup a correct printing image. The easiest way to get the parameters is printing with several page sizes but the same material (say, a image in MS-paintbrush) and noticing what changed between the captured data for both cases.

The pulses timings are more difficult to get. You will have to experiment with them, placing delays with usleep() or even sleep(). Get first the log generated by Bochs, calling the function bx_printf() at the devices.cc source file to get the timings in machine clocks and convert them to microsecond units. I have rewritten several times my Linux driver to get the best fit.

This article will be a tour through my printer driver's implementation, so you get a feeling of what to do after starting discovering your printer. As I have done myself, you can start to code your driver even before everything is well understood, if you follow the mimic principle. Much functionality of my printer was discovered that way, as my first driver was very crude to be useful. I dare you to write a working driver, even if it only write a tiny line, like my first experiment. Go ahead! Make your printer sing!

Mimicking the original driver

Will you have the chance to share some of my code? I don't really know, but I hope so. The low priced printers share several characteristics in common with mine: (1) they don't have enough memory, what mean you will have to print "bands", or "strips of paper", while the laser is burning the image in the printer's drum; (2) most of them need a fast way to transfer huge amounts of image data from the processor to printer's memory in real time, so there will be found non-standard protocols for that; (3) for the same reason, a compression algorithm will be used to encode image data. The compression is the tricky part of the reverse engineering, not so easy to mimic unless you understand it fully. I asked you in a previous article to make several patterned images to discover how it works. If you don't understand it fully, there is no way to write a printer driver, sorry.

Other matter of concern here is the parallel port emulation. If your printer supports SPP, EPP, and ECP, don't choose the latest! Linux is very efficient. Try to use the simplest protocol, because it is easier to debug. I mean SPP (standard parallel protocol), of course. A difficult issue I found in my printer was with the band sizes. First, each band must have an integer name of rows, in my case 4800 dots for each line. Second, when the size of compressed data varies too much, the printer gets lost, so I have to make a dynamic band sizing patch to my original compression algorithm (to see the full compression algorithm, please get the driver's source at ml85p-0.0.5.tar.gz, or at Metalab under /pub/Linux/hardware/drivers). The band is resized as it is compressed by fragments of code like:


       if ((cnt < LINE_SIZE) &&
          ((pktcnt + pcnt/256) < 5000) &&
          (linecnt<LINES_BY_PAGE))
       {
          cnt += LINE_SIZE;
       }

In this code, LINE_SIZE is the number of bytes for each printer's row (4800/8 = 600), pktcnt is the number of packets of compressed data assembled so far, while pcnt in the number of similar data found, that will generate other packets, each at most with 256 size. This gurantees my compressed bands will have about 5000 packets in size. The implementation seems very complex, but you have to account that each packet only can have 2, 3 or more bytes of data, so I have to take care of not overflowing the printed lines. When such overflow occurs, I see a shifted ugly page, or even a black band at the end of page (and here go my precious toner...).

You will have to recognize what is needed to reset the printer and which is the actual print page command. This is easy to get. Just capture your printer data without any printed page and you will get the reset procedure. There are different kind of commands, at least in my printer, so take care with the strobing of commands. The Samsung printer have two "kinds of strobe", I suppose one for selecting the printer's ASIC register and the other to send the register's new value. The lpoutw function show the two strobe sequences:


void
lpoutw ( int data,int type ) {
	int mask=0;
	char s[100];
	outlp(data);
	if (type) {
		mask=2;
	}
	coutlp( 4+mask );
	coutlp( 5+mask );
	sinpwfast(0x7f);
	coutlp( 4+mask );
	toggle_control(17);
}

The type argument tell it which kind of strobe to use. Of course, both generate a physical STB pulse, but with different AUTOFD signal levels. The toggle_control function call at the end is not well understood, but I told you that I mimic the windows interface. If I take out this function call, all my driver stops working, so let us leave it there! Your printer possibly will not be the same, but give attention to all control signals or you will be in trouble.

Help from ghostscript

Ghostscript is a nice postscript emulator, that translates not even to printer languages of most common printers, but also to several not printer related formats. I need my printer "understanding" postscript, so this is the way to go. I translate with ghostscript to an easy to process format and send to my printer from it.

I have chosen pbmraw as my target output format. I call ghostscript to translate postscript source to several pbmraw formatted pages, and then call my driver to read them (this format is very easy to read!) and send to the printer. The problem with this approach is that we have to get lots of disk spaces, for the image files are very large. The best approach is to pipe ghostscript output directly into my driver, so there will be no disk accesses at all. This is unix magic! The kernel connect ghostscript with my driver and as ghostscript send me image data, I read it and process in the flight. The driver must be aware of the exact end of each picture to look for the header of the next, or it will get lost.

Other problem I found was the page size. My default printer page is 4800 by 6774 pixels (as I only plan to use A4 paper) and ghoscript-generated pages varies in size. Then my routines have to be careful about that and fill the missing spaces or clip when the picture is larger, both for the width and the height of the pictures. After the piping mechanism was ready, the only disk space used was for the compressed images. This is a much lower requirement, and it is temporary, as I remove each page, after it is sent to printer.

There are some tricks for doing this in a modular fashion. First you save the bitmap file dimensions, when reading its header (I use bmwidth and bmheight variables). Then, when your get_bitmap() function is called, which returns one byte of bitmap data, you look if the bitmap's widht is greater than your page image size. If it is larger, simply read your page width and skip the remaining bytes from the bitmap file, otherwise read it's real size. If you clear the bitmap buffer before, (the memset() call) the space to the right of each line will be blanks, as expected.


unsigned char
get_bitmap () {
	FILE *dbgf;
	int i,k,tmp;
	if (bmcnt==0) {
		memset(bmbuf,0,800);
		if (linecnt<(bmheight-topskip)) {
			if (bmwidth > 800) {
				fread(bmbuf,1,800,bitmapf);
				bitmap_seek(bmwidth-800);
			}
			else {
				fread(bmbuf,1,bmwidth,bitmapf);
			}
		}
		bmptr = bmbuf+leftskip/8;
		bmcnt = LINE_SIZE;
		linecnt++;
	}
	bmcnt--;
	return *bmptr++;
}

The bitmap_seek() function is suppoed to do a seek, but I can't call fseek() directly, as I'm reading from a pipe! I just read and discard bytes with it. The variables bmptr, bmcnt, and bmbuf implement a simple buffer to get the next bitmap bytes, when get_bitmap() is called again. Notice linecnt, that tracks the line of the printed page output, and topskip and leftskip, that allows control of the margins at the top and left of the printed page. Ghostscript tends to put a larger margin at the top and left that I want, so the control is only to reduce those margins. It is easy though make them grow, if needed.

Pulse timing and status checking

A problem with gathering the pulses from Bochs is that, although the simulation is perfect in every detail, including real time clock of the virtual machine, the real hardware (printer) is not. He just "thinks" it is connected to a slow machine. So, most status reading will return an already ready condition. To know exactly what to look for, there are several possibilities:
  • you can check "unofficially" the status of the printer's port each time you change something, outputting data to it. When the printer driver check it again, you will notice what bits changed to get an idea of what is being tested. This is not infallible, but gives you a hint.
  • you can disassemble at the point the original driver is checking the status. Notice there are several such checks, and you must get them all to be sure you understand it. This breakpoint is tricky to be set, as I will explain below.
  • The first spying I made was to read the printer's status port each time sometime is written to it and log it to stderr or to the impr.log file. This file (impr.log) is being opened at the very start of Bochs and closed before it finishes, so I write stuff to it during the run. This spying is put at the bx_devices_c::outp(Bit16u addr, Bit32u value, unsigned io_len) routine, after checking that our printer port is being accessed.
          if ((addr <= 0x37a) && (addr >= 0x378)) {
                    port_real_outb(addr,value);
                    st379 = port_real_inb(0x379);
                    fprintf(stderr,"O%x,%x i1(%x)\n",(addr-0x378),value,st379);
            }
    Notice that port_real_outb() and port_real_inb() are not part of Bochs. I have included them to interface directly to the hardware. This routine is called when the virtual machine try to simulate an output. My code translate the simulation to real hardware access, but also reads the status port to st379, so we can see the status changing when the printer hardware detect the command. Otherwise, I would not see much, because the simulation is very slow compared with the hardware.
    Of course, this output will be shown in real time. Sometimes, I change the stderr for impr_log (see my patched Bochs source) and log to it for further analysis, but it is good to see it in real time as well.

    To get the status checking disassembly, I modify the bx_devices_c::inp() function to print also the instruction pointer (EIP) when some port is being read. My capturing statement is fprintf(impr_log,"I%x(%x) 0x%x\n",(addr-0x378),ret,EIP);, so I get at the output (impr.log file) something like I1(7f) 0x80020965, the last large number representing the eip register at the time of the status checking. Then I filter all those statments (with grep I1 for instance), edit to cut everything except the last number and then sort | uniq it to have a list of all checking points for status. I wonder how useful are simple programs like uniq and sort and how much time I have been living without them (programming under msdos/windows).

    After getting these status checking points, I restart bochs (with the printer installed, of course) and disassemble several bytes after the input instruction, as the following example (in the example, I included 1 byte more at the beginning to show the reported "in" instruction):

            <bochs:6> disas 0x8001f350 0x8001f356
    	8001f350: ec: in AL, DX
    	8001f351: 24f0: and AL, #f0
    	8001f353: 3cf0: cmp AL, #f0
    	8001f355: 7522: jnz +#22

    Most of times you don't really need to know what bit is more meaningful . You can just do the same test at your Linux driver. First you shall have the addresses from the impr_log file, then you stop bochs pressing <Ctrl>-c (to stop the simulation) and execute the instruction disas 0x8001f350 0x8001f356. The second address is where to stop the diasassembly, so give something say ten bytes after the start address. Repeat the process for all recorded checkpoints you got with the procedure given before. You don't need to understand everything now, just record the assembly output. You don't really need to know much of assembly language, but at least your machine's architecture (registers) and a handful of logical instructions. In the example given, we are testing if the four high order bits of the status port are all turned on. If you can't understand this, please go read a good assembly language book or call for a friend's help.

    If you have a SMP (multiple processor) machine, you have to look for troubles when disabling interrupts. It is better to try first with only one processor. When everything works fine, you can rewrite a SMP version of your code. There will be critical parts of the data transfer that you will need all speed possible, or the printer will lose data, so it is unavoidable the use of cli() and sti() in some places.

    Tools for the future

    Real time techniques are invaluable to analyze unknown data streams. We can make a versatile logic analyzer with RT-Linux plus some driver code and a suitable graphical interface. I plan to make available in the near future something like that, not only to detect and reverse engineer printers, but anything connected to the parallel port or even other ports. The only concern here is that we will have to stop the cpu until the trigger conditions occur, and if they don't occur you will have a rock-solid frozen machine. Of course, we can use an interrupt source to do the triggering, but this doesn't guarantee real time performance, because RT-Linux, while much faster to react than a normal linux kernel driver, have a finite response time.



    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming