Wwwoffle's cache have only directories with the domain names and files beginning with D or U. Each Uxxxxxx file have a URI with the name of it's corresponding Dxxxxxx file, with the same xxxxxx suffix. The Dxxxxxx file is a complete http received answer, with a header and the html, text, gif, or whatever kind of information was received in answer to the GET protocol request.
Our procedure to make "local" files from the cached version could be stated as:
Of course, this code is not bullet-proof, but if the html is correct, only IMG
tags will be touched.
To use the program, change the cachedir
variable to something
suitable (the site directory inside your wwwoffle cache). If you like, change
also the destdir
(desination directory) variable. If you prefer a
widgetized verison, make an entry widget for those variables. This is
left as an exercise!
Here's the code:
#!/bin/sh # # Utilitário para filtrar o cache do wwwoffle, criando # um conjunto local de páginas html e as figuras incluidas # \ exec wish "$0" "$@" set cachedir /var/spool/wwwoffle/http/members.xoom.com set destdir [pwd] # # converte os nomes dos arquivos Dxxxxxx de acordo com o conteudo de Uxxxxxxx # salva o resultado no diretório corrente # proc cvfiles { } { global cachedir destdir foreach f [glob $cachedir/U*] { regexp {(.*)/U(.*)} $f match prefix sufix set fn "D$sufix" set inf [open $f] set destname [gets $inf] close $inf puts "file: D$sufix --> [file tail $destname]" ### open file and discard it's http header set inf [open $prefix/D$sufix] set newfn $destdir/[file tail $destname] set outf [open $newfn w] fconfigure $inf -translation binary fconfigure $outf -translation binary while {[string length [gets $inf]] > 1} { puts -nonewline . } fcopy $inf $outf close $inf close $outf ### change links to the pictures in the html files if {[string match {*.html} $newfn]} { set inf [open $newfn] set outf [open tmp_html w] while {![eof $inf]} { set line [gets $inf] if {[string match {*SRC=*} $line]} { regsub {(.*SRC=)\".*/(.*)\"(.*)} $line {\1"\2"\3} line } if {[string match {*BACKGROUND=*} $line]} { regsub {(.*BACKGROUND=)\".*/(.*)\"(.*)} $line {\1"\2"\3} line } puts $outf $line } close $inf close $outf file rename -force tmp_html $newfn } } } cvfiles exit
That's all fellows. Happy hacking!