dpkg technical manual



Version 3.1.3



Table of Contents

1. Quick summary of dpkg's external interface
1.1. Control files
1.2. The dpkg status area
1.3. The dpkg library files
1.4. The "dpkg" command-line utility
1.4.1. "Documented" command-line interfaces
1.4.2. Environment variables which dpkg responds to
1.4.3. Assertions
1.4.4. --predep-package
2. dpkg-deb and .deb file internals
2.1. The .deb archive format
2.2. The dpkg-deb command-line
2.2.1. Internal checks used by dpkg-deb when building packages
3. dpkg internals
3.1. Updates
3.2. What happens when dpkg reads the database
3.3. How dpkg compares version numbers
   

The basic dpkg package control file supports the following major features:-

   
  • 5 types of dependencies:-

    • Pre-Depends, which must be satisfied before a package may be unpacked

    • Depends, which must be satisfied before a package may be configured

    • Recommends, to specify a package which if not installed may severely limit the usefulness of the package

    • Suggests, to specify a package which may increase the productivity of the package

    • Conflicts, to specify a package which must NOT be installed in order for the package to be configured

    • Breaks, to specify a package which is broken by the package and which should therefore not be configured while broken

    Each of these dependencies can specify a version and a dependency on that version, for example "<= 0.5-1", "== 2.7.2-1", etc. The comparators available are:-

    • "<<" - less than

    • "<=" - less than or equal to

    • ">>" - greater than

    • ">=" - greater than or equal to

    • "==" - equal to

  • The concept of "virtual packages", which many other packages may provide, using the Provides mechanism. An example of this is the "httpd" virtual package, which all web servers should provide. Virtual package names may be used in dependency headers. However, current policy is that virtual packages do not support version numbers, so dependencies on virtual packages with versions will always fail.

  • Several other control fields, such as Package, Version, Description, Section, Priority, etc., which are mainly for classification purposes. The package name must consist entirely of lowercase characters, plus the characters '+', '-', and '.'. Fields can extend across multiple lines - on the second and subsequent lines, there is a space at the beginning instead of a field name and a ':'. Empty lines must consist of the text " .", which will be ignored, as will the initial space for other continuation lines. This feature is usually only used in the Description field.

   

The "dpkg status area" is the term used to refer to the directory where dpkg keeps its various status files (GNU would have you call it the dpkg shared state directory). This is always, on Debian systems, /var/lib/dpkg. However, the default directory name should not be hard-coded, but #define'd, so that alteration is possible (it is available via configure in dpkg 1.4.0.9 and above). Of course, in a library, code should be allowed to override the default directory, but the default should be part of the library (so that the user may change the dpkg admin dir simply by replacing the library).

   

Dpkg keeps a variety of files in its status area. These are discussed later on in this document, but a quick summary of the files is here:-

   
  • available - this file contains a concatenation of control information from all the packages which dpkg knows about. This is updated using the dpkg commands "--update-avail <file>", "--merge-avail <file>", and "--clear-avail".

  • status - this file contains information on the following things for every package:-

    • Whether it is installed, not installed, unpacked, removed, failed configuration, or half-installed (deconfigured in favour of another package).

    • Whether it is selected as install, hold, remove, or purge.

    • If it is "ok" (no installation problems), or "not-ok".

    • It usually also contains the section and priority (so that dselect may classify packages not in available)

    • For packages which did not initially appear in the "available" file when they were installed, the other control information for them.

    The exact format for the "Status:" field is:

          Status: Want Flag Status
    

    Where Want may be one of unknown, install, hold, deinstall, purge. Flag may be one of ok, reinstreq. Status may be one of not-installed, config-files, half-installed, unpacked, half-configured and installed. The states are as follows:-

    not-installed

    No files are installed from the package, it has no config files left, it uninstalled cleanly if it ever was installed.

    unpacked

    The basic files have been unpacked (and are listed in /var/lib/dpkg/info/[package].list. There are config files present, but the postinst script has _NOT_ been run.

    half-configured

    The package was installed and unpacked, but the postinst script failed in some way.

    installed

    All files for the package are installed, and the configuration was also successful.

    half-installed

    An attempt was made to remove the package but there was a failure in the prerm script.

    config-files

    The package was "removed", not "purged". The config files are left, but nothing else.

    The two last items are only left in dpkg for compatibility - they are understood by it, but never written out in this form.

    Please see the dpkg source code, lib/parshelp.c, statusinfos, eflaginfos and wantinfos for more details.

  • info - this directory contains files from the control archive of every package currently installed. They are installed with a prefix of "<packagename>.". In addition to this, it also contains a file called <package>.list for every package, which contains a list of files. Note also that the control file is not copied into here; it is instead found as part of status or available.

  • methods - this directory is reserved for "method"-specific files - each "method" has a subdirectory underneath this directory (or at least, it can have). In addition, there is another subdirectory "mnt", where misc. filesystems (floppies, CD-ROMs, etc.) are mounted.

  • alternatives - directory used by the "update-alternatives" program. It contains one file for each "alternatives" interface, which contains information about all the needed symlinked files for each alternative.

  • diversions - file used by the "dpkg-divert" program. Each diversion takes three lines. The first is the package name (or ":" for user diversion), the second the original filename, and the third the diverted filename.

  • updates - directory used internally by dpkg. This is discussed later, in the section Section 3.1, “Updates”.

  • parts - temporary directory used by dpkg-split

   

This strange option is described as follows in the source code:

   
/* Print a single package which:
 *  (a) is the target of one or more relevant predependencies.
 *  (b) has itself no unsatisfied pre-dependencies.
 * If such a package is present output is the Packages file entry,
 *  which can be massaged as appropriate.
 * Exit status:
 *  0 = a package printed, OK
 *  1 = no suitable package available
 *  2 = error
 */
   

On further inspection of the source code, it appears that what is does is this:-

   
  • Looks at the packages in the database which are selected as "install", and are installed.

  • It then looks at the Pre-Depends information for each of these packages from the available file. When it find a package for which any of the pre-dependencies are not satisfied, it breaks from the loop through the packages.

  • It then looks through the unsatisfied pre-dependencies, and looks for packages which would satisfy this pre-dependency, stopping on the first it finds. If it finds none, it bombs out with an error.

  • It then continues this for every dependency of the initial package.

   

Eventually, it writes out the record of all the packages to satisfy the pre-dependencies. This is used by the disk method to make sure that its dependency ordering is correct. What happens is that all pre-depending packages are first installed, then it runs dpkg -iGROEB on the directory, which installs in the order package files are found. Since pre-dependencies mean that a package may not even be unpacked unless they are satisfied, it is necessary to do this (usually, since all the package files are unpacked in one phase, the configured in another, this is not needed).

   

This chapter describes the internals to the "dpkg-deb" tool, which is used by "dpkg" as a back-end. dpkg-deb has its own tar extraction functions, which is the source of many problems, as it does not support long filenames, using extension blocks.

   

dpkg-deb documents itself thoroughly with its '--help' command-line option. However, I am including a reference to these for completeness. dpkg-deb supports the following options:-

   
  • --build (-b) <dir> - builds a .deb archive, takes a directory which contains all the files as an argument. Note that the directory <dir>/DEBIAN will be packed separately into the control archive.

  • --contents (-c) <debfile> - Lists the contents of the "data.tar.gz" member.

  • --control (-e) <debfile> - Extracts the control archive into a directory called DEBIAN. Alternatively, with another argument, it will extract it into a different directory.

  • --info (-I) <debfile> - Prints the contents of the "control" file in the control archive to stdout. Alternatively, giving it other arguments will cause it to print the contents of those files instead.

  • --field (-f) <debfile> <field> ... - Prints any number of fields from the "control" file. Giving it extra arguments limits the fields it prints to only those specified. With no command-line arguments other than a filename, it is equivalent to -I and just the .deb filename.

  • --extract (-x) <debfile> <dir> - Extracts the data archive of a debian package under the directory <dir>.

  • --vextract (-X) <debfile> <dir> - Same as --extract, except it is equivalent of giving tar the '-v' option - it prints the filenames as it extracts them.

  • --fsys-tarfile <debfile> - This option outputs a gunzip'd version of data.tar.gz to stdout.

  • --new - sets the archive format to be used to the new Debian format

  • --old - sets the archive format to be used to the old Debian format

  • --debug - Tells dpkg-deb to produce debugging output

  • --nocheck - Tells dpkg-deb not to check the sanity of the control file

  • --help (-h) - Gives a help message

  • --version - Shows the version number

  • --licence/--license (UK/US spellings) - Shows a brief outline of the GPL

   

Here is a list of the internal checks used by dpkg-deb when building packages. It is in the order they are done.

   
  • First, the output Debian archive argument, if it is given, is checked using stat. If it is a directory, an internal flag is set. This check is only made if the archive name is specified explicitly on the command-line. If the argument was not given, the default is the directory name, with ".deb" appended.

  • Next, the control file is checked, unless the --nocheck flag was specified on the command-line. dpkg-deb will bomb out if the second argument to --build was a directory, and --nocheck was specified. Note that dpkg-deb will not be able to determine the name of the package in this case. In the control file, the following things are checked:-

    • The package name is checked to see if it contains any invalid characters (see Section 1.1, “Control files” for this).

    • The priority field is checked to see if it uses standard values, and user-defined values are warned against. However, note that this check is now redundant, since the control file no longer contains the priority - the changes file now does this.

    • The control file fields are then checked against the standard list of fields which appear in control files, and any "user-defined" fields are reported as warnings.

    • dpkg-deb then checks that the control file contains a valid version number.

  • After this, in the case where a directory was specified to build the .deb file in, the filename is created as "directory/pkg_ver.deb" or "directory/pkg_ver_arch.deb", depending on whether the control file contains an architecture field.

  • Next, dpkg-deb checks for the <dir>/DEBIAN directory. It complains if it doesn't exist, or if it has permissions < 0755, or > 0775.

  • It then checks that all the files in this subdir are either symlinks or plain files, and have permissions between 0555 and 0775.

  • The conffiles file is then checked to see if the filenames are too long. Warnings are produced for each that is. After this, it checks that the package provides initial copies of each of these conffiles, and that they are all plain files.

   

This chapter describes the internals of dpkg itself. Although the low-level formats are quite simple, what dpkg does in certain cases often does not make sense.

   

This describes the /var/lib/dpkg/updates directory. The function of this directory is somewhat strange, and seems only to be used internally. A function called cleanupdates is called whenever the database is scanned. This function in turn uses scandir(3), to sort the files in this directory. Files who names do not consist entirely of digits are discarded. dpkg also causes a fatal error if any of the filenames are different lengths.

   

After having scanned the directory, dpkg in turn parses each file the same way it parses the status file (they are sorted by the scandir to be in numerical order). After having done this, it then writes the status information back to the "status" file, and removes all the "updates" files.

   

These files are created internally by dpkg's "checkpoint" function, and are cleaned up when dpkg exits cleanly.

   

Judging by the use of the updates directory I would call it a Journal. Inorder to efficiently ensure the complete integrity of the status file dpkg will "checkpoint" or journal all of it's activities in the updates directory. By merging the contents of the updates directory (in order!!) against the original status file it can get the precise current state of the system, even in the event of a system failure while dpkg is running.

   

The other option would be to sync-rewrite the status file after each operation, which would kill performance.

   

It is very important that any program that uses the status file abort if the updates directory is not empty! The user should be informed to run dpkg manually (what options though??) to correct the situation.