Home

Project: Parse an Outlook PST using Python


I’m still getting some endian issues and code quirks figured out (as this is my first foray into a python project) but I’m coming along. The big part is going to be figuring out is the pointers I’m reading are accurate :) Whoever made the PST file format should be shot!

Anyways, read the excerpt if you want to see what I have so far (which I believe is correct, amazingly enough)

==

<font face="Lucida,Courier New"><font color="#008000">#!/usr/bin/python<br></font><br><font color="#C00000">from</font> <font color="#000000">array</font> <font color="#C00000">import</font> <font color="#0000C0">*</font><br><font color="#C00000">import</font> <font color="#000000">os</font><font color="#0000C0">,</font><font color="#000000">sys</font>

	<p><font color="#C00000">def</font> <font color="#000000">main</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font></p>

  <font color="#000000">bo</font> <font color="#0000C0">=</font> <font color="#000000">sys</font><font color="#0000C0">.</font><font color="#000000">byteorder</font>

  <font color="#000000">filename</font> <font color="#0000C0">=</font> <font color="#004080">‘Users/phaedo/Desktop/backup.pst’</font>
  <font color="#008000">#filename = 'E:\backup.pst’
</font> <font color="#008000">#filename = 'C:\Documents and Settings\CEcker\My Documents\Email Backup\backup3.pst’
</font> <font color="#000000">fp</font> <font color="#0000C0">=</font> <font color="#000000">open</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">,</font><font color="#004080">'rb’</font><font color="#0000C0">)</font>

  <font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">array</font><font color="#0000C0">(</font><font color="#004080">'i’</font><font color="#0000C0">)</font> <font color="#008000">#h is the type, fill in whatever type you need</font>
  <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">fromfile</font><font color="#0000C0">(</font><font color="#000000">fp</font><font color="#0000C0">,</font> <font color="#0080C0">1</font><font color="#0000C0">)</font> <font color="#008000">#numBytes is how many bytes you want to read in</font>

  <font color="#C00000">if</font> <font color="#000000">bo</font> <font color="#0000C0">==</font> <font color="#004080">'big’</font><font color="#0000C0">:</font>
    <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">byteswap</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

  <font color="#000000">sig</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

  <font color="#C00000">if</font> <font color="#000000">sig</font> <font color="#0000C0">==</font> <font color="#0080C0">0×4E444221</font><font color="#0000C0">:</font>

    <font color="#C00000">print</font> <font color="#004080">“Valid PST signature”</font>
  <font color="#C00000">else</font><font color="#0000C0">:</font>
    <font color="#C00000">print</font> <font color="#004080">“Invalid PST signature: “</font><font color="#0000C0">+</font><font color="#0000C0">`</font><font color="#000000">value</font><font color="#0000C0">`</font>

  <font color="#000000">fp</font><font color="#0000C0">.</font><font color="#000000">seek</font><font color="#0000C0">(</font><font color="#0080C0">0xa8</font><font color="#0000C0">,</font><font color="#0080C0">0</font><font color="#0000C0">)</font>
  <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">fromfile</font><font color="#0000C0">(</font><font color="#000000">fp</font><font color="#0000C0">,</font> <font color="#0080C0">1</font><font color="#0000C0">)</font>

  <font color="#C00000">if</font> <font color="#000000">bo</font> <font color="#0000C0">==</font> <font color="#004080">'big’</font><font color="#0000C0">:</font>
    <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">byteswap</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

  <font color="#000000">filesize</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
  <font color="#C00000">print</font> <font color="#004080">“Datafile reports: “</font><font color="#0000C0"><ins style="text-align:left;">/font&gt;<font color="#0000C0">`</font><font color="#000000">filesize</font><font color="#0000C0">`</font><font color="#0000C0"></font></ins></font><font color="#004080">“ bytes”</font>

  <font color="#000000">osfilesize</font> <font color="#0000C0">=</font> <font color="#000000">os</font><font color="#0000C0">.</font><font color="#000000">stat</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">[</font><font color="#0080C0">6</font><font color="#0000C0">]</font>
  <font color="#C00000">print</font> <font color="#004080">“OS reports: “</font><font color="#0000C0"><ins style="text-align:left;">/font&gt;<font color="#0000C0">`</font><font color="#000000">osfilesize</font><font color="#0000C0">`</font><font color="#0000C0"></font></ins></font><font color="#004080">“ bytes”</font>

  <font color="#C00000">if</font> <font color="#000000">osfilesize</font> <font color="#0000C0">==</font> <font color="#000000">filesize</font><font color="#0000C0">:</font>
    <font color="#C00000">print</font> <font color="#004080">“ They agree!”</font>
  <font color="#C00000">else</font><font color="#0000C0">:</font>

    <font color="#C00000">print</font> <font color="#004080">“ They disagree :(”</font>

  <font color="#000000">fp</font><font color="#0000C0">.</font><font color="#000000">seek</font><font color="#0000C0">(</font><font color="#0080C0">0xc4</font><font color="#0000C0">,</font><font color="#0080C0">0</font><font color="#0000C0">)</font>

  <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">fromfile</font><font color="#0000C0">(</font><font color="#000000">fp</font><font color="#0000C0">,</font><font color="#0080C0">1</font><font color="#0000C0">)</font>

  <font color="#000000">fp</font><font color="#0000C0">.</font><font color="#000000">seek</font><font color="#0000C0">(</font><font color="#0080C0">0xbc</font><font color="#0000C0">,</font><font color="#0080C0">0</font><font color="#0000C0">)</font>

  <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">fromfile</font><font color="#0000C0">(</font><font color="#000000">fp</font><font color="#0000C0">,</font><font color="#0080C0">1</font><font color="#0000C0">)</font>

  <font color="#C00000">if</font> <font color="#000000">bo</font> <font color="#0000C0">==</font> <font color="#004080">'big’</font><font color="#0000C0">:</font>

    <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">byteswap</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

  <font color="#000000">pointer2</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

  <font color="#000000">pointer1</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
  <font color="#C00000">print</font> <font color="#004080">“Pointer 1 is “</font><font color="#0000C0">+</font><font color="#000000">hex</font><font color="#0000C0">(</font><font color="#000000">pointer1</font><font color="#0000C0">)</font>

  <font color="#C00000">print</font> <font color="#004080">“Pointer 2 is “</font><font color="#0000C0">+</font><font color="#000000">hex</font><font color="#0000C0">(</font><font color="#000000">pointer2</font><font color="#0000C0">)</font>

  <font color="#000000">fp</font><font color="#0000C0">.</font><font color="#000000">seek</font><font color="#0000C0">(</font><font color="#000000">pointer1</font><font color="#0000C0">,</font><font color="#0080C0">0</font><font color="#0000C0">)</font>

  <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">fromfile</font><font color="#0000C0">(</font><font color="#000000">fp</font><font color="#0000C0">,</font><font color="#0080C0">3</font><font color="#0000C0">)</font>
  <font color="#C00000">if</font> <font color="#000000">bo</font> <font color="#0000C0">==</font> <font color="#004080">'big’</font><font color="#0000C0">:</font>

    <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">byteswap</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

  <font color="#C00000">print</font>

  <font color="#000000">tablepointer</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

  <font color="#000000">firstid</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">)</font>

  <font color="#C00000">print</font> <font color="#004080">“Table of items is at “</font> <font color="#0000C0">+</font> <font color="#000000">hex</font><font color="#0000C0">(</font><font color="#000000">tablepointer</font><font color="#0000C0">)</font>

  <font color="#C00000">print</font> <font color="#004080">“First ID is “</font> <font color="#0000C0">+</font> <font color="#0000C0">`</font><font color="#000000">firstid</font><font color="#0000C0">`</font>

  <font color="#C00000">print</font>

  <font color="#000000">emptyarray</font><font color="#0000C0">(</font><font color="#000000">data</font><font color="#0000C0">)</font>

  <font color="#000000">fp</font><font color="#0000C0">.</font><font color="#000000">seek</font><font color="#0000C0">(</font><font color="#000000">tablepointer</font><font color="#0000C0">,</font><font color="#0080C0">0</font><font color="#0000C0">)</font>
  <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">fromfile</font><font color="#0000C0">(</font><font color="#000000">fp</font><font color="#0000C0">,</font><font color="#0080C0">2</font><font color="#0000C0">)</font>

  <font color="#C00000">if</font> <font color="#000000">bo</font> <font color="#0000C0">==</font> <font color="#004080">'big’</font><font color="#0000C0">:</font>
    <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">byteswap</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

  <font color="#000000">id1</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">)</font>

  <font color="#C00000">if</font> <font color="#000000">id1</font> <font color="#0000C0">!=</font> <font color="#0080C0">0</font><font color="#0000C0">:</font>

    <font color="#000000">fp</font><font color="#0000C0">.</font><font color="#000000">seek</font><font color="#0000C0">(</font><font color="#000000">tablepointer</font><font color="#0000C0">,</font><font color="#0080C0">0</font><font color="#0000C0">)</font>
    <font color="#C00000">while</font> <font color="#000000">id1</font> <font color="#0000C0">!=</font> <font color="#0080C0">0</font><font color="#0000C0">:</font>

      <font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">array</font><font color="#0000C0">(</font><font color="#004080">'i’</font><font color="#0000C0">)</font>
      <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">fromfile</font><font color="#0000C0">(</font><font color="#000000">fp</font><font color="#0000C0">,</font><font color="#0080C0">2</font><font color="#0000C0">)</font>

      <font color="#C00000">if</font> <font color="#000000">bo</font> <font color="#0000C0">==</font> <font color="#004080">'big’</font><font color="#0000C0">:</font>
        <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">byteswap</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

      <font color="#000000">id1</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">)</font>
      <font color="#000000">itemoffset</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">)</font>

      <font color="#000000">emptyarray</font><font color="#0000C0">(</font><font color="#000000">data</font><font color="#0000C0">)</font>
      <font color="#C00000">print</font> <font color="#004080">“Item ID found is “</font><font color="#0000C0">+</font><font color="#0000C0">`</font><font color="#000000">id1</font><font color="#0000C0">`</font>
      <font color="#C00000">print</font> <font color="#004080">“Offset of Item is “</font> <font color="#0000C0">+</font> <font color="#000000">hex</font><font color="#0000C0">(</font><font color="#000000">itemoffset</font><font color="#0000C0">)</font>

      <font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">array</font><font color="#0000C0">(</font><font color="#004080">'h’</font><font color="#0000C0">)</font>
      <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">fromfile</font><font color="#0000C0">(</font><font color="#000000">fp</font><font color="#0000C0">,</font><font color="#0080C0">2</font><font color="#0000C0">)</font>

      <font color="#C00000">if</font> <font color="#000000">bo</font> <font color="#0000C0">==</font> <font color="#004080">'big’</font><font color="#0000C0">:</font>
        <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">byteswap</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

      <font color="#000000">sizedata</font> <font color="#0000C0">=</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">)</font>
      <font color="#C00000">print</font> <font color="#004080">“Size of data there is “</font> <font color="#0000C0"><ins style="text-align:left;">/font&gt; <font color="#0000C0">`</font><font color="#000000">sizedata</font><font color="#0000C0">`</font> <font color="#0000C0"></font></ins></font> <font color="#004080">“B”</font>

      <font color="#C00000">print</font>

	<p><font color="#C00000">def</font> <font color="#000000">emptyarray</font><font color="#0000C0">(</font><font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
  <font color="#C00000">while</font> <font color="#000000">len</font><font color="#0000C0">(</font><font color="#000000">data</font><font color="#0000C0">)</font> <font color="#0000C0">&gt;</font> <font color="#0080C0">0</font><font color="#0000C0">:</font></p>

    <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">pop</font><font color="#0000C0">(</font><font color="#0000C0">)</font>

	<p><font color="#C00000">if</font> <font color="#000000"><i>name</i></font> <font color="#0000C0">==</font> <font color="#004080">'<i>main</i>'</font><font color="#0000C0">:</font></p>

  <font color="#000000">main</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#000000"></font></font>

==

Should run fine on OS X or Windows NT. Running this on a PST file should produce an output something like this:

Valid PST signature
Datafile reports: 9453568 bytes
OS reports: 9453568L bytes They agree!
Pointer 1 is 0×2b2800
Pointer 2 is 0×785400 

Table of items is at 0×5e5000  
First ID is 4



Item ID found is 4  
Offset of Item is 0xda6  
Size of data there is 18944B



Item ID found is 308  
Offset of Item is 0×1c1  
Size of data there is 22016B



Item ID found is 556  
Offset of Item is 0×1b9  
Size of data there is 28160B



Item ID found is 844  
Offset of Item is 0×1bd  
Size of data there is 24064B



...

The data from these records appear to make sense too. Addresses are all in the same general area. Item sizes are all consistent. No wacky ID numbers. The only problem I see is with the SIZE of the data that should supposedly be at those addresses. The addresses are much too tightly packed to hold that amount of data! We’ll see how this pans out. When running the libpst stuff in verbose debug mode you see some errors that tables might be corrupted, but it seems as if these tables might be pretty inconsequential in the long run, because they simply move on and forget about them.

There are few ways to see if what you’re reading is accurate. The bytes that represent the PST’s filesize are easy enough. The PST signature in the header is also easy to validate. The initial pointers at least specify what ID you should find at their offsets, which I think will be a good way to validate if they’re correct.

I’m going to hack on this a little bit more, but if you have some ideas, shoot me a message!

Oh yeah, and big props to the libPST guys who figured this all out long before I came along ;) Their documentation of the format is adequate and should be easy enough to follow. Their application (readpst) also produces some useful output if you enable the debugging in the C source code. Check out their project at sourceforge.