Migrating to a new computer recently, I observed that my Outlook PST file (sitting in C:\Users\Amrinder\AppData\Local\Microsoft\Outlook folder) was about 4 GB. While not necessarily a problem, the folder size of Personal Folders itself was only showing up to be about 2.4 GB. (You can right click on Personal Folders -> Properties for Personal Folders -> Folder Size).
So, this begs the question, if Personal Folders is 2.4 GB, why is the PST file 4 GB? The answer to that lies in the structure of the PST file. Essentially, PST is a format that can be very easily loaded, very easily saved and very easily searched (gigabytes of information that we load when we start up Outlook, and is searchable in about 5 seconds). As the students and practitioners of data structures and algorithms will no doubt note, this requires that data files be indexed on many different columns, and then those index files stored, along with the data files. Further, whenever index files are created, they contain pointers back to data files themselves, and then when the data files are changed (for example, an email deleted, etc), data file cannot be simply modified without modifying those index files as well. So, that leads to two choices: every time you delete an email (or any element), you reprocess your index files. This would simply lead to a very non-responsive program. Other option is to let some of the changes go pending, and then run this maintenance once in a while, in which you shrink the data files, reclaim the deleted blocks and reindex the files.
Interestingly, you can run a small test for yourself. First, note the size of Personal Folders and the size of the PST file. Then, send a 10 MB file as an attachment to someone, and note your Folder size and the size of the PST file. Then, delete the email from Sent Items and check the folder size and the size of the PST file again. You will likely observe that while the size of the Personal Folders went from x to x+10 to x, the PST file only went from y to y+10+delta, and didn’t really go down to y after you deleted the email. So, read on.
This aspect of dirty records is not limited to Outlook only. Relational database management systems (such as Oracle) essentially do the same – they let delete records run without index recomputations, and then recompute the indexes (and shrink the data files) in a periodic fashion. In Oracle, you can trigger off this “data cleansing/recomputation” manually by using the analyze table queries. In Outlook too, you can do this by: Personal Folders -> Properties for Personal Folders -> Advanced -> Compact Now.
[Now, if you continue your test, you may observe that the PST file does go back after the file deletion and data compaction.]
One thing that I did observe, that even after doing the data compaction (really, there is such a verb), the file did not reach the same size as the Personal Folders was suggesting, but after shutting down Outlook and restarting it, and running data compaction one more time, it got to about 20% of the folder size.