I am looking for a way to minimize the footprint of our rehosted test & development sites. Currently the 50 GB DB is exported from Production and imported into test/dev sites. And then on our Unix server, we use the rsync command to move the application files and vault files (1500 GB) from the production disk to the test/dev disks. After the rehost is done, we perform "remove unreferenced files" to make sure things are tidy, and any "extra" files moved over from production disk to test/dev disk are deleted.
But maybe we only want to work with a 100 GB vault on test/dev sites. How can we rehost only the latest 1 year of data, for example?
Additionally, I thought of doing an rsync on files last modified in the previous 1 year. But I'm not sure if that is a good idea. Are there necessary files in the vaults that are older than 1 year? For example, the workflow templates? They might be stored in the DB. But that is just an example of some of the data I'm worried about not copying over if I limit the scope of rsync to only previous 1 year.