Cleaning Out Old Data and Files
From SugarCRM Wiki
SugarCRM produces a lot of data. Once you start using it you quickly see your database increasing in size and your file system getting larger and larger. You'll probably notice this mostly as your backups start getting bigger. As a responsible systems administrator, it is your job to cut down on the flab: reducing the data in the file system will make your backups quicker and decrease the amount of storage space they take up; cleaning un-needed data out of the database will improve performance as well. What's not to like about that?
PLEASE HELP OUT BY ADDING YOUR KNOWLEDGE TO THIS PAGE. Especially if you have an understanding of the detailed database structure.
Contents |
File System Cleaning
There are a lot of files in the cache directory which are created when you upload files, when you perform upgrades, when you attach files to messages and when you install modules. Many of these can be removed. However the documentation doesn't really tell you which you can take and which you can't. In many cases the file name has been replaced by a uid() so it is hard to tell from a file listing as well. Please contribute your knowledge below.
| Directory | Description / Recommendation |
| /cache/diagnostic | Contains the output of diagnostic reports. These can be deleted unless you want to keep them. They don't take up much room anyway. |
| /cache/images | These seem to be (on my installation) images which were embedded in the emails I received in the group email box. Each image has been renamed with a uid. Assuming it is OK to delete all the files which have been named with a UID, the following command will remove all files beginning with eight hex characters, and a dash:
rm -rf [0-9,a-f][0-9,a-f][0-9,a-f][0-9,a-f][0-9,a-f][0-9,a-f][0-9,a-f][0-9,a-f]-* |
| /cache/upload | Lots of stuff in here. After 3 months of operation this folder alone had doubled the size of my sugarcrm installation. Seems like this is where most email attachments and documents end up. How do we decide which ones to delete and which to keep? |
| /cache/upload/upgrade | Contains several subdirectories. Seems like this is where temporary files needed for upgrades to modules etc are unzipped. Also, this directory contains backups of all the old (pre-upgrade) files. These are needed to uninstall modules and should not be deleted. |
Database Cleaning
The way SugarCRM handles deleted items is quite smart: when an item is deleted it remains in the database but is just marked as deleted. Thus a smart system administrator armed only with a few SQL commands can re-instate any records which have been deleted accidentally. But this in itself creates a problem: at some point you may want to throw some information away, based on age or other criteria. This is truer for some types of data than others. Emails for example, once deleted may never be required again. Share your techniques for removing unwanted information below, and comment or improve upon others' stragegies.
Built in Pruning Function
SugarCRM has a built in Pruning function. You can activate this as Admin through the Scheduler. Here's what the manual says:
| Prune Database on 1st of Month. This job reads all the tables in your Sugar Suite database, finds records that have been soft-deleted (deleted = 1), creates a large SQL file, and physically deletes those records on the 1st of every month. It is mostly a performance job, and is not essential to the use of Sugar Suite. The backup files are placed in cache/backups with time-stamped filenames. |
However, there are some problems with using this script.
- It will indiscriminately go through and delete all the items which have been marked as deleted of any age, from any table. There is no fine-grained control.
- There are threads in the forums which suggest that when this script deletes an item from a table, it won't go through all the other associated tables and remove references to that item. e.g if you delete a contact, the item will be gone from the contacts table, but not from the contacts_cstm table. Bug Report
- On large databases, there are issues with php script timeouts and memory. In this case you'll probably want to break the task down into smaller steps. Forum posting
So, if you need a bit more control over your installation you'll need some custom scripts.
Custom Scripts
Here are some user-contributed custom scripts which you can use to manually prune your database. These can of course be automated through the use of cron. As always, take backups of your database before use. Test them on a test installation first. Use at your own risk. Your mileage may vary. etc.
Emails Tables
The MySQL script below will remove all emails which have been marked as deleted and which are older than 30 days from the email table. It will then go through the related tables and remove any reference to emails which no longer appear in the emails table.
-- Remove all deleted emails older than 30 days. DELETE FROM `emails` WHERE deleted=1 AND date_modified <= (now() - INTERVAL 30 DAY); -- Delete references from associated tables DELETE FROM `emails_accounts` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_bugs` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_cases` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_contacts` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_leads` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_opportunities` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_projects` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_project_tasks` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_prospects` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_tasks` WHERE email_id NOT IN (SELECT id FROM `emails`); DELETE FROM `emails_users` WHERE email_id NOT IN (SELECT id FROM `emails`); -- Now need to compress the tables to remove the overhead, especially for emails table. OPTIMIZE TABLE emails, emails_accounts, emails_bugs, emails_cases, emails_contacts, emails_leads, emails_opportunities, emails_projects, emails_project_tasks, emails_prospects, emails_tasks, emails_users;
Accounts Tables
Please comment on the script below. This script (supposedly) removes all Accounts with no contacts, and then tidies up the accounts_contacts and accounts_cstm tables accordingly. This may not be appropriate to your installation
-- Delete all the Accounts / Companies which don't have any live Contacts underneath them. DELETE FROM `accounts` WHERE accounts.id NOT IN ( SELECT DISTINCT accounts_contacts.account_id FROM `accounts_contacts` LEFT JOIN `contacts` ON accounts_contacts.contact_id=contacts.id WHERE contacts.deleted=0 ); -- And remove all entries from Accounts.contacts table which reference deleted contacts. DELETE FROM `accounts_contacts` LEFT JOIN `contacts` ON accounts_contacts.contact_id=contacts.id WHERE accounts_contacts.contact_id NOT IN ( SELECT DISTINCT id FROM `contacts` WHERE contacts.deleted=0 ); -- Now clean up accounts_cstm DELETE FROM `accounts_cstm` WHERE id_c NOT IN (SELECT id FROM `accounts`);
Campaigns Tables
These can get ridiculously huge. Anyone understand the structure so they can contribute cleaning code?
Code:
Contacts Tables
The regular pruning only deletes contacts with deleted=1 set. How about removing related data from all other tables? Can anyone help with this?
-- Remove all deleted contacts older than 30 days. DELETE FROM `contacts` WHERE deleted=1 AND date_modified <= (now() - INTERVAL 30 DAY); -- Delete references from associated tables DELETE FROM `accounts_contacts` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `contacts_audit` WHERE parent_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `contacts_bugs` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `contacts_cases` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `contacts_users` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `contract_contacts` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `calls_contacts` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `meetings_contacts` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `opportunities_contacts` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `projects_contacts` WHERE contact_id NOT IN (SELECT id FROM `contacts`); DELETE FROM `quotes_contacts` WHERE contact_id NOT IN (SELECT id FROM `contacts`); TO BE CONTINUED, It misses to check parent_type = contact and parent_id not in select id from contact .. and probably more. -- Now need to compress the tables to remove the overhead, especially for emails table. OPTIMIZE TABLE contacts, accounts_contacts, contacts_audit, contacts_bugs, contacts_cases, contacts_users, contract_contacts, calls_contacts, meetings_contacts, opportunities_contacts, projects_contacts, quotes_contacts;
