When Auditing entries or AttributeService entries are being deleted, Alfresco does not actually delete all of the associated data. The structures of thealf_prop_* tables are designed to heavily reuse individual textual or numerical data elements, much to the point that cascade deletion upon removal of audit entries or attributes is no longer possible as the same values could be referenced in other elements.
Since Alfresco 4.1.9, 4.2.2 (Enterprise) and 5.0 (Community), Alfresco has included a default job to clean up dangling data inalf_prop_* tables. This CRON job is disabled by default and must be re-configured to be able to run.
You can schedule or manually trigger thepropTablesCleanupTrigger script to clean up audit and property values tables (alf_audit_ andalf_prop_ tables).
If you plan to run a scheduled cleanup job for a large database (i.e. table sizes greater than 10 million rows), use the V2 cleanup job. Note that this is the default algorithm used in Content Services 6.2.1 onwards. SeeAlternative method for scheduling cleanup of database tables for more.
propTablesCleanupTrigger is aMonitoredCronTrigger script, and a schedule to run this script can be set in thealfresco-global.properties file or by using JMX.
To run this script on a schedule, add the following line to thealfresco-global.properties file:
attributes.propcleaner.cronExpression=* * * * * ? 2099The default setting of this property is a Quartz Cron expression,0 0 3 ? * SAT which runs every Saturday at 3 am. Alfresco recommends that you do not change the default value unless required.
Here is an example of the output you should expect to see in the debug information:
2020-04-14 13:22:17,493 INFO [schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-3] Processing from 0 to 100000 rows of 2 rows from table alf_audit_app.2020-04-14 13:22:17,493 INFO [schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-3] Processing from 0 to 1 rows of 0 rows from table alf_audit_entry.2020-04-14 13:22:17,509 INFO [schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-3] Processing from 0 to 100000 rows of 1 rows from table alf_prop_unique_ctx.2020-04-14 13:22:17,524 INFO [schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-3] Processing from 0 to 100000 rows of 3 rows from table alf_prop_root.The default batch size is 100000 rows.
CAUTION: The
propTablesCleanupTriggerscript is designed for occasional, scheduled cleanups of the database tables. Ensure that you run this script during periods when there is minimal or no load on the server. If there is load on the server whilepropTablesCleanupTriggerscript is running, you might experience database conflicts and related errors.
You can use an alternative algorithm for the cleanup of database tables. This is recommended for large databases (i.e. table sizes greater than 10 million rows). Note that this is the default algorithm used in Content Services 6.2.1 onwards.
Below are the default settings for the V1 cleanup job:
system.prop_table_cleaner.algorithm=V1# Prop cleaner V2 propertiessystem.delete_not_exists.batchsize=100000system.delete_not_exists.delete_batchsize=1000system.delete_not_exists.read_only=falsesystem.delete_not_exists.timeout_seconds=-1If you’re running an earlier version than Content Services 6.2.1, then follow the steps in the remainder of this page to use the V2 algorithm.
Change the prop table cleaner default properties inalfresco-global.properties:
Enable the new prop cleaner algorithm:
system.prop_table_cleaner.algorithm=V2Set the prop cleaner to read only:
system.delete_not_exists.read_only=trueSet the prop cleaner timeout:
# This will stop the next batch from being processed if the elapsed time from the job start is greater than 1 hour system.delete_not_exists.timeout_seconds=3600Restart the application server.
Open a web browser.
Go to<alfresco_ip>/alfresco/s/enterprise/admin/admin-log-settings.
Enable debug logging to see the amount of data that would be deleted by the algorithm. Change the log setting toDEBUG for package name:
org.alfresco.repo.domain.schema.script.DeleteNotExistsExecutorGo to<alfresco_ip>/alfresco/s/enterprise/admin/admin-scheduledjobs in a web browser, and execute thepropTablesCleanupJobDetail job.
Check the logs to see which values would be deleted from each table:
alf_prop_root alf_prop_value alf_prop_string_value alf_prop_serializable_value alf_prop_double_valueYou should see something like:
[org.alfresco.repo.domain.schema.script.DeleteNotExistsExecutor] [DefaultScheduler_Worker-10] Script would have deleted a total of X items from table alf_prop_root.Change the prop table cleaner default properties inalfresco-global.properties:
Enable the new prop cleaner algorithm:
system.prop_table_cleaner.algorithm=V2Set the prop cleaner to delete:
system.delete_not_exists.read_only=falseSet the prop cleaner timeout:
# This will stop the next batch from being processed if the elapsed time from the job start is greater than 1 hour system.delete_not_exists.timeout_seconds=3600Restart the application server.
Open a web browser.
Go to<alfresco_ip>/alfresco/s/enterprise/admin/admin-log-settings.
Enable debug logging to see the amount of data that was deleted by the algorithm:
Change the log setting toDEBUG for package nameorg.alfresco.repo.domain.schema.script.DeleteNotExistsExecutor.
CAUTION: The algorithm could delete more than the initial value due to cascading deletes being enabled in some tables.
Open a web browser, go to<alfresco_ip>/alfresco/s/enterprise/admin/admin-scheduledjobs, and execute thepropTablesCleanupJobDetail job.
Check the logs to see which values were deleted from each table.
alf_prop_root alf_prop_value alf_prop_string_value alf_prop_serializable_value alf_prop_double_valueYou should see something like:
[org.alfresco.repo.domain.schema.script.DeleteNotExistsExecutor] [DefaultScheduler_Worker-10] Script deleted a total of X items from table alf_prop_root.Noteworthy:
Since the server(s) and database could be configured in a specific way for each customer, we recommend that you test the setting for the timeout property. This should be changed according to the capabilities of the system or its load.
# This will stop the next batch from being processed if the elapsed time from the job start is greater than 1 hour system.delete_not_exists.timeout_seconds=3600The algorithm can be further improved by changing the batch values according to the capabilities of the system:
# Items processed in one prepared statement system.delete_not_exists.batchsize=100000 # Items deleted in one prepared delete statement system.delete_not_exists.delete_batchsize=1000CAUTION: The algorithm could delete more than the initial value stated when running in read-only mode due to cascading deletes being enabled in some tables.
For more detailed logging, change the log setting to TRACE for package nameorg.alfresco.repo.domain.schema.script.DeleteNotExistsExecutor.
Important:This could generate a lot of logging, so revert it to its default logging level when not needed.