Number of threads to use for host health evaluation[health].health threads default number of CPUs * 2
Number of threads to use for host health initialization (default 25% of health threads)
Number of threads to use for host health maintenance (cleanup host alert transitions) (default 25% of health threads)
Number of threads to use for host health cleanup on child disconnection (default 25% of health threads)
Migratehealth_log ,health_log_detail andalert_hash tables tonetdata-health.db and delete
fromnetdata-meta.db
Migrateaclk_queue,alert_queue andalert_version tables tonetdata-aclk.db and delete
fromnetdata-meta.db
Switch alert transition store frommetasync tohealth event loop
Alert config cleanup during child disconnection offloaded tohealth event loop
Schedule periodic host alert transition cleanup job
Add workers for extended statistics
Deduplicate info/summary fields in transitions

[TBA]

Test Plan

[TBA]

github-actionsbot added area/health area/daemon area/database labels

Feb 10, 2025

stelfrag force-pushed thehealth_evloop branch from5e8d9f8 tofa79ce7Compare

February 17, 2025 07:48

github-actionsbot added the area/streaming label

Feb 17, 2025

stelfrag force-pushed thehealth_evloop branch 2 times, most recently from6493d55 tofd57440Compare

February 18, 2025 14:08

github-actionsbot added the area/aclk label

Feb 18, 2025

stelfrag force-pushed thehealth_evloop branch 6 times, most recently from8eedfbc to163eb41Compare

February 26, 2025 08:45

github-actionsbot added the area/buildBuild system (autotools and cmake). label

Feb 26, 2025

stelfrag force-pushed thehealth_evloop branch 6 times, most recently from4edcc4d to352e44dCompare

February 27, 2025 12:09

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/database/sqlite/sqlite_aclk.c

		const char *database_aclk_config[] = {

		"CREATE TABLE IF NOT EXISTS alert_queue "
		" (host_id BLOB, health_log_id INT, unique_id INT, alarm_id INT, status INT, date_scheduled INT, "

Copy link

Contributor

thiagoftsmFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If status and date_scheduled cannot be null, please, set explicitlyNOT NULL.

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/database/sqlite/sqlite_aclk.cShow resolvedHide resolved

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/database/sqlite/sqlite_db_migration.c

		"red text, warn text, crit text, exec text, to_key text, info text, delay text, options text, "
		"repeat text, host_labels text, p_db_lookup_dimensions text, p_db_lookup_method text, p_db_lookup_options int, "
		"p_db_lookup_after int, p_db_lookup_before int, p_update_every int, source text, chart_labels text, "
		"summary text, time_group_condition INT, time_group_value DOUBLE, dims_group INT, data_source INT)",

Copy link

Contributor

thiagoftsmFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What I wrote above about the null fields is also applied for each table here. Let us force the engine to do a double check, to be sure we will not have data loss.

Copy link

Contributor

thiagoftsmFeb 27, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

ifevery is ourupdate_every or another frequency, ideally, it should be integer

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/database/sqlite/sqlite_functions.cShow resolvedHide resolved

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/database/sqlite/sqlite_health.c OutdatedShow resolvedHide resolved

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/health/health_event_loop.c OutdatedShow resolvedHide resolved

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/health/health_event_loop.c OutdatedShow resolvedHide resolved

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/health/health_event_loop.c OutdatedShow resolvedHide resolved

thiagoftsm reviewed

Feb 27, 2025

View reviewed changes

src/health/health_event_loop.c

		static void timer_cb(uv_timer_t *handle)
		{
		uv_stop(handle->loop);
		uv_update_time(handle->loop);

Copy link

Contributor

thiagoftsmFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am letting this comment here for we do not forget. If next commits do not use the commented code, please, remove them.

stelfrag force-pushed thehealth_evloop branch from352e44d to83a4f48Compare

February 28, 2025 10:31

Copy link

Member

ktsaou commentedMar 1, 2025

I updated to the current master, to have better crash detection.

stelfrag force-pushed thehealth_evloop branch 4 times, most recently fromc8e306b tof715a21Compare

March 4, 2025 14:17

thiagoftsm self-requested a review

March 6, 2025 02:31

stelfrag force-pushed thehealth_evloop branch 4 times, most recently fromc1ea9d7 tob99ba95Compare

March 12, 2025 15:30

stelfrag force-pushed thehealth_evloop branch fromb99ba95 tob08e783Compare

March 17, 2025 09:11

stelfrag force-pushed thehealth_evloop branch fromb08e783 tof5d807bCompare

April 3, 2025 12:33

stelfrag added2 commits

April 9, 2025 18:09

Rebase

2a841b5

Cache info/summary idsFix tableDeduplicate info/summary from alert transitionsAdd job_running flag and cleanup handling to health event loopQueue store sql statements in metadata threadRemove unnecessary health check condition from event loopAdd new job names for alert host snapshot and process eventsPrepare ACLK OP to handle alert processing per hostRemove transactions for nowFix compile errorRevert serialization testAdd opcodes for Health Pause/ResumeSerialize alert transition processing / wrap in transaction (test)Set fixed timer intervals for health event loopAccurate scheduling health run after maintenance cannot be done yetRemove unused thread-local variable for health thread and simplify SQL statement preparationRefactor job submission logic to improve worker handlingSimplify job execution handlingMaintenance task should try to keep the health run scheduleImplement worker data pool to avoid memory allocationsFix delay calculation in health_host_register functionFix wrong memory allocationsRefactor job management to use dynamic worker data allocationCleanup workersRefactor health_host_initialize to remove delay parameterRemove VACUUM command from database health cleanupIf cleanup is running returnRefactor health job schedulingPrevent scheduling job if we reached the limitConfigure init, maint and cleanup health thread count based on the configured health threadsSchedule a host health init when we one is completedFix maintenance reschedulingAdd worker jobsCode cleanupAdd health thread configRun maintenanceRebase and code cleanupTransient alert entries are queued for deletion at the end of the evaluation loop (making sure they are processed and saved)Host unregister with cleanupNew job for host rrdcalc cleanup during unregisterstream receiver will schedule a host unregister and rrdcalc cleanup nowConfigure max threads per job (needs improved config)Reschedule alert evaluation if rrdcalc cleanup is pendingCollect all alert transitions and save in a batchRemove alert transition store from metadata event loopAdd host heath maintenance job (not used yet)cleanup_health_log removed from metadata threadMigrate health tables and aclk tables to new databases. Drop old tables.Use attach_databaseStore alert transitions to the new databases (includes queues and version book keeping for the cloud)Add sql_init_databases to configure all databases during startupDo not create aclk and health tables in netdata-meta.dbNew netdata-health database for alert transitionsNew netdata-aclk database to store transitions that need to go to the cloudFix compilation with internal checksRemove static threadInit and shutdown healthRegister/Unregister hosts for healthHandle resume from suspension / health delay of child reconnectRemove static health threadNew health event loop

Fix compilation

edaa9ee

stelfrag force-pushed thehealth_evloop branch fromf5d807b toedaa9eeCompare

April 9, 2025 15:13

stelfrag added2 commits

April 9, 2025 18:50

Fix compilation

d02c8fd

Handle invalid ids

a2e167e

Labels

area/aclk area/build

Build system (autotools and cmake).

area/daemon area/database area/health area/streaming

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rework health event loop#19612

Are you sure you want to change the base?

Rework health event loop#19612

Uh oh!

Conversation

stelfrag commentedFeb 10, 2025•
edited
Loading