Ошибка "ETL service aggregation to hourly tables has encountered an error"

1. Проблема

Появилась ошибка в событиях менеджера управления

ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details.

С чем это связанно на сколько критично и как ее устранить?

2. Диагностика

Ошибки веб-интерфейсе:

ETL service sampling has encountered an error. Please consult the service log for more details.
ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details.

В логах выглядит так:

2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_2|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_7|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_6|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_5|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_3|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
Exception in component tRunJob_5
java.lang.RuntimeException: Child job running failed
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_5Process(SampleRunJobs.java:1654)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_6Process(SampleRunJobs.java:1456)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_1Process(SampleRunJobs.java:1228)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_4Process(SampleRunJobs.java:1000)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tJDBCConnection_2Process(SampleRunJobs.java:767)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tJDBCConnection_1Process(SampleRunJobs.java:642)
        at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs$2.run(SampleRunJobs.java:2683)
2023-03-16 17:09:06|UmugSh|Q4jbCQ|CX4xDc|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java Exception|tRunJob_5|java.lang.RuntimeException:Child job running failed|1
Exception in component tRunJob_1
java.lang.RuntimeException: Child job running failed
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tRunJob_1Process(SampleTimeKeepingJob.java:6196)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_2Process(SampleTimeKeepingJob.java:5938)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_1Process(SampleTimeKeepingJob.java:4573)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_2Process(SampleTimeKeepingJob.java:4448)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tRowGenerator_2Process(SampleTimeKeepingJob.java:4317)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_3Process(SampleTimeKeepingJob.java:3722)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_5Process(SampleTimeKeepingJob.java:3106)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_4Process(SampleTimeKeepingJob.java:2424)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_3Process(SampleTimeKeepingJob.java:1778)
        at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob$2.run(SampleTimeKeepingJob.java:11524)

3. Решение

3.1. Решение 1

Перед выполнением следующих команд следует воспользоваться резервным копированием менеджера управления. Воспользуйтесь инструкцией для создания полной резервной копии.

Команды:

/usr/bin/engine-backup --mode=backup --scope=dwhdb --file="/var/lib/ovirt-engine-backups/dwhdb-$(date +%Y%m%d%H%M%S).tar.bz2" --log=/var/log/dwhdb.log

su - postgres

psql -U postgres -d ovirt_engine_history

SELECT now();

SELECT * from history_configuration;

UPDATE history_configuration set var_datetime = date_trunc('hour', now())- interval '1 hour' WHERE var_name = 'lastHourAggr';

UPDATE history_configuration set var_datetime = cast(now() as date)- interval '1 day' WHERE var_name = 'lastDayAggr';

exit

systemctl restart ovirt-engine-dwhd

systemctl status ovirt-engine-dwhd

3.2. Решение 2

engine-setup

Данная команда переустановит DWH, после этого проблема будет решена.

Данные сообщения связаны с службой dwhd, занимающейся мониторингом информации о хостах, ВМ и хранилищах. Переход на зимнее/летнее время в разных часовых поясах может привести к появлению данных сообщений. Если данное сообщение с ошибкой единоразово, его можно игнорировать