Ошибка "ETL service aggregation to hourly tables has encountered an error"
1. Проблема
Появилась ошибка в событиях менеджера управления
ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details.
С чем это связанно на сколько критично и как ее устранить?
2. Диагностика
Ошибки веб-интерфейсе:
ETL service sampling has encountered an error. Please consult the service log for more details.
ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details.
В логах выглядит так:
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_2|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_7|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_6|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_5|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2023-03-16 17:09:06|MhcU4N|Q4jbCQ|UmugSh|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_3|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
Exception in component tRunJob_5
java.lang.RuntimeException: Child job running failed
at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_5Process(SampleRunJobs.java:1654)
at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_6Process(SampleRunJobs.java:1456)
at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_1Process(SampleRunJobs.java:1228)
at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tRunJob_4Process(SampleRunJobs.java:1000)
at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tJDBCConnection_2Process(SampleRunJobs.java:767)
at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs.tJDBCConnection_1Process(SampleRunJobs.java:642)
at ovirt_engine_dwh.samplerunjobs_4_4.SampleRunJobs$2.run(SampleRunJobs.java:2683)
2023-03-16 17:09:06|UmugSh|Q4jbCQ|CX4xDc|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java Exception|tRunJob_5|java.lang.RuntimeException:Child job running failed|1
Exception in component tRunJob_1
java.lang.RuntimeException: Child job running failed
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tRunJob_1Process(SampleTimeKeepingJob.java:6196)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_2Process(SampleTimeKeepingJob.java:5938)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_1Process(SampleTimeKeepingJob.java:4573)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_2Process(SampleTimeKeepingJob.java:4448)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tRowGenerator_2Process(SampleTimeKeepingJob.java:4317)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_3Process(SampleTimeKeepingJob.java:3722)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_5Process(SampleTimeKeepingJob.java:3106)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCInput_4Process(SampleTimeKeepingJob.java:2424)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob.tJDBCConnection_3Process(SampleTimeKeepingJob.java:1778)
at ovirt_engine_dwh.sampletimekeepingjob_4_4.SampleTimeKeepingJob$2.run(SampleTimeKeepingJob.java:11524)
3. Решение
3.1. Решение 1
Перед выполнением следующих команд следует воспользоваться резервным копированием менеджера управления. Воспользуйтесь инструкцией для создания полной резервной копии.
Команды:
/usr/bin/engine-backup --mode=backup --scope=dwhdb --file="/var/lib/ovirt-engine-backups/dwhdb-$(date +%Y%m%d%H%M%S).tar.bz2" --log=/var/log/dwhdb.log
su - postgres
psql -U postgres -d ovirt_engine_history
SELECT now();
SELECT * from history_configuration;
UPDATE history_configuration set var_datetime = date_trunc('hour', now())- interval '1 hour' WHERE var_name = 'lastHourAggr';
UPDATE history_configuration set var_datetime = cast(now() as date)- interval '1 day' WHERE var_name = 'lastDayAggr';
exit
systemctl restart ovirt-engine-dwhd
systemctl status ovirt-engine-dwhd
3.2. Решение 2
engine-setup
Данная команда переустановит DWH, после этого проблема будет решена.
Данные сообщения связаны с службой dwhd, занимающейся мониторингом информации о хостах, ВМ и хранилищах. Переход на зимнее/летнее время в разных часовых поясах может привести к появлению данных сообщений. Если данное сообщение с ошибкой единоразово, его можно игнорировать