Added a reader thread so the thread calling nextTuple did not block. This was causing other problems because Threads are reused between bolts and spouts and the MetricAggregationBolts weren't always getting their tuples and so evaluation wasn't happening.
This is coded so that it will work with the sequence for shutdown if a bolt is moved, but that doesn't seem to happen on just a shutdown.
Also fix a problem where the windows were not slid while the metrics were lagging. If metrics are lagging for several minutes, the windows could get so far out of date that valid metrics are discarded.
Use the new oldState and unchangedExpressions fields in the AlarmCreatedEvent to force the resend of unchanged sub alarms so the alarm will be re-evaluated.
Change the code that checks for "lagging" metrics to use this timestamp instead of the Metric timestamp since the lagging code deals with emptying the Kafka queue and this timestamp is a much better measure of how backed up the kafka queue is. The metric timestamp is set by the agent and it is much likelier for the time to be off.
Switch to mon-common build 48 which has the new timestamp. The MetricFilteringBolt will work correctly if the API is using an older version of MetricEnvelope without the timestamp, the lagging code just won't be invoked.
Change the tests to work with the new timestamp.
Had to back down to an older version of scala or the Threshold Engine would not start with a java.lang.NoClassDefFoundError: scala/reflect/ClassManifest
Needed to strip out the storm jar from the consolidated jar.
Needed to remove storm.yaml because Storm complains. Moved the registration of the Serializer to ThresholdingEngine.
Added the storm-core jar to the deb so it can be used for the local mode in mini-mon
Added logback.xml to the deb since storm-core.jar also has a logback.xml and that confuses logback. Ensure our logback.xml is used.
Added the storm-core.jar and logback.xml to the start of thresh in the deb
Had to rework how the AlarmEventForwarder was injected into the AlarmThresholdingBolt because the old way didn't work in a Storm cluster because the TopologyModule wasn't loaded on the worker when prepare was called.
Change so that all view slots must have metrics before the SubAlarm transitions to ALARM. Also, change it so SubAlarmStats doesn't transition it to UNDETERMINED on startup until there have been emptyWindowObservationThreshold calls to evaluate(). Previously it only required one for new sub alarms and sub alarms on restart. Want the ThresholdEngine to have same behavior on restart as it would when the Threshold Engine has been running for a long time.
Added code to MetricFilteringBolt to check on the lag between current time and the timestamp in the metric. If too large, a message is sent to the MetricAggregationBolt to hold off on evaluating alarms.
Added unit tests.
This is to support the use cases of aggregation of Metrics across systems and to handle Metrics where an extra Dimension is added but we still want the old MetricDefinition to be matched.
Added new MetricDefinitionAndTenantIdMatcher class to do the matching. Added its associated test.
Added tests for these cases.