HomeHabit is not refreshing when Daydream is active

igor · April 7, 2022, 4:12pm

Just to clarify, large string question was not specific to values that update often, since an update can touch other things in the system that would trigger some checks. If there are no large values, it might something else then, that was just a guess.

Yeah, I wasn’t clear in the previous message. It is harder to address because it is a low level code that has large change impact, so I would need to have a reproducible flow before that can be changed/tested.
I will run more tests and see if there is some root cause for that.

Flole · April 7, 2022, 4:56pm

I am not aware of any large values, I have only (short) strings, numbers and dates. I do have lot’s of items though (1099 if I see that correctly) if that would explain a potential negative performance impact (I would assume there’s some kind of searching an item when a new state is received). But I would have seen that in the profiler if there would be some method taking a long time, and the ones that were looking particularly bad are the ones mentioned above, everything else looked pretty good. The equals-method that I saw in the profiler could be potentially part of searching the right item to update the state, but that’s a wild guess and I haven’t looked at where it’s called from, I’ll do that later on though so we can see if it’s part of the updating process and could explain a slowdown or if that’s for something else and unrelated.

igor · April 15, 2022, 2:22pm

@Flole The latest patch release (28.7) has the fix for getSimpleName() usage. Let me know if it makes a difference on your device.
There are also some other performance improvements, but those were related to processing large number items initially, not a large update volume, so might not benefit you specifically.

Flole · April 16, 2022, 7:14pm

@igor Thank you very much, the CPU usage is now much lower, only about 50% of what it was before. Unfortunately the issue is still there, so I tried another improvement: You are currently subscribing to openHAB’s like this /rest/events?topics=smarthome/items/*,openhab/items/*. That subscribes to ItemStateEvent and ItemStateChangedEvent and also GroupStateChangedEvent and you seem to be processing all of those. However, every ItemStateChangedEvent also comes with a ItemStateEvent (and GroupItemStateChangedEvents are a subset of ItemStateChangedEvents), so ItemStateChangedEvents are effectively transmitted twice now. As I think we are only interested in when something changed, what I did was modifying that subscription so it subscribes to rest/events?topics=smarthome/items/*/statechanged,openhab/items/*/statechanged, which again reduced the CPU usage significantly (again by more than 50%). This means basically I filtered out all the ItemStateEvents and now there are only ItemStateChangedEvents and GroupItemStateChangedEvents.While I was at 170% before all the improvements (so 1 core fully utilized, another one 60%), I am now down to 35%. Your changes alone brought it down to around 90%. So far this is working without any issues, so it might be worth looking into this as it cuts the load of things that need to be processed in half, and of course also network traffic is reduced significantly by this.

I haven’t tested if it comes with any side effects (like for example states being out of sync, but then the issue is somewhere else as a (Group|Item)StateEvent is only sent if the Group/Item received an update in openHAB, the difference to the ChangedEvent is just that those are sent even if there wasn’t an update), however that change seems like a good idea to me assuming that during startup the states are initially requested and updated (but only if they are uninitialized! Otherwise there is a risk of a race condition). That of course also assumes that on a reconnect all states are set back to uninitialized and then the current state is requested and applied to all items (assuming there wasn’t a statechanged-event that updated it before, meaning it’s still uninitialized).

Flole · April 17, 2022, 4:12pm

I’ve reverted my change for testing now and after a few minutes saw the effect. I added some additional logging to some apply method where the filtering is done (the String “Unsupported event type” is in there, that should help you finding what I am talking about) and there are 150 messages per second arriving there right now (I only looked at one of the cases in that apply method, so those are not duplicates)… That is because of all the StateEvents which openHAB is transmitting even though nothing was changed (some of my devices report their state multiple times per second, even if nothing was changed). Also the “missing” events are arriving there, unfortunately I don’t know at which place I could check next to figure out where they get lost, it’s getting a little hard to follow from this point, if you could give me a string from another function where I could check next I would probably be able to find it and see what arrives there. But for me the solution seems to be to disable the unnecessary notifications as that reduces the message load significantly (and still no visible side effects by the way). Maybe that’s something that could be added as a Lab-option if you are not confident about making this the default? Seems like a small change to select a different string for the subscription and maybe even disable some of the filtering logic, but that way there could be easier testing, also by others.

igor · April 17, 2022, 5:17pm

ItemStateEvent processing was were from the beginning it seems, so I would be reluctant to exclude it, since I don’t remember what was the reasoning for it (Labs option might work).
Will look more through history of changes, to see if I can find some more info.

What do you mean by missing events?

Flole · April 17, 2022, 5:34pm

Oh I thought I stated it clearly what the current state is, but seems like I didnt, sorry about that. So Homehabit seems to be building up a “message queue” up to some point (that’s when I started to see the time item lagging behind), and then at some point it started skipping updates, so the time item was incremented in 2 second steps for example. “One-Time-Events” like turning on a light were often skipped/not represented properly in the UI, periodic events like the power consumption reporting from my meter was working (but probably just like the time item with skipped messages in between, just that it doesn’t really matter in this case). I first thought it might be an openHAB issue, but all events reach that apply function in HomeHabit, but then they get lost somewhere internally and never get represented in the UI. So I turn on a light, I see the json being passed to that apply function, but HomeHabit never refreshes the state (so it stays off).

I don’t think it’s any specific change that introduced this issue, probably just my system getting bigger and bigger with more events coming in and being forwarded to HomeHabit. If you want to reproduce this you can probably just add more and more items to your OH Installation and then at some point you will probably see the same behavior (it could even be that a huge load of messages breaks it so it’s skipping messages and then it never recovers from that), I think that makes more sense than looking through changes.

igor · April 17, 2022, 5:47pm

Got it, thanks for additional details.

On the history of changes, ItemStateEvent might have been needed at some point to workaround some missing change events, but not anymore. That was my thinking.
So far I couldn’t reproduce any performance degradation with 100 messages per second, we will see if increasing will have effect. That probably depends on the device and its load too, so might be hard to reproduce. But optimizations can still be done.

Flole · April 17, 2022, 6:04pm

I’m really curious now and will setup another openHAB test instance now to see what happens with 100 time items on my devices. I tried it also with my phone (OnePlus 3), it shows the exact same behaviour. Maybe the difference between my setup and your tests is that time items are refreshing all at the same time while my changes are more evenly spread. Shouldn’t take me long to start an openHAB instance with 100 time items, so soon we will know for sure if that triggers the issue for me or not.

Talking about optimizations: There is one call to getGenericParameterTypes that get’s called for the same method over and over again, returning the same results in my case and that is a quite expensive call aswell (about 15% of the total runtime of that thread). Caching the result in a HashMap or something like that would probably be another significant improvement. That’s the last obvious thing that I can see in the profiler, there are no other obvious calls to external functions that take a long time.

igor · April 17, 2022, 6:04pm

There is also a possibility that update is getting “lost” somewhere else after it is processed, like saving into database could be a bottleneck (some recent performance fixes were for similar issue).
When you see changes not reflecting in UI, like light switch, did you ever notice if you just switch to another dashboard and then back, was that state correct then? or you actually need to restart the app?

igor · April 17, 2022, 6:10pm

Yeah, that’s possible.
From my current measurements, it takes 100ms to process 150 messages (per second), so while it could be faster, it still far from getting backed up.

Do you know any approximate place it was called from?

Flole · April 17, 2022, 6:14pm

It’s called from the function with the string “Unable to promote access for private factory”, or “ctor can’t be null”.

igor · April 17, 2022, 6:18pm

Okay, that is not directly related OH integration then. It is in JSON parsing library, but OH integration is using an older parser. Probably, some other integrations (if you have any), or other parts of the app cause that.

Flole · April 17, 2022, 6:27pm

I don’t have any other integrations. It is being called with “com.habitautomated.shdp.value.AutoValue_Device” as Argument quite often, also multiple times per second (about 100 times per second). Is that new parser activated with the SHDP OH Integration option from the Labs section? I haven’t seen any mention what “SHDP OH Integration” is or does, it’s disabled in my case.

igor · April 17, 2022, 6:40pm

No, OH integration does not support new parser yet.
Btw, if you see that in Labs section, it means you have developer mode enabled, which will slow down the app in some places because of extra logging. It should not be enable-able through the normal UI.

Flole · April 17, 2022, 7:04pm

Just tried a brand-new OH installation and a brand new HomeHabit installation (so developer mode off), added 100 DateTime items with NTP to OH, 1 Dimmer and 1 Switch, Homehabit got 1 Value item to show the time (it’s updated properly), one slider for the Dimmer and one Switch for the Dimmer. If I turn on the Dimmer with the Switch it’s stuck in “Turning on…” and only after some time it is updated (along with the slider). I don’t see time lagging behind or skipping seconds though, just the Dimmer update getting missed or delayed. Just to confirm: You are not seeing the same behaviour? You are seeing instant updates, even when you use the Release Version I am using? Maybe you need to wait a little (like a minute or two), it seems like it’s fast immediately after a (re)connect and then get’s slower over time until it stops working completely.

When I put the App in the background and then in the foreground again the state is updated, I think the connection is re-established in that case?

igor · April 18, 2022, 1:24am

Managed to reproduce this behavior with artificially multiplying events coming from OH.
With 120 DateTime items updating every second and repeated 10 times each, so 1200 events per second, it started skipping some seconds. With 120 events x 50 = 6000 per second, it was skipping most of the time, and switches were loosing state updates.

Since you were saying that most of events coming from OH in your case are updates and not changes, I think there is a solution that would help. I believe event processing can handle higher volume, but each item update triggers other things, including database save and that is probably the issue here.
I will add a filter that would exclude events without changes to the state at the first possible opportunity. Let’s see if that will help.

Flole · April 18, 2022, 1:27am

You’re artificially building the statechanged option in OH now I’d probably be happy if that is used, comparing states seems to be additional overhead again, not sure if any other integration emits “useless” events.

igor · April 18, 2022, 1:33am

OH integration code already keeps cache of all current items, since events only have partial data, so this change is minimal, and hopefully helps in the short term.
Removing ItemStateEvent is still desired, but it would require more testing, so will be planned for a later release.

Flole · April 18, 2022, 1:39am

Awesome, sounds good.

Are you already preventing duplicate Updates? If 10 updates are “queued” only the last one needs to be processed, so all the other previous ones can be skipped in that case. I know that I have some pumps that report their RPMs multiple times per second, but what we really care about is the last one. I’m not sure what the best way to implement this would be so an event can be updated without losing its position in the queue (then it would “starve” and never be updated).