HomeHabit is not refreshing when Daydream is active

igor · April 17, 2022, 5:17pm

ItemStateEvent processing was were from the beginning it seems, so I would be reluctant to exclude it, since I don’t remember what was the reasoning for it (Labs option might work).
Will look more through history of changes, to see if I can find some more info.

What do you mean by missing events?

Flole · April 17, 2022, 5:34pm

Oh I thought I stated it clearly what the current state is, but seems like I didnt, sorry about that. So Homehabit seems to be building up a “message queue” up to some point (that’s when I started to see the time item lagging behind), and then at some point it started skipping updates, so the time item was incremented in 2 second steps for example. “One-Time-Events” like turning on a light were often skipped/not represented properly in the UI, periodic events like the power consumption reporting from my meter was working (but probably just like the time item with skipped messages in between, just that it doesn’t really matter in this case). I first thought it might be an openHAB issue, but all events reach that apply function in HomeHabit, but then they get lost somewhere internally and never get represented in the UI. So I turn on a light, I see the json being passed to that apply function, but HomeHabit never refreshes the state (so it stays off).

I don’t think it’s any specific change that introduced this issue, probably just my system getting bigger and bigger with more events coming in and being forwarded to HomeHabit. If you want to reproduce this you can probably just add more and more items to your OH Installation and then at some point you will probably see the same behavior (it could even be that a huge load of messages breaks it so it’s skipping messages and then it never recovers from that), I think that makes more sense than looking through changes.

igor · April 17, 2022, 5:47pm

Got it, thanks for additional details.

On the history of changes, ItemStateEvent might have been needed at some point to workaround some missing change events, but not anymore. That was my thinking.
So far I couldn’t reproduce any performance degradation with 100 messages per second, we will see if increasing will have effect. That probably depends on the device and its load too, so might be hard to reproduce. But optimizations can still be done.

Flole · April 17, 2022, 6:04pm

I’m really curious now and will setup another openHAB test instance now to see what happens with 100 time items on my devices. I tried it also with my phone (OnePlus 3), it shows the exact same behaviour. Maybe the difference between my setup and your tests is that time items are refreshing all at the same time while my changes are more evenly spread. Shouldn’t take me long to start an openHAB instance with 100 time items, so soon we will know for sure if that triggers the issue for me or not.

Talking about optimizations: There is one call to getGenericParameterTypes that get’s called for the same method over and over again, returning the same results in my case and that is a quite expensive call aswell (about 15% of the total runtime of that thread). Caching the result in a HashMap or something like that would probably be another significant improvement. That’s the last obvious thing that I can see in the profiler, there are no other obvious calls to external functions that take a long time.

igor · April 17, 2022, 6:04pm

There is also a possibility that update is getting “lost” somewhere else after it is processed, like saving into database could be a bottleneck (some recent performance fixes were for similar issue).
When you see changes not reflecting in UI, like light switch, did you ever notice if you just switch to another dashboard and then back, was that state correct then? or you actually need to restart the app?

igor · April 17, 2022, 6:10pm

Yeah, that’s possible.
From my current measurements, it takes 100ms to process 150 messages (per second), so while it could be faster, it still far from getting backed up.

Do you know any approximate place it was called from?

Flole · April 17, 2022, 6:14pm

It’s called from the function with the string “Unable to promote access for private factory”, or “ctor can’t be null”.

igor · April 17, 2022, 6:18pm

Okay, that is not directly related OH integration then. It is in JSON parsing library, but OH integration is using an older parser. Probably, some other integrations (if you have any), or other parts of the app cause that.

Flole · April 17, 2022, 6:27pm

I don’t have any other integrations. It is being called with “com.habitautomated.shdp.value.AutoValue_Device” as Argument quite often, also multiple times per second (about 100 times per second). Is that new parser activated with the SHDP OH Integration option from the Labs section? I haven’t seen any mention what “SHDP OH Integration” is or does, it’s disabled in my case.

igor · April 17, 2022, 6:40pm

No, OH integration does not support new parser yet.
Btw, if you see that in Labs section, it means you have developer mode enabled, which will slow down the app in some places because of extra logging. It should not be enable-able through the normal UI.

Flole · April 17, 2022, 7:04pm

Just tried a brand-new OH installation and a brand new HomeHabit installation (so developer mode off), added 100 DateTime items with NTP to OH, 1 Dimmer and 1 Switch, Homehabit got 1 Value item to show the time (it’s updated properly), one slider for the Dimmer and one Switch for the Dimmer. If I turn on the Dimmer with the Switch it’s stuck in “Turning on…” and only after some time it is updated (along with the slider). I don’t see time lagging behind or skipping seconds though, just the Dimmer update getting missed or delayed. Just to confirm: You are not seeing the same behaviour? You are seeing instant updates, even when you use the Release Version I am using? Maybe you need to wait a little (like a minute or two), it seems like it’s fast immediately after a (re)connect and then get’s slower over time until it stops working completely.

When I put the App in the background and then in the foreground again the state is updated, I think the connection is re-established in that case?

igor · April 18, 2022, 1:24am

Managed to reproduce this behavior with artificially multiplying events coming from OH.
With 120 DateTime items updating every second and repeated 10 times each, so 1200 events per second, it started skipping some seconds. With 120 events x 50 = 6000 per second, it was skipping most of the time, and switches were loosing state updates.

Since you were saying that most of events coming from OH in your case are updates and not changes, I think there is a solution that would help. I believe event processing can handle higher volume, but each item update triggers other things, including database save and that is probably the issue here.
I will add a filter that would exclude events without changes to the state at the first possible opportunity. Let’s see if that will help.

Flole · April 18, 2022, 1:27am

You’re artificially building the statechanged option in OH now I’d probably be happy if that is used, comparing states seems to be additional overhead again, not sure if any other integration emits “useless” events.

igor · April 18, 2022, 1:33am

OH integration code already keeps cache of all current items, since events only have partial data, so this change is minimal, and hopefully helps in the short term.
Removing ItemStateEvent is still desired, but it would require more testing, so will be planned for a later release.

Flole · April 18, 2022, 1:39am

Awesome, sounds good.

Are you already preventing duplicate Updates? If 10 updates are “queued” only the last one needs to be processed, so all the other previous ones can be skipped in that case. I know that I have some pumps that report their RPMs multiple times per second, but what we really care about is the last one. I’m not sure what the best way to implement this would be so an event can be updated without losing its position in the queue (then it would “starve” and never be updated).

Flole · April 18, 2022, 11:07pm

I just found the json library you’re most likely using, have you tried the @CompiledJson annotation for the class I mentioned? It’s supposed to avoid reflection, and I think that would also avoid that expensive call (and probably accelerate conversion aswell).

igor · April 19, 2022, 2:16pm

The latest fix is available in 28.9 release.

No, all item updates are treated the same.

Yes, that annotation is used.

Flole · April 19, 2022, 2:54pm

I am a little surprised that there is not DslJsonConverter created for the AutoValueDevice, there are converters for Device, DeviceUpdate, Binding and so on, but not for the AutoValue* things that are extending some other classes (at least in 28.8, 28.9 isn’t here yet). That seems to be another bottleneck, and it seems to be for item saving in the database? Then optimizing this so the converter is created (I don’t really understand why it isn’t if there is an annotation in com.habitautomated.shdp.value.AutoValue_Device) would probably help increasing performance aswell.

TommySharp · April 20, 2022, 8:08am

I’ve got nothing worthwhile to add except I’m loving the collaboration depth of investigation here! Nice work!

Flole · April 26, 2022, 1:20pm

I just had a chance to try and look at the latest version. Again it is a huge improvement, CPU at around 40% now (so almost as good as my solution) and more importantly: The issue seems to be finally gone!

One thing I noticed though is a very high i/o activity (also on the older versions): Homehabit is writing almost constantly 6MB/s to the flash. That is not good at all, as it causes excessive wear for the flash. Is it really necessary to store the state of the items persistent across App-restarts? They will most likely be outdated anyways when the App is restarted, so I don’t see the point in doing that? I suggest either keeping the current state in-memory only or keep it in memory and only commit the states once every 5 minutes (or only when the App is exited). That would further reduce CPU usage and will ensure that the flash is not killed due to excessive writes (6MB/s is a lot! Even if some of that is caught by the Operating System’s caching this is not good at all).

As a test I moved the entire database-directory to a RAM-Disk and that reduced the memory I/O to basically 0. Of course I will lose any changes across reboots of my device but that’s not an issue for me at the moment. If the eMMC dies from this I would have much bigger issues.