HomeHabit is not refreshing when Daydream is active

Flole · April 6, 2022, 12:19am

The delay happened if the screensaver was active only for a longer time. I’ve switched my device to ethernet now to rule out a WiFi connection issue but that didn’t help.

I noticed that with openHAB as backend when trying to switch on a dimmer it stays at “Turning on” for a long time and then eventually switches to “Off” again, even though the light is on. Only after restarting HomeHabit it picks up the “On” Status after a few seconds.

The Constraint errors were gone after a reboot, unfortunately I forgot to actually submit my post until I had a look at it again today.

One thing I noticed today aswell: In the connection to openHAB you seem to be subscribing to something like /rest/events?topics=openhab/items, while all you are really interested in seems to be something like /rest/events?topics=openhab/items/*/statechanged. In my case there are many events per second being pushed to HomeHabit this way (even if there were no changes, just because some device told openHAB that the state hasn’t changed). If you only need the statechanged then it is sufficient to subscribe to those. I suggest having a connection just for the statechanged/Dashboard refreshing, and in case you need other events aswell (for example for automatically detecting new items) have a separate connection for those. That would lower the CPU and network usage of HomeHabit quite a bit in my case I guess.

Flole · April 6, 2022, 1:50am

Adding to this: It’s apparently completely unrelated to anything, right now on my phone when I launch Homehabit I can see values of a donut widget updating, the I try to turn on a light, it switches on immediately but takes a long time to update the status. I tried to enable debug mode for the openhab instance but didn’t see anything additional in the logs afterwards.

Flole · April 6, 2022, 3:43pm

I think I finally know what is wrong now: Homehabit just can’t keep up with the 20 messages per second it gets. When I change a dimmer in various steps then the light reacts immediately, the slider stays in it’s position but after some time it starts to move, reflecting my changes from before.

Additionally I added a value item and bound it to a clock in OH that posts a change every second, there’s about a minute of a delay. So the code that parses and posts the updates needs to be improved, it is not fast enough. On my device I have 4 CPUs, about 50% are idle, so maybe splitting things up would be an option so the ItemStateEvent has it’s own connection and thread? Or even split things up further and have 3 connections for ItemStateChangedEvent, ItemStateEvent and GroupItemStateChangedEvent, then you could get rid of the filtering/separation completely and also use the idle CPUs more effectively.

Flole · April 6, 2022, 5:57pm

Continuing this investigation I ran the Android Profiler on this app and some issues in the code were immediately visible: You are using getSimpleName() to get a class name for some log messages, it is appended right after “Converter not found for value type:”. That function is known for bad performance (see https://bugs.openjdk.java.net/browse/JDK-8187123 for reference), so if you can somehow find a way around that please do so to improve performance massively (this function call uses more than 50% of the time of one of the threads). Looking at the next thread I am seeing something very similar, at the very same location is again more than 50% of the time wasted. For the third thread it’s even worse: 78% wasted there. So if you can remove that somehow (it looks like a debug string to me, so maybe just remove it completely for release builds) it would be a massive performance gain.

Last but not least there is another method call that’s using about 70% of another thread: equals() in something that was probably an “Item” class before obfuscation made it unreadable, in that file the string “Cannot build Item, some of required attributes are not set” exists (you should be able to find it with this “hint”) and there is a string comparison in the equals() method which is not good and eating quite some performance, you could probably pre-hash that String of the item you are comparing and store it somewhere (don’t forget to update it whenever there are changes!) and then only need to hash half of it and compare, that’s about 50% performance gain of that function aswell (well actually you would probaby have a getCachedHash() method that builds a hash over all strings of the class and just call that and compare the results). Or just use the hashCode function and give it a cachedHash that is set to null on every modification, then the first call would be slow, all of the following ones would be super fast. So many great options, what’s the best solution depends on your code and how it’s used.

I am quite confident that with these optimizations it would work a lot better in my case, and of course this does also improve performance and therefore battery life for others.

igor · April 6, 2022, 8:15pm

@Flole Can you upload debug logs from that device (in the Support screen)?
“Converter not found” is not an expected path, so it is not really optimized for performance. There must be another issue if it hits that code so much.

To clarify, you are seeing a minute delay just when adding an item that changes every second? Or there are other high volume changes in the system at the same time?

Flole · April 6, 2022, 8:42pm

There are no “Converter not found” messages in the logs, but it doesn’t matter if the text is actually used, it is generated in each iteration. I obviously don’t know how your code looks like, but the bytecode ended up being something like this:

StringBuilder errMsg = createSbFromMessage("Converter not found for value type: ");
errMsg.append(cls.getSimpleName());;
Objects.requireNonNull(converter, errMsg.toString());

So even if the object is not null (which based on your explanation it seems to be, otherwise I would see it in the Logs?) that “bad line of code” is executed. So you probably want something like this instead:

if(converter == null) {
   StringBuilder errMsg = createSbFromMessage("Converter not found for value type: ");
   errMsg.append(cls.getSimpleName())
   throw new NullPointerException(errMsg.toString());
}

I haven’t done much Java recently, so this might not be 100% correct, but I think you get the idea. Of course you could also do the requireNonNull instead of directly throwing the exception, but that wouldn’t be “as good as this”.

My system is constantly seeing changes and state updates, even when I did that test. Right now that time item is matching the real time, but it’s skipping seconds (only updating every 2 seconds).
I tested it with the openHAB Demo instance and with that it’s working properly (they don’t have a high volume of updates though).

igor · April 7, 2022, 3:09pm

Thanks for a very detailed analysis of the issue!

Yes, getSimpleName() usage in non-error flow is an oversight, so that will be fixed.
However, I wasn’t able to reproduce performance issue so far (even on slower debug builds). With 50 clock items updating every seconds, it was still processing it without delay. So I’m not able to test if the change makes any difference, but the fix will be in the next patch release.

This one is harder to address, since it is in generated class code. I will see if I can artificially reproduce degraded performance to get a better idea.
Do you have any OpenHAB item that contains a lot of data? Like a very large string values or something similar?

What device did you run this profiling on?

Flole · April 7, 2022, 3:57pm

There are no large string values involved, I had a look with wget for testing to see what OH is actually publishing and those were all “normal”/“short” messages, nothing that looked different from what the OH Demo instance is sending. I will wait to see if this improved the situation now, if not I will do additional debugging by hooking functions and adding debug output, that’s a little harder to set up but I can certainly do that in case these optimizations were not enough/didn’t solve the problem. Then I should be able to see what messages actually arrive and when they arrive, I am pretty sure that it’s fixed now though.

I don’t see any reason why it shouldn’t work: You could still override the generated class code, or provide an alternative implementation for example by adding a cachedHashCode() method which then does this:


    if(cachedHash == null)
        cachedHash = hashcode();
    return cachedHash;

And then override the equals to something like this (or add a cachedEquals and use that one):

    if(otherItem == null)
        return false;
    if (getClass() != otherItem.getClass())
        return false;
    return cachedHashCode() == otherItem.cachedHashCode();

Don’t forget to set cachedHash = null in every setter-method to invalidate the cached value. I intentionally left out the “this == other”-check as that seems to be something that doesn’t happen very often and the cachedHashCode() would be equal then aswell, so it would still work correctly, just slower if the cachedHashCode isn’t generated yet (but an additional comparison would be gone, so if that’s indeed almost never happening it sounds like a fair deal to me, but that’s just a personal opinion from someone who doesn’t know the code at all).

I am seeing this on a wall-mounted Xiaomi MiPad 2 tablet which is my main device. It has a 4 core Intel CPU and 2GB of RAM (about 500MB are usually free).

igor · April 7, 2022, 4:12pm

Just to clarify, large string question was not specific to values that update often, since an update can touch other things in the system that would trigger some checks. If there are no large values, it might something else then, that was just a guess.

Yeah, I wasn’t clear in the previous message. It is harder to address because it is a low level code that has large change impact, so I would need to have a reproducible flow before that can be changed/tested.
I will run more tests and see if there is some root cause for that.

Flole · April 7, 2022, 4:56pm

I am not aware of any large values, I have only (short) strings, numbers and dates. I do have lot’s of items though (1099 if I see that correctly) if that would explain a potential negative performance impact (I would assume there’s some kind of searching an item when a new state is received). But I would have seen that in the profiler if there would be some method taking a long time, and the ones that were looking particularly bad are the ones mentioned above, everything else looked pretty good. The equals-method that I saw in the profiler could be potentially part of searching the right item to update the state, but that’s a wild guess and I haven’t looked at where it’s called from, I’ll do that later on though so we can see if it’s part of the updating process and could explain a slowdown or if that’s for something else and unrelated.

igor · April 15, 2022, 2:22pm

@Flole The latest patch release (28.7) has the fix for getSimpleName() usage. Let me know if it makes a difference on your device.
There are also some other performance improvements, but those were related to processing large number items initially, not a large update volume, so might not benefit you specifically.

Flole · April 16, 2022, 7:14pm

@igor Thank you very much, the CPU usage is now much lower, only about 50% of what it was before. Unfortunately the issue is still there, so I tried another improvement: You are currently subscribing to openHAB’s like this /rest/events?topics=smarthome/items/*,openhab/items/*. That subscribes to ItemStateEvent and ItemStateChangedEvent and also GroupStateChangedEvent and you seem to be processing all of those. However, every ItemStateChangedEvent also comes with a ItemStateEvent (and GroupItemStateChangedEvents are a subset of ItemStateChangedEvents), so ItemStateChangedEvents are effectively transmitted twice now. As I think we are only interested in when something changed, what I did was modifying that subscription so it subscribes to rest/events?topics=smarthome/items/*/statechanged,openhab/items/*/statechanged, which again reduced the CPU usage significantly (again by more than 50%). This means basically I filtered out all the ItemStateEvents and now there are only ItemStateChangedEvents and GroupItemStateChangedEvents.While I was at 170% before all the improvements (so 1 core fully utilized, another one 60%), I am now down to 35%. Your changes alone brought it down to around 90%. So far this is working without any issues, so it might be worth looking into this as it cuts the load of things that need to be processed in half, and of course also network traffic is reduced significantly by this.

I haven’t tested if it comes with any side effects (like for example states being out of sync, but then the issue is somewhere else as a (Group|Item)StateEvent is only sent if the Group/Item received an update in openHAB, the difference to the ChangedEvent is just that those are sent even if there wasn’t an update), however that change seems like a good idea to me assuming that during startup the states are initially requested and updated (but only if they are uninitialized! Otherwise there is a risk of a race condition). That of course also assumes that on a reconnect all states are set back to uninitialized and then the current state is requested and applied to all items (assuming there wasn’t a statechanged-event that updated it before, meaning it’s still uninitialized).

Flole · April 17, 2022, 4:12pm

I’ve reverted my change for testing now and after a few minutes saw the effect. I added some additional logging to some apply method where the filtering is done (the String “Unsupported event type” is in there, that should help you finding what I am talking about) and there are 150 messages per second arriving there right now (I only looked at one of the cases in that apply method, so those are not duplicates)… That is because of all the StateEvents which openHAB is transmitting even though nothing was changed (some of my devices report their state multiple times per second, even if nothing was changed). Also the “missing” events are arriving there, unfortunately I don’t know at which place I could check next to figure out where they get lost, it’s getting a little hard to follow from this point, if you could give me a string from another function where I could check next I would probably be able to find it and see what arrives there. But for me the solution seems to be to disable the unnecessary notifications as that reduces the message load significantly (and still no visible side effects by the way). Maybe that’s something that could be added as a Lab-option if you are not confident about making this the default? Seems like a small change to select a different string for the subscription and maybe even disable some of the filtering logic, but that way there could be easier testing, also by others.

igor · April 17, 2022, 5:17pm

ItemStateEvent processing was were from the beginning it seems, so I would be reluctant to exclude it, since I don’t remember what was the reasoning for it (Labs option might work).
Will look more through history of changes, to see if I can find some more info.

What do you mean by missing events?

Flole · April 17, 2022, 5:34pm

Oh I thought I stated it clearly what the current state is, but seems like I didnt, sorry about that. So Homehabit seems to be building up a “message queue” up to some point (that’s when I started to see the time item lagging behind), and then at some point it started skipping updates, so the time item was incremented in 2 second steps for example. “One-Time-Events” like turning on a light were often skipped/not represented properly in the UI, periodic events like the power consumption reporting from my meter was working (but probably just like the time item with skipped messages in between, just that it doesn’t really matter in this case). I first thought it might be an openHAB issue, but all events reach that apply function in HomeHabit, but then they get lost somewhere internally and never get represented in the UI. So I turn on a light, I see the json being passed to that apply function, but HomeHabit never refreshes the state (so it stays off).

I don’t think it’s any specific change that introduced this issue, probably just my system getting bigger and bigger with more events coming in and being forwarded to HomeHabit. If you want to reproduce this you can probably just add more and more items to your OH Installation and then at some point you will probably see the same behavior (it could even be that a huge load of messages breaks it so it’s skipping messages and then it never recovers from that), I think that makes more sense than looking through changes.

igor · April 17, 2022, 5:47pm

Got it, thanks for additional details.

On the history of changes, ItemStateEvent might have been needed at some point to workaround some missing change events, but not anymore. That was my thinking.
So far I couldn’t reproduce any performance degradation with 100 messages per second, we will see if increasing will have effect. That probably depends on the device and its load too, so might be hard to reproduce. But optimizations can still be done.

Flole · April 17, 2022, 6:04pm

I’m really curious now and will setup another openHAB test instance now to see what happens with 100 time items on my devices. I tried it also with my phone (OnePlus 3), it shows the exact same behaviour. Maybe the difference between my setup and your tests is that time items are refreshing all at the same time while my changes are more evenly spread. Shouldn’t take me long to start an openHAB instance with 100 time items, so soon we will know for sure if that triggers the issue for me or not.

Talking about optimizations: There is one call to getGenericParameterTypes that get’s called for the same method over and over again, returning the same results in my case and that is a quite expensive call aswell (about 15% of the total runtime of that thread). Caching the result in a HashMap or something like that would probably be another significant improvement. That’s the last obvious thing that I can see in the profiler, there are no other obvious calls to external functions that take a long time.

igor · April 17, 2022, 6:04pm

There is also a possibility that update is getting “lost” somewhere else after it is processed, like saving into database could be a bottleneck (some recent performance fixes were for similar issue).
When you see changes not reflecting in UI, like light switch, did you ever notice if you just switch to another dashboard and then back, was that state correct then? or you actually need to restart the app?

igor · April 17, 2022, 6:10pm

Yeah, that’s possible.
From my current measurements, it takes 100ms to process 150 messages (per second), so while it could be faster, it still far from getting backed up.

Do you know any approximate place it was called from?

Flole · April 17, 2022, 6:14pm

It’s called from the function with the string “Unable to promote access for private factory”, or “ctor can’t be null”.