Status Update
Comments
as...@google.com <as...@google.com> #2
Branch: androidx-master-dev
commit ec33fd30f60c310dc47ffeb481c79a50c16232f9
Author: Chris Craik <ccraik@google.com>
Date: Fri Apr 10 12:33:03 2020
Fix startupBenchmark in presubmit
Fixes: 153742334
Test: ./gradlew benchmark:integ-tests:startup-benchmark:cC # fails on device without locked clocks
Test: ./gradlew benchmark:integ-tests:startup-benchmark:cC -i -Pandroid.testInstrumentationRunnerArguments.notAnnotation=android.support.test.filters.FlakyTest,androidx.test.filters.FlakyTest # passes when we pass arg that presubmit passes
Skip StartupBenchmark when running in presubmit configuration by using
@FlakyTest.
Unfortunately, this seems to be the only easy way to skip the test,
without library-side complications. We can't early return from within
the test body, since then the Benchmark library complains that we
didn't include a benchmark block in our test.
The empty test may not be necessary for running in CI, but it does
allow the notAnnotation=FlakyTest gradle command above complete
successfully.
Change-Id: I143454e06f0458adbe71560d78e2ac11cfdc0bba
M benchmark/integration-tests/startup-benchmark/src/androidTest/java/androidx/benchmark/integration/startup/benchmark/ArgumentInjectingApplication.kt
A benchmark/integration-tests/startup-benchmark/src/androidTest/java/androidx/benchmark/integration/startup/benchmark/EmptyTest.kt
M benchmark/integration-tests/startup-benchmark/src/androidTest/java/androidx/benchmark/integration/startup/benchmark/StartupBenchmark.kt
as...@google.com <as...@google.com> #3
Chris, could you take a look at this benchmark? It seems weird that it is showing significant improvement without any related changes in the vicinity.
cc...@google.com <cc...@google.com> #4
This is a super weird improvement - during this period of improvement, androidx.room.benchmark.InvalidationTrackerBenchmark and RelationBenchmark were consistently failing. After the fix, the number went back to where it was.
Seems that either:
- androidx.room.benchmark crashing was affecting these other benchmarks in an entirely separate process
- the room change itself was affecting the other benchmark
- this is a weird coincidence
Double checking it's not #3 with some ABTD runs to try and repro.
Nothing in the traces jump out - most of the secondary metrics are the same or extremely similar but for some reason time is very different.
Could still be a unlucky layout issue.
cc...@google.com <cc...@google.com> #5
Looked into this with help of Yigit's new ABTD comparison workflow.
The thing that appears to have caused the improved performance is that the test running right before the wear benchmarks was crashing, which caused a full reset just before wear runs. See screenshot #1, this is from the invocation perfetto trace which is a top level artifact in the ATI page.
These wear benchmarks are likely very sensitive to device resets as they have a very large number of binder transactions, which are running faster in the faster configurations. My guess is hammering on binder slightly overwhelms system server.
When running faster, the average binder transaction takes 0.86ms (2nd screenshot) When running slower, the average binder transaction takes 1.05ms (3rd screenshot)
As the loop seems to have at least 2 transactions, that probably accounts for the gap with and without the reset.
This is a broader problem though - neither of these look trivial to remove from the benchmarks.
One is: AndroidComposeViewAccessibilityDelegateCompat.<init>: (Landroidx/compose/ui/platform/AndroidComposeView;)V -> android.view.accessibility.AccessibilityManager.getEnabledAccessibilityServiceList: (I)Ljava/util/List;
And another is: androidx.wear.compose.material.TimeBroadcastReceiver.register: (Landroid/content/Context;)V -> android.content.ContextWrapper.registerReceiver: (Landroid/content/BroadcastReceiver;Landroid/content/IntentFilter;)Landroid/content/Intent;
I got both of these by looking at a trace (on an API 35 device so it gets method traces), and selecting the binder transactions that happen within the method trace. Trace here:
Leland, what would you recommend we do here? Avoid the binder transactions with microbenchmark-specific caching? (Only in cases where we've reviewed it to be unavoidable?)
With the recent microbench refactor, it's much more possible for us to create trace based metrics in microbenchmark, so counting transactions may be a good long term tool to help with cases like this, but in the short term, how can we avoid cases like these in the benchmarks?
cc...@google.com <cc...@google.com> #6
Going to try and add some sleeps when binder is detected.
Description
Perf Regression (Low) found, matching 2 tracked metrics from benchmarks.
To triage this regression, see the guide at go/androidx-bench-triage .
Test classes affected:
Test methods affected:
Devices affected:
API Level: