Fixed
Status Update
Comments
ap...@google.com <ap...@google.com> #2
Project: platform/frameworks/support
Branch: androidx-master-dev
commit 87230c0d52d3d534692a31c8ffbb378fc3508111
Author: Chris Craik <ccraik@google.com>
Date: Tue Oct 08 08:28:15 2019
Remove warmup log and array alloc from critical path
Bug:142058671
Test: ./gradlew benchmark:benchmark-benchmark:cC # with systrace / method tracing
When transitioning out of warmup, saw array allocation in method
trace, and log work in systrace. Still more improvements to make here.
Change-Id: Ibddbc3dd90e7802d3777f690be1a684ced4fc339
M benchmark/common/src/main/java/androidx/benchmark/BenchmarkState.kt
M benchmark/common/src/main/java/androidx/benchmark/WarmupManager.kt
https://android-review.googlesource.com/1134877
https://goto.google.com/android-sha1/87230c0d52d3d534692a31c8ffbb378fc3508111
Branch: androidx-master-dev
commit 87230c0d52d3d534692a31c8ffbb378fc3508111
Author: Chris Craik <ccraik@google.com>
Date: Tue Oct 08 08:28:15 2019
Remove warmup log and array alloc from critical path
Bug:142058671
Test: ./gradlew benchmark:benchmark-benchmark:cC # with systrace / method tracing
When transitioning out of warmup, saw array allocation in method
trace, and log work in systrace. Still more improvements to make here.
Change-Id: Ibddbc3dd90e7802d3777f690be1a684ced4fc339
M benchmark/common/src/main/java/androidx/benchmark/BenchmarkState.kt
M benchmark/common/src/main/java/androidx/benchmark/WarmupManager.kt
ap...@google.com <ap...@google.com> #3
Project: platform/frameworks/support
Branch: androidx-master-dev
commit b15ef41ce16efc5c3b7d3dfe0d04157ffa474ff2
Author: Chris Craik <ccraik@google.com>
Date: Tue Oct 08 15:06:18 2019
Add tracing to benchmark
Bug:142058671
Test: tests in benchmark-benchmark, with systrace
Change-Id: Ic7c35ad8c8b18e478ad2a19627df92e3241d8a0a
M benchmark/common/api/1.0.0-rc01.txt
M benchmark/common/api/current.txt
M benchmark/common/api/public_plus_experimental_1.0.0-rc01.txt
M benchmark/common/api/public_plus_experimental_current.txt
M benchmark/common/api/restricted_1.0.0-rc01.txt
M benchmark/common/api/restricted_current.txt
M benchmark/common/src/main/java/androidx/benchmark/BenchmarkState.kt
A benchmark/common/src/main/java/androidx/benchmark/TraceCompat.kt
M benchmark/junit4/src/main/java/androidx/benchmark/junit4/BenchmarkRule.kt
https://android-review.googlesource.com/1136816
https://goto.google.com/android-sha1/b15ef41ce16efc5c3b7d3dfe0d04157ffa474ff2
Branch: androidx-master-dev
commit b15ef41ce16efc5c3b7d3dfe0d04157ffa474ff2
Author: Chris Craik <ccraik@google.com>
Date: Tue Oct 08 15:06:18 2019
Add tracing to benchmark
Bug:142058671
Test: tests in benchmark-benchmark, with systrace
Change-Id: Ic7c35ad8c8b18e478ad2a19627df92e3241d8a0a
M benchmark/common/api/1.0.0-rc01.txt
M benchmark/common/api/current.txt
M benchmark/common/api/public_plus_experimental_1.0.0-rc01.txt
M benchmark/common/api/public_plus_experimental_current.txt
M benchmark/common/api/restricted_1.0.0-rc01.txt
M benchmark/common/api/restricted_current.txt
M benchmark/common/src/main/java/androidx/benchmark/BenchmarkState.kt
A benchmark/common/src/main/java/androidx/benchmark/TraceCompat.kt
M benchmark/junit4/src/main/java/androidx/benchmark/junit4/BenchmarkRule.kt
ap...@google.com <ap...@google.com> #4
Project: platform/frameworks/support
Branch: androidx-master-dev
commit 345f86d34f7e373f464c0d8185e392b067a2de4a
Author: Chris Craik <ccraik@google.com>
Date: Wed Oct 09 17:37:17 2019
Bump thread priority of benchmarks and JIT during benchmarks
The JIT thread is so low priority that other parallel tasks can starve
it, especially for the first few benchmarks when a process runs.
The system can spin up significant background work right after install
and/or instrumentation start, and on locked devices with only two big
cores, there aren't enough CPUs to go around - warmup and benchmark
both complete before relevant JIT is complete.
Now, we bump the priority of both the benchmark and JIT thread.
Tracing benchmarks show that the JIT thread goes much faster, which
should significantly reduce the chance we capture results on unjitted
code.
This may also motivate us to use CPU affinity + locked small cores in
the future, we can keep monitoring.
Test: ./gradlew benchmark:b-c:cC
Test: ./gradlew benchmark:b-b:cC
Test: ./gradlew recyclerview:r-b:cC
This CL also adds more logging, and unifies all logging under
"benchmark" tag. This logging was very useful in discovering and
diagnosing the priority problem, since it showed the edge cases where
jit finished *during* the measure pass.
Bug: 140773023
Bug: 142058671
Change-Id: If542e3cb8867165cf7b4688090ee534e68a23562
M benchmark/common/src/androidTest/java/androidx/benchmark/BenchmarkStateTest.kt
M benchmark/common/src/main/java/androidx/benchmark/BenchmarkState.kt
A benchmark/common/src/main/java/androidx/benchmark/ThreadPriority.kt
M benchmark/common/src/main/java/androidx/benchmark/WarmupManager.kt
M benchmark/junit4/src/main/java/androidx/benchmark/junit4/BenchmarkRule.kt
https://android-review.googlesource.com/1138018
https://goto.google.com/android-sha1/345f86d34f7e373f464c0d8185e392b067a2de4a
Branch: androidx-master-dev
commit 345f86d34f7e373f464c0d8185e392b067a2de4a
Author: Chris Craik <ccraik@google.com>
Date: Wed Oct 09 17:37:17 2019
Bump thread priority of benchmarks and JIT during benchmarks
The JIT thread is so low priority that other parallel tasks can starve
it, especially for the first few benchmarks when a process runs.
The system can spin up significant background work right after install
and/or instrumentation start, and on locked devices with only two big
cores, there aren't enough CPUs to go around - warmup and benchmark
both complete before relevant JIT is complete.
Now, we bump the priority of both the benchmark and JIT thread.
Tracing benchmarks show that the JIT thread goes much faster, which
should significantly reduce the chance we capture results on unjitted
code.
This may also motivate us to use CPU affinity + locked small cores in
the future, we can keep monitoring.
Test: ./gradlew benchmark:b-c:cC
Test: ./gradlew benchmark:b-b:cC
Test: ./gradlew recyclerview:r-b:cC
This CL also adds more logging, and unifies all logging under
"benchmark" tag. This logging was very useful in discovering and
diagnosing the priority problem, since it showed the edge cases where
jit finished *during* the measure pass.
Bug: 140773023
Bug: 142058671
Change-Id: If542e3cb8867165cf7b4688090ee534e68a23562
M benchmark/common/src/androidTest/java/androidx/benchmark/BenchmarkStateTest.kt
M benchmark/common/src/main/java/androidx/benchmark/BenchmarkState.kt
A benchmark/common/src/main/java/androidx/benchmark/ThreadPriority.kt
M benchmark/common/src/main/java/androidx/benchmark/WarmupManager.kt
M benchmark/junit4/src/main/java/androidx/benchmark/junit4/BenchmarkRule.kt
cc...@google.com <cc...@google.com> #5
As part of trying to reland Owen's looping arch change, I verified that it does significantly improve this problem, see attached .json files - specifically the measured numbers.
E.g. Parameterized benchmark, before (both variants):
"runs": [
18,
12,
12,
12,
12,
"runs": [
17,
5,
5,
6,
5,
After:
"runs": [
14,
11,
11,
11,
11,
"runs": [
4,
4,
4,
4,
4,
Or TrivialJavaBenchmark, Before:
"runs": [
24,
12,
12,
12,
12,
After:
runs": [
11,
11,
11,
11,
11,
Let's mark this fixed once the warmup rearch lands again.
E.g. Parameterized benchmark, before (both variants):
"runs": [
18,
12,
12,
12,
12,
"runs": [
17,
5,
5,
6,
5,
After:
"runs": [
14,
11,
11,
11,
11,
"runs": [
4,
4,
4,
4,
4,
Or TrivialJavaBenchmark, Before:
"runs": [
24,
12,
12,
12,
12,
After:
runs": [
11,
11,
11,
11,
11,
Let's mark this fixed once the warmup rearch lands again.
cc...@google.com <cc...@google.com> #6
In addition, we've lowered back down our measurements for our smallest (noop) benchmarks. Looks like making warmup and measurement more similar means we've fully jitted a lot more code. Since tiny benchmarks measure quickly, we were likely not giving the code in measurement codepaths time to jit.
I'd guess much of the remaining cost for the first loop is likely branch mispredictions for the loop early return itself (used only during measurement), but that only seems to only significantly affect the first benchmark (the first 'after' number above in ParameterizedBenchmark).
I'd guess much of the remaining cost for the first loop is likely branch mispredictions for the loop early return itself (used only during measurement), but that only seems to only significantly affect the first benchmark (the first 'after' number above in ParameterizedBenchmark).
Description
"timeNs": {
"minimum": 10,
"maximum": 28,
"median": 10,
"runs": [
28,
11,
10,
10,
10,
10,
10,
...
"name": "nothing",
"timeNs": {
"minimum": 9,
"maximum": 25,
"median": 9,
"runs": [
25,
9,
9,
9,
9,
9,
9,
...