This directory contains a set of webdriver-driven benchmarks for Cobalt.
Each file should contain a set of tests in Python “unittest” format.
All tests included within performance.py will be run on the build system. Results can be recorded in the build results database.
In most cases, you will want to run all performance tests, and you can do so by executing the script performance.py. You can call python performance.py --help
to see a list of commandline parameters to call it with. For example, to run tests on the raspi-2 QA build, you should run the following command:
python performance.py -p raspi-2 -c qa -d $RASPI_ADDR
Where RASPI_ADDR
is set to the IP of the target Raspberry Pi device.
To run individual tests, simply execute the script directly. For all tests, platform configuration will be inferred from the environment if set. Otherwise, it must be specified via commandline parameters.
If appropriate, create a new file borrowing the boilerplate from an existing simple file, such as browse_horizontal.py.
Add the file name to the tests added within performance.py, causing it run when performance is run.
If this file contains internal names or details, consider adding it to the “EXCLUDE.FILES” list.
Use the record_test_result*
methods in tv_testcase_util
where appropriate.
Results must be added to the build results database schema. See the internal README-Updating-Result-Schema.md file.
To run the benchmarks against any desired loader, a --url command line parameter can be provided. This will be the url that the tests will run against.
It should have the following format:
python performance.py -p raspi-2 -c qa -d $RASPI_ADDR --url https://www.youtube.com/tv?loader=nllive
The results will be printed to stdout. You should redirect output to a file if you would like to store the results. Each line of the benchmark output prefixed with webdriver_benchmark TEST_RESULT:
provides the result of one measurment. Those lines have the following format:
webdriver_benchmark TEST_RESULT: result_name result_value
where result_name
is the name of the result and result_value
is a number providing the measured result for that metric. For example,
webdriver_benchmark TEST RESULT: wbBrowseHorizontalDurLayoutBoxTreeUpdateUsedSizesUsPct50 3061.5
gives the 50th percentile of the duration Cobalt took to update the box tree's used sizes, on a horizontal scroll event, in microseconds.
Note that most time-based measurements are in microseconds.
Some particularly interesting timing-related benchmark results are:
wbStartupDurBlankToBrowseUs*
: Measures the startup time, until all images finish loading.wbBrowseToWatchDurVideoStartDelay*
: Measures the browse-to-watch time.wbBrowseVerticalDurTotalUs*
: Measures the input latency (i.e. JavaScript execution time + layout time) during vertical scroll events.wbBrowseVerticalDurRasterizeAnimationsUs*
: Measures the time it takes to render each frame of the animation triggered by a vertical scroll event. The inverse of this number is the framerate.wbBrowseHorizontalDurTotalUs*
: Same as wbBrowseVerticalDurTotalUs*
except for horizontal scroll events.wbBrowseHorizontalDurRasterizeAnimationsUs*
: Same as wbBrowseVerticalDurRasterizeAnimationsUs*
except for horizontal scroll events.In each case above, the *
symbol can be one of either Mean
, Pct25
, Pct50
, Pct75
or Pct95
. For example, wbStartupDurBlankToBrowseUsMean
or wbStartupDurBlankToBrowseUsPct95
are both valid measurements. The webdriver benchmarks runs its tests many times in order to obtain multiple samples, so you can drill into the data by exploring either the mean, or the various percentiles.
Some particularly interesting count-related benchmark results are:
wbBrowseVerticalCntDomHtmlElements*
: Lists the number of HTML elements in existence after the event. This includes HTML elements that are no longer in the DOM but have not been garbage collected yet.wbBrowseVerticalCntLayoutBoxes*
: Lists the number of layout boxes within the layout tree after the event.wbBrowseVerticalCntLayoutBoxesCreated*
: Lists the number of new layout boxes that were created during the event.wbBrowseHorizontalCntDomHtmlElements*
: Same as wbBrowseVerticalCntDomHtmlElements*
except for horizontal scroll events.wbBrowseHorizontalCntLayoutBoxes*
: Same as wbBrowseVerticalCntLayoutBoxes*
except for horizontal scroll events.wbBrowseHorizontalCntLayoutBoxesCreated*
: Same as wbBrowseVerticalCntLayoutBoxesCreated*
except for horizontal scroll events.In each case above, the *
symbol can be one of either Max
, Median
, or Mean
. For example, wbBrowseVerticalCntDomHtmlElementsMax
or wbBrowseVerticalCntDomHtmlElementsMedian
are both valid measurements. The webdriver benchmarks runs its tests many times in order to obtain multiple samples, so you can drill into the data by exploring either the max, median, or mean.
The webdriver benchmarks output many metrics, but you may only be interested in a few. You will have to manually filter only the metrics that you are interested in. You can do so with grep
, for example:
python performance.py -p raspi-2 -c qa -d $RASPI_ADDR > results.txt echo "" > filtered_results.txt grep -o "wbStartupDurBlankToBrowseUs.*$" results.txt >> filtered_results.txt grep -o "wbBrowseToWatchDurVideoStartDelay.*$" results.txt >> filtered_results.txt grep -o "wbBrowseVerticalDurTotalUs.*$" results.txt >> filtered_results.txt grep -o "wbBrowseVerticalDurRasterizeAnimationsUs.*$" results.txt >> filtered_results.txt grep -o "wbBrowseHorizontalDurTotalUs.*$" results.txt >> filtered_results.txt grep -o "wbBrowseHorizontalDurRasterizeAnimationsUs.*$" results.txt >> filtered_results.txt grep -o "wbBrowseVerticalCntDomHtmlElements.*$" results.txt >> filtered_results.txt grep -o "wbBrowseVerticalCntLayoutBoxes.*$" results.txt >> filtered_results.txt grep -o "wbBrowseHorizontalCntDomHtmlElements.*$" results.txt >> filtered_results.txt grep -o "wbBrowseHorizontalCntLayoutBoxes.*$" results.txt >> filtered_results.txt cat filtered_results.txt