Kaido Kert | 612c020 | 2020-01-22 10:28:42 -0800 | [diff] [blame] | 1 | Changes for 0.5.2 'Asiatic Cheetah': |
| 2 | ------------------------------------ |
| 3 | |
| 4 | 0.5.2 is a small release improving speed for ARM32 and adding minor features: |
| 5 | - ARM32 optimizations for loopfilter, ipred_dc|h|v |
| 6 | - Add section-5 raw OBU demuxer |
| 7 | - Improve the speed by reducing the L2 cache collisions |
| 8 | - Fix minor issues |
| 9 | |
| 10 | |
| 11 | Changes for 0.5.1 'Asiatic Cheetah': |
| 12 | ------------------------------------ |
| 13 | |
| 14 | 0.5.1 is a small release improving speeds and fixing minor issues |
| 15 | compared to 0.5.0: |
| 16 | - SSE2 optimizations for CDEF, wiener and warp_affine |
| 17 | - NEON optimizations for SGR on ARM32 |
| 18 | - Fix mismatch issue in x86 asm in inverse identity transforms |
| 19 | - Fix build issue in ARM64 assembly if debug info was enabled |
| 20 | - Add a workaround for Xcode 11 -fstack-check bug |
| 21 | |
| 22 | |
| 23 | Changes for 0.5.0 'Asiatic Cheetah': |
| 24 | ------------------------------------ |
| 25 | |
| 26 | 0.5.0 is a medium release fixing regressions and minor issues, |
| 27 | and improving speed significantly: |
| 28 | - Export ITU T.35 metadata |
| 29 | - Speed improvements on blend_ on ARM |
| 30 | - Speed improvements on decode_coef and MSAC |
| 31 | - NEON optimizations for blend*, w_mask_, ipred functions for ARM64 |
| 32 | - NEON optimizations for CDEF and warp on ARM32 |
| 33 | - SSE2 optimizations for MSAC hi_tok decoding |
| 34 | - SSSE3 optimizations for deblocking loopfilters and warp_affine |
| 35 | - AVX-2 optimizations for film grain and ipred_z2 |
| 36 | - SSE4 optimizations for warp_affine |
| 37 | - VSX optimizations for wiener |
| 38 | - Fix inverse transform overflows in x86 and NEON asm |
| 39 | - Fix integer overflows with large frames |
| 40 | - Improve film grain generation to match reference code |
| 41 | - Improve compatibility with older binutils for ARM |
| 42 | - More advanced Player example in tools |
| 43 | |
| 44 | |
| 45 | Changes for 0.4.0 'Cheetah': |
| 46 | ---------------------------- |
| 47 | |
| 48 | - Fix playback with unknown OBUs |
| 49 | - Add an option to limit the maximum frame size |
| 50 | - SSE2 and ARM64 optimizations for MSAC |
| 51 | - Improve speed on 32bits systems |
| 52 | - Optimization in obmc blend |
| 53 | - Reduce RAM usage significantly |
| 54 | - The initial PPC SIMD code, cdef_filter |
| 55 | - NEON optimizations for blend functions on ARM |
| 56 | - NEON optimizations for w_mask functions on ARM |
| 57 | - NEON optimizations for inverse transforms on ARM64 |
| 58 | - VSX optimizations for CDEF filter |
| 59 | - Improve handling of malloc failures |
| 60 | - Simple Player example in tools |
| 61 | |
| 62 | |
| 63 | Changes for 0.3.1 'Sailfish': |
| 64 | ------------------------------ |
| 65 | |
| 66 | - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs |
| 67 | - Reduce binary size, notably on Windows |
| 68 | - SSSE3 optimizations for ipred_filter |
| 69 | - ARM optimizations for MSAC |
| 70 | |
| 71 | |
| 72 | Changes for 0.3.0 'Sailfish': |
| 73 | ------------------------------ |
| 74 | |
| 75 | This is the final release for the numerous speed improvements of 0.3.0-rc. |
| 76 | It mostly: |
| 77 | - Fixes an annoying crash on SSSE3 that happened in the itx functions |
| 78 | |
| 79 | |
| 80 | Changes for 0.2.2 (0.3.0-rc) 'Antelope': |
| 81 | ----------------------------- |
| 82 | |
| 83 | - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase |
| 84 | The impact is important on SSSE3, SSE4 and AVX-2 cpus |
| 85 | - SSSE3 optimizations for all blocks size in itx |
| 86 | - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444) |
| 87 | - Speed improvements on CDEF for SSE4 CPUs |
| 88 | - NEON optimizations for SGR and loop filter |
| 89 | - Minor crashes, improvements and build changes |
| 90 | |
| 91 | |
| 92 | Changes for 0.2.1 'Antelope': |
| 93 | ---------------------------- |
| 94 | |
| 95 | - SSSE3 optimization for cdef_dir |
| 96 | - AVX-2 improvements of the existing CDEF optimizations |
| 97 | - NEON improvements of the existing CDEF and wiener optimizations |
| 98 | - Clarification about the numbering/versionning scheme |
| 99 | |
| 100 | |
| 101 | Changes for 0.2.0 'Antelope': |
| 102 | ---------------------------- |
| 103 | |
| 104 | - ARM64 and ARM optimizations using NEON instructions |
| 105 | - SSSE3 optimizations for both 32 and 64bits |
| 106 | - More AVX-2 assembly, reaching almost completion |
| 107 | - Fix installation of includes |
| 108 | - Rewrite inverse transforms to avoid overflows |
| 109 | - Snap packaging for Linux |
| 110 | - Updated API (ABI and API break) |
| 111 | - Fixes for un-decodable samples |
| 112 | |
| 113 | |
| 114 | Changes for 0.1.0 'Gazelle': |
| 115 | ---------------------------- |
| 116 | |
| 117 | Initial release of dav1d, the fast and small AV1 decoder. |
| 118 | - Support for all features of the AV1 bitstream |
| 119 | - Support for all bitdepth, 8, 10 and 12bits |
| 120 | - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale |
| 121 | - Full acceleration for AVX-2 64bits processors, making it the fastest decoder |
| 122 | - Partial acceleration for SSSE3 processors |
| 123 | - Partial acceleration for NEON processors |