3.0 Release Note¶
Declaration: This document is translated by Baidu Translate
As China’s first independently developed industrial-grade deep learning platform, PaddlePaddle has always adhered to the open-source path, supporting the intelligent upgrade of industries. The PaddlePaddle framework version 3.0 not only continues the characteristics of the PaddlePaddle framework 2.0 series, which unifies static and dynamic operations and integrates training and inference, but also achieves breakthroughs in automatic parallelism, neural network compilers, and high-order automatic differentiation, providing strong support for technological innovation and industrial applications in the era of large models, and creating a one-stop, high-performance deep learning development experience for developers. Whether it is cutting-edge algorithm research or the implementation of industrial-grade large models, PaddlePaddle framework version 3.0 will become the preferred tool for developers. Key features are described as follows:
Unified Static and Dynamic Automatic Parallelism: This feature significantly reduces the cost of industrial development and training. Users only need to perform a small amount of tensor slicing marking on a single card, and the PaddlePaddle framework will automatically derive the distributed slicing information and add communication operators to ensure logical correctness. At the same time, based on the model structure and cluster information, combined with the optimization of memory and scheduling layers, PaddlePaddle can automatically find the most efficient distributed parallel strategy, thereby significantly reducing the development cost of hybrid parallel training and enabling developers to focus more on model and algorithm innovation. The automatic parallel architecture has undergone in-depth verification and polishing to better support the pre-training + fine-tuning process for common large model scenarios such as pure dense models, pure sparse models (MoE), and multi-modal understanding models. It improves the slicing derivation rules of operators and supports converting automatic parallel training parameters into manual parallel parameters for downstream inference, achieving comprehensive usability and helping users reduce the development cost of large model parallel programs. Additionally, to further simplify the user’s distributed development process, a new
paddle.distributed.parallel
interface is introduced. Based on the encapsulation of distributed tensor marking syntax, it supports users in non-intrusively configuring common parallel strategies such as data parallelism, model parallelism, and pipeline parallelism outside of the model networking. Furthermore, the static graph automatic parallel architecture has undergone a comprehensive upgrade based on PIR, with the underlying basic components, core modules, parallel strategies, and performance optimization strategies all implemented uniformly based on the extended PIRDistDialect
, further enhancing the consistency of automatic parallelism between static and dynamic states, and achieving performance levels on the Llama series models that are on par with or even surpass manual parallel methods.Integrated Training and Inference for Large Models: Since version 2.0, PaddlePaddle has adopted the design philosophy of “unified dynamic and static, integrated training and inference,” and version 3.0 will continue to uphold this philosophy. Thanks to the unified dynamic and static architecture and interface design, PaddlePaddle fully supports both dynamic and static graph modes, and possesses excellent whole-graph export capabilities. The success rate of whole-graph export from dynamic to static in PaddlePaddle is as high as 95%, surpassing PyTorch’s 62%. “Integrated training and inference” means being able to reuse training and inference code, especially model networking code, within the same framework. After completing the development and training of the model, only a small amount of development work is required to achieve rapid inference deployment. This feature provides an ultimate development experience for the industry. It enables the reuse of training and inference capabilities, providing a unified development experience and ultimate training efficiency for the entire process of large models. Through the work of transitioning from dynamic to static, the training and inference tasks can be seamlessly connected. It supports multiple mainstream large models, and the DeepSeek-R1 full-blood version achieves single-machine deployment with doubled throughput.
High-order differential in scientific computing: PaddlePaddle Framework 3.0 provides support for high-order automatic differentiation, compilation optimization, and distributed training capabilities for scientific computing. Experiments on 41 different equations on NVIDIA Modulus show that the differential equation solving speed of PaddlePaddle is on average 115% faster than the version of PyTorch with compiler optimization enabled. Additionally, PaddlePaddle has also established the PaddleScience toolkit for solving general mathematical problems and the PaddleHelix toolkit focused on biological computing. Furthermore, PaddlePaddle Framework 3.0 natively supports complex technology systems, which is of great significance for data feature analysis in scenarios such as weather forecasting and aerodynamic analysis of automobiles and aircraft.
Neural Network Compiler: This feature significantly reduces the cost of performance optimization. The compiler of PaddlePaddle adopts an integrated design with the framework, capable of supporting efficient training and variable-shape inference for various models such as generative models and scientific computing models, providing a good balance between computational flexibility and high performance. After using the CINN compiler, over 60% of the models have shown significant performance improvements, with an average increase of 27.4%. The CINN neural network compiler has comprehensive improvements in completeness and performance. In this version, we have comprehensively optimized the front-end and back-end aspects of the compiler: including adding an automatic Re-Compute mechanism for reverse computation graphs, front-end Pass performance optimization, upgrading the symbol derivation mechanism, optimizing operator fusion strategies, enhancing the back-end Schedule strategy and subscript expression simplification capabilities, etc. At the same time, we have investigated and fixed a large number of correctness and performance issues, systematically improving the general optimization capabilities of the compiler.
Heterogeneous Multi-Chips Adaptation: One of the key features of PaddlePaddle is its ability to adapt to heterogeneous multi-core environments and fully leverage hardware potential. In terms of access mechanism, PaddlePaddle provides simple and efficient abstract interfaces and a basic operator system, reducing the cost of adaptation. In terms of operation mechanism, it optimizes scheduling and storage sharing mechanisms, enhancing scheduling efficiency. From the perspective of operator kernels, PaddlePaddle offers a compiler-based automatic fusion and tuning solution to improve end-to-end performance. Additionally, PaddlePaddle has established research and development infrastructure for new hardware vendors, including code integration, continuous integration, and model regression testing. These mechanisms ensure that new hardware is incorporated into PaddlePaddle’s normal release system, allowing users to install and try it directly without the need for compilation. PaddlePaddle’s comprehensive functionality and low-cost access mechanism have attracted hardware vendors to contribute a total of 4001 pull requests (PRs), encompassing 26584 commits.
In addition to the above core features, Highly Extensible Intermediate Representation To enhance the scalability of the PaddlePaddle framework, we have developed the Highly Extensible Intermediate Representation (PIR), which systematically abstracts the underlying core concepts and provides flexible and efficient components. As an infrastructure, PIR supports multiple technologies such as dynamic-to-static, automatic differentiation, automatic parallelization, combinational operators, and graph optimization, and is widely used in distributed training, model compression, and inference deployment scenarios. Through the Declarative Rewrite Rule (DRR) mechanism provided by PIR, the development cost of Pass can be reduced by 60%. At the same time, PIR has been verified in all scenarios and is enabled by default, supporting one-click dynamic-to-static conversion, ensuring excellent performance and good scalability of the framework. Continuous improvements have been made to the existing functions of the framework version 2.0, and new features have brought significant improvements in user experience, performance, ease of secondary development, and hardware adaptability. This version continues to enrich and enhance the API functions to meet more scenarios at the user experience level. For large model scenarios, optimization and improvement have been made to the distributed parallel strategy optimization and inference function enhancement. Thorough usability improvements have been made in terms of compilation and installation, with a new synchronous upgrade of the installation method and version of dependent packages. Comprehensive reinforcement of system security has been carried out, and comprehensive error correction checks have been conducted on product documentation. At the same time, a large amount of cleanup has been done on some obsolete code to ensure the simplicity of the architecture.
Incompatible upgrade¶
PaddlePaddle API supports implicit type promotion. In the most commonly used calculations such as addition, subtraction, multiplication, and division, if the data types of the two inputs are different, it is necessary to determine the data type of the output. Historically, PaddlePaddle has only partially supported implicit type promotion, and the actual rules are unclear. Objectively, this manifests as inconsistencies between dynamic and static graphs, inconsistencies between API and operator overloading, and non-compliance with commutativity. Especially when large models widely use mixed calculations with bf16/fp16 and fp32, unexpected issues are prone to occur and are difficult to locate. Starting from the 3.0 beta version, PaddlePaddle has clarified the implicit data type promotion rules, which defines in detail the types of calculation results for Tensor and Tensor, as well as Tensor and a scalar (Scalar), ensuring that calculations comply with commutativity, operator overloading is consistent with binary API results, and dynamic graphs and static graphs produce consistent results. This is more in line with user understanding and industry habits. #60638, #63842, #60011
Discontinued Features¶
Support for 0-dimensional Tensor has been stable for two versions. In this version, the switch FLAGS_set_to_1d, which converts 0-dimensional Tensor to a 1-dimensional Tensor containing only one element in some cases, has been removed. This switch is to accommodate incorrect writing in some suites where 0-dimensional Tensor is represented by a 1-dimensional Tensor containing only one element. That is, PaddlePaddle now fully distinguishes between the semantics of 0-dimensional Tensor and 1-dimensional Tensor containing only one element, and the two are not equivalent. #61227
1. User experience upgrade¶
New Features¶
Added PaddlePaddle APIs to expand PaddlePaddle’s functionality. These include
paddle.nn.FeatureAlphaDropout
,paddle.cartesian_prod
,paddle.distributed.to_distributed
,paddle.pi
, etc. #64881, #65605, #70757, #71030, #69946, #70021, #69613, #68123, #70032Introduce new Tensor class methods and attributes, along with corresponding unit tests, to enhance the usability of Tensor. #68334, #68681, #69132, #69270, #69256, #69197, #69231, #69222, #69257, #69301, #69361, #69348, #69464, #69542, #69667, #69563, #69796, #69477, #69779, #69724, #69835, #69781, #69982, #69913, #70026, #70013, #69539, #69736, #69841, #70277, #69580, #69599, #69693, #69848, #69751, #70556, #70591, #69673, #70647, #68192, #68511, #68833, #69406, #69480, #69463, #69632, #69473, #68694, #69534, #69820, #70121
API Function Enhancement¶
Enhanced the functionality of 43 APIs, making existing APIs easier to use and facilitating code conversion. This includes but is not limited to adding API parameters, expanding the data types supported by APIs, and correcting existing unreasonable designs. #65105, #65103, #62975, #64436, #63346, #68079, #67878, #68432, #68677, #69012, #69385, #65032, #64977, #67071, #67298, #66687, #65946, #66170, #66929, #67994, #67947, #68033, #68046, #68294, #68214, #68281, #68390, #68772, #69451, #69252, #69529, #69750, #69827, #69099, #68594, #70090, #70228, #70166, #70389, #70790, #71029, #71283, #71342
PaddlePaddle Python API fully supports type hints. All parameters and return values of Python API have been annotated with type hints for ease of development and use. #65209, #65201, #65190, #65082, #65226, #65076, #65238, #65236, #65247, #65249, #65244, #65272, #65191, #65290, #65255, #65292, #65300, #65301, #65332, #65323, #65326, #65273, #65317, #65354, #65283, #65372, #65337, #65085, #65382, #65381, #65378, #65274, #65380, #65386, #65351, #65284, #65366, #65308, #65375, #65376, #65464, #65197, #65455, #65457, #65487, #65486, #65547, #65504, #65460, #65183, #65454, #65559, #65560, #65570, #65569, #65566, #65620, #65568, #65567, #65660, #65645, #65600, #65532, #65765, #65767, #65770, #65768, #65771, #65772, #65774, #65769, #65773, #65766, #65776, #65775, #65755, #65779, #65777, #65823, #65807, #65821, #65819, #65810, #65808, #65824, #65553, #65818, #65812, #65803, #65865, #65870, #65866, #65844, #65845, #65853, #65874, #65871, #65809, #65867, #65822, #65872, #65873, #65869, #65868, #65849, #65875, #65876, #65843, #65727, #65587, #66006, #66005, #65785, #65784, #65811, #65919, #65838, #65852, #65847, #66014, #65805, #66009, #66012, #65633, #66011, #66010, #66013, #66015, #66016, #66030, #66028, #66029, #66054, #66040, #65993, #66058, #66280, #66037, #66057, #66077, #66051, #65912, #66090, #66189, #66127, #66277, #66119, #66270, #66305, #66306, #66279, #66276, #66295, #66301, #66473, #66384, #66505, #66328, #66394, #66392, #66432, #66575, #66572, #66656, #66475, #66654, #66616, #66694, #66686, #66766, #66749, #66760, #66803, #66770, #66693, #66771, #66792, #66862, #66867, #66684, #66966, #66793, #66987, #66985, #66989, #66639, #66994, #66986, #66993, #67002, #66996, #67001, #66864, #67031, #67089, #67143, #67179, #67178, #67284, #67104, #67079, #67132, #67147, #67204, #67112, #67233, #67366, #67067, #67391, #67428, #67197, #67047, #66890, #67159, #67439, #67555, #67448, #67556, #67469, #67558, #67405, #67644, #67624, #67679, #67677, #67785, #67767, #65319, #65277, #67673, #65557, #67527, #66965, #65905, #65657, #66357, #68163
Optimized the error messages of many PaddlePaddle APIs, making the errors more understandable. #67148, #67154, #67546, #67335, #67255, #67099, #67074, #67073, #66957, #67063, #67575, #67608, #67634, #67325, #67429, #67401, #66881, #68492, #67695, #69833, #70398
Bug Fixes¶
Fixed a bug in
paddle.nn.functional.max_unpool1d
when the inputoutput_size
is a tuple. #65910Fixed the issue where
paddle.base.core.eager.Tensor
did not support paddle::DataType. #66765Fixed the issue where an error occurred during BF16 training when the pir switch was turned on. #66833
Fixed the issue of bias in the linear layer during parallel processing in the pipeline. #67212
Fixed the error issue when using loss for judgment in parallel pipeline. #66980
Fixed the error issue when using
paddle.Tensor.item
in parallel pipeline. #67441Fixed bugs in
paddle.einsum
in specific scenarios. #67588Fixed the error issue of
paddle.nn.SyncBatchNorm
during gradient computation. #67559Fixed the issue mentioned in issue #69992. #70017
Fixed the issue where
paddle.arange
produced incorrect results when dealing with large integers. #70188Fixed the issue where
paddle.max
andpaddle.min
propagated incorrectly when there were nan values in the input. #70049Fixed issues with APIs such as
paddle.linalg.svd
andpaddle.linalg.any
when handling 0-size Tensor. #70235, #70489, #70047, #70103, #70127, #70098, #70077, #70130, #70254, #70125, #70342, #70369, #71094, #71089, #71185, #70537, #70481Fixed some issues with type hint annotations and documentation issues. #65429, #65496, #65461, #65542, #65575, #65545, #65609, #65644, #65700, #65697, #65719, #65639, #65742, #65891, #65877, #65895, #66007, #66679, #66680, #66676, #66677, #66884, #67288, #67302, #66978, #67295, #67520, #67421, #67529, #67536, #67618, #67661, #67698, #67800, #67933, #67893, #68108, #67927, #68322, #68341, #68415, #68372, #68559, #68598, #68708, #68780, #68992, #68989, #68895, #69014, #69139, #68996, #69090, #68922, #69333, #69141, #69609, #69652, #69715, #69716, #69934, #70253, #70297, #70252, #70468, #70102, #70546, #70616, #70582, #70635, #70499, #70755, #70935, #71133, #71172, #71238, #71230, #71394
Document optimization¶
Enhanced several API documents to make them easier to read and understand. #67772, #69895, #65904, #66480, #66974, #67100, #66991, #67287, #67841, #68206, #68305, #68462, #67061, #66503, #68856, #68866, #68768, #69215, #69449, #69396, #69498, #69413, #69404, #69729, #69749, #69266, #69989, #70209, #70128, #70143, #69874, #70242, #70145, #70813, #71046
2. Basic execution architecture¶
PIR is fully implemented and enabled by default, supporting one-click transition from motion to stillness, ensuring excellent performance and good scalability of the framework.
Bug Fixes¶
Fixed accuracy issues caused by parameter configuration. #65814
Fixed bugs related to save/load. #65268, #65359, #65373, #65314, #65446, #65476, #66891, #66931, #65978, #67654, #67906, #68723, #71452, #71457, #67819, #68120, #68300, #68315, #68743, #68744, #69585, #71165, #71400
Skip/fix failed unit tests in PIR mode, including scenarios such as Windows and XPU. #65690, #65759, #65730, #65760, #65833, #65834, #65856, #65886, #65899, #65932, #65998, #65953, #65997, #66061, #66111, #66137, #66073, #66203, #66227, #65744, #66234, #67487, #67561, #67584, #67742, #69832, #65885, #66709, #66734, #66959, #67399, #67389, #67230, #67403, #67619, #67662, #67902, #67382, #67430, #67517, #67533, #67573, #67468, #67640, #67667, #67716, #68386, #67234, #67266, #67362, #67631, #68081
Fixed bugs related to dynamic graphs. #65619, #69163, #68862, #68164, #69867
Fixed kernel operation-related bugs, including issues with operation positions and null pointers. #66334, #67931, #70353
Fixed the bug related to the transition from dynamic to static. #67617, #67936, #68938, #68734, #69010, #69408, #69461, #69699, #69774, #69803, #69853, #70510, #70830, #70904, #70913, #71040, #71048, #71106, #71201, #71216, #71223, #71296, #71385, #71505, #66934, #71096, #71144, #71430, #71437, #71473, #71412, #65648, #67853, #66543, #68229, #70846, #67532
Fixed other bugs, including issues related to backpropagation gradient calculation, memory copying, and executor errors. #65493, #65678, #65673, #65794, #66358, #66875, #67339, #67465, #67754, #67835, #67892, #67967, #67952, #68036, #68063, #68128, #68151, #68140, #68167, #68200, #68325, #68376, #68539, #68530, #68637, #68639, #68688, #68751, #68806, #68810, #68779, #68811, #68844, #68790, #68870, #68960, #68999, #69036, #69188, #69234, #69375, #69399, #69538, #69603, #69633, #69765, #69768, #69821, #70091, #70123, #70147, #70201, #70198, #69815, #70420, #70377, #70552, #70545, #70595, #70836, #70771, #70922, #70969, #70926, #71117, #71151, #71194, #71234, #71339, #71445, #66350, #66533, #66622, #67721, #67700, #69207, #69615, #69785, #67805
Function optimization¶
Support save/load. #65296, #65671, #66231, #66185, #66722, #66863, #67057, #68101, #68628, #66359, #68481
Optimize the compilation process of custom operators. #67615, #67659
Support for composite operators. #69121, #69144, #70204, #71098, #71335
Support for custom devices. #70909, #71294, #71362, #71010, #71036, #70637, #71085
Execution support for other scenarios. #65050, #65664, #65741, #65786, #65499, #66441, #67668, #68199, #69088, #70199, #70308, #70709, #70937, #71066, #71079, #71121, #71136, #71205
New Features¶
SOT adapts to Python 3.13 bytecode, supporting static graph conversion (SOT mode) under Python 3.13. #68071, #69126, #69131, #69196, #69232, #69253, #69267, #69412, #69431, #69432, #69436, #69557, #69567, #69700, #69707, #69735, #69738, #69744, #69753, #69887, #69920, #69950, #70319, #70927
Adapted PIR forward execution. #65335
Support save/load. #67910
Adapted to pylayer. #70335
Optimize the logic under PIR. #67961
Support for other scenarios. #68344, #70071, #70291, #70752, #70812, #71033
Security Issues¶
Introduced approval rules for IR (Intermediate Representation) save/load operations to enhance security and governance during model serialization. #65737
Developer¶
Fix issues in dynamic-to-static conversion. Improve overall graph conversion success rate and optimize inference export experience. #65291, #66153, #66379, #66557, #67021, #67482, #67495, #67981, #68030, #68078, #68328, #68442, #68679, #68850, #68892, #68991, #69043, #69097, #69210, #69295, #69428, #69518, #69642, #69940, #70118, #70169, #70218, #70287, #70412, #71099, #71156, #71193, #71336, #71463, #71476, #71503
Inplace strategy upgrade. #65491
Control flow related development. #67251
Add environment variables. #68467
Support sparse operator operations. #67111
Other execution support development, including logic optimization, version adaptation, and adding unit tests. #69241, #69806, #70768, #66829, #67110, #67442, #67041, #67452, #69061, #69307, #68669, #69829, #70003, #70443, #70364, #71495
Performance optimization¶
Optimize dynamic shape handling in static graph conversion, reducing graph construction iterations and compilation time. #65235, #65477, #65517, #65882, #66346, #66746, #67786, #67876, #68113, #68302, #68337, #68616, #69354, #70009, #70877
End-to-end performance optimization for SOT, minimizing subgraph fragmentation, reducing scheduling overhead, and improving static training efficiency. #67591, #67746, #67823, #67890, #67921, #68031, #68153, #68729, #69249, #69263, #69300, #69313, #69325, #69353, #69411, #69506, #69672, #69746, #69834, #69836, #69852, #69975, #70151, #70293, #70405, #70851, #71039, #71254, #71295, #71298, #71346, #71377, #71407
Optimize the performance of dynamic shape scenarios. #68491, #68629
Accelerate the execution speed of PIR executor. #69513
Optimize PIR saving and loading performance. #69683
Optimize for device. #69676
Clean up redundant input and output information. #66278
Discontinued Features¶
Remove outdated test cases. #66269, #66690, #67505, #67464, #68400, #68178, #68194
Clean up obsolete flags and configurations. #69124, #69176, #69274, #68384
Cleaned up PIR redundancy strategy and single test. #66366, #70534, #68444, #70599, #68801, #66303, #67854, #70795
Discard the related unit tests and APIs for dynamic-to-static conversion. #66421, #68251, #68252, #68253, #68254, #68409, #70569, #71279
Discard the related unit tests for automatic parallelism. #67857, #67862, #67995, #68012, #68013, #67798
3. Compiler architecture¶
The CINN compiler has seen comprehensive improvements in completeness and performance. In this version, we have conducted thorough optimizations across all aspects of the compiler’s front-end and back-end: including the addition of an automatic Re-Compute mechanism for reverse computation graphs, front-end Pass performance optimization, symbol derivation mechanism upgrades, operator fusion strategy optimization, back-end Schedule strategy, and enhanced subscript expression simplification capabilities. At the same time, we have investigated and fixed a large number of correctness and performance issues, systematically enhancing the compiler’s general optimization capabilities. When the CINN compiler is enabled for the PaddlePaddle PaddleX series models, over 60% of the models show significant performance improvements compared to dynamic graph mode.
New Features¶
New hardware backend support: Added support for two new backends, HIP and SYCL. (#65146, #65329, #69554, #71204, #65438, #66476, #66620, #67813)
Added support for manual setting of numerical ranges, equality constraints, and other information for symbol dimensions in reasoning scenarios. (#67628, #67384)
Function optimization¶
Performance optimization¶
New backend optimization strategies such as GridReduce, Loop merging, Transpose tuning, and automatic vectorization have been added, significantly enhancing Kernel performance across various dimensional spaces and under different hardware configurations in all scenarios. (#67236, #68897, #69409, #65336, #66419, #68338, #68364, #71087, #68019, #68122, #65187, #66742, #67083, #68667, #68750, #69376, #69350, #69740, #68918, #70092, #69607, #69794, #70258, #70547, #70581, #70649, #69732, #70786, #70942, #71014, #71263, #71249, #71340, #71301, [#71380](https://github.com
Optimize operator fusion strategies, upgrading various strategies including horizontal fusion, multi-downstream fusion, Reshape alignment fusion, etc., to further enhance the fusion capabilities of operators and improve end-to-end optimization performance. (#66034, #67829, #68171, #69478, #69691, #70665, #71103, #70873)
The simplification capability of backend subscript expressions has been upgraded, supporting the simplification of complex expressions with dynamic and static dimensions, significantly reducing the subscript computation overhead in the generated backend Kernel. (#68011, #68617, #68624, #68685, #68220, #68720, #68753, #68986, #68987, #69071, #69164, #69282, #69522, #69857, #70208, #70355, #70427, #70450, #68737, #70500, #70953, #70933, #71026, #70456, #70257, #70461, #70142, #71018, #71278)
A new automatic Re-Compute mechanism for reverse computation graphs has been added, which can effectively reduce model training memory usage and improve performance. (#69342, #70255, #68241, #69954, #70832)
Optimize the backend Host and Device code compilation process to reduce compilation time and improve the processing performance of branches in the Broadcast scenario. (#65669, #65916, #66109, #65611, #65990, #66088, #66207, #66537, #66768, #70685, #71410, #66062)
Improved and upgraded the mechanisms for symbol derivation, simplification, and caching in dynamic dimensions, added symbol derivation interface implementations for all conventional operators (580+), and provided more constraint information for Kernel compilation.(#65343、#66582、#65500、#65591、#66637、#68208、#68056、#68015、#68096、#68236、#68973、#68967、#69133、#68550、#68882、#69005、#69911、#70376、#71153、#66644、#66650、#66642、#66729、#66838、#66762、#66580、#66612、#66625、#66643、#66837、#66946、#67018、#67049、#66956、#67008、#66930、#66877、#66896、#67120、#67117、#67098、#67136、#67294、#67327、#66827、#67201、#66892、#67377、#66619、#67037、#67412、#67394、#67374、#67418、#67348、#67337、#67390、#67407、#67491、#67422、#67461、#67458、#67486、#67490、#67462、#67364、#67435、#67665、#67426、#67507、#67730、#67776、#67806、#67803、#67788、#67705、#67814、#67858、#67751、#67875、#67663、#67434、#67818、#68180、#68547、#68548、#68670、#68964、#68929、#68907、#68917、#68984、#68644、#69167、#68975、#68947、#68978、#68980、#68979、#69329、#69055、#69331、#69414、#69335、#69017、#69344、#69069、#69698、#69919、#69964、#70337、#70282、#70741、#70818、#71031、#70541、#66609、#66889、#66633、#66735、#66935、#66627、#66730、#67210、#67115、#67275、#67472、#67577、#67328、#67566、#67451、#68098、#68225、#68177、#68102、#67951、#67957、#68235、#68447、#68446、#68183、#68318、#68385、#67635、#65623、#65956、#66063、#65992、#65880、#66343、#65889、#66606、#66618、#66737、#66607、#66579、#66732、#66849、#66400、#66952、#66570、#66967、#66595、#67121、#67206、#67444、#67494、#67499、#67267、#67567、#67455、#67161、#67581、#67539、#67625、#67690、#67454、#67731、#67734、#67735、#67607、#67413、#67387、#67882、#67864、#67503、#67861、#67888、#67884、#67826、#68044、#67851、#68276、#69888、#70093、#70436、#70914、#71222)
Optimized some front-end passes to enhance the robustness of the front-end processing flow and improve the performance of computationally intensive subgraphs. (#65142, #67466, #69228, #70994, #71226, #71297, #71443)
Designed new backend IR basic components and related Pass interfaces to provide a more concise and efficient way of developing optimization strategies. Through automatic pruning strategies, it can effectively reduce the traversal overhead of backend IR. (#70485, #70765, #71042, #70952, #69454, #70361, #70334, #70406, #70191, #70462, #70548, #70592, #70437, #70619, #70543, #69611, #70739, #70533, #70696, #70498, #70829, #71111, #70883)
Bug fixes¶
Fix some bugs in the derivation and implementation logic of operator symbols. (#65185, #65231, #65266, #65951, #67142, #67286, #65958, #65955, #66470, #66764, #66036, #66662, #66741, #66745, #66807, #66791, #66859, #66880, #66962)
Fixed bugs in the lowering of some special operators to the compiler. (#68698, #68699, #68691, #68948, #70144, #70895)
Fixed the issue of errors reported in some scenarios when integrating operators. (#67038, #67400, #67655, #67723, #68029, #68042, #68888, #69250, #69937, #70924)
Fix the correctness issue of the backend when handling extreme values, and improve the robustness of the compiler. (#68327)
Fixed implementation logic bugs in the backend Schedule and post-processing tuning process, resolving errors and performance issues in some cases. (#68605, #68937, #68587, #69060, #69608, #71471, #71068)
Resolved the issue of randomness in the operator fusion process. (#69547, #70931)
4. Automatic parallel architecture¶
In the official 3.0 version, we have conducted in-depth verification and refinement of the automatic parallel architecture to better support the pre-training + fine-tuning process for common large model scenarios such as pure text dense models, pure text sparse models (MoE), and multi-modal understanding models. Specifically, we have added segmentation derivation rules for over 20 operators tailored for these scenarios, and support the conversion of automatic parallel training parameters into manual parallel parameters for downstream inference, making automatic parallelism fully usable and helping users reduce the development cost of large model parallel programs. Additionally, to further simplify the distributed development process for users, we have introduced a new paddle.distributed.parallel
interface. Based on the encapsulation of distributed tensor notation syntax, it supports users in non-intrusively configuring common parallel strategies such as data parallelism, model parallelism, and pipeline parallelism outside of model networking. Furthermore, the static graph automatic parallel architecture has undergone a comprehensive upgrade based on PIR, with the underlying basic components, core modules, parallel strategies, and performance optimization strategies all implemented uniformly based on the extended PIR DistDialect
. This has further enhanced the dynamic and static consistency of automatic parallelism, achieving performance levels on the Llama series models that are on par with or even surpass manual parallelism.
New Features¶
Added the
paddle.distributed.parallel
interface to support configuring common parallel strategies outside of model networking, simplifying the distributed development process. #69004, #69033, #69077, #69136, #69169, #69212, #69217, #69283, #69288, #69326, #69365, #69384, #69426, #69443, #69462, #69492, #69628, #69677, #69697, #69776, #69896, #70138, #70182, #70539, #71116, #71210For pure text sparse scenarios, it supports MoE expert parallelism, implements an expert parallelism to mesh partitioning conversion mechanism, and supports automatic invocation of all2all communication. #66462, #66750, #68004, #68053, #68187, #68477, #69098, #69262, #69296, #70715, #71292, #71320
To meet the needs of users in extreme manual optimization scenarios for managing segmentation status and communication operations, and to address the issue of being unable to use tensor segmentation syntax in some non-SPMD scenarios, we have added the
LocalLayer
interface to support a hybrid network of automatic and manual parallelism. #70519, #70525, #70600, #71232, #71264, #71373To enable users to run automatic parallel programs using domestic hardware, we have completed the adaptation for Kunlun chips, and support for other chips is also underway. #70997, #71126, #71229, #71289, #71425, #71500
For situations where the data dimension cannot be divided evenly by the device dimension, non-balanced splitting derivation and splitting transformation are supported. #66103, #67756, #69265, #70072
The shard_dataloader function has been upgraded to support setting the gradient accumulation step count through
batch_sampler
, and also supports scenarios with multiple model inputs. #65325, #70659Upgrades have been made to the parameter saving and loading functions, supporting asynchronous storage of parameters, mutual loading of
master_weight
between dynamic and static graphs, as well as parameter version control and offload functions. #66858, #67427, #70105, #70639To meet users’ needs for converting dynamic networking involving
PyLayer
to static, support has been added forPyLayer
in static graph mode, allowing distributed tensors to be run withinPyLayer
. #67326, #68190, #69089, #70831To address the issue of incorrect dynamic-to-static conversion caused by inconsistency between the data stream input format and the
input_spec
actually required by the model for dynamic-to-static conversion, the dynamic-to-static conversion interface supports a user-definedinput_spec
feature, allowing users to input the requiredinput_spec
on their own. #69183For hybrid parallel scenarios, the gradient clipping strategy has been adapted and supported. #65259, #65928, #69287, #69760, #71421
For scenarios where the number of model layers is not divisible by the number of devices, a non-balanced pipeline parallel strategy is supported, allowing users to split different numbers of network layers at different pipeline stages. #69728, #70164, #70230
Added
set_mesh
andget_mesh
interfaces to enable users to easily set and retrieve the global mesh. #69999Added automatic and manual parallelism accuracy alignment switches to facilitate the conversion of existing manual parallelism models to automatic parallelism and verify the accuracy of the results. #67681
Functional improvements¶
Improve and optimize the derivation rules for operator slicing
Added derivation rules for operators
add_n
,split
, andsoftmax_grad
. #65606, #69439Added operator splitting derivation rules for
assign
andembedding_grad
. #67457Added
clip
operator slicing derivation rule. #70632Added derivation rules for the
dist_stack
andgather_nd
operators. #65426Added the derivation rule for
dropout
operator segmentation. #70216Added slicing derivation rule for
fused_dropout_add
operator. #67722Added
fast_ln
custom operator segmentation derivation rule. #68148Added
greater_equal
andless_equal
operator slicing derivation rules. #68868Added
greater_than
andless_than
operator slicing derivation rules. #68133Added
if
operator segmentation derivation rule. #69357Added slicing derivation rules for operators
logical_and
,logical_not
,logical_or
, andlogical_xor
. #67840Added
logsumexp
operator slicing derivation rule. #67840Added
non_zero
operator slicing derivation rule. #67996Added
pad
operator slicing derivation rule. #68304Added the derivation rule for operator segmentation of
p_norm
. #68317Added the derivation rule for the
scatter_nd
operator’s slicing. #67980Added
sigmoid
operator segmentation derivation rule. #71092
Static graph automatic parallel architecture based on PIR upgrade
Upgrades to Automatic Mixed Precision (AMP) training. #65089, #65892, #66418, #66674, #68545
Upgrades to the parameter slicing parallel strategy. #63542, #67748, #68288, #68314, #69059, #71167
Upgrading the pipeline parallelism strategy. #66810, #67174, #67522, #68141, #68742, #68962, #69052, #69201, #69244, #69578, #69584, #69654, #69799, #69894, #70360, #70615
Gradient accumulation strategy upgrade. #66641, #67254, #67907, #68391, #68460, #68472, #68664, #68727, #69171, #69805
Operator fusion strategy upgrade. #68087, #68207, #68383, #68623, #68650, #68736, #69103, #70889
The
tensor_fusion
optimization strategy has been upgraded. #66130, #68475, #69243, #69560, #69823, #70195, #70309, #70363, #70869Tensor parallel optimization strategy upgrade. #68182, #68389
Upgrade of custom operator segmentation derivation mechanism. #67614
Upgrades to the parameter saving and loading mechanism. #66416, #67045, #67369, #68203
Optimize computation graph compilation time. #68796
Bug fixes¶
Fixed bugs in the segmentation derivation mechanism and the segmentation derivation rules for several operators. #65702, #65835, #66098, #66955, #67052, #67059, #67101, #67283, #67729, #67996, #68413, #68455, #68533, #68976, #68977, #69027, #69203, #69223, #69862, #69991, #70100, #70624, #71024, #71152, #71214, #71253, #71388
Fixed several bugs in the segmentation conversion mechanism. #65060, #65820, #67630, #67809, #68115, #68468, #70023
Fixed the bug of incorrect derivation of
shard_degree
in parameter slice parallelism. #68781, #69214Fixed issues in scenarios such as inconsistent results between dynamic and static graphs in
shard_dataloader
, slicing dict-type data, and customsampler
scenarios. #65262, #66096, #66882, #69620Fixed the bug where the
recompute
setting withuse_reentrant=false
was incompatible with parameter slicing. #65188Fixed bugs in the parameter loading and saving functions. #66266, #69764
Fixed bugs in operators such as
Conv2D
,fill_constant
,flash_attn_grad
,reduce_scatter
,if
,tuple_push
, andtuple_pop
. #67587, #68008, #68586, #68589, #69519, #70207Fixed bugs in communication operators such as
reduce_scatter
,p_send
, andp_recv
. #67386, #71433Fixed the bug where automatic allocation of GPU memory occurred when converting uninitialized distributed tensors to NumPy arrays on some cards. #66361
Fixed the bug that triggered data copying when calling
to_tensor
on non-segmented tensors. #67169Fixed the bug related to the segmentation of the
scaler
parameter. #68289Fixed the accuracy issue of
enable_delay_scale_loss
. #68525Fixed the hang issue caused by different creation orders of communication groups. #68847
Fixed the bug of incorrect
op_role
setting in static graph scenarios. #67850, #67986, #68156Fixed the bug where the output variable of the random number operator could not be sliced in static graphs. #67589, #67750, #68067
Fixed the bug where the graph cache mechanism failed in static graphs. #68488
Fixed the bug of index out-of-bounds in
paddle.distributed.to_distributed
. #70174Fixed a bug in the pipeline parallel visualization tool. #71386
5. Operator mechanism¶
Operator-related PRs, including the splitting of combined operators, the adaptation of new hardware-compatible operator kernels, sparse operator operations, and the retirement of old IR operators, have laid the foundation for PIR-compatible compilers and achieving performance advantages across multiple hardware platforms. The standardization of the operator system has optimized the code structure, reduced technical debt, and improved maintainability.
New Features¶
Support the splitting of combinatory operators. #65148, #65007, #65482, #65006, #65692, #65961, #65968, #65967, #66510, #66795, #66835, #67151, #67342, #67481, #67502, #67606, #67757, #67775, #67891, #67790, #67965, #67968, #68168, #68125, #68228, #68295, #68353, #68357, #68827, #68834, #69239, #68817, #69108, #69373, #69372, #68829, #69684, #68818, #68835, #69838, #69998, #69675, #70367, #70080, #71352, #66450, #67593, #67988, #68346, #68399, #68319, #68485, #68961, #68575
Support for XPU-related operator computations. #65684, #65976, #68497
PIR supports sparse operators. #62663, #67885, #67976, #68261, #68326
Support manual Recompute. #65879
Implement the kernel and register the operator. #63130
Added dynamic graph second-order inverse composition for acos. #70409
Support initialization and computation of 0-size tensors. #70504
Bug Fixes¶
Fixed bugs related to composite operators. #70250, #67170, #71218, #69095, #70189
Fixed save/load-related bugs. #69153
Fixing issues during the invocation and execution of other operators, including type matching, type inference, parameter type support, etc,. #65360, #65024, #66308, #67085, #67285, #67076, #67547, #68007, #68527, #68549, #68543, #68604, #68741, #68859, #69025, #69065, #69405, #69688, #69912, #70177, #70517, #70596, #70788, #70870, #71332, #71454, #71442, #71499, #67459, #68470, #70206
Others¶
Optimize code style. #68536
Fix spelling errors. #67456, #66673, #68702, #68735, #68718, #70700, #70682, #70670, #70241, #69626, #70051, #67764, #68872, #70055, #67954, #67404, #69273, #66981, #68145, #69148, #69145, #69168, #68940, #70344
Modify the interface documentation. #69378
Replaced operator and parameter naming under the fluid operator system. #69345, #69382, #69484, #69444
Discarded¶
xshape output exit. #66769, #67009, #67152, #67172, #67355, #67373, #66089
Remove the obsolete operators, their kernels, related unit tests, and related calling codes under the fluid system. #67370, #67088, #67324, #67666, #68058, #68311, #68358, #68312, #68355, #67528, #68316, #68356, #68397, #68441, #68417, #68567, #68583, #68649, #68331, #68730, #69754, #69445, #69921, #70268, #69446, #69544, #70272, #69745, #70300, #70388, #70421, #70302, #70445, #69275, #69081, #70588, #67778, #67953, #68093, #68092, #67684, #69665, #67915, #67917, #68403, #68404, #68969, #68953, #68954, #68942, #68950, #69381, #69380, #69448, #69680, #69775, #69812, #69840, #69828, #69742, #69923, #69922, #69904, #70002, #70054, #70052, #70053, #70713, #70718, #70718, #70717
Remove the deprecated API of combination operators. #69873, #69309
Improvement¶
Supported more data types. #69143
Update xpu interface. #69800
Improved operator printing functionality. #69916
Upgraded the normalize operation to support more scenarios. #70152
Extended group_norm to handle cases where the rank is greater than 5. #68774
Improved the usage of backward_blacklist. #69356
6. Framework performance optimization¶
PRs related to performance optimization, encompassing optimizing operator performance, enhancing kernel performance, optimizing memory usage, and refining namespaces, all aim to provide users with a superior development experience.
New Features¶
Functional improvements¶
Bug Fixes¶
Fixed bugs related to PIR, CINN, SOT, OneDNN, etc. #68951, #69553, #69682, #67741, #69346, #69401, #68903
Fixed bugs related to composite operators. #69479, #69487, #67176
Fixed the issue with the FP8 data type on the CPU. #65539
Remove unnecessary overhead for creating events in computational flow. #67315
Fixed performance issues. #68378
Fixed issues related to types. #69720
Fixed other issues. #70019, #70008, #70645, #71209, #68152, #69907, #71207
Performance optimization¶
Optimizations related to the CINN compiler. #69455, #70284, #67576, #68946, #68615
Memory-related optimizations. #68660, #69930, #68174, #68660, #70359
Kernel computation-related optimizations. #65507, #68541, #71479, #71403
XPU-related optimizations. #67051
Other optimizations include pass optimization of the inference process, dynamic shape optimization in automatic parallelism, and FlashAttention computation optimization. #68394, #68696, #68759, #68791, #69390, #69961, #69939, #70455, #70663, #71290
Others¶
7. Inferential deployment¶
Focusing on two core directions: the construction of the new generation of Proven Intermediate Representation (PIR) ecosystem and large model inference optimization, the main breakthroughs include:
Deep fusion of PIR-TensorRT
Complete the refactoring and code optimization of the core execution mechanism, and develop over 50 operator converters
Added low-precision support (FP16/INT8) and Generic Plugin execution capability
Build a complete unit testing system that supports the entire process of model loading/saving
Leap in reasoning performance of large models
Added full-process support for the Mixture of Experts (MoE) system, covering Hopper architecture optimization
Supports processing of 128K ultra-long sequences, enhancing long text reasoning capabilities
Implement cutting-edge quantization schemes such as FP8/W8A8 to reduce memory usage
Comprehensive upgrade of infrastructure
OneDNN has been upgraded to version 3.6, significantly enhancing CPU inference performance
Model loading speed optimized by over 40%, supporting fast loading of PIR models
Improve distributed inference support and fix allreduce data type issues
New Features¶
Support Paddle-TensorRT based on PaddlePaddle’s new generation of intermediate representation (PIR)
Development of core basic execution mechanism functions and code optimization. #64995, #67054, #67660, #67755, #70762,
Development of operator Marker and Converter. #67753,#67956,#68084,#67974,#68395,#68216,#68529,#68608, #68663,#68757,#68614,#68783,#68775,#68839,#68686,#68840,#68941,#69015,#69038,#69117,#69208,#69315,#69261,#68878,#69705,#69706,#70170,#70267,#70429,#69330,#70507,#70535,#70667,#70816,#70826,#70955,#71028,#71013,#71157,#71231,#69199,#68956,#66658,#66811,#67519,#67877,#68090,#69086,#68787,#68778,#69318,#69995,#70325,#70817,#70879,#70875,#71041,#68876
Support for Generic Plugin execution function. #66634, #70251
Low-precision (FP16, INT8) function support. #69597, #71127,
Auxiliary functions such as the single test system and pass usage support have been improved #67525, #68034, #71281, #71235, #67568, #70139, #70529
Large model inference optimization
Added fused_moe function support (basic support/non-standard TopK/Hopper architecture) #66084, #67425, #67732
Support for mixed precision computation (GQA mixed precision/BF16 registration) #65078, #67769
Added inference optimization features (dynamic graph inference/support for 128K long sequences) #65962, #70088
Added implementation of quantization inference operator (FP8 W8A8 computation/weight-only int4 quantization) #65441, #64094
Feature-complete¶
The functional mechanism of Inference is well-established under PIR
The executor supports loading .json models #65223
Support controllable PIR mode switch-on/off #65596
Improved reasoning mechanism of large models
Optimized gemm algorithm search (cublaslt global search/offline caching) #65597, #66132
Enhance type system compatibility (PD_VISIT_FLOATING_AND_HALF_TYPES) #71022
Optimized attention mechanism (support for multiple blocks of MMHA/XPU) #67211, #68104
Performance optimization¶
Bug fixes¶
Fixed issues related to Predictor when saving/loading PIR models. #65180, #65019, #65714, #69619, #67570, #65595, #69200
Fixed execution issues of reasoning unit tests in scenarios such as PIR and multiple hardware configurations. #65763,#66481,#67105,#67248,#67470,#67638,#68135,#68191,#68211,#68160,#68185,#68127,#68887,#69191, #70961,#68020,#67923,#67963,#68482,#68546,#68593,#68793
Fixed issues related to Paddle TensorRT conversion and execution. #66932,#66655,#67274,#67504,#65780,#68170,#68647,#68776,#69573,#69598,#69510,#69864,#69885,#70161,#70116,#70791,#70801,#70824,#70939, #71143,#71154,#71163,#71183,#71233,#71287,#71319,#67720,#69671,#70168,#69957
Fixed issues related to Paddle Inference compilation and linking. #65846, #67081, #63184
Fixed quantization issues. #67839, #68049, #70099, #64878, #65717, #67552, #67715
Fixed OneDNN inference issues. #67836, #68021, #68132, #71426, #68057
Paddle Inference supports bug fixes for OpenVINO issues. #70212, #70288,
Fixed issues related to Pass. #65349,#65421,#65677,#66850,#67443,#67620,#68158,#68642,#68837,#68880,#68935,#69112,#69205,#69242,#69352,#69421,#69690,
Fixed issues related to fused_moe (testing/GEMM/WINT4/multi-architecture compatibility/Bias optional) #67353, #67396, #67717, #67794, #67783
Fixed issues in the block_attention series (GQA discrepancy/out-of-bounds risk/multi-head support) #67175, #69001, #70763
Fixed PIR-related issues (layout conversion/BF16 replacement errors) #66977, #67830
Fixed distributed-related issues (allreduce data type/parameter synchronization) #67449, #69157
Fixed kernel execution issues (forward-backward conflict/default stream argsort) #67218, #68374
Other key fixes (reducing the size of the C++ library/fixing RoPE calculation in NeoX format/fixing static graph execution) #66041, #66583, #67580
8. Hardware adaptation¶
Continuously improve and upgrade the functions of platforms such as Kunlun and Haiguang to enhance user experience
New Features¶
The addition of operations (ops) and improvement of functions on Kunlun Core XPU involve the following ops: flash attention/flash_attn_unpadded, multinomial, matmul, repeat_interleave, logsumexp, index_put_grad, mean_grad, pow, pow_grad, rsqrt, full, rms_norm, rms_norm_grad, put_along_axis, Cumsum, argmin, masked_select/grad, expand_v2/grad, all2all, expand, reduce_sum, reduce_max, reduce_min, moe, fused_linear_param_grad_add, adamw, clip/clip_grad, tan, acos, blha_get_max_len, gather/gather_grad, scatter/scatter_grad, round, index_select/sindex_select_grad, isfinite, isinf, quantize_linear, dequantize_linear, conv3d_transpose, logsumexp_grad, index_add_grad, eye, gather_element, tril, triu, set_value_grad, argmax, take_along_axis, etc #65413, #64846, #65656, #65963, #66143, #66482, #66585, #67077, #67173, #67551, #63989, #67919, #68052, #68176, #68408, #68454, #68478, #68473, #68453, #68770, #68933, #69042, #68713, #69368, #69723, #69767, #69898, #69970, #69771, #70176, #70428, #70573, #70576, #70633, #70114, #70627, #71038, #71132, #71228, #71274, #71364, #71375, #71431, #71451, #67585, #67637, #67914, #67641, #67913, #67955, #68411, #68560, #68423, #68894, #71053, #71047, #69056, #70843, #65653, #68023, #67780, #68622, #67215
Add support for rocsolver and warpctc on Haiguang DCU, and carry out the addition of OPs and improvement of functions. The involved ops include: flash_attention, hipblaslt, fastgelu, multiclass_nms3
Bug fixes¶
Bug fix for OP on Kunlun Core XPU #65020, #65251, #65418, #65387, #65525, #65613, #65533, #65705, #65915, #66238, #66485, #67349, #67372, #67276, #67460, #67496, #67530, #67828, #68010, #68157, #68172, #68388, #68213, #68501, #68504, #68585, #69229, #69374, #69424, #69440, #69614, #68542, #69990, #70351, #70479, #70431, #70638, #70856, #70974, #70973, #71027, #71062, #71115, #71110, #70858, #71147, #71212, #71361, #71423, #70859, #71492, #71493, #69826, #67341, #68906, #71171
Bug fix for OP on Haiguang DCU #69617, #65716, #66630, #65399
Performance optimization¶
Kunlun Core XPU upgrades the functions of basic components such as streams and optimizes the performance of certain operations. #65102, #69727, #69899, #69942, #70025, #70640
9. Environment update¶
We optimized the framework’s stability and cross-platform compatibility, fixed issues related to test coverage and compilation environment compatibility, and enhanced support for multiple platforms such as Windows, XPU, and DCU. Simultaneously, we streamlined the code structure, removed obsolete code and unnecessary dependent libraries to reduce maintenance costs, upgraded key dependencies such as CUDA, further optimized the CI/CD process, improved build speed, and enhanced overall system stability.
Bug Fixes¶
Improve the CI/CD process, fix test cases, resolve compilation and installation issues in different environments, and enhance the stability and cross-environment compatibility of the framework. #65627, #65736, #65900, #66069, #67000, #67312, #67432, #67540, #67670, #68449, #70806, #65665, #65652, #70644, #68119, #68466, #68858, #68788, #68934, #69883, #69924, #71187, #70798, #71248, #70512, #71363, #71438, #71291
Improvement and Upgrade¶
Environmental upgrade #69491, #66560, #65686, #71177, #71284, #69791, #69349, #70944, #65411
Improvement of DCU/NPU/KUNLUN pipeline #67516, #67629, #67987, #69903, #68448, #70401, #71192, #71197, #68027
Support for Windows environment #70390, #70785, #71286, #71414, #68901
Improvement of third-party libraries #71419
Other optimizations are aimed at enhancing CI stability and execution efficiency #67574, #69058, #70610, #67093, #69037, #65213, #65913, #65947, #66479, #71054, #71396
New Features¶
10. other¶
Changes unrelated to user usage, including cleanup of obsolete code, code migration, cleanup of unit tests, debugging, or upgrades to monitoring mechanisms.
Discarded¶
Clean up abandoned code and useless unit tests #65894, #66165, #66293, #66102, #66442, #66922, #66531, #65518, #66800, #66372, #65902, #65462, #65327, #65189, #65181, #66535, #65383, #65173, #66429, #66386, #66447, #66367, #66160, #65408, #65433, #65481, #65444, #65389, #65663, #65649, #65629, #66142, #65796, #66163, #66291, #65480, #65495, #65498, #65503, #65502, #65501, #65512, #65528, #65472, #65390, #65344, #65384, #65388, #65198, #65248, #65443, #65430
11. List of contributors¶
0x3878f, 0x45f, 2742195759, 86kkd, A-nnonymous, ADream-ki, Aganlengzi, Albresky, AndPuQing, AndSonder, Aoraki-Dream, ApricityXX, Asthestarsfalll, Aurelius84, BHmingyang, BeingGod, Betelgeu, BiynXu, CJ77Qi, Caogration, DDDivano, Dale1314, Deleter-D, DesmonDay, Difers, Dmovic, DongBaiYue, DrRyanHuang, DrownFish19, Eddie-Wang1120, EgoistSA, FeixLiu, ForFishes, Fripping, From00, Function-Samuel, GoldenStain, Guanhuachen2003, GuoxiaWang, Hanyonggong, HarperCy, Hongqing-work, HydrogenSulfate, JZ-LIANG, Jeff114514, JiaWenxuan, LLee233, LanCole, Lans1ot, Layssy, Leoforever123, LiYuRio, LielinJiang, LittleHeroZZZX, Liujie0926, Liyulingyue, Luohongzhige, Marcusryz, MarisaSparkL, Micalling, MikhayEeer, MrXnneHang, MufanColin, NKNaN, Neo-WY, NeroLoh, PolaKuma, Qin-sx, QingshuChen, RachelXu7, RichardWooSJTU, RuohengMa, SCUcookie, Sekiro-x, SigureMo, Sunny-bot1, SylarTiaNII, Sylence8, TBD1, TR666, TimeYWL, Tom-Zheng, Turingg, Victor-Bayim, Vvsmile, WAYKEN-TSE, Wanglongzhi2001, Wangzheee, Waynezee, Wennie396, Whsjrczr, Wizard-ZP, Wong4j, XavierZXY, XiaociZhang, XieYunshen, Xing-lil, Xreki, YKTian-x2b, YZW-explorer, YanhuiDua, YuanRisheng, ZHOU05030, ZhangHandi, ZhangX-21, ZibinGuo, a2064968462, anderson101866, aooxin, aquagull, baoqiwen, bapijun, blacksheep-Aristotle, bukejiyu, carryyu, ccsuzzh, chang-wenbin, changeyoung98, chen2016013, ckl117, cmcamdy, co63oc, continue-coding, cqulilujia, crazyxiaoxi, cszdrg, cubehan3, cyber-pioneer, danleifeng, decade-afk, deepllz, dynamicheart, eee4017, eggman-1024, enkilee, epiphanyer, ethan-sem, fangfangssj, feixi21, fightfat, fufu0615, fxfxfxfxfxfxfxfx, fxy1699, gitliuyf, gongel, gongshaotian, gongweibao, gouzil, gsq7474741, guixxiic, gzy19990617, hanyang2508, haoyu2022, heavyrain-lzy, houj04, huangjiyi, huangkr03, hxzd5568, icpcccpc, inaomIIsfarell, iosmers, jeff41404, jerrywgz, jiachengdai, jiahy0825, jinmingyi1998, jinyouzhi, joseflv, jychen21, jzhang533, kangguangli, kanze1, kineast, kircle888, l1cacheDell, leo0519, lifulll, linkk08, little1d, liufengwei0103, liuruyan, lixcli, liym27, liyongchao911, lizexu123, lizhenyun01, lj970926, lshpku, lszxb, ltd0924, luotao1, lwkhahaha, lxd-cumt, mayang002, megemini, mikemikimike, ming1753, monster1015, mori0umi, ndyysheep, nizne9, nobodynobody, ooooo-create, penPenf28, phlrain, pkuzyc, qili93, rich04lin, risemeup1, ronny1996, rsmallblue, runzhech, skywalker2012, smile2game, sneaxiy, successfulbarrier, sunzhongkai588, swgu98, tc20042008, tianhaodongbd, tianshuo78520a, tizhou86, tlxd, uanu2002, umiswing, vivienfanghuagood, waliwali777, walkalone20, wanghuancoder, wangna11BD, will-jl944, winffke, winter-wang, wwwuyan, xiaoguoguo626807, xiaoluomi, xiaoyao0115, xingmingyyj, xkkkkkk23, xu8117, xuxinyi389, xz-alex, yangrongxinuser, yeteye, yinfan98, yongqiangma, yuan20041218, yuanlehome, yuguo-Jack, yumin066, zbt78, zeroRains, zhangbo9674, zhanghonggeng, zhanglirong1999, zhangting2020, zhangyk0314, zhangyuqin1998, zhiminzhang0830, zhink, zhiqiu, zhouquan32, zhoutianzi666, zhwesky2010, zoooo0820, zrr1999, zty-king, zxcd, zyfncg