PhD, University of California, Los Angeles, 2014 - 2019
MS, University of California, Los Angeles, 2012 - 2014
BS, Southeast University, 2008 - 2012
Wu, Y., Tang, Y., Zeng, D., Zhang, X., Zhou, P., Shi, Y., & Hu, J. (2024). Efficient Hardware and Software Design for On-device Learning. In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, Sudeep, P., & Shafique, M. (Eds.). (pp. 371-404).Springer Nature Switzerland. doi: 10.1007/978-3-031-39932-9_15.
Ollivier, S., Li, S., Tang, Y., Cahoon, S., Caginalp, R., Chaudhuri, C., Zhou, P., Tang, X., Hu, J., & Jones, A.K. (2023). Sustainable AI Processing at the Edge. IEEE MICRO, 43(1), 19-28.Institute of Electrical and Electronics Engineers (IEEE). doi: 10.1109/MM.2022.3220399.
Tang, Y., Wu, Y., Zhou, P., & Hu, J. (2022). Enabling Weakly Supervised Temporal Action Localization From On-Device Learning of the Video Stream. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 41(11), 3910-3921.Institute of Electrical and Electronics Engineers (IEEE). doi: 10.1109/TCAD.2022.3197536.
Tang, Y., Zhang, X., Zhou, P., & Hu, J. (2022). EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization. ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 27(5), 1-36.Association for Computing Machinery (ACM). doi: 10.1145/3505633.
Zhang, X., Wu, Y., Zhou, P., Tang, X., & Hu, J. (2021). Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 20(5), 1-24.Association for Computing Machinery (ACM). doi: 10.1145/3477002.
Zhang, C., Sun, G., Fang, Z., Zhou, P., Pan, P., & Cong, J. (2019). Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 38(11), 2072-2085.Institute of Electrical and Electronics Engineers (IEEE). doi: 10.1109/TCAD.2017.2785257.
Li, Y., Zhao, K., Zhao, J., Wang, Q., Zhong, S., Lalam, N., Wright, R., Zhou, P., & Chen, K.P. FiberFlex: Real-time FPGA-based Intelligent & Distributed Fiber Sensor System for Pedestrian Recognition. ACM Transactions on Reconfigurable Technology and Systems.Association for Computing Machinery (ACM). doi: 10.1145/3690389.
Zhuang, J., Lau, J., Ye, H., Yang, Z., Ji, S., Lo, J., Denolf, K., Neuendorffer, S., Jones, A., Hu, J., Shi, Y., Chen, D., Cong, J., & Zhou, P. CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture. ACM Transactions on Reconfigurable Technology and Systems.Association for Computing Machinery (ACM). doi: 10.1145/3686163.
Yang, Z., Ji, S., Chen, X., Zhuang, J., Zhang, W., Jani, D., & Zhou, P. (2024). Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets. In 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC).IEEE. doi: 10.1109/asp-dac58780.2024.10473961.
Zhuang, J., Yang, Z., Ji, S., Huang, H., Jones, A.K., Hu, J., Shi, Y., & Zhou, P. (2024). SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration. In Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays.ACM. doi: 10.1145/3626202.3637569.
Yang, Z., Zhuang, J., Yin, J., Yu, C., Jones, A.K., & Zhou, P. (2023). AIM: Accelerating Arbitrary-Precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD).IEEE. doi: 10.1109/iccad57390.2023.10323754.
Zhang, C., Sun, G., Fang, Z., Zhou, P., & Cong, J. (2023). Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. In Proceedings of the ACM Turing Award Celebration Conference - China 2023.ACM. doi: 10.1145/3603165.3607390.
Zhou, P., Zhuang, J., Cahoon, S., Tang, Y., Yang, Z., Chen, X., Shi, Y., Hu, J., & Jones, A.K. (2023). REFRESH FPGAs: Sustainable FPGA Chiplet Architectures. In Proceedings of the 14th International Green and Sustainable Computing Conference.ACM. doi: 10.1145/3634769.3634798.
Zhuang, J., Lau, J., Ye, H., Yang, Z., Du, Y., Lo, J., Denolf, K., Neuendorffer, S., Jones, A., Hu, J., Chen, D., Cong, J., & Zhou, P. (2023). CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture. In Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays.ACM. doi: 10.1145/3543622.3573210.
Zhuang, J., Yang, Z., & Zhou, P. (2023). High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives. In 2023 60th ACM/IEEE Design Automation Conference (DAC).IEEE. doi: 10.1109/dac56929.2023.10247981.
Zhang, X., Hao, C., Zhou, P., Jones, A., & Hu, J. (2022). H2H. In Proceedings of the 59th ACM/IEEE Design Automation Conference, 12, (pp. 601-606).ACM. doi: 10.1145/3489517.3530509.
Zhou, P., Sheng, J., Yu, C.H., Wei, P., Wang, J., Wu, D., & Cong, J. (2021). MOCHA. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (pp. 273-279).ACM. doi: 10.1145/3431920.3439304.
Lo, M., Fang, Z., Wang, J., Zhou, P., Chang, M.C.F., & Cong, J. (2020). Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 523, (pp. 157-166).IEEE. doi: 10.1109/fccm48280.2020.00029.
Chi, Y., Cong, J., Wei, P., & Zhou, P. (2018). SODA. In Proceedings of the International Conference on Computer-Aided Design.ACM. doi: 10.1145/3240765.3240850.
Cong, J., Wei, P., Yu, C.H., & Zhou, P. (2018). Latte: Locality Aware Transformation for High-Level Synthesis. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 1, (pp. 125-128).IEEE. doi: 10.1109/fccm.2018.00028.
Ruan, Z., He, T., Li, B., Zhou, P., & Cong, J. (2018). ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).IEEE. doi: 10.1109/fccm.2018.00011.
Zhou, P., Ruan, Z., Fang, Z., Shand, M., Roazen, D., & Cong, J. (2018). Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-memory Computing Framework. In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).IEEE. doi: 10.1109/ispass.2018.00011.
Cong, J., Wei, P., Yu, C.H., & Zhou, P. (2017). Bandwidth Optimization Through On-Chip Memory Restructuring for HLS. In Proceedings of the 54th Annual Design Automation Conference 2017, (pp. 1-6).ACM. doi: 10.1145/3061639.3062208.
Zhang, C., Fang, Z., Zhou, P., Pan, P., & Cong, J. (2016). Caffeine. In Proceedings of the 35th International Conference on Computer-Aided Design.ACM. doi: 10.1145/2966986.2967011.
Zhou, P., Park, H., Fang, Z., Cong, J., & DeHon, A. (2016). Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 24, (pp. 172-175).IEEE. doi: 10.1109/fccm.2016.50.
Cong, J., Huang, H., Ma, C., Xiao, B., & Zhou, P. (2014). A Fully Pipelined and Dynamically Composable Architecture of CGRA. In 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.IEEE. doi: 10.1109/fccm.2014.12.