nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2025, 09, v.34 641-648
生成式人工智能构建患者药品说明书的效果比较
基金项目(Foundation): 中国药品监督管理研究会课题(编号2024-Z-Y-001)
邮箱(Email): yanyingying89@163.com;zhaisuodi@163.com;
DOI: 10.19577/j.1007-4406.2025.09.001
摘要:

目的 比较不同大语言模型(LLM)与提示词策略组合在构建患者药品说明书中的应用效果,探索基于生成式人工智能构建患者药品说明书的可行方法。方法 将8种LLM(豆包、豆包深度思考、Kimi、Kimi长思考k1.5、ChatGPT、DeepSeek、Grok、Gemini)及5种提示词策略[Zero-Shot、Zero-Chain-of-Thought(Zero-CoT)、Tree-of-Thought(ToT)、Zero-Shot&Few-Shot(ZS&FS)和Zero-CoT&Few-Shot(ZC&FS)]进行不同组合,生成普瑞巴林胶囊、对乙酰氨基酚缓释片和左氧氟沙星片的患者药品说明书。由2名评价者从科学性、全面性、幻觉、易读性、阅读所需时间和便捷性6个维度对生成的患者药品说明书进行质量评价,以各组合得分的标准差评估生成文本的稳定性,综合质量与稳定性,得出最优的LLM+提示词策略组合。结果 8种LLM与5种提示词策略共得到40种LLM与提示词策略的组合,生成360份患者药品说明书。2名评价者平均测量的组内相关系数(ICC)为0.93(95%CI 0.88~0.97,P<0.01)。40种LLM与提示词策略组合的平均质量得分范围为19.16(Grok+ZC&FS)到26.15(DeepSeek+ZS&FS),标准差(稳定性)范围为0.31(Gemini+ZC&FS)到7.42(ChatGPT+Zero-Shot)。整体而言,DeepSeek在生成患者药品说明书方面的应用效果优于其他LLM。结合质量与稳定性的综合考量,DeepSeek+ZS&FS为最优组合。结论 DeepSeek+ZS&FS为生成式人工智能中生成患者药品说明书的最佳组合,人工智能生成的内容经药师专业审核后,可形成准确的患者药品说明书,促进临床合理用药。

Abstract:

AIM This study aims to compare the effectiveness of different combinations of large language models(LLMs) and prompt strategies in generating patient drug information, exploring feasible combinations for constructing patient drug information using generative artificial intelligence. METHODS Pregabalin capsules, acetaminophen extended-release tablets, and levofloxacin tablets were selected as example drugs. A total of 8 LLMs(Doubao, Doubao Deep Thinking, Kimi, Kimi Long Thinking k1.5, ChatGPT, DeepSeek, Grok 3, and Gemini) and 5 prompt strategies [Zero-Shot, Zero-Chain-of-Thought(Zero-CoT), Tree-of-Thought(ToT), Zero-Shot & Few-Shot(ZS&FS), and Zero-CoT & Few-Shot(ZC&FS)] were combined to generate target texts. Two evaluators assessed the quality of the generated patient drug information across 6 dimensions: scientific accuracy, comprehensiveness, hallucination, readability, reading time required, and accessibility. The standard deviation of scores for each combination was used to evaluate the stability of the generated texts. The optimal combination of LLMs and prompt strategies was determined through comprehensive analysis of quality and stability. RESULTS A total of 8 LLMs and 5 prompt strategies were tested, resulting in 40 combinations and 360 generated target texts. The average interrater reliability(ICC) was 0.93(95% CI, 0.88-0.97, P < 0.01). Significant differences in quality and stability were observed across combinations, with average scores ranging from 19.16(Grok + ZC&FS) to 26.15(DeepSeek + ZS&FS) and standard deviations ranging from 0.31(Gemini + ZC&FS) to 7.42(ChatGPT + Zero-Shot). Overall, DeepSeek outperformed other models. Considering both quality and stability, the DeepSeek + ZS&FS combination was identified as the optimal combination. CONCLUSION This study established an optimal combination for DeepSeek + ZS&FS generating patient drug information using generative artificial intelligence. When verified by pharmacists, AI-generated texts can provide accurate and accessible medication information supporting rational drug use in clinical practice.

参考文献

[1] KIM M, SUH D, BARONE J A, et al. Health literacy level and comprehension of prescription and nonprescription drug information[J]. Int J Environ Res Public Health, 2022, 19(11):6665.

[2] RA?KOVI?A, STEINBACH M, MUGO?A S, et al. Patient information leaflets:how do patients comprehend and understand drug information?[J]. Arch Pharm Pract, 2024, 15(1):7.

[3]李海琦,蒋蓉,郑妤婕,等.日本电子药品说明书管理实践及启示[J].中国新药与临床杂志, 2024, 43(12):904.

[4] HE N, YAN Y Y, WU Z Y, et al. Chat GPT-4 significantly surpasses GPT-3.5 in drug information queries[J]. J Telemed Telecare,2025, 31(2):306.

[5]张志玲,周鹏翔,何娜,等.基于临床实践指南,应用生成式人工智能模型编写纤维肌痛患者教育材料[J].临床药物治疗杂志,2024, 22(5):7.

[6] JEONG Y U, CHOI J, PARK N, et al. Predicting drugdrug interactions:a deep learning approach with GCN-based collaborative filtering[J]. Artif Intell Med, 2025, 167:103185.

[7] WANG L, CHEN X, DENG X W, et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs[J]. NPJ Digit Med, 2024, 7(1):41.

[8] SALAZAR-LARA C, ARIAS-RUSSI A F, MANRIQUE R. Bridging the gap in health literacy:harnessing the power of large language models to generate plain language summaries from biomedical texts[C]//Proceedings of the Second Workshop on PatientOriented Language Processing(CL4Health). Albuquerque, New Mexico. Stroudsburg, PA, USA:ACL, 2025:269-284.

[9]国家互联网信息办公室.国家互联网信息办公室关于发布2024年生成式人工智能服务已备案信息的公告[EB/OL].(2025-01-08)[2025-08-23]. https://www.cac.gov.cn/2025-01/08/c_1738034725920930.htm.

[10]闫盈盈,何娜,张志玲,等.生成式人工智能构建患者药品说明书的方法研究[J].临床药物治疗杂志, 2024, 22(5):1.

[11] ZAGHIR J, NAGUIB M, BJELOGRLIC M, et al. Prompt engineering paradigms for medical applications:scoping review[J]. J Med Internet Res, 2024, 26:e60501.

[12] YAO S Y, YU D, ZHAO J, et al. Tree of thoughts:deliberate problem solving with large language models[J]. Advances in neural information processing systems, 2023, 36:11809.

[13] SHAN Y, JI M, DONG Z G, et al. The Chinese version of the patient education materials assessment tool for printable materials:translation,adaptation, and validation study[J]. J Med Internet Res, 2023, 25:e39808.

[14]教育部.教育部关于印发义务教育课程方案和课程标准(2022年版)的通知[EB/OL].(2022-03-25)[2025-02-10]. https://www.gov.cn/zhengce/zhengceku/2022-04/21/content_5686535.htm.

[15] Engelberg Center for Health Care Reform at Brookings. Exploring the Promise of Patient Medication Information[R/OL].(2014-07-01)[2025-02-12]. https://www.brookings.edu/wp-content/uploads/2014/05/PMI-Final-Presentation-20140702-updated-2.pdf.

[16]古口贵子,金子亚纪子,池岛幸男,等.新患者药品指南草案用户测试实施报告书[R/OL].(2023-03-14)[2025-07-04]. https://www.pmda.go.jp/files/000272288.pdf.

[17] JOHNSON D, GOODMAN R, PATRINELY J, et al. Assessing the accuracy and reliability of AI-generated medical responses:an evaluation of the chat-GPT model[J]. Res Sq, 2023:rs.3.rs-25669YL.

[18] GOZZI M, DI MAIO F. Comparative analysis of prompt strategies for large language models:single-task vs. multitask prompts[J].Electronics, 2024, 13(23):4712.

[19] HUANG L, YU W J, MA W T, et al. A survey on hallucination in large language models:principles, taxonomy, challenges, and open questions[J]. ACM Trans Inf Syst, 2025, 43(2):1.

[20] TALONI A, SANGREGORIO A C, ALESSIO G, et al. Large language models provide discordant information compared to ophthalmology guidelines[J]. Sci Rep, 2025, 15:20556.

[21] CHARNOCK D, SHEPPERD S, NEEDHAM G, et al. DISCERN:an instrument for judging the quality of written consumer health information on treatment choices[J]. J Epidemiol Community Health, 1999, 53(2):105.

[22] SHOEMAKER S J, WOLF M S, BRACH C. Development of the patient education materials assessment tool(PEMAT):a new measure of understandability and actionability for print and audiovisual patient information[J]. Patient Educ Couns, 2014, 96(3):395.

[23] DOAK C C, DOAK L G, ROOT J H. Teaching patients with low literacy skills[J]. Am J Nurs, 1996, 96(12):16M.

[24] SUPPAN M, FUBINI P E, STEFANI A, et al. Performance of 3conversational generative artificial intelligence models for computing maximum safe doses of local anesthetics:comparative analysis[J].JMIR AI, 2025, 4:e66796.

[25] PATEL S, PONN J, LEE T J, et al. Artificial intelligence in peripheral artery disease education:a battle between ChatGPT and google gemini[J]. Cureus, 2025, 17(6):e85174.

[26]董淑杰,翟所迪.国外患者用药说明书的设计与实践概述[J].中国药物应用与监测, 2013, 10(4):227.

基本信息:

DOI:10.19577/j.1007-4406.2025.09.001

中图分类号:R95

引用信息:

[1]马凯楠,闫盈盈,何娜,等.生成式人工智能构建患者药品说明书的效果比较[J].中国临床药学杂志,2025,34(09):641-648.DOI:10.19577/j.1007-4406.2025.09.001.

基金信息:

中国药品监督管理研究会课题(编号2024-Z-Y-001)

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文