马斯克挖到宝！病理AI工具显示Twitter极具价值

京港台：2023-8-30 03:37| 来源：倍可亲科学 | 评论( 2 )条 | 我来说几句

马斯克挖到宝！病理AI工具显示Twitter极具价值

　　（倍可亲特稿/By Science专栏作家 Lily）Twitter新近改了名儿叫X。近几年来，Twitter的股价忽上忽下，折磨得让人几乎要得心脏病。很多人都觉得Twitter可能不行了，但真实情况是：未必。你看看，大多数人还是依靠Twitter来了解突发事件、热点新闻、和全球动态，很多网红们还是选择用 Twitter 来秀美食、美色和秀恩爱。无数的表情包和笑话每天都会从Twitter上飘过，仍有很多人每天都在 Twitter 上互动、表白或互怼，更有公司和名人们把 Twitter的角角落落都塞满了广告和宣传。不仅如此，最近居然有科学家基于Twitter 开发出了一款病理分析AI工具，让人对Twitter不禁再度信心百倍，你小子还有两下子，连科学/医学领域都能一显身手。

　　斯坦福大学医学院的Huang Z.等几位科学家，2023年8月17日在《Nature Medicine》上发表了题为“A Visual-Language Foundation Model for Pathology Image Analysis using Medical Twitter”的论文。这些科学家们观察到， Twitter 上有30 多个病理方面的子类别，这些子类别里发布有大量的临床讨论和去识别化病理图像，如果不加以利用，简直浪费巨大。研究人员从 Twitter 上收集了多少张图片呢？你猜。243,375张！

　　研究者们把这些图片整理筛选了一下，筛选条件非常严格，诸如图像和文本应该配对，转发图像不计入内，图像被点赞越多越好等。Twitter 来源的图库整理好后，研究者将另一个在线开源图库PathLAION合并进来。PathLAION是迄今为止，最大的开源带注释文本描述的病理图库，合并的目的是为了进一步扩展数据库。研究者们把这个新创建的数据库命名为 OpenPath，OpenPath总共包含 208,414 个图像文本对，其中 116,504 个来自原始推文，59,869 个来自相关回复，32,041 个来自 PathLAION。

　　研究人员随后编写了大量程序，让机器通过各种对比学习，全面掌握OpenPath的图像资料。这个新型AI工具，由于是用于病理图像识别的，可以说是一个机器人病理专家，研究者给它取了个大名叫PLIP。对比学习的过程主要是通过比较正图像和负图像来完成，比如当正常肺组织的图像和肺腺癌的图像放在一起是，PLIP会判断得出结论，是“相似”还是“不同”。研究者们还对PLIP的判断力进行了进一步的微调和优化，首先开发出了文本和图像编码器，然后又生成了嵌入式特征数据。经过这番优化，PLIP能对每个配对的图像文本对作出 “相似”的判断，而对非配对的图像文本对作出 “不相似”的判断。比如，当肺腺癌图像放在一起时，即使图像来源于不同的患者，由不同实验室采用了不同染色方式，由于所有图像都具有特定的病理特征，它们都是“相似”的。

　　接下来是我们无比期待的关键大戏：PLIP要经过几轮大考，看看它究竟是骡子还是马！第一轮是入学考试，研究者们从四个已经验证标记过的图库集，抽取PLIP 从未接触过的，源于多个组织和器官的新图片，让PLIP判断是“正常”，“良性”还是“恶性”。这个考试对 PLIP 简直小菜一碟，PLIP无需培训直接参试，大多数问题都回答得非常正确，其成绩比几年前开发的另一款AI模型CLIP好得多，检索的有效性和精度都提高了不少。接下来进入第二轮考试，PLIP需要在复杂的图像上定位特异性的病变、细胞、和组织等。这次考试其实也是对PLIP的一个再训练过程，研究者主要采用了图像嵌入分析（简单而言就是从图像中找到最本质的信息）和线性探测（简单而言就是避免聚集）策略，考试范围涵及四个图片数据库（Kather colon、PanNuke、DigestPath 和 WSSS4LUAD）。不出意外，PLIP 的表现优于其他两个陪考模型。

　　现在还剩最后两轮考试！第三轮考的是输入文本请求，让PLIP检索正确的图像。例如，当你输入“结直肠 H&E 组织中正常结肠粘膜”的文本请求时，PLIP 马上开始在Twitter、PathPedia、PubMed 病理学和病理书的图像数据库中检索，并列出检索到结果。第三轮考试结束后，PLIP还需要完成更难的，也是最后一轮考试：输入病理图像，让 PLIP 检索出图库中同种病变的图像。PLIP不负众望，完败一起陪考的其他三个模型（CLIP、MuDiPath 和 SISH5）。虽然其他模型也可以完成以上任务，但PLIP表现更为出色，更为最优秀。经过这番宰关过将，PLIP已经成为了一名合格的机器人病理专家，何况它还拥有安装了数字图书馆的超强大脑！PLIP始于大众社交平台，终于科研的圣神殿堂，必然能对疾病的诊断和分类、病理医师的教育和培训、罕见病例的识别、病理诊断的质量控制、诊断标准等方面，做出巨大贡献。

　　有高人预测，Twitter股价的长期趋势绝对是上涨。那么这项研究成果对Twitter股价有正面贡献吗？肯定有一丢丢。人工智能只要正确使用，就会越用越聪明，Twitter同理。

　　Novel Pathological AI Tool Indicates Twitter Valuable and Relieves Concerns of Immediate Collapse

　　BackChina Science, By Lily

　　You may need an iron will to face the roller-coaster-like stock prices of Twitter (now rebranded as X), which may give you the impression that Twitter will permanently close soon. However, most people still rely on Twitter to learn about breaking news, current events, and trends around the globe. Influencers still employ Twitter to exhibit their food, drink, stature, and sweethearts. Humor enthusiasts are still taking advantage of Twitter to share memes and jokes. Online socializers still use Twitter to interact with friends, share opinions, and argue with opponents. A plethora of corporations, celebrities, and public personalities continue to stuff Twitter full of ads for products, services, projects, and events. Recently, a newly developed pathological analysis AI tool might give you an even more positive indication: Twitter is still very useful, even in the scientific/medical field.

　　On Aug 17, 2023, Huang Z. and some colleagues from Stanford University School of Medicine published a research paper titled “A visual–language foundation model for pathology image analysis using medical Twitter” on Nature Medicine. They realized that tons of clinical discussions and de-identified pathology images posted on Twitter with over 30 pathology subcategories would be a big waste if researchers did not utilize them. Guess how many images the researchers harvested from Twitter? 243,375! Applying strict filtering protocols, such as images and texts should be paired, images should not be retweeted, images should be highly liked, etc. Huang et al. first meticulously cleaned curated data from Twitter. To further expand the dataset, they combined Twitter data with another online open source (PathLAION). They ultimately created a new database named OpenPath, which contains 208,414 image–text pairs, including 116,504 from tweets, 59,869 from the associated replies, and 32,041 from PathLAION, as the largest publicly available pathology image collection with annotated text des**tion so far.

　　Based on the OpenPath, employing the contrastive learning strategy, the researchers then developed an AI tool, or a robot pathologist, called pathology language–image pretraining (PLIP). Contrastive learning allowed the model to tell “similar” or “different” by comparing positive and negative pairs of images, e.g., a normal lung tissue and a lung adenocarcinoma. Further fine-tuning or optimization was carried out to facilitate the robot pathologist to tell “similar” for each paired image and text, and “dissimilar” for non-paired images and texts. For example, all lung adenocarcinomas share certain similar pathological traits, even if the images originated from distinct patients and laboratories with different staining manners. This process was implemented by first developing text and image encoders, and further generating embedded features.

　　Then comes the critical and exciting part: the robot pathologist needs to take some exams to evaluate how well it functions. In the entrance exam, PLIP was exposed to new images from four external validation datasets and was required to tell which images were normal，benign, or malignant. The images were curated from multiple tissues and organs. Outcomes turned out that this task was a piece of cake for PLIP, which provided correct answers to most of the questions without the need for retraining. The effectiveness and precision of the retrievals were much better compared to another previously developed contrastive language–image pretraining (CLIP) model. The next exam requires PLIP to locate the characteristic cell structures, tissue compositions, and disease manifestations on complicated images, representing specific pathological changes. This exam required a further retraining process for PLIP, including Image embedding analysis (whose simple term is to find the most essential information from the image) and linear probing (whose simple term is to avoid cluster). Four different datasets (Kather colon, PanNuke, DigestPath and WSSS4LUAD) were used for this training and exam. With no surprise, PLIP outperformed two other models.

　　Still, there are two more exams waiting for PLIP! The third exam was to input a text request, for example, normal colon mucosa in colorectal H&E tissue, to test if PLIP can retrieve the correct images from different image datasets, including Twitter, PathPedia, PubMed pathology, and pathology book. The last exam was even more challenging: inputting an image and testing if PLIP can retrieve representative images of the same kind. PLIP achieved the best performance on both text and image retrievals, although other models, including CLIP, MuDiPath, and SISH5, could conduct similar tasks. Upon finishing the two final exams, PLIP was proved to be a qualified robot pathologist installed with the digital library in its super brain. This AI tool developed based on social media data will be valuable for disease diagnosis and classification, education and training, rare case identification, quality control, benchmarking, etc.

　　Based on some forecasts, a long-term increase is expected on Twitter's stock price. Will this scientific discovery contribute to its increase? Definitely a tiny, tiny bit. Nevertheless, it is noteworthy that AI tools will become more versatile and smarter as they are more commonly used. Twitter is the same.

　　这家最好！股市开户分批买入大盘股指基金

马斯克挖到宝！病理AI工具显示Twitter极具价值