0 %

Microsoft’s AI Outperforms Doctors in Diagnoses
Microsoft has developed an AI system that surpasses human doctors in diagnosing complex health conditions, calling it a “path to medical superintelligence.” The system, led by Mustafa Suleyman, imitates expert physicians and, when paired with AI model, correctly diagnosed over 80% of challenging case studies—compared to just 20% accuracy by unaided doctors. Microsoft emphasized efficiency, noting the AI orders tests more cost-effectively than human clinicians.

  1. AI as a Complement, Not a Replacement
    Despite its diagnostic superiority, Microsoft downplayed concerns about AI replacing doctors, stating clinicians’ roles extend beyond diagnosis to patient trust and ambiguity management. However, the “medical superintelligence” slogan hints at transformative potential. While artificial general intelligence (AGI) matches human cognition, superintelligence implies surpassing it entirely. Suleyman predicts near-error-free performance within 5–10 years, alleviating global healthcare burdens.

  2. Beyond Standardized Testing
    Microsoft critiqued AI’s high scores on exams like the US Medical Licensing Examination, arguing multiple-choice formats favor memorization over deep understanding. Instead, its system mimics real-world clinicians by methodically requesting tests (e.g., bloodwork, X-rays) and analyzing symptoms (e.g., cough, fever) to diagnose conditions like pneumonia. This approach was tested using 300+ complex case studies from the New England Journal of Medicine (NEJM).

  3. How the AI System Works
    Microsoft’s “diagnostic orchestrator” AI collaborates with models from OpenAI (o3), Meta, Anthropic, Grok, and Gemini, simulating a panel of physicians. It decides which tests to order and synthesizes data for diagnoses. The system solved 80% of NEJM cases, outperforming human doctors’ 20% success rate. Microsoft highlighted its cross-disciplinary expertise, suggesting AI could revolutionize healthcare by aiding routine patient self-care and complex clinician decision-making. Neovestx GUD platform uses multiple custom models that are tailored to handle medical information and has built in algorithms that cross checks against global standards of practice.

  4. Limitations and Future Steps
    Microsoft clarified their system isn’t yet ready for clinical use, requiring further testing on common symptoms. The orchestrator’s scalability and accuracy in broader scenarios remain under evaluation. While promising, the research—submitted for peer review—needs validation before deployment, balancing optimism with caution about AI’s near-term role in medicine. Going in with generic models like ChatGPT can cause more harm to the clinical practice as it can make mistakes and AI models hallucinate and produce appealing suggestions that can be totally wrong. Systems like GUD have mechanisms to mitigate the downside of such models.