<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Machine Learning on Sumukh Acharya</title><link>https://sumukh-acharya.vercel.app/tags/machine-learning/</link><description>Recent content in Machine Learning on Sumukh Acharya</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sat, 01 Jun 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://sumukh-acharya.vercel.app/tags/machine-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>Data Science Intern @ CODMAV Research Centre (PES University)</title><link>https://sumukh-acharya.vercel.app/experience/codmav/</link><pubDate>Sat, 01 Jun 2024 00:00:00 +0000</pubDate><guid>https://sumukh-acharya.vercel.app/experience/codmav/</guid><description>&lt;p&gt;&lt;strong&gt;PES University · Jan 2026 – May 2026&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="overview"&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;During my tenure at the &lt;strong&gt;Centre of Data Modelling, Analytics and Visualization (CODMAV)&lt;/strong&gt;, I worked at the intersection of healthcare and Artificial Intelligence. My primary objective was to build a robust predictive system capable of identifying lung cancer risk at an early stage, which is critical for patient survival rates.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="the-technical-challenge"&gt;&lt;strong&gt;The Technical Challenge&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The core difficulty of this project lay in the sheer scale and sparsity of the raw clinical data. Sourced from the &lt;a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GD5XWE" target="_blank" rel="noopener noreferrer"&gt;
 Harvard Dataverse (Lung Cancer Risk Prediction Dataset)
&lt;/a&gt;, the initial dataset was massive but significantly noisy, comprising &lt;strong&gt;22,811 patient records&lt;/strong&gt; and &lt;strong&gt;788 health markers&lt;/strong&gt;.&lt;/p&gt;</description></item></channel></rss>