{"id":1611,"date":"2023-10-24T03:13:47","date_gmt":"2023-10-24T03:13:47","guid":{"rendered":"https:\/\/fabiogori.ai\/?p=1611"},"modified":"2023-10-26T11:42:44","modified_gmt":"2023-10-26T11:42:44","slug":"why-a-good-performing-ai-model-could-actually-be-a-bad-model","status":"publish","type":"post","link":"https:\/\/fabiogori.ai\/it\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/","title":{"rendered":"Why a Good-Performing AI Model Could Actually Be a Bad Model"},"content":{"rendered":"<h2 class=\"wp-block-heading\">Why a Good-Performing AI Model Could Actually Be a Bad Model<\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/10\/DALL\u00b7E-2023-10-22-23.22.29-Photo-of-a-maze-shaped-like-a-brain-highlighting-the-complexity-of-models-and-the-caption-Not-All-That-Glitters-Is-Gold.png\" alt=\"\" class=\"wp-image-1612\" srcset=\"https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/10\/DALL\u00b7E-2023-10-22-23.22.29-Photo-of-a-maze-shaped-like-a-brain-highlighting-the-complexity-of-models-and-the-caption-Not-All-That-Glitters-Is-Gold.png 1024w, https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/10\/DALL\u00b7E-2023-10-22-23.22.29-Photo-of-a-maze-shaped-like-a-brain-highlighting-the-complexity-of-models-and-the-caption-Not-All-That-Glitters-Is-Gold-300x300.png 300w, https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/10\/DALL\u00b7E-2023-10-22-23.22.29-Photo-of-a-maze-shaped-like-a-brain-highlighting-the-complexity-of-models-and-the-caption-Not-All-That-Glitters-Is-Gold-150x150.png 150w, https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/10\/DALL\u00b7E-2023-10-22-23.22.29-Photo-of-a-maze-shaped-like-a-brain-highlighting-the-complexity-of-models-and-the-caption-Not-All-That-Glitters-Is-Gold-768x768.png 768w, https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/10\/DALL\u00b7E-2023-10-22-23.22.29-Photo-of-a-maze-shaped-like-a-brain-highlighting-the-complexity-of-models-and-the-caption-Not-All-That-Glitters-Is-Gold-12x12.png 12w, https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/10\/DALL\u00b7E-2023-10-22-23.22.29-Photo-of-a-maze-shaped-like-a-brain-highlighting-the-complexity-of-models-and-the-caption-Not-All-That-Glitters-Is-Gold-600x600.png 600w, https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/10\/DALL\u00b7E-2023-10-22-23.22.29-Photo-of-a-maze-shaped-like-a-brain-highlighting-the-complexity-of-models-and-the-caption-Not-All-That-Glitters-Is-Gold-100x100.png 100w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Image generated by DALL-E based on blog post title<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h3>Introduzione<\/h3>\n<p>If you\u2019re in any industry that leverages data, you\u2019ve likely heard the buzz about how artificial intelligence and machine learning models can revolutionize your business, automate complex tasks, and provide invaluable insights. With all the hype, it\u2019s easy to fall into the trap of thinking that a high-performing model\u2014judged by metrics like accuracy, precision, or recall\u2014is always a good model. After all, if it performs well, it must be learning the right things, right?<\/p>\n<p>Well, not necessarily.<\/p>\n<p>In this blog post, we\u2019ll delve into the somewhat counterintuitive idea that a well-performing model can actually be a bad model. We\u2019ll explore why a model that excels in training might fail miserably when deployed in a real-world scenario, why some models are like \u201csmart but lazy students,\u201d and how a model might be leveraging spurious statistical relations to give the illusion of high performance.<\/p>\n<p>So, if you\u2019re interested in not just building machine learning models but building <em>good<\/em> machine learning models, read on. This post aims to equip you with the knowledge to critically evaluate your models beyond just performance metrics.<\/p>\n<h3>The \u201cSmart but Lazy Student\u201d Analogy<\/h3>\n<p>We\u2019ve all encountered them at some point in our academic journeys: the smart but lazy students who somehow manage to ace exams without appearing to put in much effort. How do they do it? Often, they\u2019re experts at finding loopholes, shortcuts, or tricks that allow them to get good grades without truly understanding the subject matter. Interestingly, machine learning algorithms can behave in a similar manner. They are exceptionally good at optimizing for the objective function you give them, but sometimes, they do so in ways that are unexpected and undesirable.<\/p>\n<h4>The Objective Function: A Double-Edged Sword<\/h4>\n<p>In machine learning, the objective function (or <a href=\"https:\/\/en.wikipedia.org\/wiki\/Loss_function\">loss function<\/a>) is what the algorithm aims to optimize. For example, a classification model might aim to minimize the cross-entropy loss, while a regression model might aim to minimize the mean squared error. However, the algorithm doesn\u2019t \u201ccare\u201d how it achieves this optimization. If it finds a shortcut that allows it to minimize the loss function without capturing the true underlying patterns in the data, it will take it.<\/p>\n<h4>Why This is a Problem<\/h4>\n<p>Much like the smart but lazy student who finds a way to ace the exam without understanding the subject, a machine learning model that finds a loophole will perform well on the training data but is likely to perform poorly on new, unseen data. This is because it hasn\u2019t actually learned the underlying patterns in the data; it\u2019s merely found a shortcut to optimize the objective function.<\/p>\n<h4>An Example: Text Classification<\/h4>\n<p>Consider a text classification problem where you\u2019re trying to distinguish between positive and negative reviews. If your training data contains a lot of negative reviews that happen to mention the word \u201cterrible,\u201d the model might learn that the presence of \u201cterrible\u201d is a strong indicator of a negative review. However, what happens when the model encounters a sentence like \u201cNot terrible at all, I loved it!\u201d in the test data? The model, taking the shortcut it learned, might incorrectly classify this as a negative review.<\/p>\n<h4>How to Mitigate This Issue<\/h4>\n<p>One way to address this problem is to use techniques that promote model interpretability, such as LIME (Local Interpretable Model-agnostic Explanations) or SHAP (SHapley Additive exPlanations). These techniques can help you understand what features the model is using to make its predictions, allowing you to spot and correct any \u201cshortcuts\u201d it might be taking.<\/p>\n<p>In conclusion, machine learning models, much like smart but lazy students, are excellent at finding shortcuts to optimize their objective functions. While this can lead to high performance on the training data, it can also result in poor generalization to new data. In the next section, we\u2019ll delve into another fascinating aspect of machine learning models: their ability to leverage spurious statistical relations to give the illusion of high performance.<\/p>\n<h3>Spurious Statistical Relations: Correlation is Not Causation<\/h3>\n<p>We\u2019ve all heard the phrase \u201ccorrelation is not causation,\u201d but it\u2019s especially crucial to remember this when working with machine learning models. Sometimes a model may perform well because it has identified a statistical relationship between features and the target variable. However, this relationship might be spurious\u2014meaning it\u2019s a coincidence rather than indicative of an underlying cause-and-effect relationship.<\/p>\n<h4>What Are Spurious Statistical Relations?<\/h4>\n<p>A spurious statistical relation occurs when two variables appear to be related but are actually both influenced by a third variable, or when the relationship is a mere coincidence. In such cases, the model might perform well on the training data, where the spurious relationship exists, but fail to generalize to new data where the relationship doesn\u2019t hold.<\/p>\n<h4>The Danger of Spurious Relations<\/h4>\n<p>The primary danger of a model learning a spurious relation is that it can give the illusion of high performance. Because the model\u2019s predictions are based on coincidental relationships in the training data, it\u2019s likely to perform poorly when exposed to new data where those coincidental relationships don\u2019t exist.<\/p>\n<h4>Example: Ice Cream Sales and Drowning Incidents<\/h4>\n<p>A classic example of a spurious relationship is the correlation between ice cream sales and drowning incidents. Both tend to increase during the summer and decrease during the winter. A naive analysis might suggest that ice cream sales cause an increase in drownings, which is, of course, not the case. The hidden variable here is the temperature; warm weather influences both ice cream sales and the likelihood of people going swimming, which in turn increases the risk of drowning incidents.<\/p>\n<h4>How to Detect and Avoid Spurious Relations<\/h4>\n<ol>\n<li>\n<p><strong>Domain Knowledge<\/strong>: Understanding the domain you\u2019re working in can help you identify features that are unlikely to have a causal relationship with the target variable.<\/p>\n<\/li>\n<li>\n<p><strong>Feature Importance Analysis<\/strong>: Techniques like Random Forest\u2019s feature importance or linear model coefficients can help identify which features are most influential in making predictions. If a feature seems disproportionately influential, it might be worth investigating further.<\/p>\n<\/li>\n<li>\n<p><strong>Statistical Tests<\/strong>: Conducting statistical tests for independence can help identify if the relationship between features and the target variable is likely to be spurious.<\/p>\n<\/li>\n<li>\n<p><strong>Cross-Validation<\/strong>: Using different subsets of your data for training and validation can help identify if the model is learning spurious relations. A model based on spurious relations is likely to have a significant performance drop when validated on a different subset of data.<\/p>\n<\/li>\n<\/ol>\n<p>In summary, while spurious statistical relations can give the illusion of a high-performing model, they are a pitfall that can lead to poor generalization on new data. Always remember that correlation does not imply causation, and take steps to ensure your model is learning meaningful relationships, not mere coincidences. In the next section, we\u2019ll look at a real-world example to illustrate these concepts further.<\/p>\n<h4>Real-world Example: The Parkinson\u2019s Disease Score Predictor<\/h4>\n<p>To bring all these abstract concepts to life, let\u2019s consider a real-world example involving a machine learning model designed to predict Parkinson\u2019s Disease scores. This example will illustrate how a seemingly well-performing model can actually be a bad model due to the pitfalls we\u2019ve discussed.<\/p>\n<h4>The Objective<\/h4>\n<p>The goal of this <a href=\"https:\/\/www.kaggle.com\/competitions\/amp-parkinsons-disease-progression-prediction\">Kaggle project<\/a> was to build a model that could predict the progression of Parkinson\u2019s Disease in patients based on protein expression measurements. A high-performing model in this context could be invaluable for healthcare providers in tailoring treatment plans for patients.<\/p>\n<h4>The \u201cHigh-Performing\u201d Model<\/h4>\n<p>Initially, many models published on Kaggle seemed promising. However, upon closer inspection, it was discovered that these models had essentially learned to distinguish between control and test patients, rather than predicting the progression of Parkinson\u2019s Disease based on protein expression.<\/p>\n<h4>The Pitfall: Spurious Relations<\/h4>\n<p>These models had found a spurious relationship between some features and the target variable. These features were not causally related to Parkinson\u2019s Disease but were different between the control and test groups. As a result, these models performed well on the training data but were essentially useless for their intended purpose of predicting disease severity in new patients.<\/p>\n<h4>The Consequences<\/h4>\n<p>Relying on these models in a clinical setting could have led to incorrect treatment plans and a waste of healthcare resources. This example underscores the importance of thoroughly evaluating and understanding what a machine learning model has learned.<\/p>\n<h4>Lessons Learned<\/h4>\n<p>The issue with these models was known since the beginning, but when it is not the case, the following steps can help to identify the problem.<\/p>\n<ol>\n<li>\n<p><strong>Always Validate on Unseen Data<\/strong>: In many cases, this type of issues could only be discovered when it was tested on new, unseen data, highlighting the importance of validation.<\/p>\n<\/li>\n<li>\n<p><strong>Interpretability Matters<\/strong>: Techniques like SHAP or LIME can be used to understand what the model is actually learning, potentially flagging the issue very soon.<\/p>\n<\/li>\n<li>\n<p><strong>Domain Knowledge is Crucial<\/strong>: A healthcare expert might be able to identify the irrelevant features that a model is using, emphasizing the importance of domain knowledge in feature selection and model evaluation.<\/p>\n<\/li>\n<\/ol>\n<p>In summary, this real-world example serves as a cautionary tale of how a seemingly high-performing model can turn out to be a bad model when it learns spurious relations or fails to generalize. It\u2019s a reminder that performance metrics are just one piece of the puzzle; understanding what the model has actually learned is equally, if not more, important. In the next section, we\u2019ll discuss some strategies to mitigate these issues and build models that are both high-performing and reliable.<\/p>\n<h3>The Illusion of Performance<\/h3>\n<p>When we talk about a machine learning model\u2019s performance, we often refer to <a href=\"https:\/\/neptune.ai\/blog\/performance-metrics-in-machine-learning-complete-guide\">metrics<\/a> like accuracy, precision, recall, <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=F_1&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"F_1\" class=\"latex\" \/>, or even area under the ROC curve (AUC-ROC) for classification problems. For regression models, we might look at the mean squared error (MSE), root mean square error (RMSE), or R-squared values. These metrics give us a quantitative way to assess how well our model is doing, and they are invaluable tools for model evaluation.<\/p>\n<p>However, these metrics can sometimes create an illusion of performance. A high accuracy rate might make us think that our model is doing an excellent job. But what if the dataset is imbalanced, and the model is simply predicting the majority class for all inputs? In such a case, the model\u2019s high accuracy is misleading. Similarly, a low MSE in a regression model might make us feel confident, but what if the model is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Overfitting\">overfitting<\/a> to the training data and performs poorly on new, unseen data?<\/p>\n<p>The point is, while performance metrics are essential, they are not the end-all-be-all of model quality. They give us a snapshot of how well the model is doing on a particular dataset, but they don\u2019t necessarily tell us how well the model will perform in the real world, on new and unseen data. They also don\u2019t tell us anything about whether the model has actually learned the underlying patterns in the data, or if it has simply memorized the training data or found some loophole to exploit.<\/p>\n<p>In the following sections, we\u2019ll explore some of the reasons why a model that appears to perform well might actually be a bad model. We\u2019ll look at the pitfalls of overfitting, the dangers of spurious correlations, and the importance of understanding what the model has actually learned. So let\u2019s dive in and unravel the complexities behind the illusion of performance.<\/p>\n<h3>The Problem of Overfitting: When a Model Does Not Generalize<\/h3>\n<p>One of the most common pitfalls in machine learning is overfitting. But what exactly is overfitting? In simple terms, overfitting occurs when a model learns the training data too well, capturing not just the underlying patterns but also the noise and random fluctuations. As a result, while the model performs exceptionally well on the training data, it fails to generalize to new, unseen data. In essence, the model becomes a \u201cmemorization machine\u201d rather than a \u201cgeneralization machine.\u201d<\/p>\n<h4>Why Overfitting is a Problem<\/h4>\n<p>Imagine you\u2019re studying for an exam, and instead of understanding the core principles of the subject, you memorize the answers to all the questions in the textbook. You might score well if the exam questions are identical to those in the book, but you\u2019ll likely perform poorly on questions that require a deep understanding of the subject matter. Similarly, an overfit model performs well on the data it has seen but is likely to make incorrect predictions on new data.<\/p>\n<h4>Signs of Overfitting<\/h4>\n<p>How can you tell if your model is overfitting? One classic sign is a significant discrepancy between the model\u2019s performance on the training set and its performance on a validation or test set. If your model has a high accuracy on the training set but a much lower accuracy on the validation set, that\u2019s a red flag.<\/p>\n<h4>A Simple Example<\/h4>\n<p>Let\u2019s consider a simple example using polynomial regression. Suppose you\u2019re trying to fit a model to a set of points that follow a linear trend but also contain some random noise. If you fit a high-degree polynomial to this data, the curve might pass through almost all the points in the training set, resulting in a low MSE. However, this complex model is likely to perform poorly on new data points, as it has essentially \u201cmemorized\u201d the noise in the training set.<\/p>\n<h3>How to Mitigate These Issues<\/h3>\n<p>So far, we\u2019ve discussed various pitfalls that can make a seemingly high-performing machine learning model a bad one. But all is not lost; there are several strategies and best practices you can employ to mitigate these issues. Here\u2019s how:<\/p>\n<h4>Cross-Validation<\/h4>\n<p>Cross-validation is a powerful technique to assess how well your model will generalize to an independent dataset. By partitioning your original training data into a set of smaller train and test datasets and evaluating performance across all sets, you can get a more reliable estimate of its generalization error.<\/p>\n<h4>Regularization<\/h4>\n<p>Regularization techniques like L1 or L2 regularization add a penalty term to the loss function, discouraging the model from fitting to the high variance in the training data. This can be particularly useful for preventing overfitting.<\/p>\n<h4>Feature Engineering and Selection<\/h4>\n<p>Carefully selecting which features to include in your model can go a long way in preventing overfitting and spurious correlations. Domain knowledge is invaluable here, as it allows you to understand which features are likely to have a genuine relationship with the target variable.<\/p>\n<h4>Ensemble Methods<\/h4>\n<p>Using ensemble methods like Random Forests or Gradient Boosting can improve generalization by combining the predictions of multiple base estimators. This often results in a more robust model that is less likely to overfit or rely on spurious correlations.<\/p>\n<h4>Model Interpretability<\/h4>\n<p>As we\u2019ve discussed, understanding what your model has learned is crucial. Techniques like LIME or SHAP can provide insights into your model\u2019s decision-making process, helping you identify if it\u2019s taking shortcuts or relying on irrelevant features.<\/p>\n<h4>Consult Domain Experts<\/h4>\n<p>Especially in fields like healthcare, finance, or any other specialized area, consulting with domain experts can provide invaluable insights. They can help identify whether the features you\u2019re considering are genuinely relevant or if you\u2019re missing critical variables that could improve your model\u2019s performance and reliability.<\/p>\n<h4>Continuous Monitoring<\/h4>\n<p>Once deployed, continuous monitoring of your model\u2019s performance can help you quickly identify any issues or declines in performance, allowing for timely updates or interventions.<\/p>\n<p>By employing these strategies, you can build machine learning models that are not just high-performing but also robust and reliable. Remember, a good model is not just about high performance metrics; it\u2019s about understanding what the model has learned, how it generalizes to new data, and whether it\u2019s truly capturing the underlying patterns in the data or merely exploiting loopholes and coincidences. In the next section, we\u2019ll wrap up and summarize the key takeaways from this discussion. Stay tuned!<\/p>\n<h3>Conclusione<\/h3>\n<p>In the rapidly evolving field of machine learning, it\u2019s easy to get caught up in the race for higher performance metrics. While accuracy, precision, and other statistical measures are undoubtedly important, they are just one piece of the puzzle. As we\u2019ve explored in this blog post, a model that appears to perform well may actually be a bad model for various reasons, such as overfitting, exploiting loopholes, or relying on spurious correlations.<\/p>\n<p>The key takeaway is that building a good machine learning model requires a holistic approach. It\u2019s not just about training a model to achieve the highest possible score on some metric; it\u2019s about understanding what the model has actually learned, how well it generalizes to new data, and whether it\u2019s capturing meaningful relationships in the data. Employing strategies like cross-validation, regularization, feature selection, and model interpretability can go a long way in ensuring that your model is both high-performing and robust.<\/p>\n<p>So the next time you find yourself marveling at the performance metrics of your latest model, take a step back and consider the bigger picture. Dive deeper into the model\u2019s behavior, consult with domain experts, and most importantly, validate on unseen data. Remember, a truly good model is one that performs well not just on your training data, but in the real world.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p><strong>Fai gratuitamente il <a href=\"https:\/\/fabiogori.ai\/it\/1237-2\/\">Data Maturity Quiz<\/a> and a Free Consultation<\/strong><\/p>\n<p>Nel mondo della data science, capire a che punto siete \u00e8 il primo passo verso il miglioramento. Siete curiosi di sapere quanto la vostra azienda sia veramente esperta di dati? Volete identificare le aree di miglioramento e valutare il livello di Data Maturity della vostra organizzazione? Se \u00e8 cos\u00ec, ho lo strumento che fa per voi.<\/p>\n<p><strong>Presentazione del <a href=\"https:\/\/fabiogori.ai\/it\/1237-2\/\">Data Maturity Quiz<\/a><\/strong>:<\/p>\n<ul>\n<li class=\"translation-block\"><em><strong>Facile e Veloce<\/strong>: con sole 14 domande, potete completare il quiz in meno di 9 minuti.<\/em><\/li>\n<li class=\"translation-block\"><em><strong>Valutazione completa<\/strong>: Ottenete una visione olistica della Data Maturity della vostra azienda. Comprendete i punti di forza e le aree che richiedono attenzione.\n<\/em><\/li>\n<li class=\"translation-block\"><em><strong>Comprensione nel dettaglio<\/strong>: Ricevete un punteggio gratuito per ciascuno dei quattro elementi essenziali della Data Maturity. Questo fornir\u00e0 un quadro chiaro di dove la vostra organizzazione eccelle e dove c'\u00e8 spazio per il miglioramento.<\/em><\/li>\n<\/ul>\n<p>Per diventare un'organizzazione veramente guidata dai dati \u00e8 necessario un momento di introspezione. Si tratta di comprendere le capacit\u00e0 attuali, riconoscere le aree di miglioramento e tracciare il percorso da seguire. Questo quiz \u00e8 stato ideato per fornirvi questi spunti.<\/p>\n<p><strong>Siete pronti a intraprendere questo viaggio?<\/strong><br>\n<a href=\"https:\/\/fabiogori.ai\/it\/1237-2\/\">Fate subito il Quiz sulla Data Maturity!<\/a><\/p>\n<p>Ricordate, la conoscenza \u00e8 potere. Capendo a che punto siete oggi, potete prendere decisioni informate per un futuro migliore e guidato dai dati.<\/p>\n<p><strong>Free Consultation<\/strong> book a <a href=\"https:\/\/fabiogoriai.simplybook.it\/v2\/?widget-type=iframe&amp;theme=hugo&amp;theme=hugo&amp;timeline=modern&amp;datepicker=top_calendar#book\/service\/10\/count\/1\/\">1-hour consultation<\/a> for free!<\/p>\n<\/div>\n\n\n<div class=\"wp-block-post-date\"><time datetime=\"2023-10-24T03:13:47+00:00\">24 Ott 2023<\/time><\/div>","protected":false},"excerpt":{"rendered":"<p>Why a Good-Performing AI Model Could Actually Be a Bad Model<\/p>","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_uag_custom_page_level_css":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"disabled","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","two_page_speed":[],"_jetpack_memberships_contains_paid_content":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[25,26],"tags":[40,34,32,50],"class_list":["post-1611","post","type-post","status-publish","format-standard","hentry","category-diversely-intelligent","category-allblogs","tag-aiandml","tag-artificial-intelligence","tag-data-science","tag-overfitting"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Why a Good-Performing AI Model Could Actually Be a Bad Model - Dr Fabio Gori<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/\" \/>\n<meta property=\"og:locale\" content=\"it_IT\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Why a Good-Performing AI Model Could Actually Be a Bad Model - Dr Fabio Gori\" \/>\n<meta property=\"og:description\" content=\"Why a Good-Performing AI Model Could Actually Be a Bad Model\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/\" \/>\n<meta property=\"og:site_name\" content=\"Dr Fabio Gori\" \/>\n<meta property=\"article:published_time\" content=\"2023-10-24T03:13:47+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-10-26T11:42:44+00:00\" \/>\n<meta name=\"author\" content=\"Fabio Gori\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@igorfobia\" \/>\n<meta name=\"twitter:label1\" content=\"Scritto da\" \/>\n\t<meta name=\"twitter:data1\" content=\"Fabio Gori\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tempo di lettura stimato\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minuti\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/\"},\"author\":{\"name\":\"Fabio Gori\",\"@id\":\"https:\/\/www.fabiogori.ai\/#\/schema\/person\/6cbc17c31f500556c04c39cafeb43429\"},\"headline\":\"Why a Good-Performing AI Model Could Actually Be a Bad Model\",\"datePublished\":\"2023-10-24T03:13:47+00:00\",\"dateModified\":\"2023-10-26T11:42:44+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/\"},\"wordCount\":3119,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.fabiogori.ai\/#\/schema\/person\/91e698d640a31468ff83c06886a33511\"},\"keywords\":[\"aiandml\",\"artificial intelligence\",\"data science\",\"overfitting\"],\"articleSection\":[\"Diversely Intelligent\",\"Fabio Gori's Blogs\"],\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/\",\"url\":\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/\",\"name\":\"Why a Good-Performing AI Model Could Actually Be a Bad Model - Dr Fabio Gori\",\"isPartOf\":{\"@id\":\"https:\/\/www.fabiogori.ai\/#website\"},\"datePublished\":\"2023-10-24T03:13:47+00:00\",\"dateModified\":\"2023-10-26T11:42:44+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/#breadcrumb\"},\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/fabiogori.ai\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Why a Good-Performing AI Model Could Actually Be a Bad Model\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.fabiogori.ai\/#website\",\"url\":\"https:\/\/www.fabiogori.ai\/\",\"name\":\"Dr Fabio Gori\",\"description\":\"Data Science for R&amp;D in life science and healthcare\",\"publisher\":{\"@id\":\"https:\/\/www.fabiogori.ai\/#\/schema\/person\/91e698d640a31468ff83c06886a33511\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.fabiogori.ai\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"it-IT\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/www.fabiogori.ai\/#\/schema\/person\/91e698d640a31468ff83c06886a33511\",\"name\":\"igorfobia\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/www.fabiogori.ai\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/02\/profile_fabio.jpg\",\"contentUrl\":\"https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/02\/profile_fabio.jpg\",\"width\":2048,\"height\":2048,\"caption\":\"igorfobia\"},\"logo\":{\"@id\":\"https:\/\/www.fabiogori.ai\/#\/schema\/person\/image\/\"},\"description\":\"Data Scientist for R&amp;D in life science and healthcare. I help org make the most of their data, small or big\",\"sameAs\":[\"https:\/\/fabiogori.ai\",\"https:\/\/x.com\/igorfobia\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.fabiogori.ai\/#\/schema\/person\/6cbc17c31f500556c04c39cafeb43429\",\"name\":\"Fabio Gori\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/www.fabiogori.ai\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/eb6023542b9e3fa8b44eefbb6516c58004258416e4b325476b2ddc064d7c1b91?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/eb6023542b9e3fa8b44eefbb6516c58004258416e4b325476b2ddc064d7c1b91?s=96&d=mm&r=g\",\"caption\":\"Fabio Gori\"},\"description\":\"As a Freelance Data Scientist with over 15 years of experience, I help companies leverage data to create value, innovation, and impact. I specialize in AI, R&amp;D, and life sciences, offering solutions for data analysis, machine learning, predictive modeling, software analytical validation, technical writing, and more. I have worked with leading organizations in various domains, such as BioNTech, AkzoNobel, T-Mobile, and several academic institutions. I have developed and delivered AI solutions for challenges in biotech, diagnostics, precision medicine, metagenomics, manufacturing, and telecom. I have also designed and written scientific grants, and tenders, as well as an Applied Data Science MSc program. I am passionate about bridging the gap between data science and domain expertise, and empowering my clients to make data-driven decisions. I was the employee number 5 of a start-up acquired by Silo AI one year after I left.\",\"sameAs\":[\"http:\/\/www.fabiogori.ai\"],\"url\":\"https:\/\/fabiogori.ai\/it\/author\/fabiogoriai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Why a Good-Performing AI Model Could Actually Be a Bad Model - Dr Fabio Gori","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/","og_locale":"it_IT","og_type":"article","og_title":"Why a Good-Performing AI Model Could Actually Be a Bad Model - Dr Fabio Gori","og_description":"Why a Good-Performing AI Model Could Actually Be a Bad Model","og_url":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/","og_site_name":"Dr Fabio Gori","article_published_time":"2023-10-24T03:13:47+00:00","article_modified_time":"2023-10-26T11:42:44+00:00","author":"Fabio Gori","twitter_card":"summary_large_image","twitter_site":"@igorfobia","twitter_misc":{"Scritto da":"Fabio Gori","Tempo di lettura stimato":"14 minuti"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/#article","isPartOf":{"@id":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/"},"author":{"name":"Fabio Gori","@id":"https:\/\/www.fabiogori.ai\/#\/schema\/person\/6cbc17c31f500556c04c39cafeb43429"},"headline":"Why a Good-Performing AI Model Could Actually Be a Bad Model","datePublished":"2023-10-24T03:13:47+00:00","dateModified":"2023-10-26T11:42:44+00:00","mainEntityOfPage":{"@id":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/"},"wordCount":3119,"commentCount":0,"publisher":{"@id":"https:\/\/www.fabiogori.ai\/#\/schema\/person\/91e698d640a31468ff83c06886a33511"},"keywords":["aiandml","artificial intelligence","data science","overfitting"],"articleSection":["Diversely Intelligent","Fabio Gori's Blogs"],"inLanguage":"it-IT","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/","url":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/","name":"Why a Good-Performing AI Model Could Actually Be a Bad Model - Dr Fabio Gori","isPartOf":{"@id":"https:\/\/www.fabiogori.ai\/#website"},"datePublished":"2023-10-24T03:13:47+00:00","dateModified":"2023-10-26T11:42:44+00:00","breadcrumb":{"@id":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/#breadcrumb"},"inLanguage":"it-IT","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.fabiogori.ai\/allblogs\/diversely-intelligent\/why-a-good-performing-ai-model-could-actually-be-a-bad-model\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/fabiogori.ai\/"},{"@type":"ListItem","position":2,"name":"Why a Good-Performing AI Model Could Actually Be a Bad Model"}]},{"@type":"WebSite","@id":"https:\/\/www.fabiogori.ai\/#website","url":"https:\/\/www.fabiogori.ai\/","name":"Dr Fabio Gori","description":"Data Science for R&amp;D in life science and healthcare","publisher":{"@id":"https:\/\/www.fabiogori.ai\/#\/schema\/person\/91e698d640a31468ff83c06886a33511"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.fabiogori.ai\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"it-IT"},{"@type":["Person","Organization"],"@id":"https:\/\/www.fabiogori.ai\/#\/schema\/person\/91e698d640a31468ff83c06886a33511","name":"igorfobia","image":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/www.fabiogori.ai\/#\/schema\/person\/image\/","url":"https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/02\/profile_fabio.jpg","contentUrl":"https:\/\/fabiogori.ai\/wp-content\/uploads\/2023\/02\/profile_fabio.jpg","width":2048,"height":2048,"caption":"igorfobia"},"logo":{"@id":"https:\/\/www.fabiogori.ai\/#\/schema\/person\/image\/"},"description":"Data Scientist for R&amp;D in life science and healthcare. I help org make the most of their data, small or big","sameAs":["https:\/\/fabiogori.ai","https:\/\/x.com\/igorfobia"]},{"@type":"Person","@id":"https:\/\/www.fabiogori.ai\/#\/schema\/person\/6cbc17c31f500556c04c39cafeb43429","name":"Fabio Gori","image":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/www.fabiogori.ai\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/eb6023542b9e3fa8b44eefbb6516c58004258416e4b325476b2ddc064d7c1b91?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/eb6023542b9e3fa8b44eefbb6516c58004258416e4b325476b2ddc064d7c1b91?s=96&d=mm&r=g","caption":"Fabio Gori"},"description":"As a Freelance Data Scientist with over 15 years of experience, I help companies leverage data to create value, innovation, and impact. I specialize in AI, R&amp;D, and life sciences, offering solutions for data analysis, machine learning, predictive modeling, software analytical validation, technical writing, and more. I have worked with leading organizations in various domains, such as BioNTech, AkzoNobel, T-Mobile, and several academic institutions. I have developed and delivered AI solutions for challenges in biotech, diagnostics, precision medicine, metagenomics, manufacturing, and telecom. I have also designed and written scientific grants, and tenders, as well as an Applied Data Science MSc program. I am passionate about bridging the gap between data science and domain expertise, and empowering my clients to make data-driven decisions. I was the employee number 5 of a start-up acquired by Silo AI one year after I left.","sameAs":["http:\/\/www.fabiogori.ai"],"url":"https:\/\/fabiogori.ai\/it\/author\/fabiogoriai\/"}]}},"jetpack_featured_media_url":"","uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"trp-custom-language-flag":false,"tenweb_optimizer_mobile":false,"tenweb_optimizer_tablet":false,"woocommerce_thumbnail":false,"woocommerce_single":false,"woocommerce_gallery_thumbnail":false},"uagb_author_info":{"display_name":"Fabio Gori","author_link":"https:\/\/fabiogori.ai\/it\/author\/fabiogoriai\/"},"uagb_comment_info":49,"uagb_excerpt":"Why a Good-Performing AI Model Could Actually Be a Bad Model","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/posts\/1611","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/comments?post=1611"}],"version-history":[{"count":5,"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/posts\/1611\/revisions"}],"predecessor-version":[{"id":1625,"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/posts\/1611\/revisions\/1625"}],"wp:attachment":[{"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/media?parent=1611"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/categories?post=1611"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fabiogori.ai\/it\/wp-json\/wp\/v2\/tags?post=1611"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}