Skip to main content
Add a list examples if we want to use the country information in a specific way
Source Link
etiennedm
  • 1.5k
  • 7
  • 13

I don't think there is a unique rule to answer that. It strongly depends on how pertinent is the country informationIt strongly depends on how pertinent is the country information regarding other input data and what you want to predict.

It is possible to face cases where similar input data in different countries lead to different outputs. In that case, it would be mandatory either to add the country as input information or to create a model per country.

In other cases, the country information would not lead to any improvement in the model (so no need to do a specific model per country).

Finally, there are cases for which you will find global information (whatever country) and specific information per country. In that case, there are multiple approaches to deal with it. The first and most common is to include the country as input of your global model. As @Fnguyen mentioned, why dealing with the country differently than other inputs?

Update

If you think that the country has a specific impact on the prediction, here are a non-exhaustive list of how you could create models that deal with your assumption:

  • Using transfer learning, train a global model to capture general trends, then train the same model on the specific countries starting from the global. You may still not be able to capture specific country effects.
  • Using boosting method: train a first classifier on all countries and then train a model per country that does boosting on the output of the global trained classifier. In that way you will keep the global trends and then use specific country information.
  • Using bagging method: train some classifier(s) on all countries, others on a specific country, and then you can combine them parellel to each others in one big model per country.
  • A specific example using NNs: train a global model and one model per country. Then use per country a model that combines both the global and the specific you trained before and only retrain the 'head'. For instance if using DNNs / CNNs, you only retrain the green part of the final model: gloabl-local_NN_combination

The list is non-exhaustive and you must have a good reason to use such approaches which give more importance to the country information. Normally, the machine learning algorithms would do it on their own.

I don't think there is a unique rule to answer that. It strongly depends on how pertinent is the country information regarding other input data and what you want to predict.

It is possible to face cases where similar input data in different countries lead to different outputs. In that case, it would be mandatory either to add the country as input information or to create a model per country.

In other cases, the country information would not lead to any improvement in the model (so no need to do a specific model per country).

Finally, there are cases for which you will find global information (whatever country) and specific information per country. In that case, there are multiple approaches to deal with it. The first and most common is to include the country as input of your global model. As @Fnguyen mentioned, why dealing with the country differently than other inputs?

I don't think there is a unique rule to answer that. It strongly depends on how pertinent is the country information regarding other input data and what you want to predict.

It is possible to face cases where similar input data in different countries lead to different outputs. In that case, it would be mandatory either to add the country as input information or to create a model per country.

In other cases, the country information would not lead to any improvement in the model (so no need to do a specific model per country).

Finally, there are cases for which you will find global information (whatever country) and specific information per country. In that case, there are multiple approaches to deal with it. The first and most common is to include the country as input of your global model. As @Fnguyen mentioned, why dealing with the country differently than other inputs?

Update

If you think that the country has a specific impact on the prediction, here are a non-exhaustive list of how you could create models that deal with your assumption:

  • Using transfer learning, train a global model to capture general trends, then train the same model on the specific countries starting from the global. You may still not be able to capture specific country effects.
  • Using boosting method: train a first classifier on all countries and then train a model per country that does boosting on the output of the global trained classifier. In that way you will keep the global trends and then use specific country information.
  • Using bagging method: train some classifier(s) on all countries, others on a specific country, and then you can combine them parellel to each others in one big model per country.
  • A specific example using NNs: train a global model and one model per country. Then use per country a model that combines both the global and the specific you trained before and only retrain the 'head'. For instance if using DNNs / CNNs, you only retrain the green part of the final model: gloabl-local_NN_combination

The list is non-exhaustive and you must have a good reason to use such approaches which give more importance to the country information. Normally, the machine learning algorithms would do it on their own.

Source Link
etiennedm
  • 1.5k
  • 7
  • 13

I don't think there is a unique rule to answer that. It strongly depends on how pertinent is the country information regarding other input data and what you want to predict.

It is possible to face cases where similar input data in different countries lead to different outputs. In that case, it would be mandatory either to add the country as input information or to create a model per country.

In other cases, the country information would not lead to any improvement in the model (so no need to do a specific model per country).

Finally, there are cases for which you will find global information (whatever country) and specific information per country. In that case, there are multiple approaches to deal with it. The first and most common is to include the country as input of your global model. As @Fnguyen mentioned, why dealing with the country differently than other inputs?