I have experience in teaming up and participating in Track 1 of the 2020 Samsung Card Data Analysis Contest. The project’s goal was to create a model to predict customers who will use online franchises next month based on customer information data using online franchises last month, and further find marketing strategies to increase the utilization rate of online franchises. We used Pandas, one of Python’s libraries, and chose Random Forest Classifier as our machine learning model. With the aim of establishing marketing strategies to induce customers who are not using franchises, we divided them into “customers who use franchises” and “unused customers” in the data preprocessing process, oversampled to increase modeling accuracy expectations, and trained with the aforementioned machine learning model. However, the model’s prediction accuracy was lower than expected, and it seems that it was eliminated due to such poor performance. The limitation of the production model was that it was not possible to analyze the characteristics of each category because the columns of the data given in the contest were not explicit, and I would like to learn the specific direction of how to analyze the data when it is de-identified
Last Completed Projects
| topic title | academic level | Writer | delivered |
|---|
