What customer of Boston Air-BNB would tell us?

Mohamed Gamal
3 min readJun 5, 2021

what is the Air BNB service is?

Airbnb is an online marketplace that connects people to rent out their homes with people who are looking for accommodations in that locale & For hosts, participating in Airbnb is a way to earn some income from their property, but with the risk that the guest might do damage to it.[1]


CRISP-DM Methodology

CRISP-DM stands for a cross-industry process for data mining. The CRISP-DM methodology provides a structured approach to planning a data mining project.

1. Business understanding

2. Data understanding

3. Data preparation

4. Modeling

5. Evaluation

6. Deployment

CRISP-DM cycle

A- Business understanding:

this is the first stages and what will guide you through analysis, in the business analysis step we are trying to detect our objective from a business perspective
for the Airbnb data set we will try to predict the price of unit based on some categorical and numerical parameters

B- Data Understanding:

the second stage of our selective cycle which acquiring the data listed in our data set; include (data loading, data exploring,….. etc )
Our data include 3 CSV file 1- calendar.csv 2- listings 3- reviews.
-by exploring data we found price variance across time after removing outliers As follow

Box-plot for price across date before removing outliers
Box-plot for price across date after removing outliers
distribution of price with time
Heat map for correlation between numerical values

some of the Sentiment to study the partiality and subjectivity of Customer Comments

first, we can check the word cloud for the word repeated in summary

C- Data preparation for price prediction

to get the best model we would include column only which share in the model
and the final result, some of them are numerical, and other as categorical which need to be dummies before start our model

D- Modeling & Evaluation

Linear regression model

After choosing the correct data effect on our model we used a linear regression model

linear Regression model for predicted and Real Price
R score of test and train datasets

Random Forest Regressing :
A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Random Forest Regression prediction
R score for Random forest Regression

F- Deploymet and Final Outcomes

used two models to predict the price based on some numerical and Categorical Variables; these two models are
1- linear Regression
2- Random Forest Regressor
and plots show the result from Random Forest Regressor more accurate and able to get the high score for both test and train set