What customer of Boston Air-BNB would tell us?
what is the Air BNB service is?
Airbnb is an online marketplace that connects people to rent out their homes with people who are looking for accommodations in that locale & For hosts, participating in Airbnb is a way to earn some income from their property, but with the risk that the guest might do damage to it.
CRISP-DM stands for a cross-industry process for data mining. The CRISP-DM methodology provides a structured approach to planning a data mining project.
1. Business understanding
2. Data understanding
3. Data preparation
A- Business understanding:
this is the first stages and what will guide you through analysis, in the business analysis step we are trying to detect our objective from a business perspective
for the Airbnb data set we will try to predict the price of unit based on some categorical and numerical parameters
B- Data Understanding:
the second stage of our selective cycle which acquiring the data listed in our data set; include (data loading, data exploring,….. etc )
Our data include 3 CSV file 1- calendar.csv 2- listings 3- reviews.
-by exploring data we found price variance across time after removing outliers As follow
some of the Sentiment to study the partiality and subjectivity of Customer Comments
first, we can check the word cloud for the word repeated in summary
C- Data preparation for price prediction
to get the best model we would include column only which share in the model
and the final result, some of them are numerical, and other as categorical which need to be dummies before start our model
D- Modeling & Evaluation
Linear regression model
After choosing the correct data effect on our model we used a linear regression model
Random Forest Regressing :
A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
F- Deploymet and Final Outcomes
used two models to predict the price based on some numerical and Categorical Variables; these two models are
1- linear Regression
2- Random Forest Regressor
and plots show the result from Random Forest Regressor more accurate and able to get the high score for both test and train set