How to choose the best Airbnb in Rio de Janeiro: A data science approach

Adilson Vital Junior
5 min readJan 23, 2021
Rio de Janeiro, the wonderful city home to one of the seventh wonders of the world, the Christ the Redeemer.

Introduction

Have you wondered which qualities an Airbnb needs to have to satisfy you and deliver a delightful experience?

In this article, I will show you some of the things that you need to pay attention to during your decision so you will be possible to have a very good experience and not paying more than needed.

In this project, it was extracted from Inside Airbnb, all listings from Rio de Janeiro of November 2020. The goal is to analyze which characteristics and features are more related to the price and the quality of a place, utilizing some data science and machine learning techniques highlighting some of the main correlations.

Are the amenities important for your experience? If yes, which amenities are the most? Are these amenities correlated with the price?

Yes, they are!

As we can see through the correlations below, in figure 1, the amenities have a high positive correlation with the review scores, above all, the kitchen items like dish and silverware, refrigerator, hot water, and cooking basics are the most important ones. Possibly, these items should help immensely guests who want to be able to make their meals giving extra comfort and optimizing their experience.

Other amenities like long-term stay allowed, host greeting you, luggage drop off allowed and free street parking are some of the most appreciated services as well. Services like that show a hosting commitment and responsibility giving security for the guests.

An important factor to notice is the fact that these amenities that improve the experience is the same that has a negative correlation with price, that is, the higher the number of good amenities included, the lower the price, which is awesome for the guests! :)

Figure 1 — Top correlations with the review scores and the listing’s price.

Is the host a fundamental factor for the host experience? If yes, what can a host do to improve their services?

Yes, they are, and a LOT!

Since the number of host verifications being very correlated with the scores until the response time, which is also a factor very correlated, the faster the response more correlated with the review score, and the more delayed, the more negatively correlated, this can be seen in figure 3. Seems that people tend to see with good eyes hosts committed to responding as soon as possible.

The host acceptance/response rate, host be super host, and identity verified, are also some of the host-related features that can improve the service provided, again showing commitment by the hosts and security for the guests.

So as we can see, if a host wants to improve their service, they need to give more security and be committed to the platform, do the maximum number of verifications, answer the hosts as faster as possible, and have a good acceptance.

Figure 2— Top numerical features correlations for the price and the scores.
Figure 3— Top correlations for the host response time and the scores.

Is the neighborhood important for the price? If yes, which one is the most expensive? Are they correlated with the users’ experience?

Sure, yes, they are!

As we can see by the average neighborhood price in figure 4, some of them like São Cristovão and Boa Vista, are some of the most expansive ones. Having a considerable correlation with price. Also, by figure 5, only two locations appear to deliver a good experience for the users, Copacabana and Ipanema, and actually, these are the two of the most famous and beautiful places to be in Rio de Janeiro.

At the same time, all the other locations seem to have a negative correlation with the scores, maybe this is something to pay attention to, right?

Figure 4 — Average price for each neighborhood.
Figure 5 — Top correlations of the neighborhoods an

Which feature influences more the price? And the review scores?

Using an xgboost regression model over the price, and another model for the review scores, we can see that the position (latitude and longitude), the commitment of the host through a good acceptance/response rate, the number of amenities, and a nice response time are some of the most important features for the review scores.

For the price, again the position is the most significant one, a fact confirmed by the average price by neighborhood, the number of accommodates, bedrooms, bathrooms, and beds are also very important, and as we saw in the correlations, the higher the number the higher the price. The number of amenities can also influence the price, as we saw before, places with kitchens’ items tend to have low prices, and at the same time, deliver a good experience for the users and receiving nice reviews for it.

Figure 6 — Most important features for the review scores, throughout the Xgboost regression model.
Figure 7— Most important features for the price, throughout the Xgboost regression model.

Conclusions

In the end, have a nice kitchen will certainly surprise the guests and deliver a very good experience for them. Also, the commitment of the host is something that can change everything, answer guests’ questions and as soon as possible. And finally, choose wisely the number of rooms, this will certainly influence the price. :)

More Info?

Want to see this analysis in-depth? Go to my Github!

You can also stay in touch with me through my LinkedIn, will be a pleasure :)

--

--

Adilson Vital Junior

Alchemy apprentice, enthusiastic about machine learning and curious about the world. Love teaching math and science to my little sister.