Posted: 2017-10-29

Apartments for rent in Lublin - exploration

Keywords: data mining, data analysis, python, pandas

Let's say you got a job offer from Lublin. Unfortunately remote work is out of question, you need to migrate. Long story short, you need an apartment, but how to get about it? Maybe you should contact a real estate agent? Maybe click thru all the online postings? Maybe filter the posting prior search based on some initial assumptions?

Goals

Determine the state of the rental real estate market in Lublin as of October 2017. Assumptions:

  • We may spend tops 2500 PLN as of 2017 cash for rent.
  • We want an apartment that is no less that 40m2 in area.
  • The distance to the city centre should be less than 3km.
  • There should be at least a convenience store 1000m from the apartment.
  • We want a decent finish, no past era apartments, not rat-traps

Exploration

Let's start with plotting a histogram of apartment rent prices. If the rental cost exceeds the salary from our job offer, hunting for apartments is pointless since we won't be able to afford an apartment.

Figure 1. Apartment rent prices

It seems like most of the apartments for rent are listed for around 2000 PLN (Figure 1.), so that's within our reach. Now it would be neat if we could somehow plot all those listing on a map, to get a sense where those apartments are. Additionally let's colour code the markers. Dark blue cat_1 = (0,1000], light blue cat_2 = (1000,2000], green cat_3 = (2000,2500], orange cat_4 = (2500,3000], red cat_5= (3000, inf.)

Figure 2. Colour coded locations of apartments for rent

Most listings are located in the city centre and along the main routes (Figure 2.). Let's check the price distribution based on previously assigned categories.

Figure 3. Box plot price vs price category

There are 18 records in cat_1, 305 in cat_2, 135 in cat_3, 65 in cat_4, and 66 in cat_5. Now let's check the distribution of apartment sizes based on their rent price category (Figure 3.).

Figure 4. Box plot space vs price category

We can see that there seems to be a positive trend (Figure 4.), between the amount of rent and apartment size.

Figure 5. Price vs space

Indeed there is a positive correlation between the apartment's price and it's size (Figure 5.). Bigger apartments tend to cost more.

Now let's check if there is any correlation between the apartment location and it's price. Since we have the latitude and longitude of the listings we can compute the commute time from the apartment to the city centre. However that would require some dense Google Maps API usage. So instead let's just calculate the shortest distance from the apartment to a point in the city centre. It should be an ok. approximation of the commute time.

Figure 6. Price vs distance to city centre

There seems to be a very low negative correlation (Figure 6.) between the rent cost and distance to the city centre. Apartments closer to the city centre cost more. Is there a correlation between apartment size and distance to city centre?

Figure 7. Space vs distance to city centre

This time it seems like there is no correlation (Figure 7.) between apartment size and distance to city centre. What about distance to nearest convenience store and apartments.

Figure 8. Price vs distance to the closets convenience store

Again no correlation (Figure 8.), it seems like distance to the nearest convenience isn't one of the factors taken int account when pricing an apartment for rent. Time to limit the amount of listings to search thru. We filter the data set using our initial assumptions for the city centre and convenience store maximum distance from the apartment. Using this filter data we plot a price histogram.

Figure 9. Filtered price histogram

Great there are apartments that fulfil our assumptions (Figure 9.). Now let's add rest of the initial assumptions and plot a box plot of price distributions vs price category.

Figure 10. Filtered box plot price vs price category

We have 98 listings for apartments between 1000 and 2000 PLN per month, and 38 between 2000 and 2500 PLN per month (Figure 10.). Now we can manually search the filtered postings for a great deal.

Code grimoire

Geographical coordinates

Getting latitude and longitude from address using geopy

from geopy.geocoders import Nominatim

location_list = []
for row in df.itertuples():
    location_list.append(row.location)
lat = []
long = []
for i in location_list:
    try:
        geolocator = Nominatim()
        place = geolocator.geocode(i)
        lat.append(float(place.latitude))
        long.append(float(place.longitude))
    except AttributeError:
        result.append(int(-1))
df['latitude'] = pd.Series(lat, index=df.index)
df['longitude'] = pd.Series(long, index = df.index)

Plotting points on a map

Google provides some usable tutorials on their website, regarding the Google Maps APIs, anyway I combined the code from the custom markers guide with the polygon example code for the city border. I wasn't able to find a way to get the city border to work out of the box straight from the Google's APIs, so here's the work around:

  1. Go to http://nominatim.openstreetmap.org/ and search for the region you want to plot a border for.
  2. Click on details and copy the OSM number.
  3. Go to http://polygons.openstreetmap.fr/ and search for the OSM number.
  4. Download the latitude/longitude set in your preferred format and implement in (I used the above mentioned polygon example code).

Calculating distance between two points - method 1

The distance between two points on a sphere is calculated using the Great-circle distance formula.

def get_distance(lat,long,target_lat,target_long):
    R = 6371e3
    fi1 = math.radians(lat)
    fi2 = math.radians(target_lat)
    d_fi = math.radians(target_lat-lat)
    d_lambda = math.radians(target_long-long)
    a = (math.sin(d_fi / 2))**2 + (math.cos(fi1) * math.cos(fi2) * (math.sin(d_lambda/2))**2)
    c = 2 * math.atan2(math.sqrt(a),math.sqrt(1-a))
    d = math.modf(R * c)[1]
    return d

Calculating distance between two points - method 2

The distance between two points on a sphere is calculated using the build in geopy function.

from geopy.distance import great_circle

dist = great_circle((lat,long),(shop_lat,shop_long)).meters

Summary

  • Most apartments are for rent for around 2000 PLN in 2017 cash.
  • There is no correlation between distance to closest convenience store and rent price.
  • There is no correlation between apartment area and distance to city centre.
  • There is very low negative correlation between rent price and distance to city centre.
  • There is a positive correlation between rent price and apartment size.

This exploratory analysis allowed us to cut down the amount of listings to check by 76%. The data set along with the jupyter notebook may be cloned from this github repo.

DB