ALGORITHM FOR CONSTRUCTIONAL CHARACTERISTICS DATA CLEANSING OF LARGE-SCALE PUBLIC BUILDINGS DATABASE
Free (open access)
213 - 224
HRVOJE KRSTIC, MIHAELA TENI
Research presented in this paper utilizes public-sector buildings database obtained from the Croatian Energy Management Information System (EMIS) which comprises over 3,500 public sector buildings. EMIS provides a transparent oversight and control of energy consumption, making itself an inevitable tool for systematic energy management. The EMIS database holds static technical data of each facility, including general, constructional data, energy performance data and dynamic energy usage data. But there are a lot of variables in a database with data values that are impossible, i.e. have values that are not logical or outside of possible, acceptable ranges, and they are probably the consequence of user input errors. Besides this, there are also cases with missing data. As previously stated, this raises the question: Is it possible to make an algorithm for data cleansing and find a way to calculate the missing data? To use the obtained database for further, more complex, analysis like clustering, machine learning and neural network applications, it is necessary to remove extreme values from the database. Research presented in this paper deals with this problem with an emphasis on buildings constructional characteristics and proposes a cleansing algorithm. As a result a possible range of variables and procedure for replacement of invalid input values is proposed. Research results and findings can be used in similar buildings databases to optimize the datasets and exclude variables with extreme values which can significantly impact modelling process. Further, the proposed algorithm can be useful when making decisions for energy refurbishment and building maintenance since it eliminates cases from the database that have misleading data. The presented results show that in some cases there are more than 80% of missing or excluded data. Findings can also be implemented in EMIS or a similar system to avoid further entering of unacceptable data values.
Energy Management Information System, public sector buildings database, building characteristics, building maintenance, building energy refurbishment