NYC Planning’s Data Engineering team is transforming the way we think about Open Data in government by opening up the processes behind making public datasets. As data engineers we develop new data products and modernize the creation of existing datasets, such as PLUTO, which is NYC’s definitive tax lot dataset that contains over 800k rows and 87 columns capturing lot level, building level, and geospatial attributes sourced from a dozen input data sources. During this talk we’ll show how we re-engineered PLUTO and made sure that the data matched previous versions, discuss why it is important to us that the code to build PLUTO is available on GitHub, and describe where we’re going next, as an example of the type of work that we do.
Though, we’re not just excited about the data products we build, we’re equally passionate about how we build them. In the later portion of the talk we’ll do a deep dive into a couple of the core technologies we use that enable us to iteratively integrate improvements, distribute our maintenance responsibilities, and generate products efficiently.