Big Data to Pave the Way for Smart Cities
There have been a lot of discussions regarding the concept of a ‘Smart City’ for quite some time now. ‘Cities’ are now being considered ‘Future Smart Cities’. But, wait! What’s the whole concept of smart cities? Theoretically, smart cities can transform the lives of people at numerous levels, such as less garbage, no pollution, energy savings and a lot more. Though the idea at first seems very appealing, the actual implementation of the concept has always been a massive hurdle because of a number of reasons. Well, with the advent of big data analytics and numerous technology upgrades, ‘hype’ is going to turn into ‘reality’ in the near future.
By 2023, technological upgrades in the purview of smart cities will grow to an industry worth more than $25 billion if the figures revealed by New Jersey Institute of Technology (NJIT) are to be believed. There will be around 80 smart cities across the globe by 2025, the same study found out.
Popularly known as the ‘city-in-a-box’, the city of Songdo, South Korea is a perfect example of how it has been transformed big data analytics. Built with the vision of a connected city, developers of Songdo wired every inch of the smart city with fiber optic connection of broadband services. Dubbed as “The smart City of the future” by some, Songdo is equipped with eco-friendly models, a lower energy footprint and a lot of green space.
Numerous sensors in the city monitor traffic flow, pollution levels, temperature and energy use, among others. Data makes life of residents safe and secure. For example, children wearing sensor integrated bracelets can be tracked in case they go missing. Garbage Collection in the one-of-its-kind smart city will also generate data. The planners and developers of the Songdo are working on the model of eliminating garbage collection trucks. Other features of the city include smart energy grids and RFID (Radio Frequency Identification) tags on vehicles, among others.
The concept of smart cities might sound like a far-fetched dream, but some urban areas have already been leveraging it for some time now. Los Angeles’ interconnected network of LED bulbs is helping the local government know the condition of each bulb in the city. It will also result in brighter streets and will track the status of malfunctioning bulbs, if any, in the city. The Natural History Museum, located in Shanghai, has a unique shape. Though the idea of a spiral shape is partially taken from the shell of the pearly nautilus, it has the capability to control crowd movement by incorporating big data analytics.
The smart city chapter is still in the ‘I’ll get it when I see it’ phase but developers have already started exploring every possible aspect of the overall concept. From managing parking problems to reducing emissions and bringing down pollution levels, big data analytics has opened a new scope regarding what can be controlled in the cities leveraging big data.
Building and incorporating big data applications will require developers to address a new set of challenges following rigorous development and design models. With better understanding of the concept and success factors in the right place, the idea of building a smart city will be possible and further improve it for even smarter services and models will be a sustainable and an attainable goal. https://goo.gl/U98ShG #DataScience #Cloud
Image identification using a convolutional neural network
This blog explores a typical image identification task using a convolutional (“Deep Learning”) neural network. For this purpose we will use a simple JavaCNN packageby D.Persson, and make our example small and concise using the Python scripting language. This example can also be rewritten in Java, Groovy, JRuby or any scripting language supported by the Java virtual machine.
This example will use images in the grayscale format (PGM). The name “PGM” is an acronym derived from “Portable Gray Map” where cell values range from 0 – 255. The files are typically in the binary format (it has the magic value “P5” – you can see it by opening one of such files), but you can convert them to “Plain” (or uncompressed PGM), where each pixel in the raster is represented as an ASCII decimal number (of arbitrary size).
Our input images are from a publicly available database (CBCL Face Database MIT Center For Biological and Computation Learning). Let us copy a zip file with this (slightly modifies) database and unzip it. To do this, install the most recent DataMelt program, make a file “example.py” and run these commands using DataMelt: from jhplot import *
print Web.get(“http://jwork.org/dmelt/examples/data/mitcbcl_pgm_set2.zip”)
print IO.unzip(“mitcbcl_pgm_set2.zip”)
The last command unzips two directories “train” and “test”. You can omit “print”, which is only used to print the status of these commands. Each directory has images with faces (“face_*”) and some other images (“cmu_*). Note that ”_“ in the file name is important since this help identify image type. The "train” directory has about 1500 files with images of faces and 13700 files with other types of images. Let us look at one image and study its properties. We will use the IJ Java package. Append the following code to your previous lines: from ij import *
imp = IJ.openImage(“mitcbcl_pgm_set2/train/face_00001.pgm”)
print “Width:”, imp.width,“ Hight:”, imp.height
imp.show() # show this image in a frame
ip = imp.getProcessor().convertToFloat()
pixels = ip.getPixels() # get array of pixels
print pixels # print array with pixels
These commands show an image on the screen, print its size (19×19 pixels) and the 3D matrix with the PGM image values. Now let us create a code which will do the following:
* Reads the images from “train/” directory
* Reads the images from “test/” directory
* Initialize the CNN using several convolutional layers and pooling layers
* It runs over 50 iterations. You can increase or decrease this number depending on the required precision of image identification.
* During each iteration it calculates the probability for correct identification images with faces from the “test/” directory and saves the CNN to a file
* At the end of the training, it reads the trained CNN from the file and performs the final run over test images, printing the predictions.
Copy these lines and save in a file “example.py”. Then run this code inside the DataMelt: from jhplot import *
print Web.get(“http://jwork.org/dmelt/examples/data/mitcbcl_pgm_set2.zip”)
print IO.unzip(“mitcbcl_pgm_set2.zip”)
NMax=50 # Total runs. Reduce this number to get results faster
from org.ea.javacnn.data import DataBlock,OutputDefinition,TrainResult
from org.ea.javacnn.layers import DropoutLayer,FullyConnectedLayer,InputLayer,LocalResponseNormalizationLayer
from org.ea.javacnn.layers import ConvolutionLayer,RectifiedLinearUnitsLayer,PoolingLayer
from org.ea.javacnn.losslayers import SoftMaxLayer
from org.ea.javacnn.readers import ImageReader,MnistReader,PGMReader,Reader
from org.ea.javacnn.trainers import AdaGradTrainer,Trainer
from org.ea.javacnn import JavaCNN
from java.util import ArrayList,Arrays
from java.lang import System
layers = ArrayList(); de = OutputDefinition()
print “Total number of runs=”, NMax
print “Reading train sample..”
mr = PGMReader(“mitcbcl_pgm_set2/train/”)
print “Total number of trainning images=”,mr.size(),“ Nr of types=”,mr.numOfClasses()
print “Read test sample ..”
mrTest = PGMReader(“mitcbcl_pgm_set2/test/”)
print “Total number of test images=”,mrTest.size(),“ Nr of types=”,mrTest.numOfClasses()
modelName = “model.ser” # save NN to this file
layers.add(InputLayer(de, mr.getSizeX(), mr.getSizeY(), 1))
layers.add(ConvolutionLayer(de, 5, 32, 1, 2)) # uses different filters
layers.add(RectifiedLinearUnitsLayer()) # applies the non-saturating activation function
layers.add(PoolingLayer(de, 2,2, 0)) # creates a smaller zoomed out version
layers.add(ConvolutionLayer(de, 5, 64, 1, 2))
layers.add(RectifiedLinearUnitsLayer())
layers.add(PoolingLayer(de, 2,2, 0))
layers.add(FullyConnectedLayer(de, 1024))
layers.add(LocalResponseNormalizationLayer())
layers.add(DropoutLayer(de))
layers.add(FullyConnectedLayer(de, mr.numOfClasses()))
layers.add(SoftMaxLayer(de))
print “Training..”
net = JavaCNN(layers)
trainer = AdaGradTrainer(net, 20, 0.001)
from jarray import zeros
numberDistribution,correctPredictions = zeros(10, “i”),zeros(10, “i”)
start = System.currentTimeMillis()
db = DataBlock(mr.getSizeX(), mr.getSizeY(), 1, 0)
for j in range(NMax):
loss = 0
for i in range(mr.size()):
db.addImageData(mr.readNextImage(), mr.getMaxvalue())
tr = trainer.train(db, mr.readNextLabel())
loss = loss + tr.getLoss()
if (i != 0 and i % 500 == 0):
print “Nr of images: ”,i,“ Loss: ”,(loss/float(i))
print “Loss: ”, (loss / float(mr.size())), “ for run=”,j
mr.reset()
print ‘Wait.. Calculating predictions for labels=’, mr.getLabels()
Arrays.fill(correctPredictions, 0)
Arrays.fill(numberDistribution, 0)
for i in range(mrTest.size()):
db.addImageData(mrTest.readNextImage(), mr.getMaxvalue())
net.forward(db, False)
correct = mrTest.readNextLabel()
prediction = net.getPrediction()
if(correct == prediction): correctPredictions[correct] +=1
numberDistribution[correct] +=1
mrTest.reset()
print “ -> Testing time: ”,int(0.001*(System.currentTimeMillis() – start)),“ s”
print “ -> Current run:”,j
print net.getPredictions(correctPredictions, numberDistribution, mrTest.size(), mrTest.numOfClasses())
print “ -> Save current state to ”,modelName
net.saveModel(modelName)
print “Read trained network from ”,modelName,“ and make the final test”
cnn =net.loadModel(modelName)
Arrays.fill(correctPredictions, 0)
Arrays.fill(numberDistribution, 0)
for i in range(mrTest.size()):
db.addImageData(mrTest.readNextImage(), mr.getMaxvalue())
net.forward(db, False)
correct = mrTest.readNextLabel()
prediction = net.getPrediction()
if(correct == prediction): correctPredictions[correct] +=1
numberDistribution[correct] +=1
print “Final test:”
print net.getPredictions(correctPredictions, numberDistribution, mrTest.size(), mrTest.numOfClasses())
50 iterations usually take a few hours. The final probability to identify images with human faces will be close to 85%. Taking into account the complexity of this task, this is a rather decent performance. https://goo.gl/QeHEku #DataScience #Cloud