Amazon EC2 Spot instances from Amazon Web Services (AWS) allow us to bid on spare Amazon EC2 computing capacity. Since Spot instances are often available at a discount compared to On-Demand pricing, we can significantly reduce the cost of running our applications. However, availability and price of spot instances are driven by market demand. There is no guarantee of availability. If there is a bid higher than our bid or if the spots are taken away to serve On-Demand customers the instance sessions could be terminated on short-notice. In this work, we have developed an analytics framework for retrieving historical spot price data through Web-APIs, building models through linear, polynomial, multiple linear and knn regression and predicting availability for current scenario. Data for current scenario is retrieved through web scraping.
Fitbit is a popular activity tracker that measure data such as the number of steps walked, heart rate, quality of sleep, steps climbed, and other personal metrics involved in fitnes. Users can sync Fitbit with the mobile app which provides a dashboard. In addition, fitbit allows users to download the raw data for the users to analyze their activities themselves. In this project, we analyzed Fitbit data of several users and compared their activities. We analyzed seasonality of activities (Weekdays vs. Weekends) and sleep patterns. We also developed statistical models to obtain empirical relationship between number of steps and calories burnt based on age, gender, height and weight.
Most complex Machine Learning algorithms are developed and used for Image Processing applications. Hyperspectral Image processing has niche application areas of which vegetation classification is an important one. In this project, AVIRIS sensor data from a test site was processed and classified 16 types of crops. Dimensionality reduction techniques like PCA, ICA and classification algorithms like Support Vector Machine and Linear Discriminant Analysis were developed. Classification accuracy of 80% was obtained. Spectral Unmixing techniques were also evaluated.
Analyzing the portfolio of companies is a key activity for a investment banking companies. In this project, a dashboard was built to compare key portfolio metrics of different companies. Recommendation engine was developed to identify list of companies which could potentially be acquired
There is an increasing concern over health issues especially for women. There are numerous diagnostic tests; however not all might be releavant for an individual. Typically, there are defined set of master health checkups available. There are several combinations of possible health complications which makes it hard for a patient to select a particular health checkup package. In this project, classification models were built based on historical data of patients (health complications and the recommended checkup package by doctor). Decision trees were built which makes it easy for patients to choose the right package for them.
Achieving high throughput is the core target for any manufacturing company. However, the target throughput should be realistic for the plant floor managers to achieve. In this project historical throughputs for different shifts and different manufacturing processes were analyzed for a steel manufacturing company. Models were developed to predict the throughput based on the day of the week, operating shift, thickness of steel etc. This exercise also helped the manufacturing company to provide committment to their clients on delivery of finished piece.
Analyzing Macro Economics of a country is very important not only for the policy makers but also for people doing business in that country. For real estate companies, unemployment rate and new housing constructions are key macro economic factors which has to be monitored. In this project, historical trends of Unemployment rate and new housing constructions of United States is analyzed. The analysis was used to forecast the future of real estate market