This solution aims to create a powerful decision support system with rich data visualizations. It does this by identifying relations between various data sources available in data.gov.in and specialized data marts are created so as to enable users to make informed decisions.
This solution is designed to inspire users to put on the thinking hat and provoke creativity by providing greater in-depth to the analysis that can be done over government data.
The solution shall build an Operation Data Store, DW and Data Marts which pulls data from data.gov.in datasets. Data which is received as csv format will be Staged to a SQL Server Database, SQL Server Integration Services packages will be used for performing the ETL operations.
Data Quality Rules will be defined for domains using Data Quality Services. SSIS packages shall use the domain rules defined in data quality services to quality check and cleanse the data. Kimball Data Quality Screen process will be implemented to log Quality of data and data quality reports would be generated using Reporting services.
The solution shall provide a Master Data Management tool using Master Data Services which the Government authorized members can edit. For e.g. Regions within a country is managed by Government of India – in the recent case of Telengana becoming a state, an authorized person can use the MDM interface to create an entry in State Attribute of the MDM entity Region.
The ODS shall be modelled in our solution, i.e. Region shall be one entity of the ODS, The other Hierarchies in the Entity shall be defined, say country -> State-> District -> Corporation etc. When the data is received from different sources it shall go though ETL process which loads the data to these entities. The ODS shall be exposed as an Open Data through REST APIs which public can access (read only), and authorized people can write too.
Data warehouse shall be modelled following top-down approach. The data from the ODS gets moved to a Data Warehouse. Then Data Marts shall be created for each sections. For Example trade analysis shall be a Data mart that is provided for accessing trade data based on data from the Ministry of Commerce. Accident Analysis is another Data Mart that is build based on the data from Mistry of Road Transport and so forth.
The public can access the data using a Portal which will provide rich Data visualization categorized by subject area – Health, Road Accidents, Trade, Agriculture etc.
Government/NGO/public – The solution will enable public user to visualize and analyze open data using data marts designed for each subject areas
Government/Government Statistical Departments – Improve the quality of data by setting rules for data cleansing with the help of our technical team.
Media/ Internet Feeds – The solution will implement an operational data store with data retrieved from various datasets, A RESTFUL Web service will be built over this ODS that will provide data in a XML/JSON format that the third party could interpret and present is a different format, for e.g. Charts
All devices supported by Power BI tools – Windows mobile primarily
SharePoint 2013 Website shall be accessible from mobile browsers
Data Loading from Sources / ETL– SQL Server Integration Services 2012
Data Cleanse and Quality control – Data Quality Services
Master Data Management – Master Data Services
Data Storage – SQL Server 2012
Data Marts – SSAS Tabular and Multidimensional
Portal – SharePoint 2013
Data Visualization – Power View / Reporting Services
Screen shots of the solution
This is the login page. Users can log in to the system and based on the role – Citizen or Government appropriate modules to access will be shown
Best place to live analysis
This is one of the sample pre-defined analysis available in the system available for public as well as government use.
Government user landing page
Reflections follow a standard architecture in ETL process from staging through the data-mart. We have a well-defined format for our SSIS packages. As shown below, data load packages ideally will have 4 steps –
1. Capture the audit log – what package is being loaded, the start time of the package
2. Delete existing data in the table (staging only)
3. Use data flow transformations to load new data
4. Complete the package by another audit task that capture – End time of the package and number of records loaded
The data in staging is not strongly types. Staging tables will have nvarchar fields so as to accommodate all incoming data during data load. This data will be loaded to the Quality DB by type casting the data. Errors during type casting or data insert to Quality DB will be pushed to the Bad DB. Correct data will be send to Quality DB for data quality and cleansing activities.
Data from the Quality DB is passed through a Data Quality Services Transformation to apply cleansing rules to identify the valid/invalid data and correct the invalid data if possible. Valid/Corrected data moves to Clean DB and the invalid data is pushed to Bad DB
In the DQS transformation, we have to map the incoming data fields to an already created domain in DQS client. The rules defined in the domain will then be applied to the data field.
In the data quality services client, we need to create a knowledge base and then a domain. The domain will have the set of rules defined. This will be published to a knowledge base to validate the data. Then with existing data we can perform a knowledge discovery to identify the new data in the database. Based on this data, the rules can be modified and republished.
DQS Domain Rules
This is where we can specify the valid and invalid values for a domain. We can add multiple rules for a single domain. Once the domain is created, it can be published.
DQS Domain Values
Once we have the domain rules created, we can identify the domain values and suggest corrections to data. Output of data validated using these rules will have – valid values, corrected values and the invalid data. The valid/corrected data will be sent to a Clean DB, then to datawarehouse and finally to specialized data marts for further analysis. The bad data will be pushed to Bad DB. A data quality report will be generated based on Bad DB statistics. This is explained further below.
Master Data Services
We use Master Data Services as our master data management tool. Reflections will define the master entities. Authorized government personnel can add/update the master data. This will ensure that the same golden record is available for all the government data that this system process.
MDS – Explorer
Authorized government users can add or edit data to in master tables.
MDS – Manage users
Reflections can add users to the MDS system using the security settings of MDS. User added can be validated against the active directory. User groups can be added as well.
MDS – Manage permissions
We can set the permissions for the users/groups in MDS at various levels. The security settings can be applied to a model, entity, or at an attribute level. This means a user can be granted access to entire database or to a single column in a table. The same applies in the case of access denial as well.
The ETL Framework
Data Quality Reports
These reports help understand the quality of data over time. This is help in bringing down data quality issue and hence raise the standards of data being fed to the system
Other specialized analysis – Accident Analysis
Other specialized analysis – Industrial GDP Analysis