Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Mining - Data Warehouse Security, Study notes of Data Mining

Detail Summery about Data Warehouse Security, SECURITY, User Access Hierarchy , Backup and Recovery, Testing the data warehouse, References .

Typology: Study notes

2010/2011

Uploaded on 09/04/2011

amit-mohta
amit-mohta 🇮🇳

4.2

(152)

89 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
November 25, 2014 Data Mining: Concepts and
Techniques 1
Data Warehouse Security
Responsibility and Confidentiality : The Data Warehouse contains
confidential and sensitive University data. In order to use its data, you
must have proper authorization. Your authorization means that you have
the authority to use the data and the responsibility to share stewardship
of the data with the other users of the collection.
Once authorized, you can access the data that you need to do your job.
All authorized users are cautioned, however, that they are entrusted to
use the data they retrieve from the Warehouse with care. Confidential
data should not be released to others except for those with a "legitimate
need to know."
Please remember that you should never share Business Objects queries
with other users with the data intact -- send the query without the data.
More information about sending and saving Business Objects documents.
Querying Data with Security Restrictions : If you execute a query
requesting data that you are not authorized to access, you will get results
which may be incomplete because they are missing the data you are not
allowed to access.
If your authorization is limited to a specific set of data, be sure when
querying the data that your record selection conditions include your
security restrictions. For example, if you are authorized to access just
data for a particular department, one of your record selection conditions
should state something like "If Organization= 'My Organization'," where
My Organization is the code of your department. This will document why
the query gets the results it does, and will also help your query run faster.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Data Mining - Data Warehouse Security and more Study notes Data Mining in PDF only on Docsity!

November 25, 2014 Data Mining: Concepts and 1

Data Warehouse Security

  • (^) Responsibility and Confidentiality : The Data Warehouse contains confidential and sensitive University data. In order to use its data, you must have proper authorization. Your authorization means that you have the authority to use the data and the responsibility to share stewardship of the data with the other users of the collection.
  • (^) Once authorized, you can access the data that you need to do your job. All authorized users are cautioned, however, that they are entrusted to use the data they retrieve from the Warehouse with care. Confidential data should not be released to others except for those with a "legitimate need to know."
  • (^) Please remember that you should never share Business Objects queries with other users with the data intact -- send the query without the data. More information about sending and saving Business Objects documents.
  • (^) Querying Data with Security Restrictions : If you execute a query requesting data that you are not authorized to access, you will get results which may be incomplete because they are missing the data you are not allowed to access.
  • (^) If your authorization is limited to a specific set of data, be sure when querying the data that your record selection conditions include your security restrictions. For example, if you are authorized to access just data for a particular department, one of your record selection conditions should state something like "If Organization= 'My Organization'," where My Organization is the code of your department. This will document why the query gets the results it does, and will also help your query run faster.

November 25, 2014 Data Mining: Concepts and 2

SECURITY

  • (^) A DWH by nature is an open accessible system. The aim of DWH is generally to make large amounts of data easily accessible to the users, thereby enabling the users to extract information about the business as a whole.
  • (^) It is important to establish early any security and audit requirements that will be placed on the DWH.
  • (^) Clearly, adding security will affect performance because further checks require CPU cycles and time to perform.
  • (^) Requirement : Security can affect many different parts of the DWH such as :
    • (^) User access – can be done by
      • (^) Data classification
        • (^) By Level of security required
        • (^) By Data sensitivity
        • (^) By job Function
      • (^) User Classification
        • (^) Top-down company hierarchy (department , section, group)
        • (^) By Role
    • (^) Data load
    • (^) Legal Requirements : it is vital to establish any legal requirements (law) on the data being stored.
    • (^) Audit Requirements : such as connections, disconnections, data access, data change.
    • (^) Network Requirements : (routes)

November 25, 2014 Data Mining: Concepts and Techniques 4 User Access Hierarchy Detailed Sales data Detailed Customer Data Summariz ed sales data Reference Data Administrators Senior Analyst Administer Administer Sales Marketing Data Warehouse Inc. Analyst Analyst Analyst Analyst Analyst Analyst Analyst

November 25, 2014 Data Mining: Concepts and 5

Backup and Recovery

  • (^) Backup is one of the most important regular operations carried out on any system.
  • (^) It is important in the DWH environment because of the volumes of data involved and the complexity of the system.
  • (^) Types of Backup
    • (^) In a complete backup , the entire database is backedup at the same time. This includes all database data files, the control files and the journal files.
    • (^) Partial backup is any backup that is not complete.
    • (^) A cold backup is a backup that is taken while the database is completely shutdown. In a multi-instance environment, all instances of that database must be shut down.
    • (^) Hot backup : any backup that is not cold is considered to be hot.
    • (^) Online backup is a synonym for hot backup

November 25, 2014 Data Mining: Concepts and 7

Backup and Recovery

  • (^) Software (Omniback II, ADSM, Alexandria, Epoch, Networker)
    • (^) Performance- such as degree of parallelism, I/O bottlenecks
    • (^) Requirements : When considering which backup package to use it is important to check the following criteria. - (^) What degree of parallelism is possible? - (^) How scalable is the product as tape drives are added? - (^) What platforms are supported by the package? - (^) What tape drives and tape media are supported by the package? - (^) Does the package support easy access to information about tape contents?
  • (^) Backup strategies:
    • (^) Effect on database Design- such as DB partitioning strategies.
    • (^) Design Strategies -main aim should be to reduce the amount of data that has to be backed up on regular basis, e.g. Read- only tablespace, automation of backup

November 25, 2014 Data Mining: Concepts and 8

Backup and Recovery

  • (^) Recovery Strategies- depend on kind of failure and consist of a set of failure scenarios & their resolution. Each of the following failure scenarios indicated below needs to be centred for recovery steps and must be documented:
    1. Instance Failure 2. Media failure
    2. Loss or damage of table space or data file 4. Loss or damage of a table
    3. Loss or damage of control file 6. Failure during data movement The plan must be made for following data movement scenarios :
    4. Data load into staging tables
    5. Movement from staging to fact table
    6. Partition roll-up into larger partitions
    7. Creation of Aggregations.
  • (^) Testing the Strategy- The backup and recovery tests need to be carried out on a regular basis, but it is advisable to avoid performing tests at busy times such as end of year and try to test to be run at low load in the business year.

November 25, 2014 Data Mining: Concepts and 10 Tuning the Data Warehouse Tuning the data warehouse deal with the measures such as  Average query response times  scan rates  I/O throughput rates  Time used per query (fixed or ad-hoc)  No. of users in the group  Whether they use adhoc queries frequently or occasionally at unknown intervals or at regular or predictable times  The average / maximum size of query they tend to run  The peak time of daily usage  The more unpredictable the load, the larger the queries, or the greater the number of users the bigger the tuning task.  Memory usage per process

November 25, 2014 Data Mining: Concepts and 11 Testing the data warehouse  (^) Three levels of testing  Unit testing : each development unit is tested on its own  Integration testing : the separate development units that make up a component of DWH application are tested to ensure that they work together.  System Testing :the whole DWH application is tested together. The components are tested to ensure that they work properly together, that they don’t cause system bottlenecks.  (^) Developing the Test Plan  Test Schedule – metrics for estimating the amount of time required for testing  Data Load  How will the data be generated?  Where will the data be generated?  How will the generated data be loaded?  Will the data be correctly skewed?

November 25, 2014 Data Mining: Concepts and 13

Testing the data warehouse

  • (^) Scheduler – Given the possibility for many of the processes in the DHW to swamp the system resources if allowed to run at the wrong time, scheduling control of these processes is essential to the success of the DWH.
  • (^) Management Tools – (event / system / configuration / backup recovery / database)
  • (^) Database Management Testing the database : It can be broken down into three separate sets of tests:
  • (^) Testing the database manager and monitoring tools (creation, running & management of the test database)
  • (^) Testing database features (querying / create index / data load in parallel)
  • (^) Testing database performance (test queries with different aggregations, index strategies , degree of parallel, different-sized data sets)
  • (^) Testing the application
  • (^) Logistic of the Test (DWH application code, day-to-day operational procedures, backup recovery strategy, query performance, management & monitoring tools, scheduling software)

November 25, 2014 Data Mining: Concepts and 14 Chapter 3: Data Warehousing and OLAP Technology: An Overview

  • (^) What is a data warehouse?
  • (^) A multi-dimensional data model
  • (^) Data warehouse architecture
  • (^) Data warehouse implementation
  • (^) From data warehousing to data mining
  • (^) Summary

November 25, 2014 Data Mining: Concepts and 16

References (II)

  • C. Imhoff, N. Galemmo, and J. G. Geiger. Mastering Data Warehouse Design: Relational and Dimensional Techniques. John Wiley, 2003
  • (^) W. H. Inmon. Building the Data Warehouse. John Wiley, 1996
  • (^) R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 2ed. John Wiley, 2002
  • (^) P. O'Neil and D. Quass. Improved query performance with variant indexes. SIGMOD'
  • Microsoft. OLEDB for OLAP programmer's reference version 1.0. In http://www.microsoft.com/data/oledb/olap, 1998
  • (^) A. Shoshani. OLAP and statistical databases: Similarities and differences. PODS’00.
  • (^) S. Sarawagi and M. Stonebraker. Efficient organization of large multidimensional arrays. ICDE'
  • (^) OLAP council. MDAPI specification version 2.0. In http://www.olapcouncil.org/research/apily.htm, 1998
  • (^) E. Thomsen. OLAP Solutions: Building Multidimensional Information Systems. John Wiley, 1997
  • (^) P. Valduriez. Join indices. ACM Trans. Database Systems, 12:218-246, 1987.
  • (^) J. Widom. Research problems in data warehousing. CIKM’95.