[ 2015-05-28 ]

Space Management and Oracle Direct Path Load

Most of you will be familiar with the concept of direct path load and how it’s an efficient way to load large volumes of data into an Oracle database as well as being a great technique to use when moving and transforming data inside the database. It’s easy to use in conjunction with parallel execution too, so you can scale out and use multiple CPUs and database servers if you need to process more data in less time.
Probably less well known is how the Oracle database manages space during direct path load operations. This is understandable because the Oracle database uses a variety of approaches and it has not always been very obvious which one it has chosen. The good news is that Oracle Database 12c from version 12.1.0.2 onwards makes this information visible in the SQL execution plan and it is also possible to understand what’s happening inside earlier releases with a bit of help from this post and some scripts I’ve linked to at the end.
Why isn’t there a one-size-fits-all approach for space management? Simply put, direct path load has to work well in a wide variety of different circumstances. It is used to fill non-partitioned and partitioned tables, which means that it must work well with a small or large number of database segments. It must operate serially or in parallel, be confined to a single database instance or distributed across an entire RAC database.There are some subtle complexities to take into account too, such as data skew. For example, some table partitions might contain much more data than others. Successful parallel execution depends on the even distribution of workloads across a number of parallel execution servers. If this isn’t the case then some of the servers will finish early, leaving them with no useful work to do. This results in a low effective degree of parallelism, poor machine resource utilization and an extended elapsed time. Avoiding issues associated with data skew increases the complexity of space management because it’s usually not appropriate to simply map each parallel execution server to an individual database segment. 
    The strategies Oracle uses are designed to achieve excellent scalability in a wide variety of circumstances, avoiding the extended run times associated with skewed datasets. In this post I will focus on how the strategies work without delving too deeply into how the database makes its choice. I will explain space management using Oracle Database 11g Release 2 as the baseline and then I will introduce the changes we’ve made in Oracle Database 12c. To put things in perspective, a full rack Oracle Exadata Database Machine is capable of loading data at over 20 terabytes an hour, so you can be sure that space management has received some attention! Even if you don’t have a system like that at your disposal, you will still benefit from the improvements and optimizations Oracle makes to keep your database operating at its full potential.
    Continuar aqui

    No hay comentarios:

    Publicar un comentario