Secondary Storage

Processors and Systems

Marilyn Wolf , in The Physics of Computing, 2017

5.v.iii Storage and performance

Secondary storage is used for 2 main purposes: file systems and need paging. A plan can avoid file organization performance issues in many cases past working in main memory. However, all programs are affected by the operation characteristics of demand paging.

Demand paging is a component of a virtual memory system. A programme's address space is divided into pages. The program operates on a virtual retention paradigm with logical addresses. Pages may not reside in physical memory at whatsoever given time; the status of programme pages is maintained by a combination of hardware and software. When a program accesses a folio that is not in concrete retentivity, a situation known equally a page fault, it is fetched from secondary storage.

We can model paging operation using a model similar to the cache model. In this example, however, segments are retrieved from the storage device. We can define a bulldoze/DRAM ratio to describe the relative performance of primary memory and secondary storage:

(5.31) One thousand d = t d r i v e t D R A Grand

The average admission time for a page depends on the probability of the folio residing in memory P res :

(five.32) t p a g due east = t r e s [ p r due east s + M d ( 1 p r e s ) ]

Example 5.6 showed that solid-state drives are most 100× faster than magnetic disk drives. We can find the ratio of boilerplate paging times for SSD versus magnetic disk:

(5.33) t p a 1000 east , S S D t p a grand eastward , m a g = p r e due south + M southward s d ( 1 p r e southward ) p r east s + M m a g ( 1 p r e s )

At low page residency probabilities, this ratio approaches t S S D / t thou a one thousand .

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128093818000055

Computer systems and technology

Stuart Ferguson , Rodney Hebels , in Computers for Librarians (Third Edition), 2003

Secondary storage

Secondary storage, sometimes called auxiliary storage, is non-volatile and is used to store information and programs for later retrieval. There are many kinds of secondary storage, each with advantages and disadvantages. Nigh storage devices use either magnetic or optical storage media.

Magnetic storage devices use the principle that magnetically charged textile has both a Northward and Due south Pole. These two poles are used to correspond 0's and i's and hence binary numbers. When information are read from magnetic media, the read/write head is used to convert the different magnetic poles into binary numbers that the CPU tin can process. Conversely, when writing data to magnetic media, the read/write caput converts the binary signals from the CPU into magnetic charges. Intendance needs to be taken that magnetic radiation doesn't spoil any stored data.

Optical media devices use lasers to bum tiny craters or pits onto the surface of a plastic or metal deejay. The presence or absenteeism of a pit on the surface of the disk is used for binary storage. Although slower than magnetic media, optical media are more robust (stored data are not afflicted by magnetic radiation) and take larger storage capacities for their size.

Some mutual examples of secondary storage devices are:

Hard disk drive – this course of magnetic media is used for majority storage of data and programs. They are generally found within the case of the computer and hence are not portable. Some hard drives are removable and so provide convenient portable storage betwixt computers. They are more often than not reliable and robust with fast admission to stored data. They are called difficult disks because the metal oxide on which the magnetic information is stored is placed on a solid, hard disk drive. This enables the disk to spin at much higher speeds than otherwise without distortion to the surface of the disk. RAID (Redundant Array of Independent Disks) drives are combinations of two or more drives combined in such a style as to increase the performance and fault tolerance of the drive. RAID drives come in 3 dissimilar levels 0, 3 or 5 for varying operation and error tolerances. They are normally used on file servers where the hard drive is under constant interrogation and functioning and fault tolerance is imperative.

Floppy deejay (diskette) – Floppy disks are a magnetic medium used primarily for small data storage. As engineering has developed, the storage capacity of floppy disks has increased from 120Kb to 120Mb (over i, 000 times more storage) despite the fact that the concrete size of these disks has decreased from viii" to iii.v". They are chosen floppy disks considering the metal oxide is placed on a flexible plastic disk. Since the deejay is flexible, it cannot spin every bit fast as a hard disk. Consequently, floppy disks have much slower admission speeds than hard disks and are much less reliable. They are all the same, much more than portable than hard disks, although even this distinction is existence blurred with new technologies such as removable hard drives.

The floppy disks in current use have a capacity of one.44 MB (megabytes – retentivity is discussed later in the chapter) – an insufficient capacity for many database, spreadsheet and multimedia applications. As a result, a number of floppy disk cartridges have been developed, the all-time-known being Zip disks, SuperDisks and HiFD disk drives. The most commonly in current use is almost certainly the Zip disk, which has a chapters of 100 and 250 MB and which requires a special drive (see below).

CD-ROM – this is an optical storage medium for bulk storage of data and programs. Unlike, hard or floppy disks, the contents of a CD-ROM cannot be changed and hence is ideally suited for static information such every bit bibliographic details, newspapers, journals, periodicals, directories, dictionaries and encyclopedias. They are extremely portable and offer first-class reliability for long term storage with reasonable access times. Each CD-ROM disk tin hold upwardly to 650Mb of data.

CD-R – increasing use is made of CD-R or CD-Recordable, also known as WORMs (Write Once, Read Many disks), which let users to write to them once, using a special drive. Many CD-Rs cannot be written over or erased one time they take been written to and are therefore peculiarly useful for the archiving of data (and of course creating i's own music CDs) or whatever other application which requires the data, once saved, to be retained without alteration.

At that place are also erasable optical disks known every bit CD-RW or compact disk reastwritable, which expect as if they may replace floppy disks – or at any rate, the iii.5" diskettes.

DVD-ROM – DVD is a new optical storage engineering for bulk storage of information and programs including audio and video. DVD stands for Digital Versatile Disk (originally called Digital VideoDisk). DVD applied science hit the marketplace in 1997 and works like a CD-ROM. Its difference is that whereas a CD-ROM uses i side of the deejay to store data a DVD deejay uses both sides, and each side tin can store information on one of two layers. The light amplification by stimulated emission of radiation beam used to read the DVD operates at ii intensity levels, one for each of the 2 layers on each side. DVD utilises the MPEG-2 file and compression standard enabling a DVD deejay to hold up to 17GB of data – 20-eight times greater than a CD-ROM (encounter below for an explanation of compression and multimedia files). Consequently, a single DVD can shop four-and-a-one-half hours of a movie. It has been said that DVD technology will somewhen supercede our current storage methods for music, video and picture show. A DVD-RAM device enables users to create their ain DVD-ROM disks.

Tape – this is a magnetic storage medium mainly used for the backing up of important data to safeguard against information loss or estimator malfunction (for example, loss of bibliographic, borrower and loans data). The record itself is very similar to, and often the same as, that used in video cameras. It offers reasonable long-term security, large storage capacities, while being of pocket-size physical size. Its major disadvantage is its slow admission times. Searching for data on a tape is much like searching for one's favourite song on an audiocassette tape.

Paper – although it may seem strange to include paper (sometimes called a difficult copy) every bit a storage medium, many organisations still rely heavily on it. For many, physically seeing data printed on paper makes them easier to read and provides conviction confronting data loss. All the same, long term storage of paper is a trouble due to its deterioration and its physical size. Compare, for example, the shelf space required for an entire set of printed encyclopedias with just one CD-ROM. The biggest disadvantage of paper storage is that information technology represents a departure from the computer arrangement itself, thereby making the manipulation and processing of the data on the paper incommunicable. Of course, the majority of libraries store their books in paper format, despite the fact that there is a growing endeavor to digitise books for electronic storage and retrieval.

Many other storage media are available. With advances in technology, new storage methods are becoming more reliable, cheaper and faster, with larger storage capacities and smaller concrete sizes.

Many software packages are available which enable data to exist compressed before being written to a storage device. In many cases, data can be compressed to about half the original size thereby doubling the storage capacity of a device (more than on data pinch in Chapter seven).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781876938604500124

Database Machines

Catherine M. Ricardo , in Encyclopedia of Data Systems, 2003

Three.C. Specialized Storage Devices

A diverseness of secondary storage technologies have been used for database machines, including charged couple device (CCD) retention, magnetic bubble memory, and head-per-track disks. In associative deejay storage, an intelligent controller is used to manage data searching. Although a variety of techniques can be used to implement associative memory, all of these devices role as if in that location were a stock-still read/write head containing a processor for each rail. With the power to perform logical comparisons on the track, data items can exist retrieved past value, rather than past location. By moving the search logic to the controller hardware as the information is being read from disk, it is possible to retrieve only records that qualify for the query being candy, reducing the amount of information that is transmitted to the buffer. The issue is the offloading of the basic retrieval functions from the host computer to these units. Somewhen these specialized storage units were found to be prohibitively expensive, and parallel disks were used instead.

Read full affiliate

URL:

https://www.sciencedirect.com/science/commodity/pii/B0122272404000277

Operating Systems

Thomas Sterling , ... Maciej Brodowicz , in High Performance Computing, 2018

eleven.two.half-dozen Secondary Storage Direction

As mentioned, the Os is responsible for secondary storage. Unremarkably comprising many hard-disk drives, merely possibly as well some solid-state NVRAM, secondary storage delivers loftier density and nonvolatility for long-term storage. The Os may manage access to local disks for each node or a separate function of the system of disks continued by a storage area network such as a redundant array of independent disks configuration (there are several) for higher access bandwidth and greater reliability through redundancy of storage. While secondary storage is important to users in its Bone back up for file systems, information technology also provides other services. Virtual memory, in which pages of data for a procedure may be temporarily stored in secondary storage, gives the impression of larger retention capacity, although the data pages are actually distributed betwixt concrete main memory and secondary storage. The OS also uses secondary storage to buffer processes for time to come scheduling, or sometimes when swapping jobs in and out of retention systems. In all these cases and more, the OS is responsible for managing secondary storage, providing interfaces to it, and including services.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780124201583000113

CPUs

Marilyn Wolf , in Computers as Components (Third Edition), 2012

3.5.2 Memory Management Units and Address Translation

A retentiveness management unit of measurement translates addresses between the CPU and physical memory. This translation process is frequently known as memory mapping because addresses are mapped from a logical space into a physical infinite. Retention management units are non especially mutual in embedded systems considering virtual retentivity requires a secondary storage device such as a disk. However, that situation is slowly changing with lower component prices in full general and the appearance of Net appliances in particular. Information technology is helpful to understand the basics of MMUs for embedded systems complex enough to require them.

Many DSPs, including the C55x, don't use MMUs. Because DSPs are used for compute-intensive tasks, they oftentimes don't require the hardware assist for logical address spaces.

Memory mapping philosophy

Early on computers used MMUs to compensate for express address space in their didactics sets. When retentiveness became cheap enough that concrete retentiveness could be larger than the address space defined past the instructions, MMUs allowed software to manage multiple programs in a single physical memory, each with its own address space.

Because modern CPUs typically do non have this limitation, MMUs are used to provide virtual addressing. As shown in Figure 3.10, the retentivity management unit accepts logical addresses from the CPU. Logical addresses refer to the program'due south abstruse address space but exercise not correspond to actual RAM locations. The MMU translates them from tables to physical addresses that do correspond to RAM. By changing the MMU'southward tables, yous can change the physical location at which the program resides without modifying the plan's code or information. (We must, of form, move the program in main retentivity to correspond to the memory mapping change.)

Figure 3.x. A well-nigh addressed memory arrangement.

Furthermore, if we add a secondary storage unit of measurement such equally a disk, we can eliminate parts of the programme from main memory. In a virtual retention system, the MMU keeps rail of which logical addresses are actually resident in principal memory; those that do non reside in main memory are kept on the secondary storage device. When the CPU requests an address that is non in main retentivity, the MMU generates an exception called a page mistake. The handler for this exception executes lawmaking that reads the requested location from the secondary storage device into primary memory. The program that generated the folio mistake is restarted by the handler but subsequently

the required memory has been read back into main retentivity, and

the MMU's tables have been updated to reflect the changes.

Of form, loading a location into master memory will usually crave throwing something out of master memory. The displaced retentivity is copied into secondary storage before the requested location is read in. As with caches, LRU is a good replacement policy.

There are two styles of address translation: segmented and paged. Each has advantages and the 2 can be combined to grade a segmented, paged addressing scheme. As illustrated in Effigy 3.xi, segmenting is designed to support a large, arbitrarily sized region of memory, while pages describe small, equally sized regions. A segment is normally described by its start address and size, assuasive unlike segments to be of different sizes. Pages are of compatible size, which simplifies the hardware required for address translation. A segmented, paged scheme is created by dividing each segment into pages and using two steps for address translation. Paging introduces the possibility of fragmentation as program pages are scattered effectually physical retentiveness.

Figure 3.eleven. Segments and pages.

In a unproblematic segmenting scheme, shown in Figure 3.12, the MMU would maintain a segment register that describes the currently active segment. This register would point to the base of the current segment. The address extracted from an instruction (or from any other source for addresses, such equally a register) would be used as the offset for the address. The concrete accost is formed by adding the segment base to the commencement. Most partition schemes also bank check the physical address against the upper limit of the segment past extending the segment register to include the segment size and comparing the starting time to the immune size.

Effigy iii.12. Address translation for a segment.

The translation of paged addresses requires more MMU country but a simpler calculation. As shown in Figure 3.xiii, the logical accost is divided into two sections, including a page number and an offset. The page number is used as an index into a page table, which stores the physical address for the start of each page. Still, because all pages have the aforementioned size and it is easy to ensure that folio boundaries fall on the proper boundaries, the MMU simply needs to concatenate the tiptop $.25 of the page starting address with the bottom bits from the page commencement to form the physical address. Pages are small, typically between 512 bytes to 4 KB. As a result, an architecture with a large address space requires a large page table. The folio table is unremarkably kept in principal retention, which ways that an address translation requires memory access.

Effigy three.13. Accost translation for a folio.

The folio table may be organized in several ways, every bit shown in Figure 3.xiv. The simplest scheme is a flat tabular array. The table is indexed by the page number and each entry holds the page descriptor. A more sophisticated method is a tree. The root entry of the tree holds pointers to pointer tables at the adjacent level of the tree; each arrow table is indexed past a part of the page number. We eventually (after three levels, in this case) arrive at a descriptor tabular array that includes the page descriptor we are interested in. A tree-structured page table incurs some overhead for the pointers, only it allows u.s.a. to build a partially populated tree. If some function of the address space is non used, we practise not need to build the function of the tree that covers information technology.

Figure iii.14. Alternative schemes for organizing page tables.

The efficiency of paged address translation may be increased past caching page translation information. A enshroud for address translation is known every bit a translation lookaside buffer (TLB). The MMU reads the TLB to check whether a folio number is currently in the TLB cache and, if so, uses that value rather than reading from retentiveness.

Virtual memory is typically implemented in a paging or segmented, paged scheme so that only folio-sized regions of memory need to exist transferred on a page fault. Some extensions to both segmenting and paging are useful for virtual memory:

At minimum, a present chip is necessary to show whether the logical segment or page is currently in concrete retentiveness.

A dingy bit shows whether the folio/segment has been written to. This bit is maintained past the MMU, because it knows nigh every write performed by the CPU.

Permission bits are often used. Some pages/segments may be readable only not writable. If the CPU supports modes, pages/segments may be accessible by the supervisor only non in user mode.

A data or instruction enshroud may operate either on logical or physical addresses, depending on where it is positioned relative to the MMU.

A memory direction unit of measurement is an optional part of the ARM architecture. The ARM MMU supports both virtual address translation and memory protection; the compages requires that the MMU be implemented when cache or write buffers are implemented. The ARM MMU supports the following types of memory regions for address translation:

a section is a 1-Mbyte block of retention,

a large page is 64 KB, and

a small-scale page is 4 KB.

An address is marked as section mapped or page mapped. A ii-level scheme is used to translate addresses. The first-level table, which is pointed to by the Translation Table Base register, holds descriptors for section translation and pointers to the second-level tables. The second-level tables describe the translation of both large and small pages. The basic 2-level process for a large or pocket-size page is illustrated in Effigy 3.fifteen. The details differ betwixt big and minor pages, such equally the size of the second-level table index. The kickoff- and second-level pages likewise contain access control bits for virtual memory and protection.

Figure iii.fifteen. ARM ii-stage accost translation.

Read total affiliate

URL:

https://www.sciencedirect.com/scientific discipline/commodity/pii/B9780123884367000039

Introduction

Paul J. Fortier , Howard E. Michel , in Figurer Systems Performance Evaluation and Prediction, 2003

1.ane.v Secondary storage and peripheral device architectures

I/O devices connect to and control secondary storage devices. Chief retentiveness has grown over the years to a fairly high volume, but still non to the point where boosted data and programme storage is not needed. The storage hierarchy ( Effigy 1.six) consists of a multifariousness of information storage types. From the highest-speed memory element, cache, to the slowest-speed elements, such as record drives, the tradeoff the systems architect must make is the cost and speed of the storage medium per unit of measurement of memory. Typical secondary storage devices include magnetic tape drives, magnetic disk drives, compact optical disk drives, and archival storage devices such equally disk jukeboxes.

Effigy 1.6. Retention hierarchy.

Magnetic tape information storage provides a low-cost, high-density storage medium for low-access or slow-access information. An improvement over record storage is the random access disk units, which tin can have either removable or internal fixed storage media. Archival storage devices typically are composed of removable media configured into some array of devices.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781555582609500011

Computer Data Processing Hardware Compages

Paul J. Fortier , Howard East. Michel , in Computer Systems Performance Evaluation and Prediction, 2003

2.10.12 Database and organization mismatch

The operating system migrates storage from primary memory to secondary storage, based on the operating organization'due south perspective on when this should be done. Need paging and express storage dictate that this be performed on a page mistake basis. The database, all the same, may not wish the folio to be written back to secondary retentiveness due to concurrency command and atomicity issues. The database may wish to hold pages in memory until transaction commit fourth dimension and and then affluent to secondary storage. This would allow the database non to crave disengage of transactions on failure, simply abort, and restart.

Related to this is I/O management and device direction. The database may wish to social club access based on the queries being presented to it in social club to maintain Acid execution, whereas the operating system simply will lodge the accesses to deliver the greatest throughput of data back to the CPU. The lodge in which it returns data may be counterproductive to the database, to the signal where the database has waited and so long for needed data that when the data practice come the operating system pages out the database software to brand room for the data, or it removes the data that the new information is to exist processed confronting. In either case this is not conducive to optimal database processing.

The problem with the operating system for this type of problem is the I/O buffer management policies and mechanisms. The database wants to utilise and optimize buffers to maximize transaction throughput, while the operating system wants to maximize average process response.

The control of the processor itself by the operating system may block essential functions that the database must perform—for example, the database requires that the log of database deportment be flushed to secondary storage at specific points and in an uninterruptable manner in social club to guarantee recovery and correct execution. Besides, to proceed the database as consistent as possible requires the database to flush committed information to the persistent store when necessary and in an atomic operation. The operating organisation in its wish to be fair may time-out a database function doing specifically this performance. On some other related issue, if a database is sorting and processing ii large data files against each other, it may wish to maintain directly command over how and when data traverse the boundaries from the storage to the processor and back. Without direct control over the allocation and deallocation mechanisms, the database could be removed from one resource while still belongings another, causing a loss of the intended operation's continuity.

The operating system's locking mechanism works well for simple file management, and for the bulk of applications this is sufficient. Just a database needs better control over locking to allow locking at possibly a data item level only. The reason for this is to allow more than concurrency and less blocking of data. The intent is to increase data availability past simply locking what is being used, non an entire file. To rectify this databases are forced to use direct addressing and direct file direction features to permit for their own control over the file level of locking. However, in some operating systems the database still suffers nether the control of the operating organisation's lock manager, regardless of what mode is used.

An operating organization'due south interprocess advice mechanisms may exist too expensive to use within a database system. Many operating systems use a class of message passing involving interrupt processing. Such mechanisms may have a loftier price in terms of overhead. A database may wish to provide more simple IPC mechanisms using shared memory or semaphores, especially since a database is only another process within the operating organisation.

Scheduling in an operating system looks to maximize overall average response time and to share resources adequately. Scheduling simply deals with the selection of a process to place onto the executing hardware. A database, on the other hand, has a multilevel scheduling problem—not only must it select which transaction to place into service at any indicate in fourth dimension, but information technology must also schedule which functioning to perform on the underlying database to meet concurrency control requirements. An operating system'south scheduler will not and does not provide such a service.

A database requires the use of copying, fill-in, and recovery services of the underlying infrastructure to assistance in amalgam database recovery protocols. The problem is that many of the other features of an operating system may become in the way and hinder the easy operation of database recovery. The database wishes to dictate how and when it volition force information out to persistent storage. This is done in order to minimize the work (Undo and REDO) that must be done to recover the database to a known consequent state. The operating system, on the other manus, will do this based on its needs to reallocate storage for processes in execution. The operating system will non take into account that this least recently used page volition really be the next folio to be used by the database. It volition simply choose this page and force it out immediately, based on its needs.

To brand the operating system and database interface more compatible it is desirable that the operating organization use semantic information, which tin be provided past the database to make audio, informed decisions. This is not to say that the database should overtake or dictate the moves of the operating system. Instead it should human action in a cooperative style to maximize the system-oriented needs of a database, which are more diverse than those of a typical application. See [1] for further information on database systems.

Read total chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9781555582609500023

Parallel hardware and parallel software

Peter S. Pacheco , Matthew Malensek , in An Introduction to Parallel Programming (Second Edition), 2022

two.x.4 Input and output

In general parallel systems, multiple cores tin can access multiple secondary storage devices. We won't try to write programs that make utilize of this functionality. Rather, in our programs for homogeneous MIMD systems, we'll write programs, in which one process or thread can access

Image 95
, and all processes tin can access
Image 85
and
Image 86
. However, because of nondeterminism—except for debug output—nosotros'll usually have a single process or thread accessing
Image 85
. For heterogeneous CPU-GPU systems, we'll employ a single thread on the CPU for all I/O with the sole exception of debug output. For debug output, we'll use the threads on the GPU.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780128046050000099

Managed Overlays

John F. Buford , ... Eng Keong Lua , in P2P Networking and Applications, 2009

Managing a Distributed File Storage Service

A P2P distributed file storage service can exist constructed using the secondary storage areas of peers in the overlay. A number of designs have been proposed, including Past,539 which runs on Pastry, and CFS,541 which runs on Chord. In PAST, each user has a quota for the corporeality of storage that can be used in the overlay. Files are stored using the hashed file name as the key. Thus two different files from the aforementioned possessor are likely to be stored at unlike sets of peers in the overlay. Each file is stored at the thousand closest peers in the overlay. Since peer addresses are generated randomly, there is a high probability that the grand nodes are geographically dispersed.

CFS provides a distributed read-but file system and stores files in the overlay by dividing each file into blocks. The blocks for pop files volition be spread over many servers. Each cake is identified using a hash of its contents as the key. Each block is replicated k times in the overlay, with replicas at the peers immediately after the block'due south successor in the Chord ring. A peer sends a request for a block as a DHT lookup of the block's key. Each peer forth the lookup path checks its cache to see whether the block is present. If it is, the block is immediately returned. If the request reaches the primary peer, the block is returned to the requesting peer, which then sends a copy to each peer along the lookup path to add to its enshroud.

Figure 15.1 illustrates the role of the management agent in monitoring the service quality of a distributed file store that is similar to the By model. A peer stores a file by the hash of its filename. The peer that receives the asking forwards it to the k-1 closest peers in the overlay address space for replication. The direction agent is notified that the distributed file service has accustomed a file with a specified identifier. Later one of the peers storing the replica leaves the overlay. This generates a replication-level exception to both the peer that inserted the file and the management amanuensis. If the primary storing peer fails to add together another replica peer within a given time window, the direction agent may intervene.

Figure fifteen.ane. Overlay letters between peers storing file replicas and the managing director agent.

In addition, the management amanuensis tin monitor the uptime and storage integrity of each peer. The software for each peer tin can perform periodic file system integrity checks and ship negative results to the direction agent. A notification is besides sent when the file system usage exceeds the threshold.

Read total chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780123742148000155

Mathematical groundwork

Barry Dwyer Barry Dwyer , in Systems Analysis and Synthesis, 2016

two.7.v Files

A file is a information structure stored on a persistent secondary storage medium. Physically, the medium is arranged in blocks of fixed size, e.k., 4 KB. The computer's operating organisation will usually try to arrange that a file occupies a contiguous series of blocks, or if that isn't possible, a few contiguous areas, chosen extents.

From the programmer's point of view, a file consists of records. There are typically several records per block. Usually, the operating system unpacks the blocks into individual records when they are read and packs records into blocks when they are written. Each record typically represents one chemical element of a gear up, one pair of a relation, or 1 pair, triple, etc., of ane or more functions. The components of records are called attributes. For instance, a function from X to Y would have attributes named X and Y .

As a rule, each record has a unique identifier: each pair in a function from X to Y must have a unlike value from the domain Ten . Ten is called a fundamental of the file; Y is a non-key attribute. In the case of a relation from Ten to Y , merely the x y pairs are unique, and so the file has the composite primal ( X , Y ) . In the case of a sequence , records are physically stored in order of the sequence.

Records don't accept to be fixed in length. For example, a file could represent a graph, in which each record could contain a vertex followed past a list of edges of capricious length.

A file can represent any data construction. Records can be added to a file dynamically, so that dynamic information structures tin be created.

A record has both an accented and a relative address. The accented address specifies the concrete storage location (e.g., surface, track, sector and byte) of the record. The relative address specifies the number of bytes from the commencement of the file. The computer's operating organization translates relative addresses into physical addresses past reference to the file'southward directory entry, which lists the physical locations of its extents. When a file is copied, all its physical addresses change, but the relative addresses stay the same. Therefore, records can be linked into list or tree structures using their relative addresses.

Most modern estimator systems let an entire file to exist read into and processed entirely inside RAM, the file being saved periodically and, once again, when processing is complete. The file can therefore incorporate whatever data structures are convenient to the application. There is one restriction: it cannot be guaranteed that a saved file will occupy the same memory locations when it is later reloaded into RAM. Therefore, the file cannot contain absolute memory addresses. It may, on the other hand, contain relative addresses (i.due east., relative to the start of the file), or it may be saved in a course that is free of addresses. For case, it can exist reduced to a textual form with mark-upwardly, such as XML.

On the other hand, when the volume of data exceeds the available RAM, or when saving must exist continuous, information technology is wise to deal with the persistent information directly. In such a example, each file usually represents 1 prepare, one relation, a cluster of functions, or a sequence.

For our purposes, two kinds of files are of import.

A sequential file consists of a sequence of records stored one after another. A sequential file is read or written sequentially, starting with the get-go record, then the second record, so on.

An indexed-sequential file is indexed, allowing the records to be located either past key or in sequential society. Records may be added or deleted anywhere within the file. The file is structured to make sequential access efficient: blocks contain records with contiguous primal values, but the blocks may only exist partly filled, and the blocks themselves may non always be stored in central order. A B-tree construction is often used, with the nodes of the tree mapped direct onto blocks. Often, it is possible to store the entire index of a file in RAM, so that, afterward consulting the index, a record can be located by direct access to the cake that contains it.

An indexed file can have several indices, allowing access via more than than ane attribute. Information technology is likewise possible for several records to share the same value of a secondary key, in which example the index volition need to store the set of records with each key value, perhaps as a linked listing.

Finally, information technology is perhaps worth mentioning that the directories or folders that a hierarchical file store uses to comprise files course a kind of conclusion tree , in which the directory names and filename spell the path from the root to the file. For example, the path '/Users/barry/Documents/Synthesis/Maths.tex' describes a path from the root of the file system to the document I am currently editing. Each stride in the path translates a name into the storage address of the next node.

Read total chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780128053041000114