File Classification Performance Options

DiskSorter is optimized for modern multi-core and multi-CPU systems and is capable of parallelizing the file classification process to multiple CPUs in order to increase the speed of file classification operations. DiskSorter provides a number of performance optimization options allowing one to control how many parallel threads should be used to scan directories and how many parallel threads should be used to classify files.

File Classification Performance Options

In order to customize file classification performance options, open the file classification profile dialog, press the 'Options' button and select the 'Advanced' tab. The 'Max Dir Scan Threads' option sets the maximum number of parallel threads to use to scan input disks, directories and/or network shares. This option is especially useful when processing a large number of network shares allowing one to mitigate the network latency and slowly responding servers and NAS storage devices. The 'Processing Threads' option sets the number of parallel file classification threads to use to classify files.

File Classification Options Show Files User Names

Another option very significantly impacting the performance of file classification operations is the 'Show Files User Names' option, which is located on the 'General' tab of the file classification profile options dialog. When this option is enabled, DiskSorter inquires user names for all processed files and saves files user names in the file classification report allowing one to show file classification statistics per user. The operation of inquiring a user name for a file is a relatively slow operation especially when performed over the network and due to performance considerations this option is disabled by default. If the user needs to enable this option, it is highly recommended to configure the file classification operation to use at least 4 parallel file classification threads even on single-core or dual-core systems.

File Classification Performance Results

The performance of file classification operations highly depends on the type of the storage device, the number of available CPUs and the speed on the network for file classification operations performed over the network. For example, when classifying files located on a local SSD disk (without inquiring files user names), the performance of file classification operations can reach up to 50,000 files per second. As it is show on the example performance graph, the maximum file classification performance is reached with 4 parallel file classification threads.

File Classification Performance Results SSD Disks

On the other hand, when the same file classification operation is performed with the 'Show Files User Names' option enabled, the single-CPU performance drops significantly from 31,500 Files/Sec to 4,900 Files/Sec while the multi-CPU performance continues to scale very well up to 8 parallel file classifications threads and reaches 23,000 Files/Sec when all 8 CPUs are used to classify files in parallel.

File Classification Performance Results SSD Disks With Files User Names

Almost the same level of multi-threaded performance scaling is displayed when classifying files with the option to show files user names enabled on a system with a small number of physical CPU cores. In general, the operation of inquiring a user name for a file does not require any CPU resources and for each processed file DiskSorter just waits for the operating system to return a user name making it highly scalable to use a large number of parallel processing threads to inquire user names for a number of files simultaneously.

File Classification Performance Over Network

From the performance point of view, classifying files located on multiple network shares over the network is a slightly different operation, which depends on the number of processed network shares, the speed and the latency of the network and the type of processed storage devices. For example, when classifying files over a high-speed, low latency local network, the performance of file classification operations scales from 6,250 Files/Sec for a single network share classified using a single CPU to 23,800 Files/Sec when 4 network shares are processed simultaneously using 8 parallel file classification threads.

File Classification Performance Results Over Network

In such a configuration, the performance of file classification operations highly depends on the number of parallel threads used to scan network shares and the number of parallel threads used to classify files. For high-speed, low latency networks, the number of parallel threads used to scan directories should be equal to the number of parallel file classification threads. For slow, high-latency networks, it is possible to reach a high file classification speed when classifying a large number of network shares simultaneously and using a large number of parallel directory scanning threads.

File Classification Performance Results Over Network With Files User Names

When the same file classification operation is performed with the option to show files user names enabled, the performance of the file classification operation drops dramatically to just 150 Files/Sec for a single-threaded operation and scales up to 1,136 Files/Sec when files are classified using 8 parallel file classification threads. In this case, the performance bottleneck is definitely the operation of inquiring a user name for a file over the network and in order to increase the performance of such operations it is recommended to use a large number of parallel processing threads to inquire user names for a number of files simultaneously.