You can set up Google Cloud Search to return results from your organization's Microsoft Windows shares in addition to your Google Workspace content. You use the Google Cloud Search File Systems connector and configure it to access specified Windows shares. A single connector instance can support multiple Microsoft Windows shares.
Important considerations
Continuous automatic updates
By default, the connector continuously monitors start paths (values from fs.src
in the connector configuration file) when the connector starts up. When the file system reports changes to content or access controls, the connector is triggered to re-crawl the file system. This re-crawl can be resource intensive. To turn off file system monitoring, set fs.monitorForUpdates
to false
. You reduce connector's resource use significantly but delay when the connector reflects the changes. Learn more
DFS access control
The DFS system applies access control on its links and usually each DFS link has its own ACL. One mechanism that DFS uses is Access-based Enumeration (ABE), which can restrict the DFS links returned to a user. Users might get only a subset of the DFS Links, or even only one link when ABE isolates hosted home directories. When the connector traverses a DFS system, the connector respects the DFS link ACL and the target's Share ACL, and the Share ACL inherits from the DFS ACL.
Known limitations
- File System: The File Systems connector doesn't support mapped drives and local drives.
- Distributed File System: A mapped drive to a UNC DFS doesn't work correctly. Some ACLs aren't read correctly.
- The File Systems connector supports Distributed File System (DFS) namespaces and links. However, the connector supports DFS links only in a DFS namespace, not the regular folders in the DFS namespace.
- File links returned in cloudsearch.google.com aren't clickable. The file links returned by the Query API aren't clickable in most browsers, either.
System requirements
System requirements | |
---|---|
Operating system |
|
Software |
|
File system protocols |
Not supported: Local Windows file systems, Sun Network File System (NFS) 2.0, Sun Network File System (NFS) 3.0, or Local Linux file system. |
Deploy the connector
Prerequisites
Before you deploy the Cloud Search File Systems connector, ensure that your environment has all the following prerequisite components:
Google Workspace information required to establish relationships between Google Cloud Search and the data source:
- Google Workspace private key (which contains the service account ID). For information on obtaining a private key, go to Configure access to the Google Cloud Search REST API.
- Google Workspace data source ID. For information on obtaining a data source ID, go to Add a data source to search.
- An identity source ID. For information about how to get an identity source ID, go to Create an identity source. If you sync your Google Workspace directory with Active Directory, set up the identity source with GCDS.
The Google Workspace admin for your organization can usually get you these credentials.
Ensure that the Windows account has sufficient permissions, as described in the following section.
Required Microsoft Windows account permissions
The Microsoft Windows account that the connector is running under must have sufficient permissions to perform the following actions:
- List the content of folders
- Read the content of documents
- Read attributes of files and folders
- Read permissions (ACLs) for both files and folders
- Write basic attributes permissions
Membership in one of the following groups grants a Windows account the sufficient permissions needed by the connector:
- Administrators
- Power Users
- Print Operators
- Server Operators
Step 1. Install the Google Cloud Search File Systems connector
Get the connector repository from GitHub and build it.
To use git on the Windows server:
Clone the repository:
> git clone https://github.com/google-cloudsearch/windows-filesystems-connector.git > cd windows-filesystems-connector
Check out the desired version of the connector:
> git checkout tags/v1-0.0.3
To download from GitHub directly:
- Go to https://github.com/google-cloudsearch/windows-filesystems-connector.
- Click Clone or download Download zip.
- Unzip the package.
- Move to the new directory:
> cd windows-filesystems-connector
Build the connector. If necessary, install Apache Maven.
> mvn package
To skip tests when you build the connector, run
mvn package -DskipTests
instead ofmvn package
.Copy the connector zip file to your local installation directory:
> cp target/google-cloudsearch-windows-filesystems-connector-v1-0.0.3.zip installation-dir > cd installation-dir > unzip google-cloudsearch-windows-filesystems-connector-v1-0.0.3.zip > cd google-cloudsearch-windows-filesystems-connector-v1-0.0.3
Step 2. Create the connector configuration file
In the same directory as the connector installation, create a file and name it
connector-config.properties
.Add parameters as key/value pairs to the file contents, as in the following example:
### File system connector configuration ### # Required parameters for Cloud Search data source and identity source access api.serviceAccountPrivateKeyFile=/path/to/file.json api.sourceId=0123456789abcde api.identitySourceId=a1b1c1234567 # Required parameters for file system access fs.src=\\\\host\\share;\\\\dfshost\\dfsnamespace;\\\\dfshost\\dfsnamespace\\link # Optional parameters for file system monitoring traverse.abortAfterExceptions=500 fs.monitorForUpdates = true fs.preserveLastAccessTime = IF_ALLOWED
For detailed descriptions of each parameter, go to the configuration parameters reference.
(Optional) Configure other connector parameters, as needed. For details, go to Google-supplied connector parameters.
Step 3. Enable logging
- Create a folder named
logs
in the directory that contains the connector binary. Create an ASCII or UTF-8 file named
logging.properties
in the directory that contains the connector binary and add the following content:handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler # Default log level .level = WARNING com.google.enterprise.cloudsearch.level = INFO com.google.enterprise.cloudsearch.fs.level = INFO # uncomment line below to increase logging level to enable API trace #com.google.api.client.http.level = FINE java.util.logging.ConsoleHandler.level = INFO java.util.logging.FileHandler.pattern=logs/connector-fs.%g.log java.util.logging.FileHandler.limit=10485760 java.util.logging.FileHandler.count=10 java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
Step 4. (Optional) Configure media types
By default, the connector tries to detect the media type for each file with JDK-provided media type detection. On Microsoft Windows, JDK relies on Windows registry to determine media types for files. A missing registry entry can result in a null media type for certain files.
If necessary, you can specify a media type that overwrites any existing bindings or prevents a null media type.
- In the connector directory, create a Latin-1-encrypted file named
mime-type.properties
. Enter file extensions and their corresponding media types as in the following examples:
xlsx=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet one=application/msonenote txt=text/plain pdf=application/pdf
Step 5: Run the File Systems connector
After you install and configure the File Systems connector, to launch it on the host machine, run a command like the following example:
> java -jar google-cloudsearch-windows-filesystems-connector-v1-0.0.3.jar -Djava.util.logging.config.file=logging.properties[ -Dconfig=my.config]
Specify the configuration file path if it's different from the default (in the same directory as the binary with the name connector-config.properties
).
Configuration parameters reference
Data source access
Setting | Parameter |
Data source ID | api.sourceId=1234567890abcdef
Required. The Google Cloud Search source ID set up by the Google Workspace administrator. |
Path to the service account private key file | api.serviceAccountPrivateKeyFile=./PrivateKey.json
Required. The Google Cloud Search service account key file for Google Cloud Search File Systems connector accessibility. |
Identity source ID | api.identitySourceId=x0987654321
Required. The Cloud Search identity source ID set up by the Google Workspace administrator for syncing active directory identities using GCDS. |
File system access
Setting | Parameter |
Source file systems | fs.src=path1[,path2, ...]
Required. Specify source file systems as one or more UNC sources that are separated by the delimiter configured by |
Path separator character
Setting | Parameter |
Path separator character | fs.src.separator=separator-character
The default separator is ";". If your source paths contain semicolons, you can set a different delimiter, such as a comma (","), that does not conflict with characters in your paths and isn't reserved by property file syntax itself. If the |
Connector behavior
Setting | Parameter |
Windows domain | fs.supportedDomain=domain
Required to let users who are set up with GCDS access documents through Cloud Search. Specify as a single NetBIOS domain name of the Active Directory. |
Include accounts in ACLS | fs.supportedAccounts=account-1[, account-2,...]
A comma-delimited list of accounts to include in ACLs regardless of whether they are built-in accounts. The default value is |
Exclude built-in accounts from ACLs | fs.builtinGroupPrefix=prefix
Specify the prefix of built-in accounts. An account that starts with this prefix is considered a built-in account and will be excluded from the ACLs. The default value is |
Allow indexing of hidden files and folders | fs.crawlHiddenFiles=boolean
Set to |
Allow indexing of crawled folder listings and DFS Namespace enumerations | fs.indexFolders=boolean
When set to |
Enable file system change monitoring | fs.monitorForUpdates=boolean
When set to |
Set the maximum size of the cache of directories | fs.directoryCacheSize=number-of-entries
The maximum size of the directory cache. The connector uses the cache to identify hidden folders to avoid indexing files and folders in hidden folders. The default is 50,000 entries, which typically consume 10–15 megabytes of RAM. |
Timestamp preservation and crawl control
Setting | Parameter |
Preserve last-access timestamp | fs.preserveLastAccessTime=value
When the connector crawls files and folders, the connector can change the last access timestamp of the files and folders to the time of the crawl. When last access times aren't preserved, backup and archive systems might not move appropriate files and folders to secondary storage because of the connector's visit. By default, the connector attempts to preserve the last access time ( Possible values:
|
Crawl only files that were accessed after a certain date | fs.lastAccessedDate=YYYY-MM-DD
Crawl content only if the last access time is after the specified date. The default value is Specify the date in ISO8601 date format: YYYY-MM-DD. For example, if the value is 2010-01-01, the connector only crawls content that was accessed after the beginning of 2010. If you specify |
Crawl only files that were accessed within the past number of days | fs.lastAccessedDays=number-of-days
Crawl content only if the last access time is within the number of days before present. The default value is Use this property to expire previously indexed content that has not been accessed in a while. For example, set to 365 to crawl content only if it was accessed in the last year. If you specify |
Crawl only files that were modified after a certain date | fs.lastModifiedDate=YYYY-MM-DD
Crawl content only if the last modified time is after the specified date. The default value is Specify the date in ISO8601 date format: YYYY-MM-DD. For example, if the value is 2010-01-01, the connector only crawls content that was modified after the beginning of 2010. If you specify |
Crawl only files that were modified within the past number of days | fs.lastModifiedDays=number-of-days
Crawl content only if the last modification time is within the number of days before present. The default value is Use this property to expire previously indexed content that has not been modified in a while. For example, set to 365 to crawl content only if it was modified in the last year. If you specify |
Skip file share access control
By default, the connector preserves access control integrity when it sends Access Control Lists (ACLs) to the indexing API, including the ACLs on the file share. In some configurations, however, the connector might not have sufficient permissions to read the share ACL. In those instances, the connector doesn't return any files maintained on that file share in search results.
You can set the connector to ignore the share ACL so that content is always returned in search results. In this case, the indexing API gets a completely permissive share ACL, rather than the actual share ACL.
Setting | Parameter |
Skip file share access control | fs.skipShareAccessControl=boolean
Set to |