Fusion 2.4.5 and above support security trimming for Google Drive datasources. The high-level steps are as follows:
- Before indexing, you must enable the Google API and Admin SDK, then create a Google service account with appropriate permissions.
- Configure a Google Drive datasource in Fusion using the Google service account ID, email, and private key file.
- Configure the Security Trimming stage in Fusion's query pipeline to use Google Drive metadata.
Before indexing your Google Drive data in Fusion, you must set up the Google API using the Google Admin Console and Developers Console.
- Go to https://console.developers.google.com/.
- Click the Create Project button.
- Enter a new project name, such as "fusion".
- Select the new project, if it isn't selected already:
- Navigate to Library > Drive API.
- Enable the Google Drive API by clicking the Enable button.
- Navigate to Library > Admin SDK.
- Again, enable the API by clicking the Enable button.
- Next we need to create credentials. Navigate to Credentials > Create Credentials -> Service account key.
- Select the P12 key option and save it.
- Click Manage service accounts, then Create Service Account.
- Click Enable Google Apps Domain-wide Delegation, then Save:
- Click the View Client ID link for your service account:
- Copy the Client ID and Service account. Save them in a convenient place.
- Click the menu in the upper left and switch from the API Manager to IAM & Admin.
- Select the fusioncrawler project and set its permissions as an Owner and a Service Account Actor, as shown below:
- Go to https://admin.google.com.
- Navigate to Admin Console > Security.
You must be logged in to an administrator account; see Google support for help.
- Go to Show more > Advanced settings > Manage API client access.
- Create a new API client, where Client Name is the Client ID from your Service Account (above) and One or More API Scopes is as follows:
About authentication types for Google Drive
It is important to understand the difference between the authentication types for Google Drive and what that means to crawling.
- Use OAuth 2.0 Authentication for server-to-server interactions. This is the recommended method for Fusion. See Google support for instructions.
- Use OAuth Authentication when you only want to crawl the Items that a certain User has access to. This is especially useful if you have a diagnostics share group. See Google support for instructions.
Configure the Google Drive datasource in Fusion
Configure the Google Drive datasource in Fusion by
- Navigate to Home > Datasources > Add and select Google Drive.
- Configure the fields as follows:
- StartLinks can be set to root if you want to search all documents available to the user.
- Service Account ID and P12 Private Key File - Specify these if you want to use Service Account authentication. If these are specified, then the Google account client ID, refresh token and client secret will be ignored.
- Service Account Email - This needs to be an actual Service Account Actor specified using Domain Wide Authority to your Service Account.
- Google Account Client ID, Refresh Token, and Client Secret - Specify these if you want to use Google Account authentication.
- Security Trimming: Select include to expose the following fields:
- Apply Group Security Filtering - When selected, this will call the Google Admin SDK and get the groups for each user so that it can use the user’s groups during the security trimming stage. This is only available if you use the Server Account authentication.
- Default domain for Google Drive - If a user logins in with a username that isn’t of the form user@domain, then this default domain will be used as this user’s domain during security trimming.
- Click Save.
- Click Start Crawl to index your Google Drive data.
Set up the Security Trimming Stage
- Navigate to Home > Query Pipelines and select the pipeline that corresponds to your Google Drive datasource.
- Click Add a new pipeline stage and select Security Trimming.
- Set User ID source to "header".
- Set User ID key to "Fusion-User-Name".
- Click Save.
- In the list of query pipeline stages, drag the Security Trimming stage down until it is immediately before the Query Solr stage.
From now on, you’ll see security trimming based on the logged in user’s three fields:
If security trimming isn't working, check for the following:
- If group permissions are not configured correctly, the Google API may give permissions errors.
- Check whether the username from the query stage doesn't match the ACL field you created during indexing.