How to configure Google Drive for security trimming

Fusion 2.4.5 and above support security trimming for Google Drive datasources.  The high-level steps are as follows:

  1. Before indexing, you must enable the Google API and Admin SDK, then create a Google service account with appropriate permissions.
  2. Configure a Google Drive datasource in Fusion using the Google service account ID, email, and private key file.
  3. Configure the Security Trimming stage in Fusion's query pipeline to use Google Drive metadata.

Before indexing

Before indexing your Google Drive data in Fusion, you must set up the Google API using the Google Admin Console and Developers Console.

  1. Go to https://console.developers.google.com/.
  2. Click the Create Project button.
  3. Enter a new project name, such as "fusion".
  4. Select the new project, if it isn't selected already: 
    Screen_Shot_2017-03-15_at_9.47.53_AM.png
  5. Navigate to Library > Drive API.
  6. Enable the Google Drive API by clicking the Enable button.
  7. Navigate to LibraryAdmin SDK.
  8. Again, enable the API by clicking the Enable button.
  9. Next we need to create credentials. Navigate to CredentialsCreate Credentials -> Service account key.
  10. Select the P12 key option and save it.
  11. Click Manage service accounts, then Create Service Account.
  12. Click Enable Google Apps Domain-wide Delegation, then Save:
    Screen_Shot_2017-03-15_at_1.20.49_PM.png
  13. Click the View Client ID link for your service account:
    Screen_Shot_2017-03-15_at_3.39.42_PM.png
  14. Copy the Client ID and Service account.  Save them in a convenient place.
  15. Click the menu in the upper left and switch from the API Manager to IAM & Admin.
  16. Select the fusioncrawler project and set its permissions as an Owner and a Service Account Actor, as shown below:
    Screen_Shot_2017-03-15_at_3.54.08_PM.png
  17. Go to https://admin.google.com.
  18. Navigate to Admin ConsoleSecurity.
    You must be logged in to an administrator account; see Google support for help.
  19. Go to Show more > Advanced settings > Manage API client access.
  20. Create a new API client, where Client Name is the Client ID from your Service Account (above) and One or More API Scopes is as follows:

https://www.googleapis.com/auth/admin.directory.group,https://www.googleapis.com/auth/admin.directory.group.readonly,https://www.googleapis.com/auth/admin.directory.user,https://www.googleapis.com/auth/admin.directory.user.alias.readonly,https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/drive.readonly

About authentication types for Google Drive

It is important to understand the difference between the authentication types for Google Drive and what that means to crawling.

  • Use OAuth 2.0 Authentication for server-to-server interactions.  This is the recommended method for Fusion.  See Google support for instructions.
  • Use OAuth Authentication when you only want to crawl the Items that a certain User has access to.  This is especially useful if you have a diagnostics share group.  See Google support for instructions. 

Configure the Google Drive datasource in Fusion

Configure the Google Drive datasource in Fusion by

  1. Navigate to Home > Datasources > Add and select Google Drive.
  2. Configure the fields as follows:
    • StartLinks can be set to root if you want to search all documents available to the user.
    • Service Account ID and P12 Private Key File - Specify these if you want to use Service Account authentication. If these are specified, then the Google account client ID, refresh token and client secret will be ignored.
    • Service Account Email - This needs to be an actual Service Account Actor specified using Domain Wide Authority to your Service Account.
    • Google Account Client ID, Refresh Token, and Client Secret - Specify these if you want to use Google Account authentication.
    • Security Trimming: Select include to expose the following fields:
      • Apply Group Security Filtering - When selected, this will call the Google Admin SDK and get the groups for each user so that it can use the user’s groups during the security trimming stage. This is only available if you use the Server Account authentication.
      • Default domain for Google Drive - If a user logins in with a username that isn’t of the form user@domain, then this default domain will be used as this user’s domain during security trimming.
  3. Click Save.
  4. Click Start Crawl to index your Google Drive data.

Set up the Security Trimming Stage 

  1. Navigate to Home > Query Pipelines and select the pipeline that corresponds to your Google Drive datasource.
  2. Click Add a new pipeline stage and select Security Trimming.
  3. Set User ID source to "header".
  4. Set User ID key to "Fusion-User-Name".
  5. Click Save.
  6. In the list of query pipeline stages, drag the Security Trimming stage down until it is immediately before the Query Solr stage.

From now on, you’ll see security trimming based on the logged in user’s three fields:

Field name

Field type

Example

acl_user

String

joe.testerson

acl_domain

String

lucidworks.com

acl_group

String

engineering


Common Issues

If security trimming isn't working, check for the following:

  • If group permissions are not configured correctly, the Google API may give permissions errors.
  • Check whether the username from the query stage doesn't match the ACL field you created during indexing.
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk