Title: | Search Data Frames for Personally Identifiable Information |
---|---|
Description: | Check a data frame for personal information, including names, location, disability status, and geo-coordinates. |
Authors: | Jacob Patterson-Stein [aut, cre] |
Maintainer: | Jacob Patterson-Stein <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0 |
Built: | 2025-02-10 06:30:10 UTC |
Source: | https://github.com/jacobpstein/pii |
Search Data Frames for Personally Identifiable Information
check_PII(df)
check_PII(df)
df |
a data frame object |
Returns a data frame of columns that potentially contain PII
# create a data frame containing various personally identifiable information pii_df <- data.frame( lat = c(40.7128, 34.0522, 41.8781), long = c(-74.0060, -118.2437, -87.6298), first_name = c("John", "Michael", "Linda"), phone = c("123-456-7890", "234-567-8901", "345-678-9012"), age = sample(30:60, 3, replace = TRUE), email = c("[email protected]", "[email protected]", "[email protected]"), disabled = c("No", "Yes", "No"), stringsAsFactors = FALSE ) check_PII(pii_df)
# create a data frame containing various personally identifiable information pii_df <- data.frame( lat = c(40.7128, 34.0522, 41.8781), long = c(-74.0060, -118.2437, -87.6298), first_name = c("John", "Michael", "Linda"), phone = c("123-456-7890", "234-567-8901", "345-678-9012"), age = sample(30:60, 3, replace = TRUE), email = c("[email protected]", "[email protected]", "[email protected]"), disabled = c("No", "Yes", "No"), stringsAsFactors = FALSE ) check_PII(pii_df)
Split Data Into PII and Non-PII Columns
split_PII_data(df, exclude_columns = NULL)
split_PII_data(df, exclude_columns = NULL)
df |
a data frame object |
exclude_columns |
columns to exclude from the data frame splitdescription |
Returns two data frames into the global environment: one containing the PII columns and one without the PII columns. A unique merge key is created to join them. The function then prints the columns that were flagged and split to the console.
# create a data frame containing various personally identifiable information pii_df <- data.frame( lat = c(40.7128, 34.0522, 41.8781), long = c(-74.0060, -118.2437, -87.6298), first_name = c("John", "Michael", "Linda"), phone = c("123-456-7890", "234-567-8901", "345-678-9012"), age = sample(30:60, 3, replace = TRUE), email = c("[email protected]", "[email protected]", "[email protected]"), disabled = c("No", "Yes", "No"), stringsAsFactors = FALSE ) split_PII_data(pii_df, exclude_columns = c("phone"))
# create a data frame containing various personally identifiable information pii_df <- data.frame( lat = c(40.7128, 34.0522, 41.8781), long = c(-74.0060, -118.2437, -87.6298), first_name = c("John", "Michael", "Linda"), phone = c("123-456-7890", "234-567-8901", "345-678-9012"), age = sample(30:60, 3, replace = TRUE), email = c("[email protected]", "[email protected]", "[email protected]"), disabled = c("No", "Yes", "No"), stringsAsFactors = FALSE ) split_PII_data(pii_df, exclude_columns = c("phone"))